User ManualExperiments

Experiments

An experiment in TrackCrumb lets you split your users into two or more variants, show each variant a different experience, measure which variant converts better on a specific event, and get a statistically sound answer — using Bayesian analysis — about which variant wins. You define the experience in your code using a feature flag; TrackCrumb handles bucketing, data collection, the stats engine, and (optionally) shipping the winner to 100% with one click.


Create an experiment

Go to /experiments in the dashboard and click New experiment. Fill in:

FieldRequiredWhat to put
Experiment nameYesHuman-readable label, e.g. “Pricing copy test”
Target conversion eventYesThe event name you will fire when a user converts, e.g. trial_started. Must match exactly what you pass to tracker.track().
VariantsYesDefault: control / treatment. Click + Add variant to add up to 4 total. Each variant needs a slug key and a display name. The stats engine handles all N arms — winner is the arm with prob_best ≥ 0.95.
Linked flagNoPick an existing flag or leave on “Create new flag automatically”. Auto-create generates a flag whose key is a slug of the experiment name (e.g. pricing_copy_test) with a 50% rollout.
Holdout %NoDefault 0. A deterministic fraction of users (max 25%) always sees control, never the treatment. Used for long-term regression detection — see Holdouts below.

Power calculator

Below the variants editor, the form shows a live Power estimate:

  • Estimated weekly traffic (events/week) — your typical conversion-event volume
  • Baseline conversion rate (%) — current conversion on the control experience
  • Minimum detectable effect (relative %, e.g. 20 = “I want to detect a 20% lift”)

As you change inputs, the panel updates: ”≈ X events per arm × N arms = Y total at Z events/week, that’s W weeks.” Use this to set realistic expectations before you launch — many experiments simply don’t have enough traffic to detect small lifts.

Click Create. The experiment is in Draft. Change status to Running when you’re ready to collect data.


Wire the SDK

After creating the experiment, add this code wherever you want to split users:

// 1. Bucket the user — returns "control" | "treatment" (or your custom key)
const variant = tracker.flags.getVariant("pricing_copy_test");
 
// 2. Render the right experience
if (variant === "treatment") {
  showNewCopy();
} else {
  showOldCopy();
}
 
// 3. Later, when the user converts, fire the event you named above
//    The SDK automatically attaches experiment_id + variant — you don't add them.
tracker.track("trial_started", { converted: 1 });

Call tracker.flags.getVariant() as early as possible in the render path — ideally before the user sees any UI — to avoid flicker.

How attribution works

When you call tracker.flags.getVariant("flag_key"), the SDK:

  1. Fetches the flag config from the API (cached for 5 minutes in sessionStorage).
  2. Deterministically buckets the current user by their distinct_id — the same user always lands in the same variant.
  3. Stores the active experiment_id and variant in memory.
  4. Auto-attaches experiment_id and variant as properties to every subsequent tracker.track() call made in that session.

You only need to fire tracker.track("your_event", { converted: 1 }) when the conversion happens. You do not need to manually pass experiment_id or variant — the SDK handles it. The converted: 1 property is what the stats engine counts as a conversion.

Holdouts

If you set Holdout % on the experiment, the SDK applies a separate deterministic check before consulting the flag rollout:

  • A user is in the holdout if bucket(distinctId, "<flag_key>:holdout") < holdoutPct * 100.
  • Holdout users always see control, regardless of the flag’s rolloutPct.

Holdouts are useful for long-term regression detection — keep 5-10% of users on the original experience permanently, even after you ship the winner. If your downstream metrics start diverging between the holdout and the rest of users, you’ll know the change had an unexpected effect months later.


Read results

Expand any experiment row to see the Statistical Results panel. The shape depends on whether your experiment has 2 or N arms.

Per-arm breakdown

For every arm:

FieldMeaning
nSample size (events with this variant)
meanObserved conversion rate
95% CIBayesian credible interval on the conversion rate
prob_bestPosterior probability that this arm is the best of all arms

The arm with prob_best ≥ 0.95 is declared the winner. Otherwise the panel says inconclusive.

Sample-ratio mismatch (SRM)

If the actual traffic split deviates significantly from the configured weights (chi-square p < 0.001), a yellow SRM banner appears: “Sample ratio mismatch detected — actual split differs from expected (p=…). Check for bucketing bugs or traffic source bias before trusting results.”

When SRM is detected, the nightly recompute job automatically pauses the experiment and writes an audit-log entry. You’ll see a red “Auto-paused — sample ratio mismatch detected” banner with a manual Resume button. Don’t resume until you’ve fixed the underlying bias.

Power warning

If the experiment hasn’t collected enough data yet (< 100 events per arm OR < 300 total), a blue “still gathering” banner shows: “Still gathering data — N events so far. Need ~M more for 80% power to detect a 20% lift on a 5% baseline.” The winner is forced to “inconclusive” below this threshold to prevent false declarations.

Segment breakdown

Above the per-arm table, a “Break down by” dropdown lets you slice results by an event property:

  • Country ($country) — works out of the box (uses the dedicated ClickHouse country column populated from IP geolocation).
  • OS ($os) — requires you to attach $os as a property on your tracked events.
  • Browser ($browser) — same as OS.
  • Custom… — type any property key.

The panel then shows a per-segment-value sub-table. Useful for finding “treatment wins overall, but loses on mobile” patterns before you ship.


Apply the winner — one click

When the results panel shows a winner with prob_best ≥ 0.95 and a non-null linked flag, a green “Apply winner →” button appears next to the winner’s name. Click it:

  1. Confirmation modal: “This will route 100% of users to ‘<winnerKey>’. The experiment will be marked completed.”
  2. Confirm → the linked flag’s rolloutPct is set to 100, the flag is enabled, and the experiment status flips to Completed.
  3. An audit-log entry is written: experiment.apply_winner with the winner key and flag id in metadata.

You can also do it manually: change status to Completed, then go to /flags and set rollout to 100%.

After the winner is applied, you can safely remove the if (variant === "treatment") branch from your code in your next release.


Audit log

Every action on an experiment is recorded. Open the History side-panel on the experiment detail page to see:

  • experiment.created
  • experiment.status_changed (any status flip — including auto-pause from SRM)
  • experiment.variants_changed
  • experiment.apply_winner
  • experiment.auto_paused_srm

Most-recent first. Up to 50 entries shown.


Limits and gotchas

  • Variant cap: 2-4 variants per experiment. Larger N is on the roadmap but not exposed in the UI yet.
  • 30-day window: The ClickHouse query that feeds the stats engine looks at events from the past 30 days. Experiments running longer than 30 days only see the most recent 30 days.
  • Recompute cadence: Results refresh on-demand (when you expand the panel). A nightly cron also runs and triggers auto-pause on SRM.
  • $os and $browser segments: Not auto-emitted by the SDK. Pass them yourself on tracker.track() if you want to slice by them. $country is auto-populated from request IP geolocation server-side.
  • Pausing: Pausing prevents the auto-pause cron from acting again, but the on-demand results panel continues to compute fresh stats. Use Pause + an explicit comment in the audit log for a “frozen” experiment.

Run experiments in pairs with the same target event. After applying winner #1, start experiment #2 testing the next iteration on top — that’s how you compound lifts month over month.