Experiments
An experiment in TrackCrumb lets you split your users into two or more variants, show each variant a different experience, measure which variant converts better on a specific event, and get a statistically sound answer — using Bayesian analysis — about which variant wins. You define the experience in your code using a feature flag; TrackCrumb handles bucketing, data collection, the stats engine, and (optionally) shipping the winner to 100% with one click.
Create an experiment
Go to /experiments in the dashboard and click New experiment. Fill in:
| Field | Required | What to put |
|---|---|---|
| Experiment name | Yes | Human-readable label, e.g. “Pricing copy test” |
| Target conversion event | Yes | The event name you will fire when a user converts, e.g. trial_started. Must match exactly what you pass to tracker.track(). |
| Variants | Yes | Default: control / treatment. Click + Add variant to add up to 4 total. Each variant needs a slug key and a display name. The stats engine handles all N arms — winner is the arm with prob_best ≥ 0.95. |
| Linked flag | No | Pick an existing flag or leave on “Create new flag automatically”. Auto-create generates a flag whose key is a slug of the experiment name (e.g. pricing_copy_test) with a 50% rollout. |
| Holdout % | No | Default 0. A deterministic fraction of users (max 25%) always sees control, never the treatment. Used for long-term regression detection — see Holdouts below. |
Power calculator
Below the variants editor, the form shows a live Power estimate:
- Estimated weekly traffic (events/week) — your typical conversion-event volume
- Baseline conversion rate (%) — current conversion on the control experience
- Minimum detectable effect (relative %, e.g. 20 = “I want to detect a 20% lift”)
As you change inputs, the panel updates: ”≈ X events per arm × N arms = Y total at Z events/week, that’s W weeks.” Use this to set realistic expectations before you launch — many experiments simply don’t have enough traffic to detect small lifts.
Click Create. The experiment is in Draft. Change status to Running when you’re ready to collect data.
Wire the SDK
After creating the experiment, add this code wherever you want to split users:
// 1. Bucket the user — returns "control" | "treatment" (or your custom key)
const variant = tracker.flags.getVariant("pricing_copy_test");
// 2. Render the right experience
if (variant === "treatment") {
showNewCopy();
} else {
showOldCopy();
}
// 3. Later, when the user converts, fire the event you named above
// The SDK automatically attaches experiment_id + variant — you don't add them.
tracker.track("trial_started", { converted: 1 });Call tracker.flags.getVariant() as early as possible in the render path — ideally before the user sees any UI — to avoid flicker.
How attribution works
When you call tracker.flags.getVariant("flag_key"), the SDK:
- Fetches the flag config from the API (cached for 5 minutes in
sessionStorage). - Deterministically buckets the current user by their
distinct_id— the same user always lands in the same variant. - Stores the active
experiment_idandvariantin memory. - Auto-attaches
experiment_idandvariantas properties to every subsequenttracker.track()call made in that session.
You only need to fire tracker.track("your_event", { converted: 1 }) when the conversion happens. You do not need to manually pass experiment_id or variant — the SDK handles it. The converted: 1 property is what the stats engine counts as a conversion.
Holdouts
If you set Holdout % on the experiment, the SDK applies a separate deterministic check before consulting the flag rollout:
- A user is in the holdout if
bucket(distinctId, "<flag_key>:holdout") < holdoutPct * 100. - Holdout users always see
control, regardless of the flag’srolloutPct.
Holdouts are useful for long-term regression detection — keep 5-10% of users on the original experience permanently, even after you ship the winner. If your downstream metrics start diverging between the holdout and the rest of users, you’ll know the change had an unexpected effect months later.
Read results
Expand any experiment row to see the Statistical Results panel. The shape depends on whether your experiment has 2 or N arms.
Per-arm breakdown
For every arm:
| Field | Meaning |
|---|---|
| n | Sample size (events with this variant) |
| mean | Observed conversion rate |
| 95% CI | Bayesian credible interval on the conversion rate |
| prob_best | Posterior probability that this arm is the best of all arms |
The arm with prob_best ≥ 0.95 is declared the winner. Otherwise the panel says inconclusive.
Sample-ratio mismatch (SRM)
If the actual traffic split deviates significantly from the configured weights (chi-square p < 0.001), a yellow SRM banner appears: “Sample ratio mismatch detected — actual split differs from expected (p=…). Check for bucketing bugs or traffic source bias before trusting results.”
When SRM is detected, the nightly recompute job automatically pauses the experiment and writes an audit-log entry. You’ll see a red “Auto-paused — sample ratio mismatch detected” banner with a manual Resume button. Don’t resume until you’ve fixed the underlying bias.
Power warning
If the experiment hasn’t collected enough data yet (< 100 events per arm OR < 300 total), a blue “still gathering” banner shows: “Still gathering data — N events so far. Need ~M more for 80% power to detect a 20% lift on a 5% baseline.” The winner is forced to “inconclusive” below this threshold to prevent false declarations.
Segment breakdown
Above the per-arm table, a “Break down by” dropdown lets you slice results by an event property:
- Country (
$country) — works out of the box (uses the dedicated ClickHousecountrycolumn populated from IP geolocation). - OS (
$os) — requires you to attach$osas a property on your tracked events. - Browser (
$browser) — same as OS. - Custom… — type any property key.
The panel then shows a per-segment-value sub-table. Useful for finding “treatment wins overall, but loses on mobile” patterns before you ship.
Apply the winner — one click
When the results panel shows a winner with prob_best ≥ 0.95 and a non-null linked flag, a green “Apply winner →” button appears next to the winner’s name. Click it:
- Confirmation modal: “This will route 100% of users to ‘<winnerKey>’. The experiment will be marked completed.”
- Confirm → the linked flag’s
rolloutPctis set to100, the flag is enabled, and the experiment status flips toCompleted. - An audit-log entry is written:
experiment.apply_winnerwith the winner key and flag id in metadata.
You can also do it manually: change status to Completed, then go to /flags and set rollout to 100%.
After the winner is applied, you can safely remove the if (variant === "treatment") branch from your code in your next release.
Audit log
Every action on an experiment is recorded. Open the History side-panel on the experiment detail page to see:
experiment.createdexperiment.status_changed(any status flip — including auto-pause from SRM)experiment.variants_changedexperiment.apply_winnerexperiment.auto_paused_srm
Most-recent first. Up to 50 entries shown.
Limits and gotchas
- Variant cap: 2-4 variants per experiment. Larger N is on the roadmap but not exposed in the UI yet.
- 30-day window: The ClickHouse query that feeds the stats engine looks at events from the past 30 days. Experiments running longer than 30 days only see the most recent 30 days.
- Recompute cadence: Results refresh on-demand (when you expand the panel). A nightly cron also runs and triggers auto-pause on SRM.
$osand$browsersegments: Not auto-emitted by the SDK. Pass them yourself ontracker.track()if you want to slice by them.$countryis auto-populated from request IP geolocation server-side.- Pausing: Pausing prevents the auto-pause cron from acting again, but the on-demand results panel continues to compute fresh stats. Use Pause + an explicit comment in the audit log for a “frozen” experiment.
Run experiments in pairs with the same target event. After applying winner #1, start experiment #2 testing the next iteration on top — that’s how you compound lifts month over month.