Impact Estimation
Impact estimation forecasts the revenue lift a winning A/B test would deliver — the input that decides which experiments are worth running. Here's the formula, typical ranges, and how to use it.
Impact Estimation
Forecasting the expected revenue lift from a winning A/B test before you run it, used to decide which experiments are worth the queue slot.
Impact estimation is the practice of projecting how much extra revenue a test would generate if its winning variant rolled out to 100% of traffic. The classic form multiplies four levers: the expected effect size (relative conversion lift), the audience size exposed to the change, the average order value of that audience, and the time window over which the lift accrues.
It is the quantitative input to almost every prioritization framework — ICE, PIE, PXL — and it is what separates a test queue ranked by gut from one ranked by expected euros. A well-built estimate is conservative on effect size, honest about audience reach, and explicit about the runtime assumption baked in.
Most CRO programs lose money not on losing tests but on winning tests that were never worth running. A 4% lift on a page that gets 800 sessions a month is a rounding error; the same lift on checkout pays for the entire tool stack. Impact estimation forces that distinction before a developer touches the variant.
It is also the bridge to finance. When you tell a Head of E-commerce that a checkout test is queued, the next question is always "how much?" — and a defensible estimate (with assumptions written down) is what keeps the experimentation roadmap funded. It feeds directly into experiment prioritization as the I or the Impact term in whichever scoring model your team uses.
Estimated Lift (€) = Effect Size × Baseline Conversion Rate × Audience Size × AOV × Runtime
Effect Size
Relative lift
Expected % uplift in conversion rate from the winning variant (e.g. 0.05 for a 5% lift)
Baseline Conversion Rate
Current CVR
Conversion rate of the page or flow being tested, as a decimal
Audience Size
Eligible visitors
Sessions per period that will encounter the change once rolled out
AOV
Average order value
Mean revenue per converting session for the audience segment
Runtime
Forecast window
Number of periods you are projecting over (e.g. 12 for annualized)
An apparel Shopify store estimates the annual lift from a product-page test on dresses.
Effect Size: 0.06 (6% relative lift)
Baseline Conversion Rate: 0.022 (2.2%)
Audience Size: 120,000 sessions/month
AOV: €78
Runtime: 12 months
→ €14,826 annualized estimated lift
A 6% relative lift on a 2.2% baseline becomes 2.33%, adding ~158 orders per month at €78 AOV. Over 12 months that's just under €15k — enough to justify a one-week test, not enough to justify a six-week build.
Pick effect sizes from a defensible distribution, not optimism. The honest range for most page-level changes on an established store is 2-8% relative lift; structural changes to checkout or PDP can reach 10-15% but those are the exception, not the planning assumption. Use the table below as a sanity check before you commit to a forecast.
Typical relative conversion lift by test type (winning variants only)
| Test type | Median lift | Top-quartile lift | Hit rate |
|---|---|---|---|
| Headline / copy on PDP | 2-4% | 6-8% | ~25% |
| Pricing display & urgency | 3-6% | 9-12% | ~30% |
| Product-page layout / imagery | 4-7% | 10-14% | ~22% |
| Cart & checkout UX | 5-9% | 12-18% | ~35% |
| Navigation & category pages | 2-5% | 7-10% | ~20% |
| Add-to-cart button & PDP CTA | 3-5% | 8-11% | ~28% |
Two adjustments separate amateur estimates from credible ones. First, multiply by the historical hit rate of similar tests on your store — if only 30% of checkout tests win, the expected value is 0.3 × the winning-case lift. Second, discount audience size by the fraction of traffic that actually sees the change (mobile-only tests, geo-targeted tests, returning-visitor tests all need this cut applied).
Frequently asked questions
Prioritization is the ranking decision; impact estimation is one of the inputs that feeds it. Frameworks like ICE and PIE multiply impact by confidence and ease (or similar) to produce a score — without a credible impact number, the score is just opinion.
Use a conservative anchor from public CRO benchmarks for that test type — usually 3-5% relative lift for a well-designed change on an established store. Then halve it for your planning estimate. You can revise up once you have your own historical win rates.
Both. Show stakeholders the winning-case lift (the upside) and the expected value adjusted for hit rate (winning lift × probability of winning). The first sells the test; the second is what you actually plan revenue against.
AOV converts the extra orders into euros. If the test changes basket size too (e.g. a cross-sell), model that as a separate AOV uplift term — don't compound it into the conversion-rate effect size, or you'll double-count.
Use 12 months for annualized impact when discussing ROI with finance, and use the realistic shelf-life of the change for prioritization (some seasonal tests only matter for 8 weeks). Always state the window explicitly in the estimate.
Lower audience but higher baseline conversion rate and higher AOV usually make checkout tests the highest-impact slot in the queue. Run the formula end-to-end rather than discounting by traffic alone — the math often surprises teams.
For acquisition-stage tests yes — a conversion lift on a first-time buyer is worth their full LTV, not one order. For repeat-purchase or upsell tests, AOV is the right unit. Don't mix the two in the same estimate.
Aim for an estimate you'd be willing to defend after the test runs. If your post-test reality consistently lands below your forecasts, you're being optimistic; if it lands above, your queue is probably under-prioritizing high-effort tests.
Yes — the formula doesn't care about the change type, only the levers. Copy and imagery tests typically have lower effect sizes (2-4%) and lower hit rates, which is exactly why estimating them up-front prevents the queue from filling up with low-impact creative swaps.
A disciplined team lands within ±30% of the forecast on most tests, with checkout and pricing tests being the most predictable and copy tests the noisiest. The point is not perfect accuracy — it's having a consistent yardstick to rank a queue of 40 ideas down to the 5 that actually run.
Test ideas before you ship them
Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.