How to use RPV Lift from A/B Tests

Metricuno

May 22, 2026

6 min read

Quick answer

Testing on revenue per visitor instead of conversion rate changes which variants win — because higher-CR variants often cannibalise AOV. Here's how to run RPV tests, the sample-size penalty, and a worked example.

Definition

Experimentation

RPV Lift from A/B Tests

Measuring A/B test outcomes on revenue per visitor (RPV) instead of conversion rate, so AOV changes are not hidden.

RPV lift from A/B tests is the practice of declaring winners based on revenue per visitor (orders × AOV ÷ visitors) rather than conversion rate alone. The difference matters because many CRO interventions — urgency timers, free-shipping thresholds, simplified PDPs — push more shoppers through checkout while quietly lowering the average basket. A variant that wins on CR can lose on RPV.

Because RPV is a continuous, high-variance metric, tests on it require larger samples and different statistical handling than binary conversion tests. This guide walks through when CR and RPV disagree, the sample-size penalty, and how to run RPV tests on Shopify or WooCommerce without dev work.

Also known as

revenue per visitor testing

RPV-based experimentation

Most A/B testing tools default to conversion rate as the primary metric. It's binary, easy to compute significance on, and reaches power quickly. That convenience hides a problem: conversion rate tells you nothing about what each conversion was worth.

On a Shopify apparel store running a free-shipping-over-€60 banner test, the variant with the banner can lift CR by 8% while dropping AOV by 11% as shoppers strip baskets back to the threshold. Net RPV: down 4%. The CR dashboard celebrates a winner you should have killed.

Why CR winners and RPV winners diverge

CR and RPV diverge whenever a variant changes the composition of who buys or what they buy. Three patterns produce the divergence repeatedly, and each one shows up in the funnel data once you know to look for it.

First, threshold framing. Free-shipping bars, discount tiers, and "add €X for a gift" nudges pull AOV toward the threshold from both sides — high-intent baskets shrink to the line, low-intent baskets stretch to it. CR usually rises; AOV usually compresses.

Second, urgency and scarcity. Countdown timers and low-stock badges convert hesitant browsers but skew the buyer mix toward single-item, lowest-price-point purchases. Third, simplification — collapsing upsell modules or removing cross-sells lifts checkout completion but removes the moments where AOV grew.

The CR-only blind spot

If your test tool reports conversion rate as the primary metric and AOV as a secondary, you will ship CR winners that lose money. Secondary metrics rarely reach significance in standard test durations — the warning never fires. Make RPV the primary.

Worked example: free-shipping banner on a Shopify apparel store

A €4M/year apparel store tests adding a sticky "Free shipping over €60" banner. Control AOV sits at €74; baseline CR is 2.4%. After two weeks and 80,000 visitors per arm, the dashboards look encouraging — until you compute RPV.

Variant CR climbs to 2.59% (+7.9%). Variant AOV falls to €65.80 (−11.1%) as customers trim items to land just above the threshold. Control RPV = 2.4% × €74 = €1.776. Variant RPV = 2.59% × €65.80 = €1.704. RPV is down 4.1% — roughly €164k in annualised revenue if shipped.

Chart

Free-shipping banner test: CR winner is the RPV loser

The pattern repeats across discount-code-field tests, exit-intent popups offering 10% off, and bundled-cart redesigns. Any intervention that touches basket composition deserves an RPV readout before you ship it. This is the core argument behind broader RPV optimization as a CRO discipline.

The sample-size penalty for testing on RPV

RPV is a continuous metric with high variance — most visitors contribute €0, a few contribute €30-€500, occasional whales contribute €2,000+. Standard CR power calculations don't apply. You need to size based on the standard deviation of revenue per visitor, which is typically 4-8× the mean.

Practically, this means RPV tests need 2-4× the sample of an equivalent CR test to detect the same relative lift. The exact multiplier depends on your AOV distribution — wide product-price ranges (€20 t-shirts alongside €400 jackets) inflate variance and the sample requirement with it.

Benchmark

Sample-size multiplier for RPV tests vs CR tests, by AOV variance profile

Store profile	AOV	Revenue CV (σ/μ)	Sample multiplier vs CR	Visitors/arm for 5% MDE
Single-SKU beauty (narrow price range)	€38	3.2	1.8×	~110,000
Apparel store (mid price spread)	€74	5.1	2.6×	~180,000
Electronics/accessories (wide spread)	€135	7.4	3.9×	~310,000
Home & furniture (very wide + whales)	€220	9.8	5.2×	~480,000

Two practical mitigations: cap or winsorise the top 1% of order values (a single €4,000 order can dominate a two-week test), and segment by purchase-intent cohort where possible. Removing logged-in repeat customers from a homepage test, for instance, often halves the required sample. This is also the link to AB test ROI thinking — longer RPV tests cost more in opportunity, so reserve them for changes that plausibly move AOV.

Running RPV tests in practice

Configure your test tool to send order_value per session as the primary event, not just a conversion flag. On Shopify, this means firing a purchase event with the line-item subtotal (excluding shipping and tax for cleaner comparison) into your experimentation platform. Most modern tools accept revenue as a numeric goal directly.

For significance, use a t-test on log-transformed revenue per visitor or a non-parametric Mann-Whitney U test — both handle the skewed distribution better than a naive two-sample t-test on raw revenue. Plan test duration in full weeks to cover weekly seasonality, and pre-register your MDE so you don't peek and stop on noise.

Decision rule that ships money

Ship the variant only if RPV is significant AND positive. If RPV is flat but CR is up and AOV is down by the same %, you've moved volume without moving money — usually not worth the operational complexity of the change. If RPV is up but CR is flat, you've found a pure AOV lever — those are rare and worth keeping.

Frequently asked

RPV A/B testing FAQ

No. Test on RPV when the change plausibly moves AOV — pricing, free-shipping thresholds, upsells, bundles, urgency. For pure friction-removal tests (fixing a broken form field, speeding up a page), CR is fine and reaches power 2-4× faster.

Roughly 80,000-150,000 visitors per arm over 2-4 weeks for a mid-AOV store detecting a 5% lift at 80% power. Below that, RPV tests rarely reach significance and you're better off optimising on CR with AOV as a guardrail.

Winsorise the top 1% of order values — replace anything above the 99th percentile with the 99th-percentile value. This keeps the data honest without throwing away large legitimate orders entirely. Re-run the analysis with and without winsorisation; if the winner flips, you don't have enough data.

Technically yes with enough sample, but the distribution is heavily right-skewed (most sessions = €0). Prefer Mann-Whitney U or a t-test on log(1 + revenue). Most experimentation platforms now offer these as built-in options.

Often used interchangeably, but RPV typically means revenue per unique visitor (deduplicated by user/cookie), while revenue per session counts each visit separately. For A/B test analysis use whichever matches your randomisation unit — if you bucket by visitor, measure by visitor.

Rarely with statistical rigour. Stores under €1M typically can't reach RPV significance within a quarter. Use CR with AOV as a directional secondary, and rely on bigger qualitative signals (session replay, exit surveys) to guide decisions instead.

Always normalise to a single base currency before computing RPV. Mixed-currency revenue inflates variance and can bias results if the variant disproportionately attracts shoppers from a higher-AOV market. Convert at the order's settlement rate, not the live FX rate.

Yes, when sample allows. Returning customers have 2-3× higher AOV and much lower variance — their RPV moves more predictably. New-visitor RPV is noisier but more representative of acquisition impact. Segmenting often reveals that a variant wins on one cohort and loses on the other.

Minimum two full business weeks to cover weekly seasonality; ideally three to four. Stop on pre-registered sample size, not on significance — peeking at a high-variance metric is how false positives ship to production.

Yes, if you stored order_value alongside the variant assignment. Pull the historical data, compute RPV per arm, and re-run the significance test. Teams that do this routinely find 15-25% of their past CR winners were actually RPV-neutral or negative.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

How to use RPV Lift from A/B Tests

RPV Lift from A/B Tests

Why CR winners and RPV winners diverge

Worked example: free-shipping banner on a Shopify apparel store

Free-shipping banner test: CR winner is the RPV loser

The sample-size penalty for testing on RPV

Sample-size multiplier for RPV tests vs CR tests, by AOV variance profile

Running RPV tests in practice

RPV A/B testing FAQ

Should I always test on RPV instead of CR?

What's the minimum traffic I need to run RPV tests?

How do I handle outlier orders that distort RPV?

Can I use a standard t-test on revenue per visitor?

What's the difference between RPV and revenue per session?

Does RPV testing work on low-traffic stores under €1M?

How does RPV testing interact with Shopify Markets and multi-currency?

Should I segment RPV tests by new vs returning customer?

How long should an RPV test run?

Can I retroactively re-analyse old CR tests on RPV?

Test ideas before you ship them