Incrementality Testing: How to Know If Your Ads Are Actually Driving Revenue

Platform ROAS measures correlation. Incrementality testing measures causation. Here's how to run a geo holdout that reveals your true ad contribution.

Jordan Glickman·May 10, 2026·11

Attribution

There is a question every serious eCommerce operator should be asking and almost nobody is: would this revenue have happened without the ad?

Platform attribution never asks that question. Meta, Google, and TikTok all report conversions using models designed to assign credit to their own channels. When a customer clicks a retargeting ad and purchases, Meta attributes that purchase. What the platform does not tell you is whether that customer would have returned and bought through branded search or direct traffic within the next 48 hours regardless of whether the ad existed.

That gap between attributed revenue and actual incremental revenue is not a rounding error. For brands with large retargeting programs, high organic traffic, or strong email lists, the gap can be large enough to fundamentally change the true value of their paid media investment.

Incrementality testing is the methodology that closes that gap. It does not replace attribution entirely — attribution still provides useful operational signals. But it provides the ground truth that attribution cannot, because it measures what actually caused a purchase rather than what happened to be present at the moment of conversion.

Image brief: Four-row table — Test Result, What It Means, Recommended Action. Rows from green (Holdout drops 20%+, high incrementality) to red (Holdout revenue increases, negative incrementality). Color-coded result column. Clean minimal design. alt: "Incrementality test results interpretation table." caption: "Attribution tells you what was present at conversion. Incrementality testing tells you what caused it."

Why standard attribution systematically misleads

To understand why incrementality testing matters, you need to understand the specific failure mode of standard attribution.

Last-click attribution gives full credit to the final touchpoint before purchase. First-click gives full credit to the first touchpoint. Multi-touch models distribute credit across touchpoints using various weighting schemes. All of them share the same fundamental limitation: they measure correlation, not causation.

A loyal customer who has purchased four times, who received an email that morning, who searched the brand name, and who also happened to click a retargeting ad before completing a fifth purchase — that customer's conversion gets attributed to the retargeting ad. But the honest question is whether the ad contributed anything. The customer was likely going to purchase regardless. Attribution assigned credit based on proximity, not contribution.

This is most severe in precisely the campaign types that report the highest ROAS. Retargeting reaches warm audiences who are already close to purchasing. Branded search captures intent that exists independent of whether any ad was running. Both report high platform ROAS not because they are generating incremental purchases but because the audiences they reach have high baseline conversion probability.

When brands scale retargeting and branded search budgets based on those numbers, they are potentially spending on audiences who would have converted anyway — while the channels that actually generate new demand, prospecting, awareness, new customer acquisition, receive less budget because their reported ROAS looks worse despite doing more incremental work.

Incrementality testing reveals which ROAS is real and which is credited correlation.

Two primary incrementality testing methods

Method 1: Geo holdout testing

A geo holdout test turns off advertising in one geographic region while maintaining normal activity in a comparable control region, then compares revenue performance between the two over the test period.

The logic is clean. If ads are driving incremental revenue, the region where advertising is paused should show a measurable revenue decline relative to the control. If revenue in the holdout region remains flat or drops minimally, the ads were not generating meaningful incremental purchases — those customers were converting through other means regardless.

Geo holdout testing is the most accessible incrementality method for brands without enterprise measurement infrastructure. It requires clean geographic market selection, a defined test duration, and consistent revenue tracking by region. No pixel modifications. No platform cooperation. No statistical modeling beyond comparing indexed revenue trends between two markets.

Method 2: Platform-native lift studies

Meta's Conversion Lift product creates a randomized holdout audience within the platform itself. A portion of the target audience sees the ads normally. A randomized control group sees placeholder ads or no ads. The platform compares purchase behavior between the groups and reports the incremental lift attributable to advertising.

Meta's Conversion Lift is the most accessible platform-native incrementality tool for mid-market advertisers. It can be configured in Ads Manager for campaigns meeting minimum thresholds — typically 10,000 people in the test audience and sufficient budget to produce measurable conversion volume in both groups.

The limitation is platform dependency. Meta is measuring Meta's own incrementality using Meta's own methodology. The randomization controls for selection bias and makes the results more credible than standard attribution, but results should be treated as directional rather than definitive. Platform-native lift studies also tell you nothing about cross-channel incrementality — they cannot tell you whether pausing Meta would cause spend to shift to Google or organic search rather than simply reducing total conversions.

For most brands, geo holdout is the more instructive starting point because it measures total business revenue impact rather than one platform's claimed contribution.

Running a geo holdout test: step by step

Step 1: Select test and control markets

The ideal pairing is two geographic markets that are similar in population size, demographic composition, and historical revenue contribution. For US-based brands, state-level splits are common because state-level revenue data is available through most eCommerce platforms.

Before running the test, pull 90 days of pre-test revenue data by state and confirm the selected markets have been tracking similarly over that period. If the markets were already diverging before the test begins, they are not a valid comparison pair. Avoid markets with known seasonal anomalies during the test window, recent competitive events, or operational changes like new distribution agreements.

Step 2: Define the holdout scope and duration

A total holdout pauses all paid advertising in the test market. A partial holdout pauses one campaign type — typically the highest-ROAS retargeting campaigns — while leaving other channels running. Total holdouts produce cleaner signal but involve more operational disruption. Partial holdouts measure channel-specific incrementality with less business risk.

For a first incrementality test, a partial holdout on retargeting in a single state for two to four weeks is a low-risk way to generate useful data. The test needs enough purchase events in both groups to distinguish real revenue differences from normal variance — brands with lower daily revenue volume need longer test periods.

Step 3: Track revenue at the market level

Configure your eCommerce platform or payment processor to report weekly revenue segmented by shipping address state before the test begins. Establish the ratio between the two markets during the pre-test period — if the holdout state historically generates 11% of total revenue and the control generates 14%, that ratio is your baseline.

During the test, compare weekly revenue in each market indexed against that baseline. A meaningful drop in the holdout market's index while the control market remains stable is the signal you are looking for.

Step 4: Calculate true incremental ROAS

The incremental revenue from the paused activity is the difference between what the holdout market would have generated at its baseline rate and what it actually generated during the test. That delta divided by the spend that would have been deployed in the holdout market over the same period is the true incremental ROAS.

A retargeting program reporting 8x platform ROAS but showing only 12% holdout revenue decline when paused is not generating 8x incremental return. It is generating a small incremental lift against a high baseline of organic and direct conversions — and the attributed ROAS is mostly borrowed from those other channels.

Interpreting results

| Test Result | What It Means | Recommended Action | |---|---|---| | Holdout revenue drops 20%+ | High incrementality — ads drive real purchases | Maintain or scale; attribution roughly trustworthy | | Holdout revenue drops 5–15% | Moderate incrementality — ads have some real effect | Calculate true incremental ROAS; consider reallocation | | Holdout revenue flat or minimal decline | Low incrementality — purchases happen regardless | Reduce spend in this campaign type; reallocate to prospecting | | Holdout revenue increases | Negative incrementality — ads may create friction | Pause campaign type immediately and investigate |

The most common outcome for retargeting-heavy programs is moderate to low incrementality. The audiences being retargeted are largely warm customers who would have converted through organic, direct, or email channels regardless. The advertising is accelerating the conversion timeline rather than generating purchases that would not have otherwise occurred.

Accelerating conversion is not without value — reducing time-to-purchase and cart abandonment have real economic benefit. But it is a different value proposition than generating genuinely new demand, and it should be funded at a different budget level than a true acquisition channel.

The ROAS inversion

One of the most consistent patterns that emerges from incrementality testing across client accounts is the inversion of performance rankings between attributed and incremental ROAS.

The campaigns with the highest platform-reported ROAS are frequently the lowest-incrementality campaigns. Retargeting and branded search capture audiences that are already primed to convert. Their attributed ROAS looks excellent because they are given credit for conversions that organic and direct channels would have produced anyway.

The campaigns with the lowest platform-reported ROAS are frequently the highest-incrementality campaigns. Cold prospecting, upper-funnel TikTok, non-branded Google search — these channels reach people who would not have found or considered the brand without the ad. Their attributed ROAS is depressed because the conversion path is long and attribution credits the last touch. Their actual incremental contribution to new customer acquisition is often the highest in the program.

Brands that allocate budget by platform ROAS consistently underinvest in the channels driving genuine growth and overinvest in channels that are converting customers who were coming back anyway. The media mix modeling that supports this is the statistical version of the same insight — incrementality tests provide the behavioral confirmation.

Testing cadence

Incrementality testing is not a one-time exercise. Channel contribution changes as spend levels change, audiences saturate, creative performance evolves, and competitive dynamics shift.

A retargeting campaign that showed 35% incremental lift six months ago may show 15% today if the retargeting pool has grown relative to the organic conversion rate, or if a competitor has launched an aggressive paid program in the same category. Running structured geo holdouts quarterly — rotating through major campaign types and channels — keeps allocation decisions responsive to actual current performance rather than stale historical data.

Each test is not just a validation of the current state. It is a detection mechanism for changes that have already occurred but have not yet shown up in platform dashboards.

How incrementality testing changes the agency conversation

The most immediate practical value of incrementality data in a client relationship is the conversation it enables.

When a client sees platform-reported ROAS and nothing else, the natural instinct is to scale whatever campaign type shows the highest number. When that same client sees incrementality data showing that the highest-ROAS campaign type generates the lowest actual incremental contribution, the budget allocation conversation changes entirely — and the agency's recommendations become grounded in business outcomes rather than platform scorecards.

At Impremis, introducing incrementality testing into a client relationship reframes the performance question from "what does Meta claim it did" to "what did the business actually gain." That shift produces better allocation decisions, a more credible agency relationship, and measurable improvements in the efficiency metrics that actually matter — CAC on genuinely new customers, not the blended number that includes customers who were returning anyway.

It also protects the agency from the trap of making recommendations that look smart in the dashboard but underperform in the client's financials. That trap is where agency relationships go to die.

FAQ

How much budget do we need to run a meaningful geo holdout? The minimum budget requirement is a function of daily revenue volume and test duration, not absolute spend. You need enough purchase volume in both markets during the test period to distinguish real revenue differences from variance. A brand generating $2,000 per day can run a meaningful two-week geo holdout with relatively low stakes. A brand generating $500 per day needs a longer test period or a larger market split.

Can we test incrementality for Google as well as Meta? Yes. Geo holdout testing works for any channel. Pause Google Shopping in the holdout market while maintaining it in the control, and compare revenue trends. The methodology is identical to a Meta geo holdout. Google also offers its own Conversion Lift product for paid search with minimum spend requirements.

What if seasonal events occur during the test period? Seasonal events that are national in scope (Black Friday, holiday promotions) affect both markets equally and should not bias the comparison. Seasonal events that are regional — local weather events, regional holidays, state-specific promotional programs — can bias results. Avoid test windows with known regional disruptions.

How do we handle the situation where holdout and control markets were similar pre-test but diverge post-test for reasons unrelated to ads? This is the core risk of geo testing and why careful market selection and pre-test analysis matter. If a known external event (a regional news story, a competitor launch) occurs in one market during the test, the test is compromised and results should not be used for allocation decisions. Document the event and re-run the test in a subsequent window.

Closing

Attribution will tell you what was present at the moment of conversion. Incrementality testing tells you what caused it.

That distinction is not semantic. It is the difference between scaling the channels that look best in your dashboard and scaling the channels that are actually growing your business. For brands spending meaningfully on paid media, the gap between those two answers is often measured in significant misallocated budget — sometimes for years before someone runs the test that makes it visible.

Run the test. Read the results honestly. Reallocate based on what is actually causing conversions, not what is being credited for them.

Keep reading

Pieces I've written on related topics that pair well with this one:

The Attribution Problem That's Costing You Real Money — Attribution in Meta Ads is distorting budget decisions across channels.
The Holdout Test: How to Measure True Ad Incrementality Without a Data Science Team — Holdout testing is the most direct way to measure true ad incrementality in eCommerce — and you don't need a data science team to run one.
What Split-Testing on Meta Actually Requires to Produce Statistically Valid Results — Most Meta split tests produce noise, not signal. Here's the four-condition framework for valid creative testing — and what to do with the results.
Long-Form Ads Are Working on Meta. Volume Is Still a Trap — Why 5-minute and 14-minute ads are outperforming on Meta, and why producing 100 ads a month is the wrong response to it.
Static vs. Video Ads in 2026: What High-Spend Accounts Actually Show — Static or video? The answer depends on funnel stage, audience temperature, and placement.

← All writing Want to work together?