MarketingSoda
Enrichment

What Waterfall Enrichment Actually Means (and Why Single-Source Fails)

MT
MarketingSoda TeamApril 20, 2026 · 15 min read
What Waterfall Enrichment Actually Means (and Why Single-Source Fails)

Every B2B data provider will tell you their database covers millions of contacts and companies. The numbers are real. The coverage they deliver for your specific database is not what those numbers imply.

When you run a single enrichment provider against a real-world HubSpot database — not a curated demo set, but your actual contacts with their mix of geographies, industries, company sizes, and job functions — you will get meaningful data back on 40-60% of records. That is not a quality failure. It is a structural ceiling that exists because no single provider has comprehensive coverage across every dimension of B2B data.

This is the problem that waterfall enrichment solves. And understanding what waterfall enrichment actually is — not the marketing version, but the technical architecture — is the difference between an enrichment strategy that plateaus at 50% coverage and one that reaches 80-95%.

40–60%
maximum field coverage from any single B2B data provider against a real-world database — regardless of which vendor you choose

The Single-Source Coverage Ceiling

The coverage ceiling is not a vendor quality problem. It is a data acquisition problem.

Each B2B data provider builds its database through a specific combination of sources: web scraping, email pattern inference, user-contributed data, public filings, partnerships, and direct verification. These source methodologies create inherent strengths and blind spots that differ by provider.

A provider that excels at scraping US technology company websites will have strong firmographic coverage for SaaS companies in San Francisco but weak coverage for manufacturing companies in Stuttgart. A provider that acquires data through professional network partnerships will have strong job title data but may lack direct phone numbers. A provider that invests heavily in GDPR-compliant European data collection will have superior EMEA coverage but thinner North American depth.

This is not fixable by switching vendors. It is fixable by using multiple vendors strategically.

Here is what the coverage matrix actually looks like across major providers:

ProviderPrimary StrengthCoverage GapTypical Coverage
Clearbit / BreezeUS tech firmographicsEU/APAC, non-tech, direct dials~40%
ApolloEmail discovery, US techEU accuracy, freshness variance50-65%
ZoomInfoUS enterprise direct dialsNon-US markets, price accessibility60-72%
CognismEU/EMEA phones, GDPR complianceNorth American depth75%+ EMEA
Hunter.ioEmail verification and discoveryNo firmographics, no phone dataEmail specialist
PeopleDataLabsBreadth, developer-friendly APILower per-record confidence60-70%

No single row in that table solves the enrichment problem. But the gaps in one row are frequently covered by the strengths in another. That observation is the entire foundation of waterfall enrichment.


What Waterfall Enrichment Actually Is

Waterfall enrichment is a sequential, multi-provider enrichment architecture where a record is passed through multiple data providers in priority order, with each subsequent provider filling gaps left by the previous one.

The term "waterfall" comes from the cascading logic: try Provider A first. For any fields that Provider A could not populate, try Provider B. For remaining gaps, try Provider C. Continue until all configured providers have been attempted or all target fields are populated.

But there is a critical distinction that most explanations miss: the difference between record-level waterfall and field-level waterfall.

Record-Level Waterfall (Basic)

In a record-level waterfall, you try Provider A for the entire record. If Provider A returns no result at all — no match found — you try Provider B. If Provider B returns a result, you use it wholesale. If Provider B also returns nothing, you try Provider C.

This is the simpler implementation, and it is what most "waterfall enrichment" marketing describes. It improves coverage modestly because it catches records that Provider A could not match at all. But it does not address the more common and more impactful scenario: Provider A returns a partial result.

Field-Level Waterfall (Advanced)

In a field-level waterfall, each individual field is evaluated independently across providers. Provider A might return a job title and company name but no phone number. Provider B adds the phone number and email but has a different job title. Provider C confirms the email and adds firmographic data that neither A nor B had.

The enrichment engine evaluates each field across all providers and produces a merged "golden record" — the best available value for every field, drawn from whichever provider had the strongest data for that specific field.

This is substantially more powerful than record-level waterfall. It is also substantially harder to implement, because it requires a merge engine that can resolve conflicts when two providers return different values for the same field.

Record-Level vs Field-Level Waterfall: Coverage Comparison

Record-Level Waterfall

Provider A30%
+ Provider B48%
+ Provider C60%
Final coverage
~60%

Each provider only adds records the prior missed. Coverage plateaus quickly.

Field-Level Waterfall

Email95%
Best email provider
Phone88%
Best phone provider
Title92%
Best title provider
Company90%
Best firmographic provider
Final coverage
~90%

Each field is routed to its best provider independently. Coverage compounds across providers.

Provider A covers 40%. Provider B covers 30% of the remaining 60%. Provider C covers 20% of the remaining 42%. Three providers in a field-level waterfall reach 66% — and the math keeps compounding with each additional source.

The coverage math

The Coverage Math: How Providers Compound

The mathematics of multi-provider enrichment are straightforward but frequently misunderstood. Providers do not add their coverage percentages — they compound against the remaining gap.

Sequential coverage calculation:

  • Provider A alone: 40% coverage
  • Add Provider B (covers 30% of the 60% that A missed): 40% + 18% = 58% combined
  • Add Provider C (covers 20% of the remaining 42%): 58% + 8.4% = 66.4% combined
  • Add Provider D (covers 25% of the remaining 33.6%): 66.4% + 8.4% = 74.8% combined
  • Add Provider E (covers 20% of the remaining 25.2%): 74.8% + 5% = 79.8% combined
  • Add Provider F (covers 15% of the remaining 20.2%): 79.8% + 3% = 82.8% combined

With six providers in a properly configured waterfall, you move from 40% single-source coverage to 83% combined coverage. That is not a marginal improvement — it is the difference between a database where half your records are missing critical fields and one where four out of five records are enrichment-complete.

80–95%
field coverage achievable with 5-6 providers in a field-level waterfall — compared to 40-60% from any single source

The diminishing returns are real — each additional provider contributes less incremental coverage than the last. But the first three to four providers deliver the largest gains, and the marginal cost of adding a fifth or sixth provider is often negligible when you already have contracts with those providers.


The Hard Part: Merge Logic and Survivorship Rules

Coverage is the easy argument for waterfall enrichment. The hard part — and the part that separates a well-engineered waterfall from a data quality liability — is what happens when multiple providers return different values for the same field.

Provider A says the contact's job title is "VP of Marketing." Provider B says "Vice President, Demand Generation." Provider C says "Head of Marketing." Which one is correct? Which one do you write to HubSpot?

This is the merge problem, and solving it requires survivorship rules — explicit logic that determines which value wins when providers conflict.

Four Survivorship Strategies

1. Highest Confidence

Each provider returns not just a value but a confidence score (or the enrichment platform assigns one based on historical accuracy for that provider and field type). The value with the highest confidence score wins.

This is the strongest general-purpose strategy. It requires either provider-native confidence scores or a platform that maintains its own accuracy benchmarks per provider per field type.

2. Most Recent

The value from whichever provider has the most recent verification timestamp wins. This strategy prioritizes freshness over source reliability and works well for fields that change frequently — job title, company, seniority level.

3. First Provider (Priority Order)

You define a provider priority order, and the first provider that returns a value for a given field wins. No further evaluation occurs. This is the simplest strategy and works well when you have high confidence in your primary provider's accuracy for specific field types.

4. Manual Review

When providers disagree beyond a configurable threshold (e.g., two providers return completely different company names), the record is flagged for human review rather than auto-merged. This strategy adds latency but prevents bad merges on high-value records.

Field-Level Strategy Assignment

The strongest implementations do not apply a single survivorship strategy across all fields. They assign strategies per field type:

  • Email: highest confidence (with verification status as the confidence signal)
  • Job title: most recent (titles change frequently)
  • Company name: first provider with priority order (structural data changes less often)
  • Phone number: highest confidence (wrong numbers are worse than missing numbers)
  • Firmographics (revenue, headcount): most recent (company-level data shifts with funding rounds and layoffs)
  • LinkedIn URL: first provider (URLs are stable identifiers)

This field-level strategy assignment is what produces genuine golden records — records where each field contains the best available value from the best available source, not just the last value written.


Provenance: Knowing Where Your Data Came From

A field-level waterfall that produces golden records without tracking provenance is a black box. You know the current value. You do not know which provider supplied it, when it was last verified, or what the competing values from other providers were.

Provenance tracking means storing metadata alongside every enriched field:

  • Source provider: which provider supplied this value
  • Confidence score: the provider's confidence in this value
  • Timestamp: when this value was last verified by the source
  • Competing values: what other providers returned for this field (stored but not applied)
  • Survivorship rule: which rule determined that this value won

This metadata is critical for two reasons. First, it enables auditability — when a sales rep questions why a contact's phone number is wrong, you can trace it to the specific provider and timestamp. Second, it enables quality feedback loops — if Provider B's phone numbers are consistently wrong, you can adjust your survivorship rules or provider priority order based on evidence rather than guesswork.


The Economic Layer: Waterfall Cost Considerations

Waterfall enrichment and pricing model are independent architectural decisions, but they interact in ways that matter.

In a credit-based waterfall platform, every provider call in the sequence consumes credits. If your waterfall tries three providers before finding a phone number, that is three credits consumed for one field. The credit cost of a waterfall scales with the number of providers and the coverage difficulty of your records. Hard-to-enrich records (non-US, non-tech, small companies) consume more credits because they require more provider attempts before a match is found.

The key consideration is flexibility and control. A platform that lets you bring your own API keys (BYOK) gives you full transparency into which providers you use and how calls are made. You maintain direct relationships with your data providers, avoid vendor lock-in, and retain the ability to swap or add providers as your needs evolve. The orchestration platform handles the intelligence layer — sequencing, merge logic, quality scoring — while you keep control of your provider stack.

Credit-based pricing also creates "enrichment anxiety" — the reluctance to run enrichment on your full database because each record costs money, which leads to selective enrichment that defeats the purpose of comprehensive data quality. A BYOK approach eliminates this anxiety by giving you direct visibility into your enrichment operations.


Post-Enrichment: Quality Scoring as a Feedback Loop

Enrichment without quality assessment is flying blind. You know fields were populated. You do not know whether your database actually got healthier.

The strongest enrichment architectures integrate quality scoring as a post-enrichment feedback loop:

  1. Pre-enrichment scoring: assess the database to identify coverage gaps by field, geography, and segment
  2. Waterfall enrichment: run the multi-provider sequence to fill gaps
  3. Post-enrichment scoring: re-assess the database to measure improvement
  4. Strategy adjustment: use the delta between pre and post scores to tune provider priority, survivorship rules, and field-level strategies

This feedback loop transforms enrichment from a one-time data append into a continuous quality improvement system. It also provides the reporting that operations leaders need — not "we enriched 12,000 records" but "database completeness improved from 47% to 83%, with the largest gains in direct phone coverage (+34%) and firmographic fields (+28%)."

Without scoring, you cannot answer the most basic question about your enrichment investment: did it work?


What to Look for in a Waterfall Enrichment Platform

Not every platform that claims "waterfall enrichment" implements the full architecture described above. Here is the evaluation checklist that separates marketing claims from engineering reality:

1. Provider breadth

How many data providers does the platform support? More critically, can you add new providers without waiting for the platform to build an integration? A plugin-based architecture that supports new provider adapters is more future-proof than a fixed provider list.

2. Field-level merge (not just record-level)

Does the platform evaluate each field independently across providers, or does it use the first provider that returns any result? Ask specifically: "If Provider A returns a job title but no phone number, and Provider B returns a phone number but a different job title, how does the system decide what to write?"

3. Configurable survivorship rules

Can you define different merge strategies for different field types? Or does the platform apply a single "last writer wins" or "first writer wins" rule across all fields?

4. Provenance tracking

Does the platform record which provider supplied each field value, with confidence scores and timestamps? Can you audit a specific field on a specific record to see which providers contributed which values?

5. Quality scoring integration

Does enrichment feed into a quality scoring system that measures the before/after impact? Can you track coverage trends over time, not just per-enrichment results?

6. Pricing transparency

Does the platform separate orchestration fees from data provider costs? Or are you locked into credit-based pricing with markups on underlying provider data?

7. HubSpot-native integration

Does enriched data write directly to HubSpot properties with proper field mapping, or does it require an intermediate sync layer that introduces latency and failure points?


How MarketingSoda Refine Implements Waterfall Enrichment

We built Refine specifically to implement the full waterfall architecture described in this post — not the simplified version, but the field-level merge with configurable survivorship, provenance tracking, and integrated quality scoring.

Six-plus provider adapters with a plugin-based architecture. Refine ships with adapters for the major B2B data providers (Apollo, Cognism, ZoomInfo, Hunter.io, ZeroBounce, PeopleDataLabs) and supports adding new providers through a standardized adapter interface. You are not locked into our provider list.

Field-level merge engine with four survivorship strategies. Every field is evaluated independently. You configure highest_confidence, most_recent, first_provider, or manual_review per field type. The merge engine produces golden records with full provenance metadata.

BYOK flexibility. Bring your own API keys to maintain full control over your provider relationships. Refine handles the orchestration and intelligence — waterfall sequencing, merge logic, quality scoring, HubSpot integration — while you keep direct access to your providers.

Post-enrichment scoring. Every enrichment run triggers a quality score recalculation. You see the before/after impact on completeness, accuracy, and freshness — by field, by segment, by geography. The scoring system identifies remaining gaps and recommends which providers or strategies to adjust for the next enrichment cycle.

HubSpot-native. Enriched data writes directly to HubSpot contact and company properties. Field mappings are configurable. Write-back respects HubSpot property types and validation rules.

6+
data provider adapters in Refine's waterfall engine — with a plugin architecture that supports adding new providers without platform updates

Refine is not an enrichment tool that bolted on quality scoring. And it is not a quality tool that bolted on enrichment. It was designed from the ground up as an integrated system where enrichment and scoring operate as a continuous feedback loop — because that is the architecture that actually produces compounding data quality improvements.

If your current enrichment approach tops out at 40-60% coverage, or if you are paying credit-based markups on data you already have provider contracts for, run a free database audit to see where the gaps are. Or join the Refine waitlist to get early access when we launch.


Further Reading

Ve la puntuación de salud de tu base de datos.

Conecta HubSpot. Obtén una calificación A–F en cinco dimensiones en minutos. Gratis.

Start Free Audit
enrichmentdata-qualityhubspotwaterfall

Related Posts