Skip to content

Universal Waterfall

Overview

The Universal Waterfall is an intelligent data enrichment system that automatically populates custom field values by querying multiple data sources in a prioritized sequence. When a field value is needed, the system "waterfalls" through configured sources until it finds a value, similar to how a waterfall cascades down through multiple levels.

How It Works

Priority-Based Processing

Each custom field can have multiple data sources configured with priorities. The system processes sources in order (priority 1 first, then 2, etc.).

First Value Wins

Once a data source returns a value for an entity, that entity is marked as "processed" and subsequent sources are skipped for it.

Entity-Level Granularity

Processing happens per-entity (Contact, Account, etc.), meaning different entities can get their values from different sources based on availability.

┌─────────────────────────────────────────────────────────────┐
│                    Universal Waterfall                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Entity needs field value                                  │
│            │                                                │
│            ▼                                                │
│   ┌─────────────────┐                                       │
│   │ Priority 1      │──── Value found? ──── YES ──→ Done   │
│   │ (User Input)    │                                       │
│   └────────┬────────┘                                       │
│            │ NO                                             │
│            ▼                                                │
│   ┌─────────────────┐                                       │
│   │ Priority 2      │──── Value found? ──── YES ──→ Done   │
│   │ (Salesforce)    │                                       │
│   └────────┬────────┘                                       │
│            │ NO                                             │
│            ▼                                                │
│   ┌─────────────────┐                                       │
│   │ Priority 3      │──── Value found? ──── YES ──→ Done   │
│   │ (CRED)          │                                       │
│   └────────┬────────┘                                       │
│            │ NO                                             │
│            ▼                                                │
│        Continue...                                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Supported Data Sources

Source Types

Category Examples
User Input Manual values entered by users (always highest priority)
CRM Integrations Salesforce, HubSpot, Microsoft Dynamics, etc.
Enrichment Providers Apollo, Lusha, Cognism, RocketReach, etc.
Internal Sources CRED intelligence, custom fields, imports
Computed Sources LLM-generated values, calculations, queries
API Integrations Merge.dev and other unified APIs

Configuration

Each workspace can customize:

  • Which sources are enabled/disabled per field
  • Source priority order (drag-and-drop reordering)
  • Processing frequency: On-demand, daily, weekly, monthly, or yearly
  • Auto-run toggle: Whether enrichment runs automatically on entity creation/update

Default Configuration

New workspaces receive default waterfall configurations from the CustomFieldWaterfallConfig table. This ensures standard fields (phone, email, address, etc.) are pre-configured with sensible defaults.

The SYNC_WATERFALL_CONFIG scheduled job can propagate new default sources to existing workspaces (added as disabled to preserve user settings).


Key Behaviors

Behavior Description
Non-destructive User input always takes precedence; automatic enrichment never overwrites manual values
Credit-aware Processing checks available credits before starting enrichment
Concurrent processing Multiple fields and entities are processed in parallel for performance
Idempotent Re-running enrichment only fills missing values, doesn't duplicate work
Rate-limit coordinated Outbound vendor calls are paced by the shared model-api rate limiter — bulk runs wait within budget and drain at the vendor cap rather than dropping. See below.

How email enrichment actually runs

The product behavior above is implemented by a concrete request → generate engine in commercial-api: the enrichPersonContacts mutation → RequestPersonBillableContactsUseCase (person resolution, blacklist, credit gate, sync vs async) → GeneratePersonBillableContactsUseCase (the email/phone waterfall, free ZeroBounce validation, catch-all-aware stop predicate, employer-aware ranking, dedup + persistence to verified/unverifiedEmails). That full pipeline — including how emails are validated, tiered, ranked, and deduped — is documented in Email Enrichment Pipeline.

Rate limiting, bulk & resilience

For large/bulk enrichment runs, the per-vendor rate limits, request batching (e.g. Apollo bulk_match, Cognism redeem), 429 backpressure, retry budget, and header-driven self-calibration are documented in Enrichment Rate Limiting, Bulk & Resilience. Key point for operators: a rate-limited contact is now retried/waited (and surfaced as a Provider enrichment skipped {reason} log + cred_enrichment_provider_skipped_total metric), not silently dropped.

Issue Check Resolution
No enrichment running Credit balance, source availability Verify credits > 0, check provider status
Provider failures Enrichment logs, provider health metrics Disable failing source, adjust priority order
Config validation errors Database constraints, source references Verify source IDs exist, check priority uniqueness