Universal Waterfall

Overview

The Universal Waterfall is an intelligent data enrichment system that automatically populates custom field values by querying multiple data sources in a prioritized sequence. When a field value is needed, the system "waterfalls" through configured sources until it finds a value, similar to how a waterfall cascades down through multiple levels.

How It Works

Priority-Based Processing

Each custom field can have multiple data sources configured with priorities. The system processes sources in order (priority 1 first, then 2, etc.).

First Value Wins

Once a data source returns a value for an entity, that entity is marked as "processed" and subsequent sources are skipped for it.

Entity-Level Granularity

Processing happens per-entity (Contact, Account, etc.), meaning different entities can get their values from different sources based on availability.

┌─────────────────────────────────────────────────────────────┐
│                    Universal Waterfall                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Entity needs field value                                  │
│            │                                                │
│            ▼                                                │
│   ┌─────────────────┐                                       │
│   │ Priority 1      │──── Value found? ──── YES ──→ Done   │
│   │ (User Input)    │                                       │
│   └────────┬────────┘                                       │
│            │ NO                                             │
│            ▼                                                │
│   ┌─────────────────┐                                       │
│   │ Priority 2      │──── Value found? ──── YES ──→ Done   │
│   │ (Salesforce)    │                                       │
│   └────────┬────────┘                                       │
│            │ NO                                             │
│            ▼                                                │
│   ┌─────────────────┐                                       │
│   │ Priority 3      │──── Value found? ──── YES ──→ Done   │
│   │ (CRED)          │                                       │
│   └────────┬────────┘                                       │
│            │ NO                                             │
│            ▼                                                │
│        Continue...                                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Supported Data Sources

Source Types

Category	Examples
User Input	Manual values entered by users (always highest priority)
CRM Integrations	Salesforce, HubSpot, Microsoft Dynamics, etc.
Enrichment Providers	Apollo, Lusha, Cognism, RocketReach, etc.
Internal Sources	CRED intelligence, custom fields, imports
Computed Sources	LLM-generated values, calculations, queries
API Integrations	Merge.dev and other unified APIs

Configuration

Each workspace can customize:

Which sources are enabled/disabled per field
Source priority order (drag-and-drop reordering)
Processing frequency: On-demand, daily, weekly, monthly, or yearly
Auto-run toggle: Whether enrichment runs automatically on entity creation/update

Default Configuration

New workspaces receive default waterfall configurations from the CustomFieldWaterfallConfig table. This ensures standard fields (phone, email, address, etc.) are pre-configured with sensible defaults.

The SYNC_WATERFALL_CONFIG scheduled job can propagate new default sources to existing workspaces (added as disabled to preserve user settings).

Key Behaviors

Behavior	Description
Non-destructive	User input always takes precedence; automatic enrichment never overwrites manual values
Credit-aware	Processing checks available credits before starting enrichment
Concurrent processing	Multiple fields and entities are processed in parallel for performance
Idempotent	Re-running enrichment only fills missing values, doesn't duplicate work
Rate-limit coordinated	Outbound vendor calls are paced by the shared model-api rate limiter — bulk runs wait within budget and drain at the vendor cap rather than dropping. See below.

How email enrichment actually runs

The product behavior above is implemented by a concrete request → generate engine in commercial-api: the enrichPersonContacts mutation → RequestPersonBillableContactsUseCase (person resolution, blacklist, credit gate, sync vs async) → GeneratePersonBillableContactsUseCase (the email/phone waterfall, free ZeroBounce validation, catch-all-aware stop predicate, employer-aware ranking, dedup + persistence to verified/unverifiedEmails). That full pipeline — including how emails are validated, tiered, ranked, and deduped — is documented in Email Enrichment Pipeline.

Rate limiting, bulk & resilience

For large/bulk enrichment runs, the per-vendor rate limits, request batching (e.g. Apollo bulk_match, Cognism redeem), 429 backpressure, retry budget, and header-driven self-calibration are documented in Enrichment Rate Limiting, Bulk & Resilience. Key point for operators: a rate-limited contact is now retried/waited (and surfaced as a Provider enrichment skipped {reason} log + cred_enrichment_provider_skipped_total metric), not silently dropped.

Issue	Check	Resolution
No enrichment running	Credit balance, source availability	Verify credits > 0, check provider status
Provider failures	Enrichment logs, provider health metrics	Disable failing source, adjust priority order
Config validation errors	Database constraints, source references	Verify source IDs exist, check priority uniqueness