Skip to content

Email Enrichment Pipeline (Request β†’ Generate Billable Contacts)

How CRED turns a "discover this person's email" request into ranked, validated, deduped email contacts. This is the engine behind email (and phone) enrichment β€” the request/validation/ranking logic that sits on top of the individual vendors.

Where this lives

All of this is in commercial-api (cred-api-commercial), under src/domain/person/usecase/. The waterfall calls model-api per provider; model-api owns the outbound vendor calls and rate limiting.

Related docs:


The three layers

enrichPersonContacts (GraphQL mutation)
        β”‚
        β–Ό
RequestPersonBillableContactsUseCase     ← resolve persons, blacklist, credit check, sync vs async
        β”‚
        β–Ό
GeneratePersonBillableContactsUseCase    ← the waterfall engine: providers β†’ validation β†’ ranking β†’ persist
        β”‚
        β–Ό
per-provider helpers  β†’  model-api  β†’  external vendors (Cognism, Apollo, Lusha, …)
Layer File Responsibility
Resolver src/graphql-api/person/resolvers/enriched-contact-resolver.ts enrichPersonContacts mutation; thin β€” delegates to the request use case
Request src/domain/person/usecase/request-person-billable-contacts.ts Resolve the person set, blacklist filter, credit check, route sync/async
Generate src/domain/person/usecase/generate-person-billable-contacts.ts Run the waterfall, validate emails, rank, dedupe, persist to custom fields

1. Entry point β€” enrichPersonContacts

A single @ExposedOperation mutation (display name "Discover Person Emails / Phones", exposed as WORKFLOW_ACTION / MCP_TOOL / REST, tagged internal on the federation graph) covers both email and phone enrichment.

Arguments:

Arg Type Meaning
format ContactFormat (default EMAIL) EMAIL or PHONE
personIds [Int] Explicit person IDs to enrich
collectionId Int Enrich every person in a collection
sequenceId Int Enrich every person enrolled in a sequence
noEmailsOnly Boolean Skip persons that already have an email/verified contact
personSearchFilters InputPersonsSearchFilters Search-based selection (combinable with collectionId)

The resolver stamps a TriggeringEventInfo (CONTACTS_ENRICHMENT, random triggeringEventKey) onto the context β€” that key threads through every feature-log row written downstream so a single enrichment run is traceable β€” and calls context.usecases.requestPersonBillableContacts.execute(...).

requestProfileRetrieval(personId) is a thin sibling mutation that calls the same request use case with format: EMAIL for a single person (used for private LinkedIn profile retrieval).


2. Request layer β€” RequestPersonBillableContactsUseCase

Extends UserUseCase (authenticated). Three jobs: resolve, gate, route.

Person resolution (resolveRawPersonIds)

Exactly one selection mode wins, in this precedence:

  1. sequenceId β†’ person IDs of all ACTIVE / PAUSED / PENDING enrollments (deduped).
  2. collectionId + noEmailsOnly β†’ collection items that currently have no email.
  3. collectionId + personSearchFilters β†’ search within the collection (collection_size_limit cap).
  4. collectionId alone β†’ every person in the collection.
  5. personIds + noEmailsOnly β†’ drop person IDs that already have a tenant contact with any email (email / emails / importEmails).
  6. personIds alone β†’ as supplied.

The resolved set is then run through filterBlacklistedEntityIds (EntityTypeEnum.PERSON) so blacklisted persons never reach a paid vendor call.

Credit gate (validateInput)

  • Parent feature is FEATURE_CONTACT_WATERFALL (email) or FEATURE_CONTACT_PHONE_WATERFALL (phone).
  • Loads the enabled child datasource features under that parent; if none β†’ ValidationError (Pricing is not configured…, HTTP 400).
  • maxPrice = max(child feature amounts).
  • Unless the user isSupport, requires remainingCredits >= maxPrice Γ— personCount, else ValidationError (HTTP 429 β€” Insufficient credits…).
  • If the parent feature is disabled β†’ ValidationError (Feature is disabled.).

Credit cost is upper-bounded at the gate, charged per-provider later

The gate reserves against the most expensive datasource. Actual charges are written per-person-per-provider as the waterfall progresses (see feature logging below), and reduced for BYOK customers.

Sync vs async routing (innerExecute)

Person count Path Returns
1 Calls GeneratePersonBillableContactsUseCase.execute() inline, unwraps .contacts EnrichedPersonContactData[]
>1 Enqueues GENERATE_ENRICHED_CONTACTS worker task via lazyPushTaskFromWeb null (client polls the subscription)

The single-person path discards the partial-success warning (the synchronous GraphQL surface has no field for it); the async path forwards it on the CONTACTS_ENRICHMENT_COMPLETED PubSub payload.


3. Generate layer β€” GeneratePersonBillableContactsUseCase

extends BaseUseCase. The recursive generateContacts() method is the waterfall. Output is { contacts, warning? } (COM-40844 β€” the warning carries a swallowed partial-success notice such as a ZeroBounce transient).

Provider order (feature IDs)

The waterfall walks an ordered array of feature IDs, one provider per step. The order is per-tenant configurable; when a tenant hasn't customized it, the default is resolved as:

  • EMAIL β€” getDefaultEmailEnrichmentFeatureIds(ctx), which reads the seeded RecommendedWaterfallConfig rows for the primaryEnrichmentEmail template, ordered by priority. There is no static code fallback β€” a missing template/rows throws DEFAULT_EMAIL_ENRICHMENT_TEMPLATE_MISSING / …_ROWS_MISSING (COM-32909), so the DB seed is the single source of truth.
  • PHONE β€” DEFAULT_PHONE_ENRICHMENT_FEATURES.

Default email chain (current)

The seeded primaryEnrichmentEmail default chain is 4 sources:

Priority Provider
1 Cognism
2 Apollo
3 Lusha
4 CRED

RocketReach is NOT a default source β€” older docs that say so are stale

Until 2026-05-14 the default chain was the 5-row [Cognism, Apollo, Lusha, CRED, RocketReach]. COM-32909 dropped RocketReach from primaryEnrichmentEmail (migration 20260514130000_drop-rocketreach-from-primary-enrichment-email, corrected by 20260514150500_fix-rocketreach-feature-id-in-primary-enrichment-email, which also cleans up existing tenants). RocketReach is still an integrated, available provider (it remains in funcMap), but it now lives only in the FE "+ Add source" tray and is never seeded into RecommendedWaterfallConfig β€” so it does not run unless a tenant explicitly adds it. Any documentation claiming "5 default sources including RocketReach" is describing the pre-2026-05-14 state.

Available provider helpers (funcMap)

These are the providers the engine can call β€” a superset of the default chain. A tenant's configured emailEnrichmentFeatureIds (default chain plus anything added via the "+ Add source" tray) selects from these:

Feature constant Helper Vendor In default chain?
FEATURE_COGNISM_EMAIL/PHONE_ENRICHMENT cognism-contacts.ts Cognism βœ…
FEATURE_APOLLO_EMAIL/PHONE_ENRICHMENT apollo-contacts.ts Apollo βœ…
FEATURE_LUSHA_EMAIL_ENRICHMENT lusha-contacts.ts Lusha βœ…
FEATURE_CRED_EMAIL_ENRICHMENT cred-contacts.ts CRED (internal) βœ…
FEATURE_RR_EMAIL/PHONE_ENRICHMENT rocket-reach-contacts.ts RocketReach ❌ β€” "+ Add source" only
FEATURE_SK_EMAIL_ENRICHMENT skrapp-contacts.ts Skrapp ❌ (default for primaryWorkEmail only)
FEATURE_AMF_EMAIL_ENRICHMENT anymailfinder-contacts.ts AnyMailFinder ❌ (default for primaryWorkEmail only)
FEATURE_ARL_EMAIL_ENRICHMENT aeroleads-contacts.ts AeroLeads ❌ (default for primaryWorkEmail only)

cred-unverified-contacts.ts contributes unverified CRED candidates.

Two email templates β€” don't confuse them

There are two seeded email templates with different chains. This is the documentation-vs-reality mismatch that has tripped people up:

Template Surface Seeded chain (current)
primaryEnrichmentEmail The internal chain consumed by RequestPersonBillableContactsUseCase / getDefaultEmailEnrichmentFeatureIds β€” this is the billable-contacts default Cognism β†’ Apollo β†’ Lusha β†’ CRED (4)
primaryWorkEmail The FE "Smart Enrich β€” Email" custom-field bundle CRED β†’ Cognism β†’ Apollo β†’ Lusha β†’ Skrapp β†’ AnyMailFinder β†’ AeroLeads (7) β€” BE-SEED-REORDER / D8, 2026-05-13

Neither default chain includes RocketReach. Both were 5-row […, RocketReach] chains before mid-May 2026.

Per-step flow (per recursion)

  1. notFoundIds β€” persons not yet resolved by an earlier step and not excluded.
  2. runProvider β€” calls the provider helper, then retryRateLimitedSkips (COM-43136/P-003, behind FF_BULK_ENRICH_RETRY) re-attempts only persons our internal limiter shed (rate_limited); circuit_open / error skips are not retried.
  3. Split results into verified vs unverified groups (Apollo phone has a special "pending webhook / stale data" exclusion path).
  4. Email validation (EMAIL only, if an email-validation feature is configured) β€” see below.
  5. Demotion (EMAIL only) β€” strip verification from emails at past-employer and off-current-employer domains.
  6. Feature log β€” one CreateCreditFeatureLog row per verified person per provider (the actual credit charge); persons resolved by no provider get an EMAIL/PHONE_ENRICHMENT_ATTEMPT log.
  7. Recurse to the next feature ID with the still-unresolved persons, accumulating contacts + first-wins warning.

Email validation is free here (COM-39583)

The waterfall calls RequestEmailValidationUseCase.execute(..., { chargeCredits: false }) β€” ZeroBounce validation is bundled into the enrichment that already charged for the record. (The standalone revalidate-email-address path still bills.)

Validation writes back onto each contact: isVerified, isEmailValidated, emailValidationStatus, confidenceTier, and lastValidationCheck.

Waterfall stop predicate β€” isDeliverableHere (COM-32909)

A contact stops the waterfall only when:

c.isVerified === true && c.emailValidationStatus !== EmailValidationStatusEnum.CATCH_ALL

A valid + alias_address (CATCH_ALL) hit at the current employer is kept as a candidate but does not short-circuit β€” the next provider may return a fully VALID address at the same domain, which is preferred. A fully VALID verified email at the current employer stops the waterfall.

BYOK credit reduction

In logEnrichedContacts, if the customer has a stored secret for the provider (FEATURE_TO_SECRET_TYPE[featureId] resolves and getSecretByType returns one), the charged amount is reduced to 1 credit β€” they're spending their own vendor quota.


4. How emails are handled

Validation status (tri-state)

ZeroBounce raw status is mapped by classifyEmailValidationStatus into EmailValidationStatusEnum: VALID / INVALID / CATCH_ALL / UNVALIDATED / NOT_APPLICABLE. A catch-all-domain heuristic (COM-39649) can demote a nominally-valid address to CATCH_ALL based on the email itself, so the FE never shows a misleading green badge for a known catch-all domain.

Confidence tier

classifyConfidenceTier (COM-39604) derives a persisted tier from the validation status. CONFIDENCE_TIER_ORDER ranks them 0..4 (lower = better):

Tier Order From validation status Meaning
VERIFIED 0 VALID / VALIDATED ZeroBounce confirmed deliverable
CATCH_ALL 1 CATCH_ALL Domain accepts all β€” discoverable, never primary within tier
UNKNOWN 2 UNVALIDATED Validation ran but inconclusive
UNVERIFIED 3 NOT_APPLICABLE / null / undefined Never validated β€” we simply don't know
INVALID 4 INVALID ZeroBounce confirmed undeliverable (invalid / spamtrap / abuse / do_not_mail)

The tier is stored denormalized so ranking is an O(1) lookup rather than a re-derivation.

INVALID β‰  UNVERIFIED (COM-43358, merged 2026-06-05)

Before COM-43358, a ZeroBounce invalid verdict was folded into UNVERIFIED β€” a confirmed-undeliverable mailbox was indistinguishable from one that was never validated. INVALID is now its own tier, ranked lowest (below UNVERIFIED), so a confirmed-bad mailbox can never win primary work/enrichment email. INVALID-tier emails are explicitly excluded from primary-email eligibility in update-contact-custom-fields.ts (COM-43361) β€” an all-invalid contact gets no primary rather than a bad one β€” and the sequence recipient-resolver demotes them too. Only new enrichments get INVALID; existing rows keep their prior tier (no backfill, no migration β€” confidenceTier is already varchar(32)).

Ranking & final sort

sortBillableContacts groups contacts by personId (first-appearance order preserved β€” persons are not reshuffled), then sorts each person's emails with rankEmailsByEmployer. Tiebreak order, strongest first:

  1. Current-employer domain win.
  2. Past-employer / off-current-employer demotion (verification stripped).
  3. Confidence tier (VERIFIED > CATCH_ALL > UNKNOWN > UNVERIFIED > INVALID β€” INVALID lowest, COM-43358) plus the validation score (VALID > CATCH_ALL > unvalidated).
  4. Probability score.
  5. Recency (isActiveUpdatedAt).
  6. WORK contact-type.
  7. Data-source priority (COM-32909) β€” the configured provider order is forwarded as a Map<dataSourceAbbreviation, priority> so the sort matches what the write path persists as primaryEnrichmentEmail.

Phones use the legacy verified-first β†’ WORK-tiebreak sort (no employer concept).

Deduplication & persistence

For EMAIL/PHONE, results are written through UpdateContactCustomFieldsUseCase into custom fields β€” verifiedEmails / unverifiedEmails (EMAIL, COM-32909) and verifiedPhones / unverifiedPhones (PHONE, COM-42483). Dedup (case-insensitive by value), demotion, and ranking are applied at write time, with the same dataSourcePriority map forwarded. get-person-enriched-contacts reads from these custom fields β€” never from FeatureLog metadata β€” so the waterfall no longer writes a contact snapshot into the feature log.

The chosen primary email is the best primary-eligible candidate, and INVALID-tier addresses are never primary-eligible (COM-43358/COM-43361) β€” if every candidate is INVALID, the primary is left unset rather than promoting a confirmed-bad mailbox. The same exclusion applies in recompute-primary-work-email.ts and the CSV export ranker.

GraphQL surface fields

On TypeEnrichedContactDetails:

  • emailValidationStatus (tri-state) β€” the field to use. Resolves the most-recent EmailAddressValidation row (DataLoader-batched) and classifies it; falls back to the legacy boolean for pre-D14 contacts.
  • isEmailValidated β€” deprecated (D14/COM-32909): a boolean cannot distinguish VALID from CATCH_ALL.
  • dataSource β€” resolves { abbreviation } from the contact's provider.

5. Async completion

Multi-person runs return null and complete on the worker. The contactsEnrichmentCompleted GraphQL subscription (CONTACTS_ENRICHMENT_COMPLETED, filtered to the current user) delivers the result, including any partial-success warning (COM-40844 BE-3).


6. Rate limiting, bulk & resilience

Outbound pacing is not in this engine β€” it's enforced in the model-api provider-rate-limiter. Skips surfaced by providers (provider-contact-result.ts) are classified rate_limited | circuit_open | error, emitted as the cred_enrichment_provider_skipped_total{provider,reason} metric and a Provider enrichment skipped log, and (for rate_limited) re-attempted by the retry wave. Full detail β€” vendor caps, bulk endpoints, 429 backpressure, header-driven self-calibration β€” lives in Enrichment Rate Limiting, Bulk & Resilience.


Key code paths

  • src/graphql-api/person/resolvers/enriched-contact-resolver.ts β€” enrichPersonContacts, emailValidationStatus/isEmailValidated field resolvers, completion subscription.
  • src/domain/person/usecase/request-person-billable-contacts.ts β€” person resolution, blacklist, credit gate, sync/async routing.
  • src/domain/person/usecase/generate-person-billable-contacts.ts β€” the waterfall engine, isDeliverableHere, funcMap, feature logging, sort.
  • src/domain/person/usecase/helpers/ β€” per-provider helpers (cognism-contacts.ts, apollo-contacts.ts, lusha-contacts.ts, rocket-reach-contacts.ts, skrapp-contacts.ts, anymailfinder-contacts.ts, aeroleads-contacts.ts, cred-contacts.ts, cred-unverified-contacts.ts), provider-contact-result.ts, bulk-enrich-retry.ts, rank-emails-by-employer.ts, load-person-email-ranking-context.ts.
  • src/domain/email/usecase/validation/request-email-validation.ts β€” ZeroBounce validation (free when called from the waterfall).
  • src/domain/email/helpers/classify-email-validation-status.ts β€” tri-state + confidence-tier classification, catch-all heuristic.
  • src/domain/custom/usecase/field/update-contact-custom-fields.ts β€” dedup + persist to verified/unverifiedEmails.
  • src/domain/feature/helpers/get-default-email-enrichment-feature-ids.ts β€” resolves the default email chain from the seeded primaryEnrichmentEmail rows (no static fallback).
  • src/data/config/recommended-waterfall-config-data.ts β€” the catalog: PRIMARY_ENRICHMENT_EMAIL (4-row internal default) and SMART_ENRICH_EMAIL (7-row FE bundle).
  • src/data/seeds/006-recommended-waterfall-config.ts β€” seeds the catalog into RecommendedWaterfallConfig.
  • src/data/migrations/20260514130000_drop-rocketreach-from-primary-enrichment-email.ts (+ 20260514150500_fix-…) β€” removed RocketReach from the default chain.

Tickets

Ticket What
COM-32909 Verified β‰  CATCH_ALL stop predicate; past/off-employer demotion; data-source-priority ranking; verified/unverifiedEmails custom fields; dropped RocketReach from the default primaryEnrichmentEmail chain (now Cognism β†’ Apollo β†’ Lusha β†’ CRED)
BE-SEED-REORDER (D8) Reordered the FE primaryWorkEmail Smart Enrich bundle to the 7-row CRED β†’ Cognism β†’ Apollo β†’ Lusha β†’ Skrapp β†’ AnyMailFinder β†’ AeroLeads chain
COM-39583 ZeroBounce bundled free in the waterfall (chargeCredits: false)
COM-39604 Persisted tri-state confidenceTier for O(1) ranking
COM-39649 Catch-all-domain heuristic in classifyEmailValidationStatus
COM-40844 { contacts, warning? } partial-success propagation through the recursion and onto PubSub
COM-42483 verified/unverifiedPhones custom fields (phone parity)
COM-43136 Provider skip classification + bounded rate-limit retry wave
COM-43358 New INVALID confidence tier β€” ZeroBounce-confirmed-undeliverable emails ranked lowest (below UNVERIFIED) instead of folded into UNVERIFIED (merged 2026-06-05)
COM-43361 INVALID-tier emails excluded from primary-email eligibility (no bad-mailbox promotion)