Skip to content

Bulk Create Contacts and Cleanup

Scope

This page documents the non-import async bulk contact create and cleanup path in cred-api-commercial, with current iOS usage and possible future web usage, as observed on March 11, 2026.

Executive Summary

Bulk create contacts is a client-driven async job API for the UX of "the user pushes a button and a lot of contacts show up in CRED."

It is intentionally separate from the legacy Import pipeline:

  • It does not create an Import
  • It does not create ImportFields
  • It does not persist ImportRecords
  • It does not expose field mapping, reconcile, or comparison flows
  • It does not create import provenance such as importId or importRowNo

Instead, it starts async worker jobs that:

  • reuse CreateContactUseCase many times for creation
  • reuse DeleteContactUseCase for cleanup
  • expose state through pollable job queries

That makes it a good fit for iOS device-contact sync, local test cleanup, and possible future web bulk-create flows. It is not a replacement for file import, CRM import, webhook import, or Universal API import.

For the higher-level decision framework and local operator flow, see Contact Ingestion Overview.

Why This Exists

The current product needs two different capabilities that are easy to conflate:

Capability What it means
Bulk contact import Ingest external datasets through the Import framework with source-field mapping, row materialization, reconcile, and optional comparison
Bulk create contacts Accept many client-supplied contact payloads and create local contacts asynchronously, with enough status and tracking information for UX and testing

The new async bulk-create API exists to solve the second problem without dragging a client-supplied contact list through the much heavier import system.

Current GraphQL Surface

Start mutation

startBulkCreateContacts(input) starts the job.

Current input shape:

  • contacts: [InputCreateContact!]!
  • collectionId?: Int
  • idempotencyKey?: String
  • clientSource?: String
  • clientReferenceId?: String

Current output shape:

  • jobId
  • trackingKey
  • clientSource
  • clientReferenceId
  • status
  • totalCount
  • processedCount
  • successCount
  • failureCount
  • skippedCount

Job status query

bulkCreateContactsJob(jobId) returns the current job state.

Current output shape:

  • jobId
  • trackingKey
  • clientSource
  • clientReferenceId
  • status
  • totalCount
  • processedCount
  • successCount
  • failureCount
  • skippedCount
  • contactIds
  • errors

Delete start mutation

startBulkDeleteCreatedContacts(input) starts the cleanup job.

Current input shape:

  • bulkCreateJobId?: String
  • trackingKey?: String
  • contactIds?: [BigInt!]
  • dryRun?: Boolean

Current output shape:

  • jobId
  • bulkCreateJobId
  • trackingKey
  • dryRun
  • status
  • totalCount
  • processedCount
  • deletedCount
  • notFoundCount
  • forbiddenCount
  • failedCount

Delete job status query

bulkDeleteCreatedContactsJob(jobId) returns the current cleanup state.

Current output shape:

  • jobId
  • bulkCreateJobId
  • trackingKey
  • dryRun
  • status
  • totalCount
  • processedCount
  • deletedCount
  • notFoundCount
  • forbiddenCount
  • failedCount
  • candidateContactIds
  • deletedContactIds
  • errors

Job states

Current status values are:

  • QUEUED
  • PROCESSING
  • COMPLETED
  • PARTIAL
  • FAILED

Lifecycle

The current flow is:

  1. Client calls startBulkCreateContacts
  2. API validates input and enforces a maximum batch size of 5,000 contacts
  3. If collectionId is present, API verifies the collection exists in the caller's company and the caller can edit its items
  4. If idempotencyKey is present, API performs a best-effort cache lookup for an existing job owned by the same user
  5. API generates a jobId
  6. API derives a trackingKey
  7. API stores job state in cache
  8. API enqueues the BULK_CREATE_CONTACTS worker task
  9. Client polls bulkCreateContactsJob(jobId) for state changes
  10. Worker creates contacts, updates cached job state, and optionally adds the created contacts to a collection
sequenceDiagram
    participant Client
    participant API
    participant Cache
    participant Worker

    Client->>API: startBulkCreateContacts(input)
    API->>API: Validate input (max 5,000)
    opt collectionId provided
        API->>API: Verify collection exists and caller can edit items
    end
    opt idempotencyKey provided
        API->>Cache: Lookup idempotencyKey
        Cache-->>API: Existing jobId or cache miss
    end
    API->>API: Generate jobId and trackingKey
    API->>Cache: Store job state (TTL 24h)
    API->>Worker: Enqueue BULK_CREATE_CONTACTS
    API-->>Client: Return jobId and trackingKey

    loop Poll job state
        Client->>API: bulkCreateContactsJob(jobId)
        API->>Cache: Read job status by jobId
        Cache-->>API: Current status and counts
        API-->>Client: Job state
    end

    Worker->>Cache: Mark job PROCESSING
    Worker->>Worker: Create contacts
    opt collectionId provided
        Worker->>Worker: Add created contacts to collection
    end
    Worker->>Cache: Update final status, counts, and contactIds

Diagram: bulk-create lifecycle between Client, API, Cache, and Worker. The pollable bulkCreateContactsJob(jobId) query reads cached state while the BULK_CREATE_CONTACTS worker updates it.

Current cache TTL for job state and idempotency mapping is 24 hours.

Delete Lifecycle

The current delete flow is:

  1. Client calls startBulkDeleteCreatedContacts
  2. API requires either bulkCreateJobId or trackingKey
  3. If bulkCreateJobId is provided, API loads the cached bulk-create job and verifies ownership
  4. API derives or reuses trackingKey
  5. API resolves candidate contacts from:
  6. cached bulkCreateJob.contactIds
  7. persisted Contact.externalSource = trackingKey
  8. If contactIds are supplied, API intersects candidates with that subset
  9. API stores delete job state in cache
  10. API enqueues BULK_DELETE_CREATED_CONTACTS
  11. Client polls bulkDeleteCreatedContactsJob(jobId)
  12. Worker deletes candidates through DeleteContactUseCase

Current delete job cache TTL is also 24 hours.

Processing Behavior

Reused contact creation path

The worker reuses CreateContactUseCase for each prepared item.

That is important because it preserves the existing contact-create side effects, including:

  • normal contact validation
  • contact enrichment/matching entry points
  • collection-side behavior after contact creation
  • custom field writes triggered by the contact create path

This also means bulk create inherits the cost of the existing create pipeline. It is not a thin database insert loop.

In-job deduplication

Before creating contacts, the worker currently deduplicates contacts inside the submitted batch by normalized primary email:

  • email is trimmed
  • email is lowercased
  • first occurrence wins
  • later duplicates are counted as skipped
  • contacts without email are not deduplicated

This is only an in-request guard. It is not a general dedup strategy across existing contacts or across different jobs.

Important dedup caveat

This path should not be described as preserving the legacy findExisting() duplicate behavior.

The worker calls CreateContactUseCase, which uses the normal contact create flow in create mode. In that mode, the SaveContactUseCase.findExisting() path is intentionally bypassed. So the only explicit dedup behavior added by this bulk-create feature today is the pre-batch normalized-email skip described above.

Collection association

If collectionId is provided and contacts were created successfully, the worker tries to add those contacts to the target collection after the create loop finishes.

If collection association fails, the job can still end PARTIAL even though the contacts themselves were created.

Delete behavior

Delete uses the backend's normal contact delete semantics rather than a bespoke direct SQL cleanup path.

That means v1 cleanup is intentionally scoped to "do what backend contact deletion does today" instead of trying to prove a stronger cross-table purge contract.

Dry run support

Delete supports dryRun = true.

When enabled:

  • candidates are resolved
  • counts and candidate IDs are recorded
  • no contacts are deleted

This is the preferred first step for local testing and recovery work.

Observing Job State

Polling is the source of truth

For v1, clients should treat:

  • bulkCreateContactsJob(jobId)
  • bulkDeleteCreatedContactsJob(jobId)

as the authoritative job status APIs.

That path is coherent today:

  • client receives jobId from the start mutation
  • client can poll the query with that same jobId
  • query returns counts, IDs, errors, and status

Push progress is not fully correlated yet

The worker also publishes ProgressType.BULK_CREATE_CONTACTS events through the generic progressUpdated subscription.

However, the subscription payload only exposes a numeric entityId, while the bulk-create API returns a string jobId. The current implementation derives the progress entityId from jobId, but that derived value is not returned to the client as part of the GraphQL contract.

Practical result for create:

  • polling works
  • generic push progress exists
  • direct client correlation from jobId to progressUpdated is still underspecified

For current clients, polling should remain the source of truth.

Tracking for Testing and Cleanup

Client-provided metadata

The API now accepts two optional client metadata fields:

  • clientSource
  • clientReferenceId

Typical examples:

  • clientSource = "ios"
  • clientSource = "ios-sim"
  • clientSource = "web"
  • clientReferenceId = "<local-test-run-id>"
  • clientReferenceId = "<simulator-session-id>"

Server-generated tracking key

The server derives a compact trackingKey from:

  • jobId
  • sanitized clientSource
  • sanitized clientReferenceId

Current shape:

bcc:j:<jobId>:s:<source>:r:<reference>

The exact optional segments depend on which client metadata fields were supplied.

Contact marker

The worker stamps created contacts with:

  • Contact.externalSource = trackingKey

This is the current durable marker for test-created contacts.

Recovery after cache expiry

There is currently no "list all bulk jobs" GraphQL API.

Operationally, that means there are two recovery modes:

  • Recent jobs: inspect Redis cache keys and cached job payloads
  • Older create runs: inspect Contact.externalSource like 'bcc:j:%'

Bulk-create jobs are recoverable after cache expiry because the contact rows persist the tracking key. Bulk-delete jobs are not recoverable in the same durable way once their cache entries expire; after that, logs or contact existence checks are the main signal.

Why contactIds is not enough

The job result returns contactIds, but that list is not a complete cleanup handle.

CreateContactUseCase performs more work than a single row insert. A contact can be created in the database and then fail later in the create flow before the worker records the returned ID in the job state. In that case:

  • a real contact may exist
  • the job may report an error
  • contactIds may not include that contact

That is why trackingKey and Contact.externalSource are the safer audit and cleanup anchors.

For local or simulator-driven end-to-end testing:

  1. Send clientSource and clientReferenceId on every test run
  2. Persist both jobId and trackingKey in the test harness or logs
  3. Poll bulkCreateContactsJob(jobId) until completion
  4. For DB checks, query contacts by externalSource = trackingKey
  5. Use that same marker later for cleanup tooling

Current local operator flow

The validated local operator path is:

  • start the stack with ./start-local-federated-development.sh
  • use iOS against the local router for creation
  • use cred-platform query --env local for dry-run delete and real delete

Typical cleanup loop:

  1. bulk create from iOS
  2. keep bulkCreate jobId, trackingKey, and collectionId
  3. run startBulkDeleteCreatedContacts(dryRun: true)
  4. verify candidateContactIds
  5. run startBulkDeleteCreatedContacts(dryRun: false)
  6. poll until deletedCount = totalCount
  7. confirm the collection is empty on fresh iOS fetch

Cleanup Strategy

The current direction is intentionally conservative:

  • creation and tracking can ship first
  • destructive cleanup should land in a separate internal PR

That split is deliberate because accidental deletion is harder to recover from than accidental over-creation during test runs.

Current deletion boundary

The current cleanup job intentionally follows the same backend path the web app uses for contact deletion.

That is the right v1 boundary for testing and operator cleanup because it:

  • matches existing product behavior
  • avoids introducing a second destructive data-path
  • keeps cleanup reviewable as normal contact deletion semantics

It should not be described as a stronger or broader purge contract than the backend already provides.

Performance Reality

What this API improves

This API improves request/response UX:

  • client is no longer blocked on a long synchronous create loop
  • client can show progress and final status
  • retries can use job-based polling instead of repeating a single huge mutation loop

What it does not improve by itself

This API does not materially reduce total matching or creation time on its own.

The expensive path is still the existing contact create flow, including person matching and related downstream work. The current worker is still sequential after the initial in-batch email dedup step.

For large batches, that means runtime still scales roughly with the number of contacts that actually reach CreateContactUseCase.

Practical implication

If a few thousand contact creates still take many minutes, that is expected with the current architecture.

The async job changes client ergonomics more than raw throughput.

Likely optimization order

If throughput becomes the main problem, the likely next steps are:

  1. measure end-to-end runtime using the new job API
  2. confirm where time is being spent in the existing create path
  3. consider small, controlled worker concurrency only after validating model/API pressure
  4. if needed, add batch person resolution or batch matching deeper in the stack

The real throughput bottleneck is not the job wrapper. It is the matching and contact-create path behind each item.

Current Caveats

1. This is not import parity

This feature does not provide:

  • file upload
  • source-field mapping
  • ImportRecord persistence
  • reconcile
  • comparison
  • import provenance

2. Idempotency is best effort, not atomic

The current idempotency behavior is cache-based and does not make a single atomic claim on the key. Concurrent identical requests can still race.

3. Collection add can fail after contacts are created

Contact creation and collection association are separate steps. A job can end PARTIAL with valid created contacts if the collection add step fails.

4. Push progress is not yet a complete client contract

progressUpdated exists, but the API does not currently return the derived progress entityId that would let the client reliably correlate those events to the job it started.

5. Queue payloads still carry full contact input

The current worker task payload includes the submitted contacts. That is operationally simple, but it also means large batches carry larger queue payloads and the usual internal PII/logging concerns still apply.

6. There is no global job index API

Operators can recover recent jobs from Redis cache and older create runs from Contact.externalSource, but there is no dedicated "list jobs" API yet.

Key Code Paths

  • cred-api-commercial/src/graphql-api/contact/resolvers/contact-resolver.ts
  • cred-api-commercial/src/graphql-api/contact/inputs/bulk-delete-created-contacts-input.ts
  • cred-api-commercial/src/graphql-api/contact/types/type-bulk-delete-created-contacts.ts
  • cred-api-commercial/src/graphql-api/contact/inputs/bulk-create-contacts-input.ts
  • cred-api-commercial/src/graphql-api/contact/types/type-bulk-create-contacts.ts
  • cred-api-commercial/src/domain/contact/entity/bulk-create-contacts.ts
  • cred-api-commercial/src/domain/contact/entity/bulk-delete-created-contacts.ts
  • cred-api-commercial/src/domain/contact/usecase/start-bulk-create-contacts-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/get-bulk-create-contacts-job-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/process-bulk-create-contacts-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/start-bulk-delete-created-contacts-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/get-bulk-delete-created-contacts-job-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/process-bulk-delete-created-contacts-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/create-contact-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/delete-contact-usecase.ts
  • cred-api-commercial/src/domain/contact/usecase/save-contact/save-contact.ts
  • cred-api-commercial/src/domain/progress/entity/progress.ts
  • cred-api-commercial/src/domain/progress/service/progress-manager-service.ts
  • cred-api-commercial/src/worker/common.ts
  • cred-api-commercial/src/worker/queue/common-task-queue.ts

Quick Compare

Path Uses Import? Best fit Main strengths Main caveats
Async bulk create and cleanup contacts No iOS device contacts, client-driven contact creation, local test runs, operator cleanup Simple client contract, async UX, collection add, returned job state, dry-run delete, tracking marker for cleanup No import mapping/reconcile/provenance, push progress contract is incomplete, throughput still limited by existing create path, no global job index API
Bulk contact import Yes CSV/Excel/JSON import, CRM full import, webhook/Universal API ingestion, Polytomic Field mapping, ImportRecords, reconcile/comparison surfaces, import provenance Heavier system, different UX, slower to adapt for client-supplied contact lists