Bulk Create Contacts and Cleanup

Scope

This page documents the non-import async bulk contact create and cleanup path in cred-api-commercial, with current iOS usage and possible future web usage, as observed on March 11, 2026.

Executive Summary

Bulk create contacts is a client-driven async job API for the UX of "the user pushes a button and a lot of contacts show up in CRED."

It is intentionally separate from the legacy Import pipeline:

It does not create an Import
It does not create ImportFields
It does not persist ImportRecords
It does not expose field mapping, reconcile, or comparison flows
It does not create import provenance such as importId or importRowNo

Instead, it starts async worker jobs that:

reuse CreateContactUseCase many times for creation
reuse DeleteContactUseCase for cleanup
expose state through pollable job queries

That makes it a good fit for iOS device-contact sync, local test cleanup, and possible future web bulk-create flows. It is not a replacement for file import, CRM import, webhook import, or Universal API import.

For the higher-level decision framework and local operator flow, see Contact Ingestion Overview.

Why This Exists

The current product needs two different capabilities that are easy to conflate:

Capability	What it means
Bulk contact import	Ingest external datasets through the `Import` framework with source-field mapping, row materialization, reconcile, and optional comparison
Bulk create contacts	Accept many client-supplied contact payloads and create local contacts asynchronously, with enough status and tracking information for UX and testing

The new async bulk-create API exists to solve the second problem without dragging a client-supplied contact list through the much heavier import system.

Current GraphQL Surface

Start mutation

startBulkCreateContacts(input) starts the job.

Current input shape:

contacts: [InputCreateContact!]!
collectionId?: Int
idempotencyKey?: String
clientSource?: String
clientReferenceId?: String

Current output shape:

jobId
trackingKey
clientSource
clientReferenceId
status
totalCount
processedCount
successCount
failureCount
skippedCount

Job status query

bulkCreateContactsJob(jobId) returns the current job state.

Current output shape:

jobId
trackingKey
clientSource
clientReferenceId
status
totalCount
processedCount
successCount
failureCount
skippedCount
contactIds
errors

Delete start mutation

startBulkDeleteCreatedContacts(input) starts the cleanup job.

Current input shape:

bulkCreateJobId?: String
trackingKey?: String
contactIds?: [BigInt!]
dryRun?: Boolean

Current output shape:

jobId
bulkCreateJobId
trackingKey
dryRun
status
totalCount
processedCount
deletedCount
notFoundCount
forbiddenCount
failedCount

Delete job status query

bulkDeleteCreatedContactsJob(jobId) returns the current cleanup state.

Current output shape:

jobId
bulkCreateJobId
trackingKey
dryRun
status
totalCount
processedCount
deletedCount
notFoundCount
forbiddenCount
failedCount
candidateContactIds
deletedContactIds
errors

Job states

Current status values are:

QUEUED
PROCESSING
COMPLETED
PARTIAL
FAILED

Lifecycle

The current flow is:

Client calls startBulkCreateContacts
API validates input and enforces a maximum batch size of 5,000 contacts
If collectionId is present, API verifies the collection exists in the caller's company and the caller can edit its items
If idempotencyKey is present, API performs a best-effort cache lookup for an existing job owned by the same user
API generates a jobId
API derives a trackingKey
API stores job state in cache
API enqueues the BULK_CREATE_CONTACTS worker task
Client polls bulkCreateContactsJob(jobId) for state changes
Worker creates contacts, updates cached job state, and optionally adds the created contacts to a collection

sequenceDiagram
    participant Client
    participant API
    participant Cache
    participant Worker

    Client->>API: startBulkCreateContacts(input)
    API->>API: Validate input (max 5,000)
    opt collectionId provided
        API->>API: Verify collection exists and caller can edit items
    end
    opt idempotencyKey provided
        API->>Cache: Lookup idempotencyKey
        Cache-->>API: Existing jobId or cache miss
    end
    API->>API: Generate jobId and trackingKey
    API->>Cache: Store job state (TTL 24h)
    API->>Worker: Enqueue BULK_CREATE_CONTACTS
    API-->>Client: Return jobId and trackingKey

    loop Poll job state
        Client->>API: bulkCreateContactsJob(jobId)
        API->>Cache: Read job status by jobId
        Cache-->>API: Current status and counts
        API-->>Client: Job state
    end

    Worker->>Cache: Mark job PROCESSING
    Worker->>Worker: Create contacts
    opt collectionId provided
        Worker->>Worker: Add created contacts to collection
    end
    Worker->>Cache: Update final status, counts, and contactIds

Diagram: bulk-create lifecycle between Client, API, Cache, and Worker. The pollable bulkCreateContactsJob(jobId) query reads cached state while the BULK_CREATE_CONTACTS worker updates it.

Current cache TTL for job state and idempotency mapping is 24 hours.

Delete Lifecycle

The current delete flow is:

Client calls startBulkDeleteCreatedContacts
API requires either bulkCreateJobId or trackingKey
If bulkCreateJobId is provided, API loads the cached bulk-create job and verifies ownership
API derives or reuses trackingKey
API resolves candidate contacts from:
cached bulkCreateJob.contactIds
persisted Contact.externalSource = trackingKey
If contactIds are supplied, API intersects candidates with that subset
API stores delete job state in cache
API enqueues BULK_DELETE_CREATED_CONTACTS
Client polls bulkDeleteCreatedContactsJob(jobId)
Worker deletes candidates through DeleteContactUseCase

Current delete job cache TTL is also 24 hours.

Processing Behavior

Reused contact creation path

The worker reuses CreateContactUseCase for each prepared item.

That is important because it preserves the existing contact-create side effects, including:

normal contact validation
contact enrichment/matching entry points
collection-side behavior after contact creation
custom field writes triggered by the contact create path

This also means bulk create inherits the cost of the existing create pipeline. It is not a thin database insert loop.

In-job deduplication

Before creating contacts, the worker currently deduplicates contacts inside the submitted batch by normalized primary email:

email is trimmed
email is lowercased
first occurrence wins
later duplicates are counted as skipped
contacts without email are not deduplicated

This is only an in-request guard. It is not a general dedup strategy across existing contacts or across different jobs.

Important dedup caveat

This path should not be described as preserving the legacy findExisting() duplicate behavior.

The worker calls CreateContactUseCase, which uses the normal contact create flow in create mode. In that mode, the SaveContactUseCase.findExisting() path is intentionally bypassed. So the only explicit dedup behavior added by this bulk-create feature today is the pre-batch normalized-email skip described above.

Collection association

If collectionId is provided and contacts were created successfully, the worker tries to add those contacts to the target collection after the create loop finishes.

If collection association fails, the job can still end PARTIAL even though the contacts themselves were created.

Delete behavior

Delete uses the backend's normal contact delete semantics rather than a bespoke direct SQL cleanup path.

That means v1 cleanup is intentionally scoped to "do what backend contact deletion does today" instead of trying to prove a stronger cross-table purge contract.

Dry run support

Delete supports dryRun = true.

When enabled:

candidates are resolved
counts and candidate IDs are recorded
no contacts are deleted

This is the preferred first step for local testing and recovery work.

Observing Job State

Polling is the source of truth

For v1, clients should treat:

bulkCreateContactsJob(jobId)
bulkDeleteCreatedContactsJob(jobId)

as the authoritative job status APIs.

That path is coherent today:

client receives jobId from the start mutation
client can poll the query with that same jobId
query returns counts, IDs, errors, and status

Push progress is not fully correlated yet

The worker also publishes ProgressType.BULK_CREATE_CONTACTS events through the generic progressUpdated subscription.

However, the subscription payload only exposes a numeric entityId, while the bulk-create API returns a string jobId. The current implementation derives the progress entityId from jobId, but that derived value is not returned to the client as part of the GraphQL contract.

Practical result for create:

polling works
generic push progress exists
direct client correlation from jobId to progressUpdated is still underspecified

For current clients, polling should remain the source of truth.

Tracking for Testing and Cleanup

Client-provided metadata

The API now accepts two optional client metadata fields:

clientSource
clientReferenceId

Typical examples:

clientSource = "ios"
clientSource = "ios-sim"
clientSource = "web"
clientReferenceId = "<local-test-run-id>"
clientReferenceId = "<simulator-session-id>"

Server-generated tracking key

The server derives a compact trackingKey from:

jobId
sanitized clientSource
sanitized clientReferenceId

Current shape:

bcc:j:<jobId>:s:<source>:r:<reference>

The exact optional segments depend on which client metadata fields were supplied.

Contact marker

The worker stamps created contacts with:

Contact.externalSource = trackingKey

This is the current durable marker for test-created contacts.

Recovery after cache expiry

There is currently no "list all bulk jobs" GraphQL API.

Operationally, that means there are two recovery modes:

Recent jobs: inspect Redis cache keys and cached job payloads
Older create runs: inspect Contact.externalSource like 'bcc:j:%'

Bulk-create jobs are recoverable after cache expiry because the contact rows persist the tracking key. Bulk-delete jobs are not recoverable in the same durable way once their cache entries expire; after that, logs or contact existence checks are the main signal.

Why `contactIds` is not enough

The job result returns contactIds, but that list is not a complete cleanup handle.

CreateContactUseCase performs more work than a single row insert. A contact can be created in the database and then fail later in the create flow before the worker records the returned ID in the job state. In that case:

a real contact may exist
the job may report an error
contactIds may not include that contact

That is why trackingKey and Contact.externalSource are the safer audit and cleanup anchors.

Recommended test workflow

For local or simulator-driven end-to-end testing:

Send clientSource and clientReferenceId on every test run
Persist both jobId and trackingKey in the test harness or logs
Poll bulkCreateContactsJob(jobId) until completion
For DB checks, query contacts by externalSource = trackingKey
Use that same marker later for cleanup tooling

Current local operator flow

The validated local operator path is:

start the stack with ./start-local-federated-development.sh
use iOS against the local router for creation
use cred-platform query --env local for dry-run delete and real delete

Typical cleanup loop:

bulk create from iOS
keep bulkCreate jobId, trackingKey, and collectionId
run startBulkDeleteCreatedContacts(dryRun: true)
verify candidateContactIds
run startBulkDeleteCreatedContacts(dryRun: false)
poll until deletedCount = totalCount
confirm the collection is empty on fresh iOS fetch

Cleanup Strategy

The current direction is intentionally conservative:

creation and tracking can ship first
destructive cleanup should land in a separate internal PR

That split is deliberate because accidental deletion is harder to recover from than accidental over-creation during test runs.

Current deletion boundary

The current cleanup job intentionally follows the same backend path the web app uses for contact deletion.

That is the right v1 boundary for testing and operator cleanup because it:

matches existing product behavior
avoids introducing a second destructive data-path
keeps cleanup reviewable as normal contact deletion semantics

It should not be described as a stronger or broader purge contract than the backend already provides.

Performance Reality

What this API improves

This API improves request/response UX:

client is no longer blocked on a long synchronous create loop
client can show progress and final status
retries can use job-based polling instead of repeating a single huge mutation loop

What it does not improve by itself

This API does not materially reduce total matching or creation time on its own.

The expensive path is still the existing contact create flow, including person matching and related downstream work. The current worker is still sequential after the initial in-batch email dedup step.

For large batches, that means runtime still scales roughly with the number of contacts that actually reach CreateContactUseCase.

Practical implication

If a few thousand contact creates still take many minutes, that is expected with the current architecture.

The async job changes client ergonomics more than raw throughput.

Likely optimization order

If throughput becomes the main problem, the likely next steps are:

measure end-to-end runtime using the new job API
confirm where time is being spent in the existing create path
consider small, controlled worker concurrency only after validating model/API pressure
if needed, add batch person resolution or batch matching deeper in the stack

The real throughput bottleneck is not the job wrapper. It is the matching and contact-create path behind each item.

Current Caveats

1. This is not import parity

This feature does not provide:

file upload
source-field mapping
ImportRecord persistence
reconcile
comparison
import provenance

2. Idempotency is best effort, not atomic

The current idempotency behavior is cache-based and does not make a single atomic claim on the key. Concurrent identical requests can still race.

3. Collection add can fail after contacts are created

Contact creation and collection association are separate steps. A job can end PARTIAL with valid created contacts if the collection add step fails.

4. Push progress is not yet a complete client contract

progressUpdated exists, but the API does not currently return the derived progress entityId that would let the client reliably correlate those events to the job it started.

5. Queue payloads still carry full contact input

The current worker task payload includes the submitted contacts. That is operationally simple, but it also means large batches carry larger queue payloads and the usual internal PII/logging concerns still apply.

6. There is no global job index API

Operators can recover recent jobs from Redis cache and older create runs from Contact.externalSource, but there is no dedicated "list jobs" API yet.

Key Code Paths

cred-api-commercial/src/graphql-api/contact/resolvers/contact-resolver.ts
cred-api-commercial/src/graphql-api/contact/inputs/bulk-delete-created-contacts-input.ts
cred-api-commercial/src/graphql-api/contact/types/type-bulk-delete-created-contacts.ts
cred-api-commercial/src/graphql-api/contact/inputs/bulk-create-contacts-input.ts
cred-api-commercial/src/graphql-api/contact/types/type-bulk-create-contacts.ts
cred-api-commercial/src/domain/contact/entity/bulk-create-contacts.ts
cred-api-commercial/src/domain/contact/entity/bulk-delete-created-contacts.ts
cred-api-commercial/src/domain/contact/usecase/start-bulk-create-contacts-usecase.ts
cred-api-commercial/src/domain/contact/usecase/get-bulk-create-contacts-job-usecase.ts
cred-api-commercial/src/domain/contact/usecase/process-bulk-create-contacts-usecase.ts
cred-api-commercial/src/domain/contact/usecase/start-bulk-delete-created-contacts-usecase.ts
cred-api-commercial/src/domain/contact/usecase/get-bulk-delete-created-contacts-job-usecase.ts
cred-api-commercial/src/domain/contact/usecase/process-bulk-delete-created-contacts-usecase.ts
cred-api-commercial/src/domain/contact/usecase/create-contact-usecase.ts
cred-api-commercial/src/domain/contact/usecase/delete-contact-usecase.ts
cred-api-commercial/src/domain/contact/usecase/save-contact/save-contact.ts
cred-api-commercial/src/domain/progress/entity/progress.ts
cred-api-commercial/src/domain/progress/service/progress-manager-service.ts
cred-api-commercial/src/worker/common.ts
cred-api-commercial/src/worker/queue/common-task-queue.ts

Quick Compare

Path	Uses `Import`?	Best fit	Main strengths	Main caveats
Async bulk create and cleanup contacts	No	iOS device contacts, client-driven contact creation, local test runs, operator cleanup	Simple client contract, async UX, collection add, returned job state, dry-run delete, tracking marker for cleanup	No import mapping/reconcile/provenance, push progress contract is incomplete, throughput still limited by existing create path, no global job index API
Bulk contact import	Yes	CSV/Excel/JSON import, CRM full import, webhook/Universal API ingestion, Polytomic	Field mapping, `ImportRecord`s, reconcile/comparison surfaces, import provenance	Heavier system, different UX, slower to adapt for client-supplied contact lists

Bulk Create Contacts and Cleanup

Executive Summary

Why This Exists

Current GraphQL Surface

Start mutation

Job status query

Delete start mutation

Delete job status query

Job states

Lifecycle

Delete Lifecycle

Processing Behavior

Reused contact creation path

In-job deduplication

Important dedup caveat

Collection association

Delete behavior

Dry run support

Observing Job State

Polling is the source of truth

Push progress is not fully correlated yet

Tracking for Testing and Cleanup

Client-provided metadata

Server-generated tracking key

Contact marker

Recovery after cache expiry

Why contactIds is not enough

Recommended test workflow

Current local operator flow

Cleanup Strategy

Current deletion boundary

Performance Reality

What this API improves

What it does not improve by itself

Practical implication

Likely optimization order

Current Caveats

1. This is not import parity

2. Idempotency is best effort, not atomic

3. Collection add can fail after contacts are created

4. Push progress is not yet a complete client contract

5. Queue payloads still carry full contact input

6. There is no global job index API

Key Code Paths

Quick Compare

Why `contactIds` is not enough