Bulk Create Contacts and Cleanup
Scope
This page documents the non-import async bulk contact create and cleanup path in cred-api-commercial, with current iOS usage and possible future web usage, as observed on March 11, 2026.
Executive Summary
Bulk create contacts is a client-driven async job API for the UX of "the user pushes a button and a lot of contacts show up in CRED."
It is intentionally separate from the legacy Import pipeline:
- It does not create an
Import - It does not create
ImportFields - It does not persist
ImportRecords - It does not expose field mapping, reconcile, or comparison flows
- It does not create import provenance such as
importIdorimportRowNo
Instead, it starts async worker jobs that:
- reuse
CreateContactUseCasemany times for creation - reuse
DeleteContactUseCasefor cleanup - expose state through pollable job queries
That makes it a good fit for iOS device-contact sync, local test cleanup, and possible future web bulk-create flows. It is not a replacement for file import, CRM import, webhook import, or Universal API import.
For the higher-level decision framework and local operator flow, see Contact Ingestion Overview.
Why This Exists
The current product needs two different capabilities that are easy to conflate:
| Capability | What it means |
|---|---|
| Bulk contact import | Ingest external datasets through the Import framework with source-field mapping, row materialization, reconcile, and optional comparison |
| Bulk create contacts | Accept many client-supplied contact payloads and create local contacts asynchronously, with enough status and tracking information for UX and testing |
The new async bulk-create API exists to solve the second problem without dragging a client-supplied contact list through the much heavier import system.
Current GraphQL Surface
Start mutation
startBulkCreateContacts(input) starts the job.
Current input shape:
contacts: [InputCreateContact!]!collectionId?: IntidempotencyKey?: StringclientSource?: StringclientReferenceId?: String
Current output shape:
jobIdtrackingKeyclientSourceclientReferenceIdstatustotalCountprocessedCountsuccessCountfailureCountskippedCount
Job status query
bulkCreateContactsJob(jobId) returns the current job state.
Current output shape:
jobIdtrackingKeyclientSourceclientReferenceIdstatustotalCountprocessedCountsuccessCountfailureCountskippedCountcontactIdserrors
Delete start mutation
startBulkDeleteCreatedContacts(input) starts the cleanup job.
Current input shape:
bulkCreateJobId?: StringtrackingKey?: StringcontactIds?: [BigInt!]dryRun?: Boolean
Current output shape:
jobIdbulkCreateJobIdtrackingKeydryRunstatustotalCountprocessedCountdeletedCountnotFoundCountforbiddenCountfailedCount
Delete job status query
bulkDeleteCreatedContactsJob(jobId) returns the current cleanup state.
Current output shape:
jobIdbulkCreateJobIdtrackingKeydryRunstatustotalCountprocessedCountdeletedCountnotFoundCountforbiddenCountfailedCountcandidateContactIdsdeletedContactIdserrors
Job states
Current status values are:
QUEUEDPROCESSINGCOMPLETEDPARTIALFAILED
Lifecycle
The current flow is:
- Client calls
startBulkCreateContacts - API validates input and enforces a maximum batch size of 5,000 contacts
- If
collectionIdis present, API verifies the collection exists in the caller's company and the caller can edit its items - If
idempotencyKeyis present, API performs a best-effort cache lookup for an existing job owned by the same user - API generates a
jobId - API derives a
trackingKey - API stores job state in cache
- API enqueues the
BULK_CREATE_CONTACTSworker task - Client polls
bulkCreateContactsJob(jobId)for state changes - Worker creates contacts, updates cached job state, and optionally adds the created contacts to a collection
sequenceDiagram
participant Client
participant API
participant Cache
participant Worker
Client->>API: startBulkCreateContacts(input)
API->>API: Validate input (max 5,000)
opt collectionId provided
API->>API: Verify collection exists and caller can edit items
end
opt idempotencyKey provided
API->>Cache: Lookup idempotencyKey
Cache-->>API: Existing jobId or cache miss
end
API->>API: Generate jobId and trackingKey
API->>Cache: Store job state (TTL 24h)
API->>Worker: Enqueue BULK_CREATE_CONTACTS
API-->>Client: Return jobId and trackingKey
loop Poll job state
Client->>API: bulkCreateContactsJob(jobId)
API->>Cache: Read job status by jobId
Cache-->>API: Current status and counts
API-->>Client: Job state
end
Worker->>Cache: Mark job PROCESSING
Worker->>Worker: Create contacts
opt collectionId provided
Worker->>Worker: Add created contacts to collection
end
Worker->>Cache: Update final status, counts, and contactIds
Diagram: bulk-create lifecycle between Client, API, Cache, and Worker. The pollable bulkCreateContactsJob(jobId) query reads cached state while the BULK_CREATE_CONTACTS worker updates it.
Current cache TTL for job state and idempotency mapping is 24 hours.
Delete Lifecycle
The current delete flow is:
- Client calls
startBulkDeleteCreatedContacts - API requires either
bulkCreateJobIdortrackingKey - If
bulkCreateJobIdis provided, API loads the cached bulk-create job and verifies ownership - API derives or reuses
trackingKey - API resolves candidate contacts from:
- cached
bulkCreateJob.contactIds - persisted
Contact.externalSource = trackingKey - If
contactIdsare supplied, API intersects candidates with that subset - API stores delete job state in cache
- API enqueues
BULK_DELETE_CREATED_CONTACTS - Client polls
bulkDeleteCreatedContactsJob(jobId) - Worker deletes candidates through
DeleteContactUseCase
Current delete job cache TTL is also 24 hours.
Processing Behavior
Reused contact creation path
The worker reuses CreateContactUseCase for each prepared item.
That is important because it preserves the existing contact-create side effects, including:
- normal contact validation
- contact enrichment/matching entry points
- collection-side behavior after contact creation
- custom field writes triggered by the contact create path
This also means bulk create inherits the cost of the existing create pipeline. It is not a thin database insert loop.
In-job deduplication
Before creating contacts, the worker currently deduplicates contacts inside the submitted batch by normalized primary email:
- email is trimmed
- email is lowercased
- first occurrence wins
- later duplicates are counted as
skipped - contacts without email are not deduplicated
This is only an in-request guard. It is not a general dedup strategy across existing contacts or across different jobs.
Important dedup caveat
This path should not be described as preserving the legacy findExisting() duplicate behavior.
The worker calls CreateContactUseCase, which uses the normal contact create flow in create mode. In that mode, the SaveContactUseCase.findExisting() path is intentionally bypassed. So the only explicit dedup behavior added by this bulk-create feature today is the pre-batch normalized-email skip described above.
Collection association
If collectionId is provided and contacts were created successfully, the worker tries to add those contacts to the target collection after the create loop finishes.
If collection association fails, the job can still end PARTIAL even though the contacts themselves were created.
Delete behavior
Delete uses the backend's normal contact delete semantics rather than a bespoke direct SQL cleanup path.
That means v1 cleanup is intentionally scoped to "do what backend contact deletion does today" instead of trying to prove a stronger cross-table purge contract.
Dry run support
Delete supports dryRun = true.
When enabled:
- candidates are resolved
- counts and candidate IDs are recorded
- no contacts are deleted
This is the preferred first step for local testing and recovery work.
Observing Job State
Polling is the source of truth
For v1, clients should treat:
bulkCreateContactsJob(jobId)bulkDeleteCreatedContactsJob(jobId)
as the authoritative job status APIs.
That path is coherent today:
- client receives
jobIdfrom the start mutation - client can poll the query with that same
jobId - query returns counts, IDs, errors, and status
Push progress is not fully correlated yet
The worker also publishes ProgressType.BULK_CREATE_CONTACTS events through the generic progressUpdated subscription.
However, the subscription payload only exposes a numeric entityId, while the bulk-create API returns a string jobId. The current implementation derives the progress entityId from jobId, but that derived value is not returned to the client as part of the GraphQL contract.
Practical result for create:
- polling works
- generic push progress exists
- direct client correlation from
jobIdtoprogressUpdatedis still underspecified
For current clients, polling should remain the source of truth.
Tracking for Testing and Cleanup
Client-provided metadata
The API now accepts two optional client metadata fields:
clientSourceclientReferenceId
Typical examples:
clientSource = "ios"clientSource = "ios-sim"clientSource = "web"clientReferenceId = "<local-test-run-id>"clientReferenceId = "<simulator-session-id>"
Server-generated tracking key
The server derives a compact trackingKey from:
jobId- sanitized
clientSource - sanitized
clientReferenceId
Current shape:
bcc:j:<jobId>:s:<source>:r:<reference>
The exact optional segments depend on which client metadata fields were supplied.
Contact marker
The worker stamps created contacts with:
Contact.externalSource = trackingKey
This is the current durable marker for test-created contacts.
Recovery after cache expiry
There is currently no "list all bulk jobs" GraphQL API.
Operationally, that means there are two recovery modes:
- Recent jobs: inspect Redis cache keys and cached job payloads
- Older create runs: inspect
Contact.externalSource like 'bcc:j:%'
Bulk-create jobs are recoverable after cache expiry because the contact rows persist the tracking key. Bulk-delete jobs are not recoverable in the same durable way once their cache entries expire; after that, logs or contact existence checks are the main signal.
Why contactIds is not enough
The job result returns contactIds, but that list is not a complete cleanup handle.
CreateContactUseCase performs more work than a single row insert. A contact can be created in the database and then fail later in the create flow before the worker records the returned ID in the job state. In that case:
- a real contact may exist
- the job may report an error
contactIdsmay not include that contact
That is why trackingKey and Contact.externalSource are the safer audit and cleanup anchors.
Recommended test workflow
For local or simulator-driven end-to-end testing:
- Send
clientSourceandclientReferenceIdon every test run - Persist both
jobIdandtrackingKeyin the test harness or logs - Poll
bulkCreateContactsJob(jobId)until completion - For DB checks, query contacts by
externalSource = trackingKey - Use that same marker later for cleanup tooling
Current local operator flow
The validated local operator path is:
- start the stack with
./start-local-federated-development.sh - use iOS against the local router for creation
- use
cred-platform query --env localfor dry-run delete and real delete
Typical cleanup loop:
- bulk create from iOS
- keep
bulkCreate jobId,trackingKey, andcollectionId - run
startBulkDeleteCreatedContacts(dryRun: true) - verify
candidateContactIds - run
startBulkDeleteCreatedContacts(dryRun: false) - poll until
deletedCount = totalCount - confirm the collection is empty on fresh iOS fetch
Cleanup Strategy
The current direction is intentionally conservative:
- creation and tracking can ship first
- destructive cleanup should land in a separate internal PR
That split is deliberate because accidental deletion is harder to recover from than accidental over-creation during test runs.
Current deletion boundary
The current cleanup job intentionally follows the same backend path the web app uses for contact deletion.
That is the right v1 boundary for testing and operator cleanup because it:
- matches existing product behavior
- avoids introducing a second destructive data-path
- keeps cleanup reviewable as normal contact deletion semantics
It should not be described as a stronger or broader purge contract than the backend already provides.
Performance Reality
What this API improves
This API improves request/response UX:
- client is no longer blocked on a long synchronous create loop
- client can show progress and final status
- retries can use job-based polling instead of repeating a single huge mutation loop
What it does not improve by itself
This API does not materially reduce total matching or creation time on its own.
The expensive path is still the existing contact create flow, including person matching and related downstream work. The current worker is still sequential after the initial in-batch email dedup step.
For large batches, that means runtime still scales roughly with the number of contacts that actually reach CreateContactUseCase.
Practical implication
If a few thousand contact creates still take many minutes, that is expected with the current architecture.
The async job changes client ergonomics more than raw throughput.
Likely optimization order
If throughput becomes the main problem, the likely next steps are:
- measure end-to-end runtime using the new job API
- confirm where time is being spent in the existing create path
- consider small, controlled worker concurrency only after validating model/API pressure
- if needed, add batch person resolution or batch matching deeper in the stack
The real throughput bottleneck is not the job wrapper. It is the matching and contact-create path behind each item.
Current Caveats
1. This is not import parity
This feature does not provide:
- file upload
- source-field mapping
ImportRecordpersistence- reconcile
- comparison
- import provenance
2. Idempotency is best effort, not atomic
The current idempotency behavior is cache-based and does not make a single atomic claim on the key. Concurrent identical requests can still race.
3. Collection add can fail after contacts are created
Contact creation and collection association are separate steps. A job can end PARTIAL with valid created contacts if the collection add step fails.
4. Push progress is not yet a complete client contract
progressUpdated exists, but the API does not currently return the derived progress entityId that would let the client reliably correlate those events to the job it started.
5. Queue payloads still carry full contact input
The current worker task payload includes the submitted contacts. That is operationally simple, but it also means large batches carry larger queue payloads and the usual internal PII/logging concerns still apply.
6. There is no global job index API
Operators can recover recent jobs from Redis cache and older create runs from Contact.externalSource, but there is no dedicated "list jobs" API yet.
Key Code Paths
cred-api-commercial/src/graphql-api/contact/resolvers/contact-resolver.tscred-api-commercial/src/graphql-api/contact/inputs/bulk-delete-created-contacts-input.tscred-api-commercial/src/graphql-api/contact/types/type-bulk-delete-created-contacts.tscred-api-commercial/src/graphql-api/contact/inputs/bulk-create-contacts-input.tscred-api-commercial/src/graphql-api/contact/types/type-bulk-create-contacts.tscred-api-commercial/src/domain/contact/entity/bulk-create-contacts.tscred-api-commercial/src/domain/contact/entity/bulk-delete-created-contacts.tscred-api-commercial/src/domain/contact/usecase/start-bulk-create-contacts-usecase.tscred-api-commercial/src/domain/contact/usecase/get-bulk-create-contacts-job-usecase.tscred-api-commercial/src/domain/contact/usecase/process-bulk-create-contacts-usecase.tscred-api-commercial/src/domain/contact/usecase/start-bulk-delete-created-contacts-usecase.tscred-api-commercial/src/domain/contact/usecase/get-bulk-delete-created-contacts-job-usecase.tscred-api-commercial/src/domain/contact/usecase/process-bulk-delete-created-contacts-usecase.tscred-api-commercial/src/domain/contact/usecase/create-contact-usecase.tscred-api-commercial/src/domain/contact/usecase/delete-contact-usecase.tscred-api-commercial/src/domain/contact/usecase/save-contact/save-contact.tscred-api-commercial/src/domain/progress/entity/progress.tscred-api-commercial/src/domain/progress/service/progress-manager-service.tscred-api-commercial/src/worker/common.tscred-api-commercial/src/worker/queue/common-task-queue.ts
Quick Compare
| Path | Uses Import? |
Best fit | Main strengths | Main caveats |
|---|---|---|---|---|
| Async bulk create and cleanup contacts | No | iOS device contacts, client-driven contact creation, local test runs, operator cleanup | Simple client contract, async UX, collection add, returned job state, dry-run delete, tracking marker for cleanup | No import mapping/reconcile/provenance, push progress contract is incomplete, throughput still limited by existing create path, no global job index API |
| Bulk contact import | Yes | CSV/Excel/JSON import, CRM full import, webhook/Universal API ingestion, Polytomic | Field mapping, ImportRecords, reconcile/comparison surfaces, import provenance |
Heavier system, different UX, slower to adapt for client-supplied contact lists |