API Reference
Endpoints
POST /batch-trigger/
Receives batches of LinkedIn profiles from Pub/Sub and triggers BrightData scraping.
Service: priority-pipeline-api
Endpoint: POST /batch-trigger/
Request Body
Pub/Sub push format:
{
"message": {
"data": "base64_encoded_batch_data",
"attributes": {
"batch_number": "1",
"total_batches": "50"
}
}
}
Batch Data Format (after base64 decode):
{
"profiles": [
{
"linkedin_username": "john-doe-123456",
"person_id": "uuid-here"
}
]
}
Response
{
"status": "success",
"profiles_triggered": 20,
"batch_number": 1
}
POST /webhook/linkedin
Receives scraped profile data from BrightData.
Service: priority-pipeline-webhook
Endpoint: POST /webhook/linkedin
Request Body
BrightData webhook format:
{
"url": "https://linkedin.com/in/john-doe-123456",
"data": {
"profile": {
"name": "John Doe",
"headline": "CEO at Company",
"location": "San Francisco, CA"
}
}
}
Response
{
"status": "success",
"saved_to_bigquery": true,
"backed_up_to_gcs": true
}
GET /health
Health check endpoint for both services.
Endpoints:
- GET /health (API Service)
- GET /health (Webhook Service)
Response
{
"status": "healthy",
"service": "priority-pipeline-api",
"environment": "production"
}
Manual Execution
Trigger Coordinator Job
Execute the monthly coordinator job manually:
gcloud run jobs execute priority-pipeline-coordinator --region=us-central1
Check Profile Scrape Status
Query BigQuery to check if a profile has been scraped:
SELECT
requestResource,
requestDate,
requestStatus
FROM `cred-1556636033881.linkedin.LinkedinApiCall`
WHERE requestType = 'BRIGHTDATA_API_PERSON'
AND requestResource = 'john-doe-123456'
ORDER BY requestDate DESC
LIMIT 1;
View Recent Scrapes
SELECT
requestResource,
requestDate,
requestStatus,
COUNT(*) as scrape_count
FROM `cred-1556636033881.linkedin.LinkedinApiCall`
WHERE requestType = 'BRIGHTDATA_API_PERSON'
AND requestDate >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY requestResource, requestDate, requestStatus
ORDER BY requestDate DESC
LIMIT 100;
BigQuery Tables
LinkedinApiCall
Stores all scraped profile data.
Table: linkedin.LinkedinApiCall
Key Columns:
- requestResource - LinkedIn username
- requestType - Always 'BRIGHTDATA_API_PERSON' for this pipeline
- requestDate - When profile was scraped
- requestStatus - Success/failure status
- responseData - JSON with scraped profile data
Source Tables
PersonFields (credentity.PersonFields)
- Source for priority profiles
- Filter: isPriority = TRUE
PersonIdentifier (credmodel_google.PersonIdentifier)
- Contains LinkedIn usernames
- Filter: identifierType = 'LINKEDIN'
GCS Backup
All scraped data is backed up to Google Cloud Storage for disaster recovery.
Bucket: brightdata-monthly-priority-people
File Format: {linkedin_username}_{timestamp}.json
Example:
brightdata-monthly-priority-people/
john-doe-123456_20250119120000.json
jane-smith-789012_20250119120030.json
Pub/Sub Topic
Topic: linkedin-scraping-batches
Purpose: Coordinator job publishes batches here; API service subscribes
Message Format:
{
"profiles": [
{"linkedin_username": "user1", "person_id": "id1"},
{"linkedin_username": "user2", "person_id": "id2"}
]
}