Deployment Guide

Overview

The Priority Pipeline is deployed as Google Cloud Run services and jobs with automatic deployments via GitHub Actions.

Repository: credinvest/social_media_scrapers

Deployment Workflow: .github/workflows/deploy-priority-pipeline.yml

Cloud Run Services

Services and Jobs

Name	Type	Purpose
`priority-pipeline-api`	Cloud Run Service	Receives Pub/Sub batches, triggers BrightData
`priority-pipeline-webhook`	Cloud Run Service	Receives scraped data from BrightData
`priority-pipeline-coordinator`	Cloud Run Job	Monthly profile batch coordination

Deployment Architecture

Environments

Environment	Deployment Trigger	Auto-Deploy
Staging	Manual workflow dispatch	No
Production	Push to `main`/`master` (when `src/priority_pipeline/**` changes)	Yes

Deployment Process

Automatic Deployment (Production)

Push to main or master branches
Only triggers when files in src/priority_pipeline/** are changed

Manual Deployment (Staging or Production)

Go to GitHub: credinvest/social_media_scrapers > Actions
Select "Deploy Priority Pipeline" workflow
Click "Run workflow"
Select environment: staging or production
Click "Run workflow"

Deployment Order

Webhook service deploys first (generates URL)
API service deploys (uses webhook URL)
Coordinator job deploys

Environment Variables

API Service

Variable	Description	Value
`BRIGHTDATA`	BrightData API token	Secret
`GCP_PROJECT_ID`	Google Cloud project ID	`cred-1556636033881`
`WEBHOOK_URL`	Webhook service URL	Auto-generated
`ENVIRONMENT`	Environment name	`staging` / `production`

Webhook Service

Variable	Description	Value
`GCP_PROJECT_ID`	Google Cloud project ID	`cred-1556636033881`
`ENVIRONMENT`	Environment name	`staging` / `production`

Coordinator Job

Variable	Description	Value
`BRIGHTDATA`	BrightData API token	Secret
`GCP_PROJECT_ID`	Google Cloud project ID	`cred-1556636033881`
`PYTHONPATH`	Python path	`/app`
`ENVIRONMENT`	Environment name	`staging` / `production`

Service Configuration

Service	Max Instances	Concurrency	CPU	Memory	Timeout
API	1000	1	1	2Gi	3600s
Webhook	5000	1	1	2Gi	3600s
Coordinator Job	-	-	2	8Gi	168h

Post-Deployment Verification

Health Check

# Get URLs
API_URL=$(gcloud run services describe priority-pipeline-api --region=us-central1 --format='value(status.url)')
WEBHOOK_URL=$(gcloud run services describe priority-pipeline-webhook --region=us-central1 --format='value(status.url)')

# Test
curl $API_URL/health
curl $WEBHOOK_URL/health

Test Coordinator Job

gcloud run jobs execute priority-pipeline-coordinator --region=us-central1

Check Recent Data

SELECT 
  COUNT(*) as recent_scrapes,
  MAX(requestDate) as last_scrape
FROM `cred-1556636033881.linkedin.LinkedinApiCall`
WHERE requestType = 'BRIGHTDATA_API_PERSON'
  AND requestDate >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR);

Monitoring

View Logs

# API service
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=priority-pipeline-api" --limit 50

# Webhook service
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=priority-pipeline-webhook" --limit 50

# Coordinator job
gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=priority-pipeline-coordinator" --limit 50

View Job Executions

gcloud run jobs executions list --job=priority-pipeline-coordinator --region=us-central1 --limit=10

Troubleshooting

Check Service Status

gcloud run services describe priority-pipeline-api --region=us-central1
gcloud run services describe priority-pipeline-webhook --region=us-central1
gcloud run jobs describe priority-pipeline-coordinator --region=us-central1

View Errors

gcloud logging read "resource.labels.service_name=priority-pipeline-api AND severity>=ERROR" --limit 50

Coordinator Job Failures

# View failed execution
gcloud run jobs executions describe EXECUTION_ID --region=us-central1

# Check logs
gcloud logging read "resource.labels.job_name=priority-pipeline-coordinator AND severity>=ERROR" --limit 50