Skip to content

Deployment Guide

Overview

The Priority Pipeline is deployed as Google Cloud Run services and jobs with automatic deployments via GitHub Actions.

Repository: credinvest/social_media_scrapers

Deployment Workflow: .github/workflows/deploy-priority-pipeline.yml

Cloud Run Services

Services and Jobs

Name Type Purpose
priority-pipeline-api Cloud Run Service Receives Pub/Sub batches, triggers BrightData
priority-pipeline-webhook Cloud Run Service Receives scraped data from BrightData
priority-pipeline-coordinator Cloud Run Job Monthly profile batch coordination

Deployment Architecture

Environments

Environment Deployment Trigger Auto-Deploy
Staging Manual workflow dispatch No
Production Push to main/master (when src/priority_pipeline/** changes) Yes

Deployment Process

Automatic Deployment (Production)

  • Push to main or master branches
  • Only triggers when files in src/priority_pipeline/** are changed

Manual Deployment (Staging or Production)

  1. Go to GitHub: credinvest/social_media_scrapers > Actions
  2. Select "Deploy Priority Pipeline" workflow
  3. Click "Run workflow"
  4. Select environment: staging or production
  5. Click "Run workflow"

Deployment Order

  1. Webhook service deploys first (generates URL)
  2. API service deploys (uses webhook URL)
  3. Coordinator job deploys

Environment Variables

API Service

Variable Description Value
BRIGHTDATA BrightData API token Secret
GCP_PROJECT_ID Google Cloud project ID cred-1556636033881
WEBHOOK_URL Webhook service URL Auto-generated
ENVIRONMENT Environment name staging / production

Webhook Service

Variable Description Value
GCP_PROJECT_ID Google Cloud project ID cred-1556636033881
ENVIRONMENT Environment name staging / production

Coordinator Job

Variable Description Value
BRIGHTDATA BrightData API token Secret
GCP_PROJECT_ID Google Cloud project ID cred-1556636033881
PYTHONPATH Python path /app
ENVIRONMENT Environment name staging / production

Service Configuration

Service Max Instances Concurrency CPU Memory Timeout
API 1000 1 1 2Gi 3600s
Webhook 5000 1 1 2Gi 3600s
Coordinator Job - - 2 8Gi 168h

Post-Deployment Verification

Health Check

# Get URLs
API_URL=$(gcloud run services describe priority-pipeline-api --region=us-central1 --format='value(status.url)')
WEBHOOK_URL=$(gcloud run services describe priority-pipeline-webhook --region=us-central1 --format='value(status.url)')

# Test
curl $API_URL/health
curl $WEBHOOK_URL/health

Test Coordinator Job

gcloud run jobs execute priority-pipeline-coordinator --region=us-central1

Check Recent Data

SELECT 
  COUNT(*) as recent_scrapes,
  MAX(requestDate) as last_scrape
FROM `cred-1556636033881.linkedin.LinkedinApiCall`
WHERE requestType = 'BRIGHTDATA_API_PERSON'
  AND requestDate >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR);

Monitoring

View Logs

# API service
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=priority-pipeline-api" --limit 50

# Webhook service
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=priority-pipeline-webhook" --limit 50

# Coordinator job
gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=priority-pipeline-coordinator" --limit 50

View Job Executions

gcloud run jobs executions list --job=priority-pipeline-coordinator --region=us-central1 --limit=10

Troubleshooting

Check Service Status

gcloud run services describe priority-pipeline-api --region=us-central1
gcloud run services describe priority-pipeline-webhook --region=us-central1
gcloud run jobs describe priority-pipeline-coordinator --region=us-central1

View Errors

gcloud logging read "resource.labels.service_name=priority-pipeline-api AND severity>=ERROR" --limit 50

Coordinator Job Failures

# View failed execution
gcloud run jobs executions describe EXECUTION_ID --region=us-central1

# Check logs
gcloud logging read "resource.labels.job_name=priority-pipeline-coordinator AND severity>=ERROR" --limit 50