Data Architecture
Data Science
Matching
MLOps
QA
Data Engineering
Data Ingestion Automation
- Data Ingestion People, Company - Importing contact/people data from external sources
Coresignal Data Ingestion
- Google Cloud Workflows Console - Seeding Coresignal data is handled by Workflows
- Google Cloud Scheduler Console - Workflows are triggered by cloud scheduler
- To run a workflow outside the schedule - you need to select the job and click Force run it will trigger import from Coresignal datasource into BQ tables
Daily Batch Workflows
The following workflows run daily to ingest Coresignal data into BigQuery tables:
Companies Workflow:
- Daily All Coresignal Companies - Daily batch processing for company data
- Target Table:
AllCoresignalLinkedinCompanies - Schedule: Daily execution
Persons Workflow:
- Daily All Coresignal Persons - Daily batch processing for person data
- Target Table:
AllCoresignalLinkedinPersons - Schedule: Daily execution
Both workflows are configured to run automatically via Google Cloud Scheduler and can be manually triggered through the Google Cloud Workflows console when needed.
Coresignal Daily Metrics
These two tables monitor and analyze changes in LinkedIn data from Coresignal between refreshes. They provide daily aggregated statistics for both LinkedIn person and company data.
Tables:
coresignal.CoresignalLinkedinPersonDailyStatscoresignal.CoresignalLinkedinCompanyDailyStats
How both tables work:
- Each row represents one specific date with aggregated statistics for that day
- The
refresh_datefield shows when data was processed (based onupdated_atfor person data,last_updatedfor company data)
Important: All metrics are calculated ONLY for records that were updated on that specific date, NOT the total table
Key metrics per row:
The following metrics apply to both tables:
Volume metrics
unique_profiles/companies= Number of unique records updated on this datenew_profiles/companies= Records seen for the first time on this dateupdated_profiles/companies= Records that already existed but were updated on this datedeleted_profiles/companies= Records marked as deleted on this date
Data quality metrics (coverage percentages)
Below we have some examples of information you will find, but please check the table for the rest.
- Person data:
headline_coverage_pct,location_coverage_pct,experience_coverage_pct,education_coverage_pct - Company data:
name_coverage_pct,description_coverage_pct,headquarters_country_coverage_pct
Example interpretation:
refresh_date | unique_profiles | new_profiles | updated_profiles | headline_coverage_pct
2024-01-15 | 1000 | 50 | 950 | 85.00
2024-01-16 | 1200 | 30 | 1170 | 88.50
This means:
- Jan 15: 1000 profiles were updated, 50 were new, 950 were updates, 85% had headlines
- Jan 16: 1200 profiles were updated, 30 were new, 1170 were updates, 88.5% had headlines
Usage:
SELECT * FROM coresignal.CoresignalLinkedinPersonDailyStats
WHERE refresh_date >= '2024-01-01'
ORDER BY refresh_date DESC;
SELECT * FROM coresignal.CoresignalLinkedinCompanyDailyStats
WHERE refresh_date >= '2024-01-01'
ORDER BY refresh_date DESC;
Alerts and Monitoring
Recommended Alerts
1. Volume Drop Alert
Trigger: When daily unique profiles/companies drop
WITH daily_comparison AS (
SELECT
refresh_date,
unique_profiles,
LAG(unique_profiles) OVER (ORDER BY refresh_date) as prev_unique_profiles
FROM coresignal.CoresignalLinkedinPersonDailyStats
)
SELECT refresh_date, unique_profiles, prev_unique_profiles,
(unique_profiles - prev_unique_profiles) / prev_unique_profiles * 100 as pct_change
FROM daily_comparison
2. Data Quality Alert
Trigger: When field coverage drops significantly
-- Monitor headline coverage drop
WITH coverage_trend AS (
SELECT
refresh_date,
headline_coverage_pct,
LAG(headline_coverage_pct) OVER (ORDER BY refresh_date) as prev_coverage
FROM coresignal.CoresignalLinkedinPersonDailyStats
)
SELECT refresh_date, headline_coverage_pct, prev_coverage,
headline_coverage_pct - prev_coverage as coverage_change
FROM coverage_trend
Monitoring Queries
Daily Health Check
SELECT
'Person Data' as data_type,
unique_profiles as unique_count,
new_profiles as new_count,
updated_profiles as updated_count,
headline_coverage_pct as coverage_pct_1,
location_coverage_pct as coverage_pct_2
FROM coresignal.CoresignalLinkedinPersonDailyStats
WHERE refresh_date = CURRENT_DATE()
UNION ALL
SELECT
'Company Data' as data_type,
unique_companies as unique_count,
new_companies as new_count,
updated_companies as updated_count,
name_coverage_pct as coverage_pct_1,
description_coverage_pct as coverage_pct_2
FROM coresignal.CoresignalLinkedinCompanyDailyStats
WHERE refresh_date = CURRENT_DATE();