Data Pipeline
Local-Sync Events
Mode & Sources
Enhancements & Build
This path flags pipeline_operator: venue categorization runs after extraction; inverted indexes /
embeddings / competitor finalize always afterward. Deep-research disk export and Phase 2.7 research_priority /
account_signals builds are omitted (use research jobs); ML inference parquet export stays off unless you POST
auto_ml_intelligence: true without pipeline_operator from the API.
ML Enrichment + Abstracts are forced on server-side
so Activities/Campaigns and abstract-gated venue tagging always participate.
When categorize finds zero venues queued for LLM batches (same junk/title rules as production),
processed_records is touched so the Pipeline table “Updated” clock reflects that verify pass.
Advanced: Individual ETL Operations
Source connectivity
Probes run in this runtime (laptop or Cloud Run web service), not a job pod. Allowlist the egress IP below on Navigator and Redshift — it may differ from the job NAT IP. Venues = Salesforce only; Redshift = EF3 abstracts; Navigator = bids/RFP.
Execution History
| Job | Execution | Status | Started | Duration | |
|---|---|---|---|---|---|
| Loading… | |||||
Hot-Reload Production
Pull fresh data from GCS into the Cloud Run serving container and reload all caches.
Current State
Deploy Pipeline
Runs deploy.ps1 locally: test → build data bundle → docker build → push → Cloud Run rollout.
Docker Desktop must be running.
Revisions
| Revision | Created | CPU | Memory | Instances |
|---|---|---|---|---|
| Loading… | ||||
GCS Operations
Run History
| Job | Type | Status | Started | Duration | Log Lines | |
|---|---|---|---|---|---|---|
| Loading… | ||||||