Loading...

Schema Explorer

Venues: - Accounts: - Dimensions: -

Build a Filter

--
Tip: Use FROM 'core/venues.parquet' or FROM read_parquet('core/venues.parquet')

Run a SQL query to see results

Click a file in the sidebar to auto-insert a query, or write your own

Select a file to inspect its schema

Select a parquet file to see column statistics

Select a file to preview its data

DataOS Site Capabilities

Complete registry of pages, modules, services, data assets, and AI tools.

Pages & Views (28)

PageRoutePurpose
Home/homeLanding page with portfolio overview and key metrics
Revenue Analysis/intelligenceBillable revenue, portfolio analytics, and ML-driven intelligence
IT Category Trends/event-intelligenceIT category procurement trends by jurisdiction and topic
Event Operations/event-operationsOperational dashboard for event team: lead delivery, venue status, gaps
Event Tearsheet/event-tearsheetOne-page event intelligence brief with sponsor mix, lead profiles, topics
Sales Library/sales-libraryCentral repository for tearsheets, briefs, emails & research
Account Brief/account-briefLLM-generated strategic account analysis with pipeline, gaps, recommendations
Competitor Network/competitor-network3D neural map of 33K+ accounts: semantic clusters, graduation pathways
Sales Library/sales-libraryCentral repository for tearsheets, briefs, emails, and market research
Data Explorer/explorerInteractive data browser with filters and visualizations
AI Chat/chatConversational AI with 34 purpose-built tools for data analysis
Visualization/visualizationChart builder for custom data visualizations
Lead Scoring/lead-scoringML-powered lead quality scoring and prioritization
Master Pipeline/pipelineETL pipeline control panel: run, monitor, schedule data builds
Machine Learning/intelligence/systemML pipeline dashboard: model status, training, feature importance
Award Collection/awardsGovernment contract award browser with entity-resolved vendor matching
Entity Resolution/entity-resolutionFuzzy vendor-to-account matching with 7-signal scoring
Schema Explorer/schema-explorerDimension catalog, filter tests, Parquet Data Preview
Memory Profiler/memory-profilerRuntime memory usage analysis for service optimization
PowerBI Config/reportsGenerated report management with scheduling and delivery
API Keys/api-keysAPI key provisioning for external integrations
Cost Tracking/costsLLM API cost monitoring: per-user, per-service, per-session
README/readmeSystem documentation and architecture guide
ML README/ml-readmeMachine learning pipeline documentation

AI Chat Tools (34)

#ToolPurpose
1discover_filtersGet available filter values for BI, awards, leads, venues
2advanced_queryMulti-domain filtered queries with boolean groups
3get_account_detailCore account info: name, team, industry, spend, venues, leads
4get_business_intelBI profile: company summary, products, SLED use cases, tags
5get_account_awardsGovernment contract awards matched to an account
6compare_accountsSide-by-side comparison of 2+ accounts
7find_opportunitiesGap analysis: categories/topics an account hasn't used
8get_lead_demographicsLead demographic breakdowns by state, function, role
9get_account_topicsTopic tag distribution for an account's venues
10search_accountsFuzzy name search across 162K accounts
11get_venuesVenue listing with category, tag, keyword filters
12get_platform_statsGlobal platform metrics: totals, averages, distributions
13get_trend_dataYear-over-year trends for venues, leads, spend
14get_topic_sponsor_statsTopic-level sponsor counts and penetration rates
15account_set_operationSet math (intersect/subtract/union) on account filter groups
16aggregateGroup-by aggregations: top states, categories, teams
17query_leadsLead-level queries with demographic filters
18get_award_analysisMulti-account award analysis with grouping
19get_attribution_analysisMarketing-to-award attribution with influence windows
20get_venue_detailSingle venue deep-dive: leads, demographics, topic breakdown
21get_account_dossierComprehensive account dossier: CRM + BI + ML + pipeline
22get_account_predictionsML predictions: churn, propensity, revenue forecast, segment
23get_event_series_intelligenceEvent series analysis: sponsor mix, gaps, session inventory
24build_event_target_reportAI-scored target accounts for a specific event series
25generate_prospect_listMulti-signal tiered prospect list with sector + geo filters
26semantic_search_accountsFind accounts by capability description via 33K BI embeddings
27find_similar_accountsCosine-similarity lookalike accounts from BI embeddings
28recommend_events_for_accountSemantic event/session matching for an account's profile
29generate_account_briefLLM strategic account analysis: pipeline, gaps, growth plays
30get_procurement_trendsML Model 6 procurement forecast by jurisdiction + category
31get_deep_researchRetrieve cached deep research intelligence (company or jurisdiction)
32start_deep_researchTrigger background deep research via OpenAI web browsing
33explore_competitor_networkEgo network, cluster membership, edge explanations — "who competes with whom"
34find_graduation_candidatesAccounts shaped like whales but not yet activated — "who should we grow"

ETL Pipeline Modules

  • accounts — Salesforce account sync + field normalization
  • venues — Venue/lead data pull with category + tag assignment
  • events — Event series and session inventory builder
  • awards — Government contract collection (Navigator bids/awards)
  • entity_resolution — 7-signal fuzzy vendor-to-account matching
  • build_embeddings — BI + award narrative embeddings (1,536-dim)
  • build_indexes — 45 inverted indexes for O(1) filter lookups
  • competitor_network — 3D graph: all-pairs similarity, Louvain clustering, graduation scoring
  • ml_pipeline — 8 ML models (churn, propensity, revenue, segmentation, procurement, etc.)
  • reports — Scheduled report generation and delivery

ML Pipeline Models (8)

  • M1 — Churn Prediction: XGBoost binary classifier for 12-month churn risk
  • M2 — Revenue Forecast: Multi-horizon revenue prediction per account
  • M3 — Account Segmentation: 6-dimension behavioral archetypes (subscription, usage, pipeline, growth, engagement, tenure)
  • M4 — Propensity-to-Buy: Product family purchase probability per account
  • M5 — Lead Scoring: Lead quality scoring from demographics + engagement signals
  • M6 — Procurement Forecast: Jurisdiction × IT-category demand prediction from solicitation time series
  • M7 — Product Recommendation: Cross-sell/upsell recommendation engine
  • M8 — Engagement Scoring: Behavioral engagement scoring from subscription usage data

Core Data Assets

  • ~162K accounts (~4K active sponsors, ~158K prospects)
  • ~46K venues (events, webinars, papers, newsletters)
  • ~1.3M leads (government contacts from venue sponsorship)
  • ~211K awards/bids (government contracts from Navigator)
  • ~270M state vendor payments (procurement spending records)
  • 45 inverted indexes for instant filter resolution
  • 33K BI embeddings (1,536-dim company profile vectors)
  • 18K award embeddings (solicitation narrative vectors)
  • 1.36M competitor edges (5-signal weighted graph)
  • 18 semantic clusters (Louvain community detection)

Backend Services

  • HierarchyService — Core data layer: accounts, venues, leads, filters, facets
  • InstantFilterEngine — O(1) inverted index lookups for filter combinations
  • AttributionService — Marketing-to-award temporal attribution (548-day window)
  • SemanticEngine — Embedding search + cosine similarity + lookalike accounts
  • CompetitorNetworkService — Graph queries: ego networks, clusters, graduation scoring
  • ShardQueryEngine — Client-isolated data access (per-account Parquet shards)
  • DossierBuilder — Comprehensive account dossier assembly
  • AccountBriefService — LLM-powered strategic account brief generation
  • DeepResearchService — Background web research via OpenAI deep research models
  • CostTracker — Per-user, per-session LLM API cost monitoring
  • EntityResolution — 7-signal fuzzy matching (name, TF-IDF, embedding, address, etc.)

API Blueprints (20)

  • public_api
  • admin_api
  • hierarchy_api
  • awards_api
  • chat_api
  • lead_scoring_api
  • ops_api
  • reports_api
  • etl_api
  • ml_pipeline_api
  • agency_resolution_api
  • vendor_payment_er_api
  • ei_api (event intelligence)
  • account_brief_api
  • ji_api (jurisdiction intel)
  • revenue_dna_api
  • research_api
  • data_explorer_api
  • eo_api (event operations)
  • vault_api (tearsheet publishing)
  • competitor_network_api
Loading architecture docs...
-Assets
-Parquet
-Partitioned
-Mutable
-Dimensions
-PII assets
Drift: loading…

Drift gate

Advisory report from scripts/gen_asset_catalog.py. Re-run the script (or any full ETL) to refresh.

Request

POSTs to /api/v2/query. See request schema and declared joins.

Result

No query yet
Hit Run to execute.

Notifications

No notifications

Create Opportunity

DATA OS

Opportunity Created

DataOS
Install DataOS Add to home screen for quick access
All Features