Home Data Guild
Page
Cancel

Data Guild

TL;DR

The Data Guild helps teams make better data-driven decisions through education, best practices, and governance.

  • Join: #dev-data on Slack
  • Meetings: Monthly guild meetings + weekly office hours (TBD)
  • Resources: Data Catalog, Spark/Airflow guides, runbooks

What We Do

AreaExamples
Data DiscoveryData Catalog, dataset guides, training sessions, office hours
Best PracticesSpark optimization, Dataproc usage, Airflow DAG patterns, testing
Data QualityValidation frameworks (Great Expectations), schema standards
DocumentationArchitecture docs, access patterns, lineage, runbooks

Guild provides: Documentation, training, standards, patterns, community support

Teams own: Day-to-day pipeline development, maintenance, monitoring


What We Don’t Do

Out of Scope
Infrastructure provisioning, DevOps, on-call
Business metrics definitions
Database administration
Production incident response

Getting Started

New members:

  1. Join #dev-data on Slack
  2. Review the Data Catalog and key resources
  3. Attend a monthly guild meeting

Data consumers:

  1. Check the Data Catalog for available datasets
  2. Ask questions in #dev-data
  3. Attend weekly office hours for in-depth help

Contributors:

  • Add/update documentation
  • Present patterns at guild meetings
  • Submit RFCs for new standards (5-10 business days review in #dev-data)

Current Initiatives (Q1 2026)

InitiativeStatus
Data Catalog MVP (top 20 datasets)🚧 In Progress
Spark Best Practices🚧 In Progress
Airflow Best Practices🚧 In Progress

Key Resources

ResourceStatus
Data Catalog🚧 In Progress
Spark Best PracticesπŸ“‹ Planned
Airflow Best PracticesπŸ“‹ Planned
Dataproc / ClickHouse / Trino RunbooksπŸ“‹ Planned
Data Quality GuideπŸ“‹ Planned
Data Lake ArchitectureπŸ“‹ Planned

FAQ

QuestionAnswer
How do I find available data?Check Data Catalog or ask in #dev-data
Who handles data access issues?Ask in #dev-data or contact dataset owner
How do I propose a new standard?Post RFC in #dev-data (5-10 business days review)
How do I share a cool pattern?Present at guild meeting or document in Tettra
Who maintains production infrastructure?Data / Platform teams

Success Metrics

  • Data Discoverability: 80% of production datasets to be documented by Q2 2026
  • Engagement: Guild meeting attendance and #dev-data activity
  • Data Quality: Reduction in data incidents, SLA compliance
  • Adoption: Teams following guild standards

Trending Tags