Data Guild
TL;DR
The Data Guild helps teams make better data-driven decisions through education, best practices, and governance.
- Join: #dev-data on Slack
- Meetings: Monthly guild meetings + weekly office hours (TBD)
- Resources: Data Catalog, Spark/Airflow guides, runbooks
What We Do
| Area | Examples |
|---|---|
| Data Discovery | Data Catalog, dataset guides, training sessions, office hours |
| Best Practices | Spark optimization, Dataproc usage, Airflow DAG patterns, testing |
| Data Quality | Validation frameworks (Great Expectations), schema standards |
| Documentation | Architecture docs, access patterns, lineage, runbooks |
Guild provides: Documentation, training, standards, patterns, community support
Teams own: Day-to-day pipeline development, maintenance, monitoring
What We Donβt Do
| Out of Scope |
|---|
| Infrastructure provisioning, DevOps, on-call |
| Business metrics definitions |
| Database administration |
| Production incident response |
Getting Started
New members:
- Join #dev-data on Slack
- Review the Data Catalog and key resources
- Attend a monthly guild meeting
Data consumers:
- Check the Data Catalog for available datasets
- Ask questions in #dev-data
- Attend weekly office hours for in-depth help
Contributors:
- Add/update documentation
- Present patterns at guild meetings
- Submit RFCs for new standards (5-10 business days review in #dev-data)
Current Initiatives (Q1 2026)
| Initiative | Status |
|---|---|
| Data Catalog MVP (top 20 datasets) | π§ In Progress |
| Spark Best Practices | π§ In Progress |
| Airflow Best Practices | π§ In Progress |
Key Resources
| Resource | Status |
|---|---|
| Data Catalog | π§ In Progress |
| Spark Best Practices | π Planned |
| Airflow Best Practices | π Planned |
| Dataproc / ClickHouse / Trino Runbooks | π Planned |
| Data Quality Guide | π Planned |
| Data Lake Architecture | π Planned |
FAQ
| Question | Answer |
|---|---|
| How do I find available data? | Check Data Catalog or ask in #dev-data |
| Who handles data access issues? | Ask in #dev-data or contact dataset owner |
| How do I propose a new standard? | Post RFC in #dev-data (5-10 business days review) |
| How do I share a cool pattern? | Present at guild meeting or document in Tettra |
| Who maintains production infrastructure? | Data / Platform teams |
Success Metrics
- Data Discoverability: 80% of production datasets to be documented by Q2 2026
- Engagement: Guild meeting attendance and #dev-data activity
- Data Quality: Reduction in data incidents, SLA compliance
- Adoption: Teams following guild standards