Design a Scalable ETL Pipeline Architecture for Your Data Platform

Design a production-ready ETL pipeline architecture with tool recommendations, error handling, and scalability strategies.

๐Ÿ“ The Prompt

Act as a senior data engineer with 10+ years of experience designing enterprise-grade ETL (Extract, Transform, Load) pipelines. I need you to design a comprehensive ETL pipeline for the following scenario: **Data Context:** - Source systems: [LIST_OF_SOURCE_SYSTEMS, e.g., PostgreSQL, REST APIs, CSV files] - Target destination: [TARGET_DATA_WAREHOUSE, e.g., Snowflake, BigQuery, Redshift] - Data volume: approximately [DATA_VOLUME, e.g., 5 million records/day] - Update frequency: [FREQUENCY, e.g., real-time, hourly, daily batch] - Primary business use case: [USE_CASE, e.g., customer analytics, financial reporting] **Please provide the following in your design:** 1. **Architecture Overview:** Draw out (in text/diagram form) the end-to-end pipeline architecture, including ingestion, staging, transformation, and loading layers. 2. **Technology Stack Recommendation:** Recommend specific tools and frameworks (e.g., Apache Airflow, dbt, Spark, Fivetran) for each layer. Justify each choice based on the data volume and frequency requirements. 3. **Data Transformation Logic:** Outline the key transformation steps including data cleaning, deduplication, schema mapping, and business logic application. Provide pseudocode or SQL snippets for at least 2 critical transformations. 4. **Error Handling & Monitoring:** Design a robust error handling strategy including retry logic, dead-letter queues, data quality checks, and alerting mechanisms. 5. **Scalability & Performance:** Address how the pipeline handles traffic spikes of [SPIKE_MULTIPLIER, e.g., 3x] normal volume, partitioning strategies, and incremental loading vs. full refresh trade-offs. 6. **Data Governance:** Include data lineage tracking, schema evolution handling, and access control recommendations. 7. **Deployment & CI/CD:** Outline how to version control, test, and deploy pipeline changes safely. Format the output with clear section headers, bullet points, and include a summary comparison table of tool options where applicable.

๐Ÿ’ก Tips for Better Results

Be as specific as possible about your source systems and data formats โ€” the more detail you provide, the more tailored the architecture recommendations will be. Include any existing infrastructure constraints (e.g., cloud provider, budget, team skill set) in the source systems section to get realistic tool suggestions. Run the prompt iteratively: first get the high-level design, then ask follow-up questions about specific layers you want to deep-dive into.

๐ŸŽฏ Use Cases

Data engineers and architects designing new ETL pipelines or modernizing legacy data infrastructure should use this when planning a data platform build or migration.

๐Ÿ”— Related Prompts

๐Ÿ“Š Data & Analytics intermediate

Write Complex SQL Queries

Generate optimized SQL queries for complex analysis with CTEs, JOINs, and performance tips.

๐Ÿ“Š Data & Analytics intermediate

Python Data Analysis Script

Generate a complete Python data analysis pipeline with cleaning, visualization, and insights.

๐Ÿ“Š Data & Analytics advanced

Design a Robust ETL Pipeline Architecture for Your Data Platform

Design a complete ETL pipeline architecture with extraction, transformation, loading strategies, error handling, and governance.

๐Ÿ“Š Data & Analytics intermediate

Create a Comprehensive Data Quality Checklist for Your Dataset

Generate a tailored data quality checklist with SQL validation queries, severity levels, and a scoring framework for any dataset.

๐Ÿ“Š Data & Analytics advanced

Analyze and Interpret A/B Test Results with Statistical Rigor

Get a complete A/B test analysis with statistical significance, power analysis, validity checks, and a clear ship decision.

๐Ÿ“Š Data & Analytics intermediate

Analyze A/B Test Results and Determine Statistical Significance

Get a complete A/B test analysis with statistical significance, confidence intervals, power analysis, and ship decisions.