Design a Robust ETL Pipeline Architecture for Your Data Warehouse

Design a scalable ETL pipeline architecture with extraction strategies, transformations, error handling, and orchestration plans.

📝 The Prompt

You are a senior data engineer with 10+ years of experience designing scalable ETL (Extract, Transform, Load) pipelines. Help me design a comprehensive ETL pipeline for the following scenario: **Data Context:** - Source systems: [LIST_OF_SOURCE_SYSTEMS, e.g., PostgreSQL, Salesforce API, CSV files from SFTP] - Target destination: [TARGET_DATA_WAREHOUSE, e.g., Snowflake, BigQuery, Redshift] - Data volume: approximately [DATA_VOLUME, e.g., 5 million rows/day] - Update frequency: [FREQUENCY, e.g., hourly, daily, real-time] - Primary business use case: [USE_CASE, e.g., marketing analytics, financial reporting] **Please provide the following in your design:** 1. **Architecture Overview**: Recommend specific tools and technologies for each ETL stage (extraction, transformation, loading) and justify your choices based on the data volume and frequency requirements. 2. **Data Extraction Strategy**: Define the extraction method for each source system (full load vs. incremental load), including how to track changes (CDC, timestamps, etc.). 3. **Transformation Layer**: Outline the key transformation steps including data cleansing rules, deduplication logic, schema mapping, and any staging table structures needed. Provide example SQL or pseudocode for the most complex transformation. 4. **Loading Strategy**: Specify the loading pattern (upsert, append, truncate-and-reload) and explain partitioning or clustering strategies for optimal query performance. 5. **Error Handling & Monitoring**: Design a robust error-handling framework including retry logic, dead-letter queues, data quality checks, alerting mechanisms, and logging standards. 6. **Orchestration & Scheduling**: Recommend an orchestration tool and provide a DAG (Directed Acyclic Graph) structure showing task dependencies. 7. **Data Quality Gates**: Define at least 5 specific data quality checks (row count validation, null checks, referential integrity, etc.) that should run at each pipeline stage. Format the output with clear headers, diagrams described in text, and include a summary table of tools recommended with estimated complexity ratings.

💡 Tips for Better Results

Be as specific as possible about your source systems and data formats — the more detail you provide, the more tailored the pipeline design will be.
Include any existing infrastructure or tool constraints (e.g., 'we already use Airflow' or 'must stay within AWS ecosystem') to get realistic recommendations.
Follow up by asking the AI to generate actual code templates for the most critical pipeline components.

🎯 Use Cases

Data engineers and architects who need to design or refactor ETL pipelines for new data warehouse implementations or migrations. Ideal during the planning phase of a data infrastructure project.

🔗 Related Prompts

📊 Data & Analytics intermediate

Write Complex SQL Queries

Generate optimized SQL queries for complex analysis with CTEs, JOINs, and performance tips.

👁️ 2 📋 0

📊 Data & Analytics intermediate

Python Data Analysis Script

Generate a complete Python data analysis pipeline with cleaning, visualization, and insights.

👁️ 2 📋 0

📊 Data & Analytics intermediate

Build an RFM Customer Segmentation Model for Targeted Marketing

Create a complete RFM customer segmentation model with scoring logic, code implementation, and marketing strategies.

👁️ 2 📋 5

📊 Data & Analytics advanced

Design a Robust ETL Pipeline Architecture for Your Data Platform

Design a complete ETL pipeline architecture with extraction, transformation, loading strategies, error handling, and governance.

👁️ 1 📋 0

📊 Data & Analytics intermediate

Create a Comprehensive Data Quality Checklist for Your Dataset

Generate a tailored data quality checklist with SQL validation queries, severity levels, and a scoring framework for any dataset.

👁️ 1 📋 0

📊 Data & Analytics advanced

Analyze and Interpret A/B Test Results with Statistical Rigor

Get a complete A/B test analysis with statistical significance, power analysis, validity checks, and a clear ship decision.

👁️ 1 📋 0

ℹ️ Prompt Info

Category Data & Analytics

Difficulty advanced

Copies 0

Likes 0

🤖 Works With

ChatGPT GPT-4 Copilot

🏷️ Tags

ETL pipeline data engineering data warehouse data architecture pipeline design data integration