Create a Comprehensive Data Quality Checklist for Your Dataset

Generate a tailored data quality checklist with SQL validation queries, severity levels, and a scoring framework for any dataset.

๐Ÿ“ The Prompt

You are a data quality specialist and governance expert. I need you to generate a thorough, actionable data quality checklist tailored to my specific dataset and business context. **Dataset Details:** - Dataset name/description: [DATASET_NAME_AND_DESCRIPTION, e.g., customer transactions table with purchase history] - Number of records (approx.): [RECORD_COUNT, e.g., 2 million rows] - Key columns/fields: [LIST_KEY_COLUMNS, e.g., customer_id, transaction_date, amount, product_sku, payment_method] - Data source: [DATA_SOURCE, e.g., production MySQL database, third-party API] - Business purpose: [HOW_DATA_IS_USED, e.g., revenue forecasting, customer segmentation] **Generate the checklist organized into these six dimensions of data quality:** 1. **Completeness**: Identify which fields should never be null, acceptable null thresholds for optional fields, and checks for missing records or gaps in time-series data. 2. **Accuracy**: Define validation rules for each key field (e.g., range checks, format validation, regex patterns), cross-reference checks against trusted sources, and outlier detection criteria. 3. **Consistency**: Specify checks for referential integrity, cross-field logical consistency (e.g., end_date > start_date), standardization rules for categorical values, and consistency across related datasets. 4. **Timeliness**: Define expected data freshness SLAs, latency thresholds, checks for stale or delayed records, and monitoring for data arrival patterns. 5. **Uniqueness**: Specify primary key uniqueness checks, duplicate detection rules (exact and fuzzy matching criteria), and deduplication strategies. 6. **Validity**: Define domain-specific business rules, enumerated value checks, and schema conformance validations. For each check item, provide: - The specific check description - A sample SQL query or pseudocode to implement it - Severity level (Critical / Warning / Info) - Recommended remediation action Finally, suggest a scoring framework to calculate an overall data quality score from 0-100 based on weighted dimensions.

๐Ÿ’ก Tips for Better Results

List your most business-critical columns explicitly so the checklist prioritizes validations that matter most to downstream consumers. Mention any known data issues or pain points (e.g., duplicate records, timezone mismatches) so the checklist addresses your real-world problems. Specify your SQL dialect (e.g., PostgreSQL, BigQuery SQL, T-SQL) in the placeholder details to get copy-paste-ready queries.

๐ŸŽฏ Use Cases

Data analysts, analytics engineers, and data stewards should use this when establishing data quality monitoring for a new or existing dataset, or when onboarding a new data source into their warehouse.

๐Ÿ”— Related Prompts

๐Ÿ“Š Data & Analytics intermediate

Write Complex SQL Queries

Generate optimized SQL queries for complex analysis with CTEs, JOINs, and performance tips.

๐Ÿ“Š Data & Analytics intermediate

Python Data Analysis Script

Generate a complete Python data analysis pipeline with cleaning, visualization, and insights.

๐Ÿ“Š Data & Analytics intermediate

Build an RFM Customer Segmentation Model for Targeted Marketing

Create a complete RFM customer segmentation model with scoring logic, code implementation, and marketing strategies.

๐Ÿ“Š Data & Analytics advanced

Design a Robust ETL Pipeline Architecture for Your Data Platform

Design a complete ETL pipeline architecture with extraction, transformation, loading strategies, error handling, and governance.

๐Ÿ“Š Data & Analytics advanced

Analyze and Interpret A/B Test Results with Statistical Rigor

Get a complete A/B test analysis with statistical significance, power analysis, validity checks, and a clear ship decision.

๐Ÿ“Š Data & Analytics advanced

Design a Scalable ETL Pipeline Architecture for Your Data Platform

Design a production-ready ETL pipeline architecture with tool recommendations, error handling, and scalability strategies.