Build a Flexible CSV Parser with Streaming, Validation, and Error Reporting
Create a streaming CSV parser with schema validation, error reporting, and data transformation for large file processing.
๐ The Prompt
Create a production-ready CSV parser module in [PROGRAMMING_LANGUAGE] that can handle large CSV files efficiently with streaming, data validation, and comprehensive error reporting. The parser should be designed for processing [DATA_DESCRIPTION] data (e.g., customer records, product catalogs, financial transactions) and meet these specifications:
1. **Core Parser Configuration**:
- Accept configuration options: delimiter (default: comma), quote character, escape character, encoding ([ENCODING], e.g., UTF-8), whether the first row is a header, and custom line terminators.
- Support both file path input and readable stream/buffer input.
- Use streaming/chunked processing to handle files of [EXPECTED_SIZE] (e.g., 500MB+) without loading the entire file into memory.
2. **Column Mapping & Schema Definition**:
- Define a schema for the expected CSV structure with these columns: [COLUMN_DEFINITIONS] (e.g., "email: string, required, unique | age: integer, min=0, max=150 | signup_date: date, format=YYYY-MM-DD | status: enum(active, inactive, pending)").
- Map CSV column headers (or indices) to internal field names, supporting aliases for common header variations (e.g., 'Email Address' โ 'email', 'E-mail' โ 'email').
3. **Row-Level Validation**:
- Validate each row against the schema: check required fields, data types, format patterns (regex), value ranges, and enum values.
- Support custom validation functions for complex business rules (e.g., "if [BUSINESS_RULE]").
- Collect all validation errors per row (don't stop at the first error).
4. **Error Handling & Reporting**:
- Track and categorize errors: parsing errors (malformed CSV), validation errors (bad data), and warnings (e.g., trailing whitespace auto-trimmed).
- Generate a structured error report including: row number, column name, provided value, error type, and human-readable message.
- Support configurable behavior: skip invalid rows, abort after [MAX_ERRORS] errors, or collect all errors.
5. **Output & Transformation**:
- Transform valid rows into [OUTPUT_FORMAT] (e.g., array of objects, JSON, database-ready insert statements, or another CSV).
- Apply optional data transformations: trim whitespace, normalize case, parse dates, and convert number formats.
- Provide a summary: total rows processed, valid rows, skipped rows, and error count by category.
6. **API Design**: Expose both a simple one-call function `parseCSV(source, schema, options)` and an event-driven/streaming interface with callbacks for `onRow`, `onError`, `onComplete`.
Provide complete code with thorough comments, type definitions (if applicable), and a usage example that parses a sample CSV matching the [DATA_DESCRIPTION] schema. Include unit test cases for: valid data, missing required fields, type mismatches, malformed rows, and large file streaming.
๐ก Tips for Better Results
Provide your actual column definitions and a few sample rows so the AI generates a parser that's immediately usable with your real data.
Mention your expected file sizes โ if files are under 10MB, a simpler non-streaming approach may be cleaner and sufficient.
Ask for integration with your database ORM as a follow-up so parsed rows can be directly batch-inserted into your database.
๐ฏ Use Cases
Data engineers and backend developers who need to import, validate, and process CSV files from external sources such as client uploads, third-party exports, or batch data migrations.
๐ Related Prompts
๐ป Coding
beginner
Explain Code Like Im a Beginner
Get any code explained in plain English with line-by-line breakdowns, analogies, and learning suggestions.
๐ป Coding
beginner
Debug My Code and Explain the Fix
Get your code debugged with clear explanations of what went wrong and why, plus the corrected version.
๐ป Coding
intermediate
Write Unit Tests for My Code
Generate thorough unit tests covering edge cases, error handling, and both positive and negative scenarios.
๐ป Coding
intermediate
Convert Code Between Languages
Convert code between any programming languages while maintaining idiomatic patterns and best practices.
๐ป Coding
intermediate
Write a REST API Endpoint
Generate production-ready REST API endpoints with validation, error handling, and documentation.
๐ป Coding
advanced
Refactor Code for Better Performance
Get your code refactored for better performance with Big O analysis and design pattern suggestions.