Build a Robust Web Scraper with Error Handling and Data Export

Build a complete web scraper with pagination, error handling, rate limiting, data validation, and flexible export options.

๐Ÿ“ The Prompt

You are an experienced software developer specializing in web scraping and data extraction. Build a complete, well-structured web scraper with the following specifications: **Target & Data:** - Target website or type of website: [TARGET_WEBSITE_OR_TYPE, e.g., e-commerce product listings, job board, news aggregator] - Data fields to extract: [DATA_FIELDS, e.g., title, price, URL, date, description, rating] - Programming language: [LANGUAGE, e.g., Python, Node.js, Go] - Output format: [OUTPUT_FORMAT, e.g., CSV, JSON, SQLite database] **Functional Requirements:** 1. **Page Navigation**: Handle pagination or infinite scroll to scrape across multiple pages (up to [MAX_PAGES] pages). 2. **Data Extraction**: Parse and extract the specified data fields cleanly. Handle missing or malformed fields gracefully with default values or null markers. 3. **Rate Limiting & Politeness**: Implement configurable delays between requests (default [DELAY_SECONDS] seconds). Respect robots.txt guidelines. Rotate User-Agent strings from a predefined list. 4. **Error Handling & Retries**: Implement retry logic with exponential backoff for failed requests (max [MAX_RETRIES] retries). Log all errors with timestamps and URLs. 5. **Data Validation & Cleaning**: Strip extra whitespace, normalize encoding, and validate data types before storing. 6. **Export**: Save results to the specified output format with proper encoding and structure. **Code Quality Requirements:** - Use clear project structure with separation of concerns (config, scraper logic, data models, export). - Include comprehensive docstrings and inline comments. - Add a configuration file or CLI arguments for customizable parameters (URL, delay, max pages, output path). - Include a requirements/dependencies file. Provide the complete source code, a sample output showing expected data structure, and brief usage instructions.

๐Ÿ’ก Tips for Better Results

Always check the target website's Terms of Service and robots.txt before scraping to ensure compliance. Provide a real example URL or a detailed description of the HTML structure to help the AI generate more accurate selectors. Test the scraper on a small number of pages first and inspect the output before running a full scrape.

๐ŸŽฏ Use Cases

Data analysts, researchers, and developers who need to collect structured data from websites for analysis, monitoring, or integration into other systems.

๐Ÿ”— Related Prompts

๐Ÿ’ป Coding beginner

Explain Code Like Im a Beginner

Get any code explained in plain English with line-by-line breakdowns, analogies, and learning suggestions.

๐Ÿ’ป Coding beginner

Debug My Code and Explain the Fix

Get your code debugged with clear explanations of what went wrong and why, plus the corrected version.

๐Ÿ’ป Coding intermediate

Write Unit Tests for My Code

Generate thorough unit tests covering edge cases, error handling, and both positive and negative scenarios.

๐Ÿ’ป Coding intermediate

Convert Code Between Languages

Convert code between any programming languages while maintaining idiomatic patterns and best practices.

๐Ÿ’ป Coding intermediate

Write a REST API Endpoint

Generate production-ready REST API endpoints with validation, error handling, and documentation.

๐Ÿ’ป Coding advanced

Create a GitHub Actions CI/CD Workflow for Automated Testing and Deployment

Generate a complete GitHub Actions CI/CD workflow with build, test, deploy, and notification jobs for your project.