Build a Robust Web Scraper with Error Handling and Data Export

Build a complete web scraper with pagination, error handling, rate limiting, data validation, and flexible export options.

📝 The Prompt

You are an experienced software developer specializing in web scraping and data extraction. Build a complete, well-structured web scraper with the following specifications: **Target & Data:** - Target website or type of website: [TARGET_WEBSITE_OR_TYPE, e.g., e-commerce product listings, job board, news aggregator] - Data fields to extract: [DATA_FIELDS, e.g., title, price, URL, date, description, rating] - Programming language: [LANGUAGE, e.g., Python, Node.js, Go] - Output format: [OUTPUT_FORMAT, e.g., CSV, JSON, SQLite database] **Functional Requirements:** 1. **Page Navigation**: Handle pagination or infinite scroll to scrape across multiple pages (up to [MAX_PAGES] pages). 2. **Data Extraction**: Parse and extract the specified data fields cleanly. Handle missing or malformed fields gracefully with default values or null markers. 3. **Rate Limiting & Politeness**: Implement configurable delays between requests (default [DELAY_SECONDS] seconds). Respect robots.txt guidelines. Rotate User-Agent strings from a predefined list. 4. **Error Handling & Retries**: Implement retry logic with exponential backoff for failed requests (max [MAX_RETRIES] retries). Log all errors with timestamps and URLs. 5. **Data Validation & Cleaning**: Strip extra whitespace, normalize encoding, and validate data types before storing. 6. **Export**: Save results to the specified output format with proper encoding and structure. **Code Quality Requirements:** - Use clear project structure with separation of concerns (config, scraper logic, data models, export). - Include comprehensive docstrings and inline comments. - Add a configuration file or CLI arguments for customizable parameters (URL, delay, max pages, output path). - Include a requirements/dependencies file. Provide the complete source code, a sample output showing expected data structure, and brief usage instructions.

💡 Tips for Better Results

Always check the target website's Terms of Service and robots.txt before scraping to ensure compliance.
Provide a real example URL or a detailed description of the HTML structure to help the AI generate more accurate selectors.
Test the scraper on a small number of pages first and inspect the output before running a full scrape.

🎯 Use Cases

Data analysts, researchers, and developers who need to collect structured data from websites for analysis, monitoring, or integration into other systems.

🔗 Related Prompts

💻 Coding beginner

Explain Code Like Im a Beginner

Get any code explained in plain English with line-by-line breakdowns, analogies, and learning suggestions.

👁️ 6 📋 0

💻 Coding beginner

Debug My Code and Explain the Fix

Get your code debugged with clear explanations of what went wrong and why, plus the corrected version.

👁️ 5 📋 0

💻 Coding intermediate

Write Unit Tests for My Code

Generate thorough unit tests covering edge cases, error handling, and both positive and negative scenarios.

👁️ 5 📋 1

💻 Coding intermediate

Convert Code Between Languages

Convert code between any programming languages while maintaining idiomatic patterns and best practices.

👁️ 4 📋 0

💻 Coding intermediate

Write a REST API Endpoint

Generate production-ready REST API endpoints with validation, error handling, and documentation.

👁️ 4 📋 0

💻 Coding advanced

Create a GitHub Actions CI/CD Workflow for Automated Testing and Deployment

Generate a complete GitHub Actions CI/CD workflow with build, test, deploy, and notification jobs for your project.

👁️ 4 📋 0

ℹ️ Prompt Info

Category Coding

Difficulty intermediate

Copies 0

Likes 0

🤖 Works With

ChatGPT Claude Gemini GPT-4o

🏷️ Tags

web scraping data extraction Python automation BeautifulSoup data pipeline