Build an Anomaly Detection System for Real-Time Data Monitoring

Design a complete anomaly detection system with algorithm selection, threshold tuning, and false positive reduction.

๐Ÿ“ The Prompt

You are a machine learning engineer specializing in anomaly detection and data quality monitoring. I need you to design and implement an anomaly detection system for the following use case. **System Context:** - **Data Source:** [DATA_SOURCE, e.g., server metrics, financial transactions, IoT sensor readings, application logs] - **Key Metrics to Monitor:** [METRICS_LIST, e.g., CPU utilization, transaction amount, request latency, error rate] - **Data Volume:** Approximately [VOLUME, e.g., 10,000 events per minute] - **Latency Requirement:** Anomalies must be detected within [LATENCY, e.g., 5 minutes, real-time, 1 hour] - **Historical Data Available:** [HISTORY, e.g., 6 months of labeled data, 2 years unlabeled] - **Labeled Anomaly Examples:** [LABELS, e.g., 'yes โ€” 500 labeled incidents', 'no โ€” fully unsupervised'] - **Current Pain Point:** [PAIN_POINT, e.g., 'too many false alerts', 'missed critical incidents', 'no monitoring exists'] Please design the complete system: 1. **Anomaly Taxonomy:** Classify the types of anomalies relevant to [DATA_SOURCE] โ€” point anomalies, contextual anomalies, and collective anomalies. Provide concrete examples of each for this domain. 2. **Algorithm Selection:** Recommend and compare at least 3 suitable algorithms based on the labeling situation: - **Statistical:** Z-score, Grubbs' test, seasonal hybrid ESD (S-H-ESD) - **Machine Learning:** Isolation Forest, One-Class SVM, Local Outlier Factor - **Deep Learning:** Autoencoders, LSTM-based sequence anomaly detection For each, explain computational complexity, interpretability, and suitability for [LATENCY] requirements. 3. **Feature Engineering:** Define the features to extract from raw [DATA_SOURCE] data, including rolling statistics (mean, std, percentiles over multiple windows), rate-of-change features, time-based features, and cross-metric correlation features. 4. **Threshold Tuning Strategy:** Describe how to set and dynamically adjust anomaly thresholds to balance precision vs. recall. Include a method for handling concept drift and evolving baselines. 5. **Alert Severity Classification:** Design a 3-tier severity system (critical, warning, info) with specific criteria for each tier and recommended response actions. 6. **False Positive Reduction:** Propose at least 3 techniques to minimize false positives โ€” correlation with other signals, minimum duration filters, suppression windows, and human feedback loops. 7. **Implementation Code:** Provide a Python implementation using [FRAMEWORK, e.g., scikit-learn, PyOD, PyCaret] that processes a sample dataset, trains the detector, and flags anomalies with confidence scores. 8. **Evaluation Framework:** Define how to measure detector performance using precision, recall, F1-score, and time-to-detect. Include a method for backtesting against historical incidents. 9. **Operational Runbook:** Create a brief runbook for the on-call team: what to check when an alert fires, escalation paths, and how to provide feedback to improve the model. Structure the output with clear sections and include code with inline comments.

๐Ÿ’ก Tips for Better Results

Clearly state whether you have labeled anomaly examples โ€” this determines whether supervised, semi-supervised, or unsupervised methods are appropriate. Describe your current false positive rate and tolerance level, as this is often the biggest practical challenge in anomaly detection systems. Include information about expected seasonal patterns and known scheduled events (maintenance windows, batch jobs) to help design suppression rules.

๐ŸŽฏ Use Cases

MLOps engineers, SREs, data platform teams, and fraud analysts should use this when building or improving automated monitoring and alerting systems for critical business data.

๐Ÿ”— Related Prompts

๐Ÿ“Š Data & Analytics intermediate

Write Complex SQL Queries

Generate optimized SQL queries for complex analysis with CTEs, JOINs, and performance tips.

๐Ÿ“Š Data & Analytics intermediate

Python Data Analysis Script

Generate a complete Python data analysis pipeline with cleaning, visualization, and insights.

๐Ÿ“Š Data & Analytics intermediate

Build an RFM Customer Segmentation Model for Targeted Marketing

Create a complete RFM customer segmentation model with scoring logic, code implementation, and marketing strategies.

๐Ÿ“Š Data & Analytics advanced

Design a Robust ETL Pipeline Architecture for Your Data Platform

Design a complete ETL pipeline architecture with extraction, transformation, loading strategies, error handling, and governance.

๐Ÿ“Š Data & Analytics intermediate

Create a Comprehensive Data Quality Checklist for Your Dataset

Generate a tailored data quality checklist with SQL validation queries, severity levels, and a scoring framework for any dataset.

๐Ÿ“Š Data & Analytics advanced

Analyze and Interpret A/B Test Results with Statistical Rigor

Get a complete A/B test analysis with statistical significance, power analysis, validity checks, and a clear ship decision.