Perform a Complete Linear Regression Assumptions Check with Diagnostic Steps

Systematically verify all linear regression assumptions with diagnostic tests, visualizations, remedies, and ready-to-run code.

๐Ÿ“ The Prompt

Act as a statistician and help me systematically check whether my linear regression model meets the core assumptions. Provide diagnostic methods and remedies for each violation. **Model Details:** - Dependent variable: [DEPENDENT_VARIABLE] - Independent variables: [LIST_INDEPENDENT_VARIABLES] - Sample size: [N] - Domain context: [DOMAIN_DESCRIPTION] - Software/language used: [PYTHON_STATSMODELS/R/EXCEL/OTHER] **Available Information (paste any you have):** - Model summary output: [PASTE_MODEL_SUMMARY_IF_AVAILABLE] - Any observed issues: [DESCRIBE_ANY_CONCERNS, e.g., "residual plots look fan-shaped"] Please walk me through a complete assumptions check covering: 1. **Linearity:** How to test if the relationship between each predictor and the outcome is linear. Provide specific plots to generate and what patterns indicate violations. Suggest transformations if non-linearity is detected. 2. **Independence of Errors:** How to check for autocorrelation using the Durbin-Watson test. Explain the acceptable range and what to do if residuals are correlated (especially relevant if my data has [TIME_COMPONENT/GROUPED_STRUCTURE]). 3. **Homoscedasticity:** How to detect heteroscedasticity visually (residuals vs. fitted plot) and formally (Breusch-Pagan test, White's test). Provide remedies such as WLS, robust standard errors, or variance-stabilizing transformations. 4. **Normality of Residuals:** How to assess using Q-Q plots, Shapiro-Wilk test, and histogram of residuals. Clarify when normality violations matter (small samples) vs. when they can be safely ignored. 5. **Multicollinearity:** How to compute VIF for each predictor, interpret the values, and decide which variables to drop or combine. 6. **Influential Observations:** How to identify high-leverage points and outliers using Cook's distance, leverage values, and DFFITS. Provide decision criteria for removal. 7. **Code Implementation:** Provide complete, runnable code in [SOFTWARE/LANGUAGE] to perform all the above diagnostics. 8. **Decision Summary:** Create a checklist table with columns: Assumption | Test Used | Result | Action Needed.

๐Ÿ’ก Tips for Better Results

Paste your actual model summary output so the AI can reference specific coefficients and statistics. Always specify your programming language to get usable code. If your data is time-series, emphasize this โ€” it fundamentally affects the independence assumption.

๐ŸŽฏ Use Cases

Data analysts, researchers, and students use this after fitting a linear regression to validate their model before drawing conclusions or making predictions.

๐Ÿ”— Related Prompts

๐Ÿ“Š Data & Analytics intermediate

Write Complex SQL Queries

Generate optimized SQL queries for complex analysis with CTEs, JOINs, and performance tips.

๐Ÿ“Š Data & Analytics intermediate

Python Data Analysis Script

Generate a complete Python data analysis pipeline with cleaning, visualization, and insights.

๐Ÿ“Š Data & Analytics intermediate

Build an RFM Customer Segmentation Model for Targeted Marketing

Create a complete RFM customer segmentation model with scoring logic, code implementation, and marketing strategies.

๐Ÿ“Š Data & Analytics advanced

Design a Robust ETL Pipeline Architecture for Your Data Platform

Design a complete ETL pipeline architecture with extraction, transformation, loading strategies, error handling, and governance.

๐Ÿ“Š Data & Analytics intermediate

Create a Comprehensive Data Quality Checklist for Your Dataset

Generate a tailored data quality checklist with SQL validation queries, severity levels, and a scoring framework for any dataset.

๐Ÿ“Š Data & Analytics advanced

Analyze and Interpret A/B Test Results with Statistical Rigor

Get a complete A/B test analysis with statistical significance, power analysis, validity checks, and a clear ship decision.