Perform a Complete Linear Regression Assumptions Check with Diagnostic Steps
Systematically verify all linear regression assumptions with diagnostic tests, visualizations, remedies, and ready-to-run code.
๐ The Prompt
Act as a statistician and help me systematically check whether my linear regression model meets the core assumptions. Provide diagnostic methods and remedies for each violation.
**Model Details:**
- Dependent variable: [DEPENDENT_VARIABLE]
- Independent variables: [LIST_INDEPENDENT_VARIABLES]
- Sample size: [N]
- Domain context: [DOMAIN_DESCRIPTION]
- Software/language used: [PYTHON_STATSMODELS/R/EXCEL/OTHER]
**Available Information (paste any you have):**
- Model summary output: [PASTE_MODEL_SUMMARY_IF_AVAILABLE]
- Any observed issues: [DESCRIBE_ANY_CONCERNS, e.g., "residual plots look fan-shaped"]
Please walk me through a complete assumptions check covering:
1. **Linearity:** How to test if the relationship between each predictor and the outcome is linear. Provide specific plots to generate and what patterns indicate violations. Suggest transformations if non-linearity is detected.
2. **Independence of Errors:** How to check for autocorrelation using the Durbin-Watson test. Explain the acceptable range and what to do if residuals are correlated (especially relevant if my data has [TIME_COMPONENT/GROUPED_STRUCTURE]).
3. **Homoscedasticity:** How to detect heteroscedasticity visually (residuals vs. fitted plot) and formally (Breusch-Pagan test, White's test). Provide remedies such as WLS, robust standard errors, or variance-stabilizing transformations.
4. **Normality of Residuals:** How to assess using Q-Q plots, Shapiro-Wilk test, and histogram of residuals. Clarify when normality violations matter (small samples) vs. when they can be safely ignored.
5. **Multicollinearity:** How to compute VIF for each predictor, interpret the values, and decide which variables to drop or combine.
6. **Influential Observations:** How to identify high-leverage points and outliers using Cook's distance, leverage values, and DFFITS. Provide decision criteria for removal.
7. **Code Implementation:** Provide complete, runnable code in [SOFTWARE/LANGUAGE] to perform all the above diagnostics.
8. **Decision Summary:** Create a checklist table with columns: Assumption | Test Used | Result | Action Needed.
๐ก Tips for Better Results
Paste your actual model summary output so the AI can reference specific coefficients and statistics. Always specify your programming language to get usable code. If your data is time-series, emphasize this โ it fundamentally affects the independence assumption.
๐ฏ Use Cases
Data analysts, researchers, and students use this after fitting a linear regression to validate their model before drawing conclusions or making predictions.