Evaluating Your Machine Learning Model: Key Metrics for Regression

In our previous post, we discussed metrics for evaluating classification models. Now, let's turn our attention to regression models, where the goal is to predict a continuous numerical value (e.g., predicting house prices, forecasting stock values, or estimating temperature).

Evaluating regression models involves measuring how close the model's predictions are to the actual true values. Here are some of the most common metrics used for this purpose. These models are a core part of Supervised Learning within the broader field of Machine Learning.

Let y_i be the actual true value for the i-th observation, and ŷ_i (y-hat) be the predicted value by the model for the i-th observation. Let n be the total number of observations. Understanding these key ML concepts is crucial before diving into metrics.

1. Mean Absolute Error (MAE)

What it is: The average of the absolute differences between the predicted values and the actual values.
Formula: MAE = (1/n) * Σ |y_i - ŷ_i| (Sum over all observations)
Interpretation: MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It gives you an idea of how far off your predictions are on average, in the original units of your target variable.
- Lower MAE is better (0 is a perfect score).
Pros:
- Easy to understand and interpret (it's in the same units as the target variable).
- Less sensitive to outliers compared to MSE because it doesn't square the errors.
Cons:
- It doesn't penalize large errors as much as MSE.

Understanding the impact of data cleaning and feature engineering is also important, as these preprocessing steps can significantly affect model performance and thus these metrics.

2. Mean Squared Error (MSE)

What it is: The average of the squared differences between the predicted values and the actual values.
Formula: MSE = (1/n) * Σ (y_i - ŷ_i)²
Interpretation: MSE also measures the average magnitude of errors.
- Lower MSE is better (0 is a perfect score).
Pros:
- Penalizes larger errors more heavily than smaller errors due to the squaring. This can be useful if large errors are particularly undesirable.
- It's a differentiable function, which makes it easier to use in optimization algorithms.
Cons:
- The units are the square of the original target variable's units (e.g., if predicting price in dollars, MSE is in dollars-squared), which can make it harder to interpret directly.
- More sensitive to outliers than MAE because squaring large errors makes them even larger. Sensitivity to outliers can sometimes be an indicator of issues like overfitting.

3. Root Mean Squared Error (RMSE)

What it is: The square root of the Mean Squared Error (MSE).
Formula: RMSE = √MSE = √[(1/n) * Σ (y_i - ŷ_i)²]
Interpretation: RMSE is one of the most popular metrics for regression tasks.
- Lower RMSE is better (0 is a perfect score).
Pros:
- It's in the same units as the original target variable (like MAE), making it easier to interpret than MSE.
- It still penalizes larger errors more than MAE due to the underlying squaring in MSE.
Cons:
- Still sensitive to outliers (though less so than MSE in terms of scale, the influence of outliers on the value itself is preserved from MSE).

MAE vs. RMSE: If you want to know the average error in the original units and treat all errors equally, MAE is a good choice. If you want to penalize large errors more significantly and still have interpretable units, RMSE is often preferred. RMSE will always be greater than or equal to MAE.

4. R-squared (R² or Coefficient of Determination)

What it is: A statistical measure that represents the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). This metric is often discussed when learning about common ML algorithms.
Formula (Conceptual): R² = 1 - (Sum of Squared Residuals / Total Sum of Squares)
- Sum of Squared Residuals (SSR): Σ (y_i - ŷ_i)² (same as the numerator in MSE before averaging)
- Total Sum of Squares (SST): Σ (y_i - ȳ)², where ȳ is the mean of the actual values.
Interpretation:
- R² ranges from 0 to 1 (though it can be negative for very poor models that perform worse than just predicting the mean).
- An R² of 0 means the model explains none of the variability of the response data around its mean.
- An R² of 1 means the model explains all the variability of the response data around its mean (a perfect fit).
- For example, an R² of 0.75 means that 75% of the variance in the target variable can be explained by the model's features.
- Higher R² is generally better.
Pros:
- Provides a relative measure of the goodness of fit of a model.
- Easy to interpret as a percentage of variance explained.
Cons:
- R² always increases or stays the same when you add more features to the model, even if those features are not actually useful. This can be misleading.
- Doesn't tell you if the model is biased or if the predictions are accurate in an absolute sense, only how much variance it explains. Concerns about model bias are addressed in Bias in AI.

5. Adjusted R-squared

What it is: A modified version of R-squared that adjusts for the number of predictors (features) in the model.
Why it's used: Unlike R², Adjusted R² increases only if the new feature improves the model more than would be expected by chance. It can decrease if a new feature doesn't add enough explanatory power.
Interpretation: Similar to R², but it provides a more honest assessment of the model's explanatory power, especially when comparing models with different numbers of features.
- Higher Adjusted R² is generally better.

Choosing the Right Regression Metric

MAE, MSE, and RMSE are all measures of the average error. They are absolute metrics, meaning their values depend on the scale of your target variable.
- Use MAE if you want errors in the original units and don't want to heavily penalize outliers.
- Use RMSE if you want errors in the original units but want to penalize large errors more.
- MSE is often used internally by optimization algorithms but is less interpretable.
R-squared and Adjusted R-squared are relative metrics that tell you the proportion of variance explained.
- Use R-squared for a quick assessment of fit, but be aware of its limitations with many features.
- Use Adjusted R-squared when comparing models with different numbers of features or to get a more conservative estimate of explanatory power.

It's often a good practice to look at multiple metrics to get a comprehensive understanding of your regression model's performance. For example, a model might have a high R-squared but also a high RMSE, indicating it explains a lot of variance but still has large average errors. This holistic view is part of a good Data Science practice and is relevant to understanding future trends in Machine Learning.

Visualizing your predictions against actual values (e.g., with a scatter plot) and examining the distribution of residuals (the differences y_i - ŷ_i) are also crucial steps in evaluating regression models, beyond just looking at these numerical metrics.

Which regression metric do you find most intuitive or useful? Why?