Under/over fitting — The bias/variance dilemma

5 min readJan 27, 2021

Machine Learning is a pretty fascinating field. Generally, you give your program some inputs and it gives you a prediction. But I guess we both know that it’s easier said than done. There are a lot of things to take care of before we can get our program to work and a lot of mistakes to avoid.

Speaking of mistakes to avoid, there are two key terms that all data scientists take into account every time they build their model. They are Over fitting and Under fitting.

Before we understand what both of these terms are, we need to understand that the primary objective of a model is to ‘generalize’ to the data that is provided.

What is Generalization?

According to Google’s ML crash course:

Generalization refers to your model’s ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to create the model.

Basically, it means that a model that generalizes well is a model that learns the patterns and characteristics of our data instead of memorizing all the details of the data itself.

An ideal model strives to generalize well to both the training as well as the test i.e. unseen data.

What is overfitting?

Overfitting refers to a phenomenon where our model learns all the tiny details of our training data instead of noticing the patterns in the inputs/features. A model which has overfitted will perform exceptionally well on the data it has been trained on. However, it will perform very poorly on previously unseen data, since it doesn’t know the underlying characteristics of data. It will adapt itself to even the noise of the training data, which would naturally be a problem in our modelling process.

Overfitting happens when we train our model for a longer period of time than is required. Though there are multiple ways to prevent overfitting, one easy way is through early stopping i.e. halting the training process before it can overfit.

But how we do we know when to stop this process? What happens if we stop it too early?

Underfitting

Underfitting as the name suggests is the polar opposite of overfitting. Underfitting happens when we stop the training process prematurely.

An underfitted model can neither model the training data nor the test data.

However, underfitting is not as common a problem as overfitting, since it is easy to detect underfitting by the model’s metric.

Give it to me in layman’s terms?

A good example to understand this concept would be via a math problem.

Consider you’re studying to give a math exam. You would naturally have math problems with multiple variables along with multiple question structures (same problems asked in different ways). Suppose you rote learn how to solve a problem by going through it repeatedly and memorizing it.

Now, in your exam, you may be faced with an unseen problem i.e. maybe the variables are unknown or maybe the structure of the problem is slightly different. Since you have memorized the problem rather than learning how to solve it (i.e. learning it’s pattern), obviously you’ll be stuck in a tough situation and won’t be able to solve the exam problem correctly. You’ll have “overfitted” yourself with your textbook data.

Similarly, if you didn’t study enough for the exam, you still won’t be able to solve your paper and you’ll have “underfitted” yourself.

Bias/Variance

A simple definition of bias would be “difference between expected value and true value.” Bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause the model to miss the relevant relations between features and target.

Variance measures how far a set of numbers are spread out from their average value. It is an error from sensitivity to small fluctuations in the training set.

Keeping aside the formal definition for both bias and variance, it’s safe to say that ‘high bias leads to underfitting’ whereas ‘high variance will lead to overfitting’.

When there is a high bias error, it results in a very simplistic model. This model does not adapt to the variations in the data. As explained above, it cannot model to the training as well as test(unseen) data.

When the training is insufficient, the model is too simplistic and cannot differentiate well between the two variables.

Underfitted Model (taken from analyticsvidhya)

When the variance is high, the model learns everything about the training data down to it’s noise. It is an overly sensitive model. This results in an overfitted model which might be too complex.

Overfitted Model (taken from analyticsvidhya)

Bias and Variance are just like Yin and Yang. Both have to exist simultaneously or there will be problems. Just like overfitting and underfitting, they are inversely proportional terms. As one increases the other decreases.

While modelling, we aim for the sweet spot in between where both bias and variance are at their lowest. This is known as bias-variance tradeoff.

Bias Variance Tradeoff (taken from analyticsvidhya)

By managing the tradeoff, we aim to reduce the error which gives us an optimal model. If the variance is too high, the model will be too complex. If the bias is too high, the model will be too simple. An optimal model with such bias and variance would be called a good/ideal fit.

Such a desired model would generalize well to our required data but won’t adapt to the outliers/noisy data.

So, in conclusion, high bias leads to underfitting and high variance leads to overfitting. We cannot rid of both of them completely but we can ensure that we reach a minimal error stage by compromising the model’s complexity.