Machine Learning: Notes on Overfitting

A model is an overfit when it has learned certain unique features of the training data, but not enough general ones to perform well on unseen data. It is thus not ready for real-world use.


  • If a model has high variance and low bias, its training accuracy increases, but validation accuracy decreases with each epoch.
  • If the training set has noisy data, it may decrease the validation accuracy and increase the variance.
  • If the model is too complex, the bias will be low, but variance high.
  • If the training data is not large or varied enough, then the model will get to explore only a few scenarios or possibilities. The patterns in the unseen data will be new to it.

Detecting overfitting:

We start testing our data as soon as we can, so we know how it is performing on unseen images. We take actions as per the demand and need.

Preventing Overfitting

(1) More training data: With more data, the model has more chance to learn patterns general to the data at hand.

(2) Data augmentation: We apply transformations to the existing dataset to increase the dataset size artificially. Also, the model gets to see images from more perspectives because of the transformations.

(3) Standardization: Without normalized inputs, weights tend to vary dramatically, causing overfitting and high variance. The model becomes too focused on a few features.

(4) Feature Selection: Too many features confuse the model as to what to focus on. It ends up learning even the non-relevant ones–given the scarcity of training data–and cannot perform well on unseen images.

(5) Cross-Validation: Cross validation makes sure we are not monolithic with respect to the training and validation data. It removes any chance of the model’s dependency on the training data.

(6) Early Stopping: We stop the training when the validation loss begins to rise. The idea is to capture that set of weights which generalize most. We stop the model from learning more when the noise in the training set begins to impact it.

(7) Ensembling: We combine multiple strategically generated models, such as classifiers or experts, in this technique to obtain better predictive performance. It reduces variance, minimizes modeling method bias, and decreases the chances of overfitting.

(8) Regularization: Regularization decreases the complexity of the model by significantly reducing the variance while only slightly increasing the bias. The most widely used regularization methods are L1(Lasso), L2(Ridge), Elastic Net, Dropout, Batch Normalization, etc.



We put ghosts in machines.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store