As we move forward in our exploration of Machine Learning, this week’s lesson delves into the all-important topic of evaluating model performance. In our previous session, we ventured into Natural Language Processing (NLP), uncovering the techniques to preprocess and vectorize text data, essential for any NLP task. With an understanding of how to prepare our data, we now shift our focus to measuring how well our models are performing.

What We’ll Cover

Our discussion will revolve around the following topics:

  • Cross validation
  • Proper testing practices
  • Data leakage
  • Awareness of the task
  • Performance metrics

Cross Validation

A pivotal concept in evaluating machine learning models is Cross Validation. As you may recall, we segment our dataset into three parts: training, validation, and testing. The model, comprised of hyperparameters, interacts solely with the training and validation sets during the learning process. The test set remains untouched until the final evaluation stage.

Cross Validation involves utilizing the validation set as a pseudo-test set to assess our model performance. This practice helps to mitigate common issues like overfitting and selection bias.

Key Takeaway: Cross validation ensures reliable performance metrics by validating the model against unseen data during training.

How it works

In particular, we’ll examine a commonly used technique: k-fold cross validation. This method involves partitioning the data into ‘k’ subsets. We then train the model on ‘k-1’ subsets, leaving one subset for validation. The performance measure, averaged over all ‘k’ subsets, informs us about the effectiveness of our chosen model and hyperparameters.

Key Takeaway: K-fold cross validation provides a robust estimate of model performance, mitigating the risk of overfitting.

Proper Testing Practices

After validating our model, the final step is testing it. However, certain guidelines must be observed during this phase. Primarily, we must never let our model interact with the test set, set aside at the beginning, until this stage. This approach ensures that there is no data leakage – a scenario where information from the test set ‘leaks’ into the model during training.

Key Takeaway: Proper testing practices, including avoiding data leakage, ensure a genuine evaluation of model performance.

Performance Metrics

Performance metrics vary based on the task at hand, be it regression, classification, or clustering. We assess our model’s performance on all three datasets: training, validation, and testing. Typically, the focus is on tracking the loss during training and validation. Although test loss can also be measured, it is less informative as it’s presumed the model is already fine-tuned through cross validation.

Key Takeaway: Selecting appropriate performance metrics is crucial for evaluating model performance accurately.

Why so many different metrics

Different tasks necessitate monitoring various aspects. For instance, a binary classification task might suffer from class imbalance, where the accuracy metric may be misleading. Similarly, in a medical diagnosis scenario, it is more crucial to identify all positive cases (high recall), even at the risk of false positives. Therefore, the choice of performance metric should align with the task’s specific requirements.

Key Takeaway: Different tasks require different performance metrics, hence the need to select them wisely.


This week’s session on evaluating model performance, we hope to have illuminated the importance of proper validation, testing, and choice of metrics. For the upcoming session, we will switch gears towards optimizing these performance metrics, diving into the realm of improving our machine learning models. Stay tuned for a fascinating exploration into model improvement techniques.