After gathering data, it has to undergo data cleaning, pre-processing, and wrangling. Next, you’ll have to feed it into an outstanding model and get output in probabilities. All these make up the confusion matrix.
In this guide, you’ll discover answers to the “what is a confusion matrix” question. You’ll also discover what a confusion matrix tells you, and when you should use a confusion matrix.
First…
A confusion matrix summarizes the performance of a machine learning model on a set of test data. That is, it displays the number of accurate and inaccurate instances based on the model’s predictions.
The matrix shows the number of instances produced by the model on the test data.
Here are easy ways of reading and interpreting a confusion matrix.
The confusion matrix for a binary classification problem (two classes, denoted as Positive and Negative) looks like this:
Here’s how the performance metrics are calculated.
Here are the major reasons why a confusion matrix is essential for evaluating the performance of a classification model.
Confusion matrix offers insights into the performance of a classification model:
A confusion matrix is useful in these scenarios:
Stage 1: Logging in to Power BI
Source | Target | Count |
Class-1 instances correctly classified as class-1 | Predicted Class-1 | 10 |
Class-1 instances misclassified as class-2 | Predicted Class-2 | 6 |
Class-2 instances misclassified as class-1 | Predicted Class-1 | 2 |
Class-2 instances correctly classified as class-2 | Predicted Class-2 | 12 |
From the data, you’ll see the classification model’s performance: 10 Class-1 instances are correctly identified, while 6 are misclassified as Class-2. For Class-2, 12 instances are correctly classified, but 2 are misclassified as Class-1.
The four values in a confusion matrix are:
Type 1 error (False Positive): Predicted positive but negative.
Type 2 error (False Negative): Predicted negative but positive.
A good confusion matrix shows high values on the diagonal (True Negatives and True Positives) and low values off-diagonal (False Negatives and False Positives). All these help to indicate accurate predictions across classes.
A confusion matrix is designed to show model predictions versus the actual outcomes in a classification task. It helps in evaluating model performance and understanding errors (like false negatives/positives). It also helps in calculating metrics like recall, precision, and accuracy.
With a confusion matrix, you can easily set decision thresholds for classification inputs. Stakeholders have the option of adjusting these thresholds based on the trade-offs between different types of errors.
To analyze the confusion matrix, you’ll have to use good visuals — and that’s where tools like ChartExpo come into play.