Recall and Precision: how is the performance of a prediction model measured?

MadKudu uses two metrics to measure the performance of a model: Recall and Precision.

Key takeaways

Recall answers the question "Are we flagging all converters as good/very good ?"
Precision answers the question "Are we also flagging as good/very good non converters?"
You can't optimize on both metrics, improving one will degrade the other one.
You want to have a model where the 20/80 rule applies: identify 20% of your leads which would account for 80% of your conversions. That's in theory. In practice, you would be satisfied with a model where
- - 15-35% of your leads are marked very good/good
  - They account for at least 60% of your conversions (Recall)
  - and the very good segment conversion rate is at least x10 (Precision) higher than the low segment conversion rate

Key considerations

Optimizing for Recall means flagging more leads as good/very good to make sure you don't miss out on opportunities. Downside: you will also flag non-converters.

Optimizing for Precision means flagging fewer leads as good/very good to avoid highlighting non-converters. Downside: you will miss on opportunities (converters).

Let's take an example: You receive 100 leads a day, 10 should convert into an opp.

Do you prefer to

Scenario 1 (Recall optimization): send 30 leads to your Sales rep to work on, to find 8 of the 10 who should convert? (Recall 80%, Precision x9)
Scenario 2 (Precision optimization): Or to send only 10 leads to your Sales rep, but to find 6 who should convert? (Recall 60%, Precision x14)

This means in Scenario 2, your Sales rep has a higher chance (6/10 =60%) to get deals (and their com'), while in Scenario 1, the Sales rep has a lower chance (8/30=26%) but should get 8 deals (and a bigger com').

Choosing one metric over the other depends highly on your organization's

strategy to optimize either on deal number (scenario 1), or on your Sales team time (scenario 2)
sales team's compensation systems
sales team's ability to trust your lead routing regardless of some false positives (scenario 1 - being ok with some garbage leads), or of some false negatives (scenario 2 - being ok with missing out on opps)

Here is a recap:

	Metric Optimization
	Recall	Precision
Minimize	False-negative (missed out opp)	False-positive ("garbage" leads marked qualified)
Maximize	Number of conversions	Conversion rate
When to use	High sales team capacity	Small sales team capacity
Risk	Sales team says the scoring is letting through leads that shouldn't be marked qualified (false positive) and wasting their time.	Sales team says the scoring misses potential converters (false negative), making them miss out on opportunities

If this seems still obscure, here is an (extreme) example: Let's say you are a B2B Enterprise Payroll software, what would your Sales team think if:

They come across a student lead marked as a very good fit for your B2B Enterprise software in payroll? This lead is likely a false positive.
They see the VP of Finance of Salesforce marked as low fit? This lead is likely a false negative.

If you prefer to risk scenario 1 than 2, it means you are optimizing for Recall, while if you prefer to risk scenario 2, it means you are optimizing for Precision.

So now that you know what these metrics are for, let's get into how they are calculated and how you can optimize the model on one or another.

Recall

In theory

The "recall" metric (also called sensitivity) is a measure of the fraction of positive outcomes that were correctly scored. Recall is therefore defined and computed as:

The terms positive and negative refer to the classifier's prediction (sometimes known as the expectation), and the terms true and false refer to whether that prediction corresponds to the external judgment (sometimes known as the observation).

In practice

Cool cool, but what about lead scoring here?

Well, in the context of lead scoring predicting conversions, Recall is the % of conversions correctly predicted by the model (aka, the % of conversion scored very good and good).

True positive = people scored very good/good and who converted
False negative = people who converted but were scored low/medium
True negative = people scored low/medium who didn't convert
False positive = people scored very good/good but didn't convert

Optimizing for Recall means you want to maximize the number of True Positive and minimizing the number of False Negative. In other words, preferring to score more people very good/good than low/medium to make sure you are not missing out on some opportunities. But this would mean also having more False positives.

On the performance graph which shows the distribution of conversions by their customer fit segment., the Recall corresponds to the % of very good + % of good conversions.

Here is an approximative rule of thumb we have at MadKudu:

Pretty bad model: Recall < 50%
Not awesome but that will do (or best you can): Recall = 50 - 60%
Decent model: Recall = 60-70%
Pretty good model: Recall = 70-80%
Excellent model: Recall > 80%

Why not 100%?

Oh you could reach 100%, but probably your Precision (see below) is terrible or the model completely overfitted. If you score 100% of your leads very good, you would have 100% of your conversions correctly predicted very good, no?

This is why we need this second metric.

Precision

In theory

The "Precision" metric (also called specificity) is the opposite of the Recall as it measures the rate of false positives. Precision is therefore defined and computed as:

In practice

The Precision monitors if the model is correctly identifying (with precision) the leads who convert at a higher rate than others. Ideally, we want to have at least a 10x difference in conversion rate between the very good and the low. This means the very goods will actually have a much higher probability to convert than the lows.

In strict application of the theory, the Precision should be

but we are making a proxy by visualizing it / computing it rather as

Recall or Precision?

Let's close the loop on what we were saying at the beginning on optimizing on one metric rather than the other. At the end of the day, do you want your sales team to focus on a larger population of MadKudu qualified leads or less? The former means your Sales team might run into more false positives, while the latter means your team might miss out on some actually qualified leads. If you want to focus on more, then optimize for recall. If you want to focus on less, then optimize for precision.