Why we need to stop using Vanity Metrics to talk about COVID-19
By George McHugh, Product Manager.
Uruguay is likely to be the first country in South America and possibly the Western Hemisphere to “win” against COVID-19. Uruguay has never had a shutdown, has one of the oldest populations in the Western Hemisphere, does not mandate mask-wearing (except for public transportation), and has a relatively high population density relative to the region (at least in its capital, Montevideo).
I’m going to use Uruguay as a specific case study because it’s where I live, and where Bixlabs is based. Currently, Uruguay has less than 100 active cases, most of which are in the regions bordering Brazil.
In general terms, the USA and Uruguay are applying mostly the same measures and strategies. Then, why do they both have so radically different outcomes? What’s the difference between their approaches to COVID-19? I firmly believe that it is the metric that each country is focusing on that determines success—nothing more, nothing less.
What’s the best strategy?
Across social media and the blogosphere, the arguments focus on which country or jurisdiction is successfully winning the fight against COVID-19. Disagreements usually end up as debates over statistics, correlations, policies, and the validity of information. Most of the time, the problem isn’t really related to the numbers or the strategies used, but unconsciously and conceptually about what metrics we are tracking.
It doesn’t matter if there are lockdowns, fines, mandatory masks, amazing tests coverage, or even near-zero total deaths. All of those numbers and strategies are for naught if we aren’t defining what success really is and what it means.
In the Lean Startup approach, there is a discussion between “vanity metrics” and “value/growth metrics”. Vanity metrics tend to show ‘success’ superficially, like the number of the tests conducted (i.e. “up and to the right”) or total deaths (“down and to the right”).
These metrics are misleading because they aren’t really actionable. Tracking those numbers does not tell us which strategies and solutions are working. More tests don’t really communicate that the virus is under control, and less deaths doesn’t mean that its spread is being controlled. Do more tests and less deaths really tell us if our approach is improving? I doubt it.
The USA has some of the highest test coverage (tests/million) relative to most of the world and is the best in the Western Hemisphere. If the country succeeds on that metric, but we are having another outbreak, is this really the metric we should focus on.
Let’s take the following thought experiment, someone snaps their fingers and half of the US population is suddenly and randomly tested. That means we now have 50% test coverage. How does this information tell us whether we are succeeding or not? Sure, it gives us test results, but there isn’t any indication about whether any particular strategy was effective.
Let’s snap our fingers again, and half of the untested population now gets tested. We repeat this process every day until every person is tested at least once.
Once we get 100% coverage, it’s tempting to say that we now have all the information we need and have likely beaten the disease. But we’d have to probably retest the negatives over and over again in an extremely short period of time to really be sure that the virus isn’t spreading. We just wouldn’t know.
This example is an attempt to show that testing everyone, and even the number of tests, doesn’t really tell us much about anything, at least at first glance.
Also, unfortunately, we can’t snap our fingers and have everyone tested, even half. Time and resources are limited, test production can only grow so much to meet this almost impossible demand.
There are other edge cases, say if one person receives 400 million tests in the USA. Well, the numbers would definitely be misleading. And fundamentally, the metric of Tests per million itself is misleading no matter how many or how few tests are made, today or tomorrow. The information is definitely useful, but it isn’t how we should gauge success in the community’s response to COVID-19.
Contrary to a vanity metric, a value metric provides a way to really target and improve one’s approach. Countries which objectively have controlled the COVID-19 outbreak all have another thing in common: they are world leaders in the number of daily tests made per positive.
Using the distinction between test coverage and Tests/Positive, we can see that there are relatively few countries that have exceptionally high Tests/Positive.
The Lean Startup methodology provides 3 types of metrics to evaluate growth: viral, sticky, or paid.
We can use each of these parameters to properly measure success in the COVID-19 response. The viral metric can be used to measure the speed of infection (i.e. “flattening the curve”), the sticky metric can be used to track the success of testing, and the paid metric can be used to measure the economic viability of a specific strategy.
I’ll be focusing on specifically the nuts and bolts of the sticky metric, which I believe corresponds directly with the countries that are successfully responding to COVID-19.
The Sticky Engine
What does the daily number of Tests/Positive for each new confirmed case really mean? And how does it fit the sticky engine of growth model from the Lean Startup?
The sticky engine of growth follows this pattern:
Acquisition Rate ÷ Churn Rate
If we interpret the acquisition rate to mean the number of tests, and the churn rate the number of positives (or losses), then it means that the inventory of healthy people will expand and there will be market capture.
If the number is low, it means that maybe there are lots of tests, but there may be many confirmed cases. (100 tests made on the same day for 100 positives, i.e. 1 Tests/Positive).
There might be more tests per million in the United States, but there are only about 17 daily tests per positive.
If the number is high, it means that there are lots of tests made for each confirmed case. This is not the same as total tests. (100 tests made on the same day for 1 positive, i.e. 100 Tests/Positive).
In Uruguay, there are roughly 115 tests per positive.
Let’s consider this metric through our thought experiment from before, where each time we snap our fingers, half of the untested population gets tested. And to drive the point home, let’s say we find the same amount of positives each time we snap our fingers.
This would imply that our accuracy for determining who should get tested is exceptionally high (but it doesn’t, remember we said the tests were completely random).
The number of Tests/Positive would actually shrink with each snap, and in fact, would most likely eventually end up some of the worst numbers in the world
Hundreds of A/B Split Tests
By using Tests/Positive as the core metric, we put the focus on both increasing daily testing and reducing daily new cases. We can measure if any particular strategy is working, and we can retroactively analyze the data based on Tests/Positive over time and by jurisdiction.
We have hundreds of A/B split tests in the world in the approach to COVID-19, and it’s been long enough for us to collect and analyze the data. We can filter for the countries with extremely high Tests/Positive, look deeper into their policy decisions, and have the world learn from the success stories.