Back when I was in business school, they used to say there were three types of lies: lies, damn lies and statistics. In the age of Big Data and Open Data we ought to be past all that. But we also see that we live in the age of Fake News, and depending on who is throwing that term around, we can usually find confirmation bias.
Confirmation bias is the tendency for people to find evidence to confirm what they already believe to be true, and to ignore all other evidence. This is the opposite of what we should do in a data-driven world. Data should be the lead indicator. We should challenge our own underlying assumptions and look at situations with fresh eyes.
The danger is when we try to fit the data to a story we want to tell. The data should tell the story, not us. But this is difficult. Malevolent actors such as the clever yet criminally-minded accountants that perpetuated the greatest stock frauds of the 2000s found expert ways to hide liabilities and costs off-balance sheet, and therefore modified available data to make it worse than useless, but actually completely false.
Sometimes it is not even on purpose. As we try to make sense of a world of COVID data, each region and jurisdiction handles the situation somewhat differently. How do we count COVID cases? How do we count COVID deaths? Who is being tested? How do we compare apples to apples?
The important thing is that we do not attempt to script this story in advance. Do we want to tell the story of government incompetence, or central authority efficiently working as it should the best it can in bad circumstances? Do we believe we already know and just want the data to confirm it? Or are we willing to see and explore insights we maybe did not expect, and let the data do the talking?
And we also need to address deficiencies in data collection or processing because at the end of the day, our data should reflect the world as it is, and not as what we believe it should be.