The questions we ask
‘All models are wrong, but some are useful.’
George Box FRS
In my role as a data scientist, I regularly find myself totally blown away by how stupid computers are, however, I can forgive them, as it’s not really their fault. The stupidity that I think is unforgivable is the determination of businesses and governments across the world to replace human decision making with computer algorithms. It didn’t take me long to assemble a rogue’s gallery of some notorious ethical disasters which have resulted from the misuse of machine learning algorithms in decision making. The three examples I will touch on in this post are, software for predicting reoffending risk in the USA, which was biased against African Americans; Amazon developing an algorithm for screening CVs, which produced sexist outcomes; and the UK Government’s attempt to predict A-level results, which downgraded children attending state schools in favour of the privately educated.
Many of these ethical breaches can be traced back to the fact that computers are stupid. They do not understand the social, legal or economic context of what they are being asked to do. For example, Amazon used the CVs of people who had previously been successful at their company to train their recruitment algorithm. Which on the surface seems like a fairly sensible idea, and likely began with a business question posed to the data science team along the lines of, “These employees did well in the past, can you find more candidates like them from our huge pile of applications?”. The data science team will then have built a model with something like this question in mind, “based on what we know about what it takes to be successful at Amazon, can you find the best candidates from this pile of CVs?”. Finally, the last link in the chain, the computer, will actually have been answering a question stripped of all context, “which CVs are the most similar to those I was trained with?” This gradual simplification of the question had profound impacts on the final results, such as penalising CVs from two all-women’s colleges in the US, because in a male dominated sector, previously successful employees were pretty unlikely to have hailed from those schools.
The specific technical details of the reoffending risk assessment algorithm remain a trade secret, as the development of the software in question was done by a private company. However, given the high proportion of African Americans in the US penal system and the complexities at the intersection of race and policing, I would suggest that it is unlikely the question which eventually made its way to the computer was sufficiently nuanced to be truly fair. This inference is borne out by results that African Americans were twice as likely to be labelled at a higher risk of reoffending, but did not.
Scientists for Labour previously wrote a briefing on the Ofqual A-level results algorithm, which was not only an inappropriate tool for answering the question, but was also seemingly deeply flawed in its technical development. This led to results which would have compounded class inequalities for an entire cohort of 18 year olds had the Government not been forced into a humiliating U-turn.
The quote at the top of this blog post from Professor Box describes the reality of life as a data scientist. The models we make do not reveal some objective truth of the world, they are an approximation constructed through our decisions on what data to include, which algorithms to use and our understanding of the question we’re trying to answer. Whilst the results can be useful, in many cases they are designed to be used to support humans taking decisions, not to replace them. The distinction is important, as the computer is unlikely to understand nuance, complexity, legality and fairness.
Machine learning, computer analytics, algorithms and models aren’t going away. In fact, they are increasingly becoming an important aspect of our lives, so it is imperative that we find a way to deal with the ethics of computer assisted decision-making. The answers are scepticism and transparency. Decision-makers must be trained to understand the uncertainty of the results they are presented with, and the assumptions which are baked in. Rather than leaving a data science team to develop a model in isolation, leaders and managers who set the initial requirements should be empowered to regularly question the assumptions and choice, and where necessary scrap projects without recrimination if the model is deemed incapable of sufficiently dealing with the nuances of human life. Finally, in the public sphere models must be transparent. The Royal Statistical Society offered to assist Ofqual with the A-level results model, but withdrew that offer due to the non-disclosure agreements they were presented with. As the public we have a right to know what decisions are being made and what information is being used to make them. Without scepticism and transparency we risk locking inequality into everything from justice to health to education without even realising it has happened.