Predictive Analytics, Public Services and Poverty

by Eleanor Shearer

Artificial Intelligence and the Transformation of Government

Artificial Intelligence (AI) technologies are set to transform the way that governments deliver public services. Virtual assistants that answer citizen queries; sentiment analysis systems that track public reactions to government policies; tools that can automatically sort vast numbers of government files by topic; and facial recognition software that the police can use to identify those with outstanding arrest warrants, are all examples of AI technologies currently used to try and make governments more efficient and more responsive to citizens’ needs.

 

One key area of AI in which governments are showing increasing interest is predictive analytics – the use of AI to predict future outcomes based on historical observations. Computers can trawl through vast amounts of data to find hidden patterns, identifying links between particular factors and increased likelihood of a particular outcome – for example, a crime occurring, or a patient in a hospital responding to treatment. Making more comprehensive and more accurate predictions is a worthy goal for public servants to have, but some research suggests that predictive analytics might unfairly target poor and vulnerable citizens, because of biases in the available data on which these new tools are trained and deployed. 

Data and Poverty

Last year, Associate Professor of Political Science at the University at Albany Virginia Eubanks’ book Automating Inequality: How High-Tech Tools Profile, Policy, and Punish the Poor made an important contribution to the growing field of AI ethics and algorithmic fairness. In the book, Eubanks explores how new automated systems used in public services across the United States unfairly target poor and vulnerable citizens. She looks at three particular case studies: the automation of eligibility processes for public assistance programmes in Indiana; a system to match homeless people with public housing resources in Los Angeles; and a predictive risk model used by child protective services in Allegheny County, Pennsylvania.

 

In discussing the Allegheny County case, Eubanks makes a point that is of particular interest to all governments, local and national, that want to harness the power of new technologies to improve the way they deliver public services. She points out that one of the problems with the Allegheny County risk model is that it often relies on the parents it assesses having accessed public services. For example, a parent with a history of drug addiction, or with a history of mental illness, will receive a higher risk score than someone with a record of neither. To assess these factors, the risk model uses government databases containing patient records from addiction centres or from mental health treatment centres. This means that if someone is wealthy enough to have paid for private treatment, they will not have their addiction or illness featured in their risk score.

 

Eubanks raises an important and often overlooked point here about data and poverty. In international development, people refer to ‘data poverty’ or ‘data deprivation’ – where the poorest countries in the world do not collect adequate data about their citizens, meaning some of the world’s poorest people are not represented in databases about development. Data deprivation suggests that the problem for poor people in a data-driven age is that their data is not collected by governments, rendering them invisible to data analysis tools. However, especially in developed countries like the UK and the US the problem may not be invisibility, but hypervisibility. Wealthier citizens may opt for private alternatives to public services like healthcare, especially when those services come under strain. Meanwhile, poorer citizens claiming for means-tested benefits are subject to additional scrutiny and data collection that those who do not require government assistance can avoid. All of this means that governments may end up with more data on poorer citizens than on wealthier ones. 

 

Furthermore, Eubanks’ case studies indicate that governments may be able to collect more data, including incredibly personal and sensitive data from some of the most vulnerable public service users, and may end up with more rights over that data. Those who rely on the government for life-saving welfare payments, Medicare, or access to housing, will put up with more intrusive terms and conditions on data-gathering to access these benefits, that someone for whom government assistance matters less. Add to this the fact that public hysteria about benefit fraud and the myth of the ‘undeserving poor’ means governments often want to collect vast amounts of personal data from those accessing welfare, to ensure that their needs are real.

 

The rights that poor and vulnerable people have over their data may also be shaky if signing those rights away is (or seems to be) a prerequisite for accessing public services. One of the cases Eubanks discusses is a tool for matching homeless people in Los Angeles with public housing depending on their needs. To be in with a chance of accessing public housing, homeless people must fill in a long survey with an outreach worker, providing personal details including whether they suffer from mental illness; whether they have accessed emergency services for sexual assault or domestic violence; whether they have had sex for money or run drugs for someone; and whether they have attempted self-harm or attempted to harm others. The consent form for the survey suggests that data will be shared with ‘organizations [that] may include homeless service providers, other social service organizations, housing groups, and healthcare providers,’ and states that a full privacy notice can be provided on request. If anyone were to request the full privacy declaration, they would find that the data is shared with 168 organisations, including the Los Angeles Police Department, who are able to access personal data in the system without a warrant. As Eubanks notes, it is hard to imagine a database of those receiving mortgage tax deductions or federally subsidised student loans being accessible to law enforcement without a warrant.   

Algorithmic Bias

As AI becomes an ever-increasing presence in our lives, the issue of bias or unfairness in these new technologies is attracting more public attention. The stakes are especially high when governments could end up using technologies that perpetuate existing social inequalities. The powers that the government wield over its citizens (powers like the right to arrest someone and deprive them of their liberty, or the right to take away someone’s child) mean that when governments use tools that are unfair – such as facial recognition tools that are more likely to misidentify black faces – the consequences for citizens can be severe. The issue of algorithmic fairness is critical in the future of public services.   

 

It is therefore concerning that the discrepancy between the data that governments have from poorer citizens compared to wealthier ones affects the fairness of predictive analytics. This bias can occur in two ways. Firstly, tools trained on historical data that over-represents poor people is likely to make predictions that are skewed. In her book on the future of AI in society, mathematician Hannah Fry highlights how algorithms developed for predictive policing can end up targeting particular areas in a self-reinforcing loop. If certain (often poor, and often majority-BME) neighbourhoods are flagged as high risk of crime due to being historically overrepresented in the data on previous crimes (whether due to a genuine increased risk of crime, discriminatory over-policing of poor people and people of colour, or a combination of the two), then the police will send more officers to these areas. An increased police presence is likely to lead to officers identifying more crime in that area – meaning that the initial inequality only gets further entrenched when the system receives new data that marks these neighbourhoods as even riskier than before. This is one way in which AI systems can end up targeting poor people unfairly due to discrepancies in the underlying data.    

 

The second problem is that, when predictive models are applied, poor people are more likely to be flagged for certain risk factors if the government has more data on poorer citizens. For example, as discussed above, the government has data on citizens who have accessed public treatment for addiction, but none for those accessing private treatment. This is the issue that Eubanks flags with the Allegheny County risk model for child abuse and neglect. Highlighting the disparity caused by the risk model (what she calls poverty profiling), she writes:

 

Professional middle-class families reach out for support all the time: to therapists, private drug and alcohol rehabilitation, nannies, babysitters, afterschool programs, summer camps, tutors, and family doctors. But because it is all privately funded, none of those requests ends up in Allegheny County’s data warehouse. The same willingness to reach out for support by poor and working-class families, because they are asking for public resources, labels them to their children in the [predictive model].

(p. 166)

 

In this way, even when a model has outlined a reasonable predictive pattern (such as a link between drug addiction and child neglect), it does not treat all cases equally. Poor individuals are more likely to be targeted (and also more likely to be targeted after reaching out for treatment, when they are likely trying to do better for themselves and their child) than wealthy individuals.

Better Data or Better Data Rights?

The problem with algorithmic bias in these cases of predictive analytics is – as in almost all cases – a problem of the underlying data that models can work with. If one group gets over- or under-represented in the data, the algorithm will end up being biased. The racial bias that has been observed in facial recognition software, for example, is largely due to training data that contains far more pictures of white than non-white people. Another example would be the automatic CV screening tool that Amazon allegedly had to pull because it was biased against women. The system was trained on the people previously hired by the company – which, due to sexist perceptions about the role of women in tech – tended to be men. 

 

However, what makes this predictive analytics case especially challenging is that the solution to the data problem is less straightforward that in these other cases of AI unfairness. The proposed technical fix in these other cases is to go out and collect more data from underrepresented minorities or marginalised groups, in order to make the dataset more diverse. However, the government often cannot go and collect more data from wealthier citizens to fix biased predictive tools, especially when the data it has on poorer citizens is so intrusive. Most people would refuse to surrender the information that they are having sex for money, or that they suffer from a drug addiction, usually out of a legitimate concern for their privacy. The reluctance of wealthier citizens to share this kind of information with the government raises questions about whether it is right to gather such intrusive data about poor and vulnerable citizens in the first place, let alone extend the collection of this data to encompass the entire population. 

 

This leaves governments in a tricky position – predictive analytics presents an unparalleled opportunity to improve many public services, but in its current form may be biased. The solution to balancing the need for fairness with the drive to make governments better at helping citizens might, in this case, not be better datasets, but better data rights – and especially better data rights for those often least able to advocate for them.  

 

The movement for data rights has grown stronger in recent years, and legislation such as the EU’s GDPR aim to enshrine individuals’ data rights in law. It is essential that we recognise how access to these rights could be threatened by certain kinds of disadvantage. For example, the GDPR does set a high bar for consent, and also advises that public authorities should avoid making consent to the processing of personal data a precondition of a service. However, in practice, this recommendation will have to contend with deep-rooted cultures of suspicion of welfare recipients, that pushes governments to collect as much data as possible in order to minimise possible fraudulent claims. The GDPR also enshrines rights to erasure of personal data (also known as ‘the right to be forgotten’), but allows latitude for governments to apply certain ‘reasonable’ requirements such as an administrative fee to process the request if it will be onerous to complete, as well as proof of identification. These requirements could be barriers to the most vulnerable citizens lobbying for the erasure of their sensitive data, leaving them with digital shadow that could follow them for years (if not for life).

 

In many cases, the incentives to improve predictive analytics and the incentives to improve data rights for everyone will conflict. If poorer citizens gain better access to and understanding of their rights, governments may not be able to collect data on as broad and intrusive a scale. This will necessarily limit the scope of predictive analytics – and this may well mean that in some cases we catch fewer criminals, or detect fewer cases of child abuse. However, we must ask ourselves – what price, as a society, are we willing to pay for better prediction? After all, one way to reduce crime would be to imprison arrested on suspicion of a crime, and one way to reduce child abuse would be to take away all children from parents that are suspected of abuse. The fact that we do not take this approach is evidence of the value we place on fairness – we are willing to risk some criminals or abusers going free so that innocent people are not unjustly imprisoned or do not wrongly have their child taken away from them. Sacrificing the rights of disadvantaged people for the sake of ‘better predictive analytics’ would do a disservice to this existing commitment to fairness and justice.     

Conclusion

The case of unfairness in government uses of predictive analytics teaches us two things. First, there is a need to bring the conversations about data rights and algorithmic unfairness together. The answer is not always just to collect better and more representative data – sometimes, we face difficult questions about the balance between the social uses of new technologies and the rights of the citizens on whose data they rely. Moving the debate on the issue of data to the question of rights and of justice is important, even in the cases where ‘better data’ is currently being touted as the technical fix for unfairness. In the case of facial recognition, while many campaigners want better datasets that reflect the full diversity of human faces, others have highlighted the effects that facial recognition could have on the harmful policing of people of colour, even if it is perfectly accurate. Better data is not always the best way to protect citizens from harm in the new AI age.      

 

Second, we must ensure that data rights are not just enshrined in law, but actively made accessible to everyone. Sometimes those least able to advocate for their data rights will be those that need them most, such as those without the money to pay an administration fee for data erasure, or those who fear (rightly or wrongly) that refusing to give their data to governments will make them seem like they have something to hide, thereby threatening their access to welfare or benefits. Making data rights truly accessible may therefore require a cultural shift in the way we view those requesting public assistance – with the default being that they are trustworthy and in genuine need, not that they are ‘benefit scroungers’ or ‘welfare queens’.   

 

Predictive analytics certainly has a valuable role to play in the future of government services. However, AI cannot tell us the truth about the world; it can only tell us the truth about the data we have about the world. Currently, many poor and vulnerable citizens are subject to intrusive data collection when they access public services, and this threatens the fairness of predictive analytics systems trained or used on this data. Governments must think carefully about how best to balance improved predictions about important public issues such as crime and child abuse with the rights of all its citizens, wealthy and poor.



Walter Pasquarelli