Machine Learning Poisoning Attacks – Regulatory Implications
Imagine you have a great, clever dog. He particularly likes being scratched behind his left ear. And now imagine that someone wishing to cause you harm comes by your porch every day with a toy dog, pretends to scratch it behind the ear, but then beats the toy instead; making sure that your companion sees this. After a couple of weeks, you don’t know why, but your dog becomes scared of your attempts to scratch his left ear – he might even bite the hand that he previously welcomed.
This example, akin to the famous Pavlov’s experiments, serves well to illustrate the threat of machine learning poisoning attacks (MLPAs), based on interfering with the data used to train - or retrain - a machine learning (ML) algorithm. As machine learning becomes more and more ubiquitous, so does the threat of MLPAs, which deserves ongoing attention. This blog post seeks to achieve two goals, explored in two corresponding parts of this piece. The first seeks to briefly explain the technical nature of this kind of a cyberattack and highlight the kinds of harm it can cause. The second focuses on the legal and regulatory implications of MLPAs, investigating the extent to which the key European entities and legislative instruments approach this challenge and provide a response to it.
As Dickson noted, “machine learning models latch onto strong correlations without looking for causality or logical relations between features. And this is a characteristic that can be weaponized against them.” Just as the dog from the introduction might miss the importance of the wider context (i.e., the difference between a random person behind the porch and his good owner), so can a machine learning model, based on looking for shared characteristics between data items, fail to draw correlations in a manner that we would want it to. The following example drawn up by Dickson demonstrates perfectly how such misclassification of data can occur (either accidentally or intentionally), with the algorithm relying on the visible hexagonal mark for classification, as opposed to e.g., the shape of dog’s features:
Spotting an attack based on such watermarks could be difficult due to the massive size of many datasets used to training an algorithm; but to make matters worse, data might be poisoned in a manner invisible to a human eye, e.g., through the addition of a layer of imperceptible “noise” or using a specific camera (resulting in permutations that an algorithm might detect, but the human eye can’t). Above all, it is difficult to distinguish benign anomalous samples from adversarially crafted ones.
A major issue with the feasibility of such attacks is one of the attacker’s capabilities and access. In the example from the introduction, the attack can only be successful if the attacker knows about the left-ear scratching habit of the owner, and has uninterrupted access to the dog. In order to poison a machine learning dataset, one has to either know or predict how the model works, how is it tuned, and what it is going to look for. At this stage, a distinction should be made between white box and black box MLPAs. A white box attack of this kind takes place where attackers have access to the algorithm, its features and aims, and/or its training data, essentially the training pipeline of the model. Such access (obtained, for example, by entering the supply chain of ML models and corresponding data) may enable e.g., data poisoning or model poisoning attacks. Conversely, a black box attack is conducted from the user-level, and is based on feedback received from the model. A good example of the latter are attacks against content recommendation systems run by social media platforms – a mode of attack well covered in this blog post by F-Secure.
Why types of harm can result from MLPAs?
In the dog example, the harm is clearly visible – the dog is scared of the owner and/or bites his/her hand. The question is why should we bother with actual MLPAs, especially in 2021?
Well, even though the concept of reprogramming a model is not new, and the MLPAs were used to cheat spam filters as early as 2004 and 2005, as Constantin notes, “machine learning adoption exploded over the past decade, driven in part by the rise of cloud computing, which has made high performance computing and storage more accessible to all businesses.” AI systems relying on machine learning enter further and further areas of human activity. The more critical of an area this is, and the more actual decision-making power is left with such algorithms, the more dire the consequences of an MLPA can be. Drawing on examples presented by Kumar, O’Brien, Albert and Viljoen – a healthcare dataset could be poisoned to make a drug prescription system increase the dosages for some patients by 359%; or a bank’s image recognition system could be tricked to misrecognize the checks’ value. Critical infrastructure could be at risk too – imagine an automated pesticide distribution system being made to destroy crops by overloading them with pesticides; or a water dam management system misrecognizing the reservoir’s water level. Finally, the following example set out by Tang, Du, Liu, Yang and Hu, based on self-driving cars’ ability to (mis)recognise road signs, shows the risk of direct harm all too well:
Image source - Ruixiang Tang, Mengnan Du, Ninghao Liu, Fan Yang and Xia Hu, ‘An Embarrassingly Simple Approach for Trojan Attacking Deep Neural Networks’ (2020), https://arxiv.org/pdf/2006.08131.pdf
In the area of cybersecurity, the following examples cited by Bursztein, ilmoi and Patel show how MLPAs pose a risk for spam detectors, malware detectors, worm signature detection, DDoS attack detection, content recommendation systems and public chatbots.
In addition to the risks flowing directly from the incorrect inferences drawn by a machine-learning model, the damage to the model itself has to be considered. Developing and training an ML model can be extremely costly – and a poisoning attack could either render it completely useless or requiring repairs to fix the incorrect inferences. Unfortunately, the latter can be difficult, if not impossible in some cases. As Constantin notes, “reverting the poisoning effects would require a time-consuming historical analysis of inputs for the affected class to identify all the bad data samples and remove them. Then a version of the model from before the attack started would need to be retrained.”
The final, less direct (yet no less important) result of MLPAs is what they can do with public perception of AI systems. Just as proliferation of high-quality deepfakes runs the risk of undermining the trust in credibility of images, videos and voice recordings, and fake news undermine the credibility of online information, so can poisoning of data used by machine learning models undermine the (oftentimes, already low) public trust in AI systems. Don’t get me wrong; a healthy dose of scepticism towards AI is sensible to have nowadays; however, such systems also have countless positive applications, and it would be a huge setback if damage done to the perceived credibility of training data stifles such applications disproportionately.
The cybercrime-as-a-service dimension
Cybercrime-as-a-service (CaaS) – and its ransomware offspring – is a progressing phenomenon based on the offering of goods and services intended to facilitate cybercrime activities. It is characterised by the adoption of elements normally associated with legitimate trade, a customer-oriented approach, ease of access, diversity of choice and tailoring the product or service to the specific needs of the client. In recent years, CaaS has become a subject of concern, particularly in the context of providing cybercriminals with technology in a ready- and easy-to-use manner. It is one of the key angles of interest for the CC-DRIVER project, and you can read more about in a previous blog post.
The increased role of CaaS was noted in Europol’s Internet Organised Crime Threat Assessment for 2020 (IOCTA 2020). The report states that “where specialist skills are needed (e.g. malware-coding, malware-distribution), criminals are able to hire developers or consultants to fill this need. This highlights increased professionalisation in the cybercrime threat landscape” (p. 31).
For now, there is little evidence of MLPAs being actively supported by the CaaS ecosystem. IOCTA 2020 highlights the role of ransomware and DDoS attacks, without mentioning cybercrime activities focused on poisoning of training datasets. However, CaaS would be a natural choice of cybercriminals for supporting attacks of this last category because of the complexity and high diversity of machine learning models and systems. This creates a need for both technological expertise, and its tailored usage; CaaS is increasingly supportive of both. Traces of MLPAs being offered on cybercriminal marketplaces ought to be carefully monitored by law enforcement. In addition, investigators of a cyberattack should pay attention to potential machine learning models and training data poisoning or exfiltration that might have been one of the attack goals.
Awareness in the EU
European institutions are aware of the threat posed by MLPAs and their diversity. The European Union Agency for Cybersecurity’s (ENISA) produced a 2020 report AI Cybersecurity Challenges: Threat Landscape for Artificial Intelligence which notes that “when considering security in the context of AI, one needs to be aware that AI techniques and systems making use of AI may lead to unexpected outcomes and may be tampered with to manipulate the expected outcomes” (p. 5). The report goes beyond this statement to develop a rich taxonomy of attacks related to poisoning of machine learning, including backdoor/insert attacks on training datasets (p. 43), compromising AI inference’s correctness (p. 43), as well as data (p. 45) and model poisoning (p. 47) specifically.
The GDPR and MLPAs
As ENISA’s report indicates, the General Data Protection Regulation (GDPR) does play a role in preventing attacks on AI systems, as long as they involve processing of personal data (p. 9) – even though it may be tied to broader security obligations and consequent better resilience of ML-powered systems against data inference and model inversion attacks. Art. 5(1)(f) of the GDPR states that personal data has to be “processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’).” Apart from applying to personal data processed through machine learning algorithms, this principle has to be implemented at the design state of many such operations, in line with art. 25 of the GDPR, which requires data protection by design and by default. Moreover, art. 32 of the Regulation specifies some of the technical and organisational measures that data controllers and processors may need to implement. Importantly for MLPAs, art. 32(1)(b) refers to “the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services”, while art. 32(1)(d) speaks of the need for “a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing.” Despite being broadly phrased, both articles can lead to better, ongoing security of machine learning systems.
An interesting dimension of the GDPR opens with art. 22, focused on automated individual decision-making. This provision provides that, save for three exceptions (contractual, EU or Member State law, or consent), “the data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.” This article may curb the application of multiple machine learning models and, hence, decrease a key part of the potential harm that could be inflicted if such models are poisoned.
Still, the GDPR’s reach is limited here by its anchorage in the realm of personal data. Many machine learning models, such as object recognition algorithms, do not deal with personal data. This is where the proposed EU AI Regulation might fill in the gaps.
The draft EU AI regulation and MLPAs
The AI Regulation (the draft of which can be found here) seeks to lay down and harmonise rules governing the development and use of AI systems and practices in the EU. The question of whether the AI Regulation is likely to support the efforts against MLPAs can be answered with “yes, most likely”. Among the multitude of provisions in this sophisticated instrument, several lend support to this position. For high-risk AI systems, art. 9 requires the use of a risk management system, one that, among others, includes “estimation and evaluation of the risks that may emerge when the high-risk AI system is used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse” (art. 9(2)(b), emphasis added). Also, art 9(7) states that “testing of the high-risk AI systems shall be performed, as appropriate, at any point in time throughout the development process, and, in any event, prior to the placing on the market or the putting into service.”
Most importantly, art. 15(4) Accuracy, robustness and cybersecurity makes a direct reference to data poisoning and adversarial examples. It starts by stating that "high-risk AI systems shall be resilient as regards attempts by unauthorised third parties to alter their use or performance by exploiting the system vulnerabilities." And then, more specifically: "the technical solutions to address AI specific vulnerabilities shall include, where appropriate, measures to prevent and control for attacks trying to manipulate the training dataset (‘data poisoning’), inputs designed to cause the model to make a mistake (‘adversarial examples’), or model flaws."
Conclusion - towards specific defences?
It is quite clear that the drafters of the AI Regulation are aware of the risks tied to various MLPAs. As with the GDPR, the road from broad requirements to specific solutions, that are actually embraced, can be a long one. The list of possible defences to MLPAs is quite sizeable. It includes general ones, such as:
Access limitations to the model and the data
Having a human in the loop (spotting anomalous boundary shifts).
The list is also full of more specific, technical defences, tied to the area of adversarial machine learning:
Penetration testing of the model (i.e., trying to poison/interfered with the model yourself)
Real time monitoring of inputs and changes to ML models
RONI – Reject on negative impact
STRIP – perturb inputs, and observe variance; if not enough variance; attack likely.
The legal and institutional framework should facilitate (or, in some cases, mandate) the implementation of fitting defences from this list. It is clear that efforts aimed at this goal go beyond the frames of the GDPR and EU AI Regulation; the standardisation work of bodies such as CEN and ISO is the first example. Nevertheless, the example of the Payment Services (PSD 2) Directive (EU) 2015/2366 shows that specific cybersecurity measures can be implemented in an all-encompassing manner, if sufficient legal weight and enforcement efforts rest behind them. Benchmarking of the defences to MLPAs, suggested by Kumar, O’Brien, Albert and Viljoen, could be an important component of this process. Use case-specific threat modelling supported by analysis of ML-based systems would be a crucial step as well.
In the meantime, regardless of regulatory developments, users and developers of AI systems have to keep in mind the risk of machine learning models being poisoned. Additional caution has to be paid to the supply chain of both AI solutions and imported datasets; the recent SolarWinds attack (covered in a CC-DRIVER policy brief) shows how devastating a supply chain avenue can be to industry.
And to go back one last time to the introduction, make sure your dogs (and AI models) eat well, either from reputed providers, or – if possible – home-cooked and tasted materials.
Dr Krzysztof Garstka
The author would like to express his thanks to Alexey Kirichenko, David Wright, Richa Kumar and Sven-Erik Fikenscher for providing valuable feedback on this blog post.