What is Adversarial Machine Learning?

Original Source Here

What is Adversarial Machine Learning?

An introduction to manipulating machine learning models

Machine learning models are complicated things and, often, we can have a poor understanding of how they make predictions. This can leave hidden weaknesses that could be exploited by attackers. They could trick the model into making incorrect predictions or give away sensitive information. Fake data could even be used to corrupt models without us knowing. The field of adversarial machine learning aims to address these weaknesses.

Source: flaticon

In the rest of this article, we’ll explore this field in a bit more depth. We’ll start by discussing the types of attacks that adversarial ML aims to prevent. We’ll then move on to discussing the general approaches to preventing these attacks. To end, we will touch on how adversarial ML relates to general security measures and Responsible AI.

Types of adversarial attacks

Machine learning can help us automate more complicated tasks. The downside is that a model will introduce a new target for attackers to exploit. New types of attacks can now be used against your IT system. These include poisoning, evasion, and model stealing attacks.

Poisoning attacks

A poisoning attack focuses on the data used to train a model. Here an attacker will change existing data or introduce incorrectly labelled data. The model trained on this data will then make incorrect predictions on correctly labelled data. For example, an attacker could relabel fraud cases as not fraud. The attacker could do this for only specific fraud cases so when they attempt to commit fraud in the same way the system will not reject them.

For many applications, models are trained only once. Both the data and model would be thoroughly checked so there may be little opportunity for attacks like these. For some systems, models are continuously retrained. For example, reinforcement learning models may be trained on new data once a day/week or even immediately as new data is introduced. Ultimately, there is more opportunity for a poisoning attack in this type of environment.

Evasion attacks

Evasion attacks focus on the model itself. They involve modifying data so it seems legitimate but leads to an incorrect prediction. To be clear, the attacker modifies data used by a model to make predictions and not data used to train models. For example, when applying for a loan, an attacker could mask their true country of origin using a VPN. They may come from a risky country so, if the attacker used their true country, the model would have rejected their application.

These types of attacks are more associated with fields like image recognition. Attackers can create images that look perfectly normal to a human but results in completely incorrect predictions. For example, researchers at Google showed how introducing specific noise into an image could change the prediction of an image recognition model. Looking at Figure 1, you can see that, to a human, the layer of noise is not even noticeable. However, the model now predicts that the panda is a gibbon.

Figure 1: Adversarial example (Source: I. Goodfellow, et. al.)

Image recognition models can be fooled in this way because they are trained to associate certain pixels with the target variable. If we can tweak those pixels in just the right way the model will change its prediction. The consequences of these types of attacks could be severe if they could be used to impact systems like self-driving cars. Could a stop sign or traffic light be altered in a similar way? Such an attack could go unnoticed by a driver but cause the car to make incorrect and life-threatening decisions.

Model stealing

Similarly, model stealing attacks also focus on the model after it has been trained. Specifically, an attacker wants to learn about the structure of the model or the data used to train the model. For example, confidential data such as social security numbers or addresses could be extracted from large language processing models.

In terms of the model structure, an attacker may want to learn about the model and use it for financial gain. For example, a stock trading model could be copied and used to trade stocks. An attacker could use this information for further attacks. For example, they could identify exactly what words a spam filtering model will flag. The attacker could then alter spam/phishing emails to ensure they are delivered to the inbox.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: