AI for cybersecurity is a hot new thing—and a dangerous gamble

Machine learning and artificial intelligence can help guard against cyberattacks, but hackers can foil security algorithms by targeting the data they train on and the warning flags they look for.

When I walked around the exhibition floor at this week’s massive Black Hat cybersecurity conference in Las Vegas, I was struck by the number of companies boasting about how they are using machine learning and artificial intelligence to help make the world a safer place.

But some experts worry vendors aren’t paying enough attention to the risks associated with relying heavily on these technologies. “What’s happening is a little concerning, and in some cases even dangerous,” warns Raffael Marty of security firm Forcepoint.

The security industry’s hunger for algorithms is understandable. It’s facing a tsunami of cyberattacks just as the number of devices being hooked up to the internet is exploding. At the same time, there’s a massive shortage of skilled cyber workers (see “Cybersecurity’s insidious new threat: workforce stress”).

Using machine learning and AI to help automate threat detection and response can ease the burden on employees, and potentially help identify threats more efficiently than other software-driven approaches.

Data dangers

But Marty and some others speaking at Black Hat say plenty of firms are now rolling out machine-learning-based products because they feel they have to in order to get an audience with customers who have bought into the AI hype cycle. And there’s a danger that they will overlook ways in which the machine-learning algorithms could create a false sense of security.

Many products being rolled out involve “supervised learning,” which requires firms to choose and label data sets that algorithms are trained on—for instance, by tagging code that’s malware and code that is clean.

Marty says that one risk is that in rushing to get their products to market, companies use training information that hasn’t been thoroughly scrubbed of anomalous data points. That could lead to the algorithm missing some attacks. Another is that hackers who get access to a security firm’s systems could corrupt data by switching labels so that some malware examples are tagged as clean code.

The bad guys don’t even need to tamper with the data; instead, they could work out the features of code that a model is using to flag malware and then remove these from their own malicious code so the algorithm doesn’t catch it.

One versus many

In a session at the conference, Holly Stewart and Jugal Parikh of Microsoft flagged the risk of overreliance on a single, master algorithm to drive a security system. The danger is that if that algorithm is compromised, there’s no other signal that would flag a problem with it.

To help guard against this, Microsoft's Windows Defender threat protection service uses a diverse set of algorithms with different training data sets and features. So if one algorithm is hacked, the results from the others—assuming their integrity hasn’t been compromised too—will highlight the anomaly in the first model.

Beyond these issues. Forcepoint's Marty notes that with some very complex algorithms it can be really difficult to work out why they actually spit out certain answers. This “explainability” issue can make it hard to assess what’s driving any anomalies that crop up (see “The dark secret at the heart of AI”).

None of this means that AI and machine learning shouldn’t have an important role in a defensive arsenal. The message from Marty and others is that it’s really important for security companies—and their customers—to monitor and minimize the risks associated with algorithmic models.

That’s no small challenge given that people with the ideal combination of deep expertise in cybersecurity and in data science are still as rare as a cool day in a Las Vegas summer.

 

Original Article by Martin Giles