Adversarial Attacks and the Future of Secure AI

Special thanks to Lucy Wang for her mentorship on this topic!

Image for post
Image for post
Startups addressing threat of adversarial attacks along the AI/ML value chain.

What’s in this Post?

  1. AI/ML fundamental growth drivers: Idle repositories of data, hardware innovation, and the shift to edge computing
  2. Relevance of adversarial attacks: Real world case examples in health care, speech & audio recognition, and autonomous vehicles
  3. Solutions along the AI/ML value chain: Data optimization, algorithmic innovation, robust hardware development, MLOps
  4. AI/ML security startup market map: Overview of security startup activity and landscape

The AI/ML Gold Rush: Fundamental Growth Drivers

Projections of the global AI market size are expected to reach some $202.57B by 2026, implying a 33.1% CAGR, with applications spanning every enterprise size and sector. The key drivers — and simultaneous vulnerabilities — of this increased growth are 1) large, idle, and accessible repositories of data, 2) hardware innovation, and 3) the shift to edge computing. All of these innovations will allow for AI/ML model applications to truly scale rapidly across geolocations, industry sectors, and enterprise sizes.

Growth Driver #1: Idle Repositories of Data

To create any sort of meaningful application, large amounts of data are required to train AI/ML algorithms. Thus, the growth that the sector has experienced has largely been predicated on the increased availability of and accessibility to public and private data. This has been made possible by factors such as internet access and new data point collection methods (e.g. edge/IoT devices, smartphones, etc.).

Growth Driver #2: Increasingly Advanced, Cheaper Hardware

At the foundation of any AI/ML advancements lies the actual hardware processor innovation and adoption. Though CPUs still seem to be the processor of choice to some in recent surveys, more specialized processors like Nvidia’s GPUs are growing in popularity and can perform “millions of mathematical operations in parallel,” making it an exceptionally attractive tool when it comes to more complex DL processes. Consequently, the race to develop ever out-performing and niche-use-case hardware processors has become a hot area of activity in the space.

Growth Driver #3: Partial Shift to Edge Computing

Though the idea of constantly connected cloud computing has become more mainstream in recent years as the ideal for quickly working with AI/ML data, there will likely be somewhat of a shift towards other compute methods as a result of concerns surrounding efficiency, latency, privacy, and security. And considering over 3.8 billion edge devices will be using AI inferencing or training by the end of 2020. AI-powered edge computing that allows for “analytics and knowledge generation to occur at the source of the data” represents an attractive alternative. At the forefront of this evolution are low-power, cheap, high-performance System on Chip (SoC) processors and middleware that stretch computational capacity.

Relevance of Adversarial Attacks

While these growth drivers have resulted in some incredible progress in the space, it’s a double-edged sword. More data, AI-specific chips, and new computing paradigms mean increased attack surfaces. This section details how adversaries in various industries are exploiting this vulnerability.

Attack Methods: One Bad Apple

In 2016, Microsoft’s Tay AI, a Twitter chatbot, was developed for “casual and playful [two-way] conversation.” But less than 16 hours after her launch, a group of everyday Twitter users spammed Tay with racist, sexist, and profane tweets, causing Tay to publicly tweet similarly offensive remarks. Tay’s demise shows a simplified, public demonstration of how adversaries can intentionally data poison. These everyday Twitter users or professional hackers can inject “bad”, manipulated, or incorrect training data during the classifier process to render AI/ML models generally ineffective. This is just one primary way adversaries are able to exploit security vulnerabilities in AI/ML models today, in addition to other tactics such as model inversions (reverse-engineering completed AI/ML models to see sensitive, private training data) and Trojans (exploiting the way an AI system learns by bringing in perturbations in data in order to elicit a specific, desired, and incorrect response in the final model). The following industry examples showcase how these attacks can undermine important AI/ML applications:

3 Real World Case Examples

Health Care

Solutions along the AI/ML Value Chain

Despite the very real threat of adversarial attacks, less than 1% of AI funding is actually going towards AI/ML security research and infrastructure development startups. Therefore, rather than continuing to “retro-fit IT systems with security measures that are meant to address vulnerabilities…[from] the 1980s,” we should be increasing funding allocation to AI cybersecurity, since it will be a key factor in the continued advancement of AI as well as a way to save billions of dollars down the line. The following possible investment areas all fall along the AI/ML value chain: quality data, algorithmic/paradigm innovation, and computing hardware.

1. Data Preparation, Optimization, & Securitization

Most of the aforementioned adversarial attacks involve the malicious modification of source data. This is a particularly challenging issue to navigate as the amount of data generated every day is difficult to process and interpret, which creates a large, vulnerable attack surface. Two interesting solutions startups are beginning to move towards are a) quality data preparation and b) decentralized data techniques.

2. Algorithmic Innovation

Another way to implement robust defense strategies against adversarial attacks lies within algorithmic innovation. One particular area of longstanding interest that is now seeing some real-world application is explainable AI/ML (XAI). For instance, when it comes to an adversary who wants to bias the result of a model in their favor, XAI provides visibility into the AI black box. It helps ensure algorithmic accuracy and fairness by detecting abnormal perturbations in inputs and outputs.

3. Computing Hardware Advancements

Apart from data quality assurance and algorithmic approaches, there is a large opportunity to implement security features in the hardware designs themselves. This could include advances in the actual network architectures on chips to enable real-time security updates. One precedent example is legacy semiconductor company ARM’s TrustZone, which “establishes secure endpoints and a device root of trust.” Another player, startup Karamba Security ($27M total, Series B, backed by Fontinalis and Western Technology Investment), embeds its security solutions directly within edge devices and provides continuous threat monitoring. Other investment opportunities could involve forms of security middleware to enable system monitoring.

A Note on Secure MLOps

All three parts of the AI/ML value chain listed above touched on possible ways to prevent, detect, and defend models (whether through training data, dev strategy, or chips/hardware) against adversarial attacks. Thus, in considering the AI model environment as a whole as an asset to secure during production, there are some notable investment opportunities outside my explicitly defined value chain — specifically, startups working on production ML system monitoring. MLOps startups allow corporations increased visibility into model performance and extreme, outlier inputs in real-time, which helps facilitate secure production.

AI/ML Security Startup Market Map

I’ve compiled a collection of startups addressing adversarial attacks along the AI/ML value chain I defined above. The first row demonstrates the 3 key components of the chain: 1) Data Preparation, Optimization, & Securitization; 2) Algorithmic Innovation, with XAI as a focus; and Computing Hardware Advancements, with edge/IoT security as a focus. The second row gives a general overview of Cybersecurity-as-a-Service startups that incorporate some form of defense against these attacks, separated out by industries with particularly compelling use cases.

Image for post
Image for post
Startups addressing threat of adversarial attacks along the AI/ML value chain.

Early-Stage Startup Highlight Reel (Pre-Seed to Series B)

Data Labeling & Optimization:

  1. Labelbox ($38.9M total, Series B, a16z & Kleiner Perkins): Data labeling tools, workforce, and automation for teams.
  2. Supervisely (Pre-Seed): Data labeling tools capable of digesting images, videos, and 3d point clouds into production-ready training data.
  1. Arthur AI ($3.3M total, Seed, Index & Work-Bench): MLOps tools for monitoring, explaining, detecting bias, and measuring performance.
  2. Fiddler Labs ($13.2M total, Series A, Lux & Lightspeed): MLOps tool for understanding AI predictions, analyzing model behavior, validating compliance, and monitoring performance.
  1. Dover Microsystems ($6M total, Seed, Hyperplane Ventures): A “bodyguard” for your processor. Its “CoreGuard® technology is the only solution for embedded systems that prevents the exploitation of software vulnerabilities and immunizes processors against entire classes of network-based attacks.”
  1. Neurocat (Pre-Seed): Offers open-source AI analysis/debugging platform, AI lifecycle governance, & research/consulting services for robust model development.
  2. SafeRide Technologies (Pre-Seed): Offers multi-layer, deterministic cybersecurity for AVs and connected vehicles (e.g. anomaly detection, fleet monitoring).

Precedent Acquisitions

Data Labeling & Optimization:

  1. Uber acq. Mighty AI in June 2019
  1. Harman acq. TowerSec for $72.5M
  1. Sophos acq. Invincea for 100M in Feb. 2017

Concluding Thoughts

In order to hedge against adversarial attacks and allow for truly successful AI/ML integration at every level, AI/ML cybersecurity infrastructures must be developed in tandem with innovation. Whether end-all countermeasures to these problems manifest within a specialized startup, governmental body, or an extension of leading cybersecurity firms’ product sets, robust solutions will ultimately fall within the AI/ML value chain: quality data, algorithmic/paradigm innovation, and computing hardware.

Written by

Soon @ Insight Partners. Prev @ Two Sigma Ventures, Owkin, & StartUp Health. See more @ www.takeme.to/caitlin.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store