How to use machine learning to win the time battle against AppSec triage

by | Jul 22, 2020 | AppSec Classroom, Blog

Share This Story, Choose Your Platform

There is a well-known problem in AppSec testing that affects you whether you’re testing one application or thousands, and whether you’re using a few or many AppSec testing tools: the time spent triaging false positives and other irrelevant findings. 

AppSec tools discover a lot of issues, but two-thirds of them are likely irrelevant to your organization. Your team can spend a great deal of time researching issues that turn out to be false positives or not worth the effort to correct. 

We have a solution to this debilitating ‘noise’ problem. It’s called Triage Assistant, and it uses machine learning (ML) to predict which findings are relevant and real. With the benefit of saving your AppSec team thousands of hours of triage time, Triage Assistant is nothing short of a game-changer in application security testing efficiency and effectiveness. 

Here’s how you can better handle large volumes of AppSec data and increase adoption rates of AppSec testing tools among your developers. 

The problem: AppSec triage is too time consuming

As we all know, application security is a critical part of the development process. Proper AppSec testing requires a variety of AppSec tools to make sure every application gets comprehensive application security testing. We wholeheartedly support this best-practice approach of using a variety of tools, but it does create a challenge: false positives and noise. 

Each tool returns a large number of findings that application security analysts have to sort through to determine which ones are relevant and require attention. Each AppSec tool also produces results in a different format. Multiple tools may return some of the same results. 

Weeding through multiple reports eats up a ton of time, and it doesn’t help developers buy into the AppSec process. Developers want to move fast, with their sights set on the next build. They don’t want to wait for security to catch up. They can get really frustrated manually sorting through report after report. Triaging the thousands of findings is the most intense and time-consuming part of the application security process. 

Let’s take a closer look at what a typical AppSec analyst goes through in the triage process. After the various AppSec tools have reported findings, the analyst is faced with a massive list of theoretical vulnerabilities. Each tool finding must be assessed to determine: 

  • Is this a false positive? 

  • Is this exploitable? 

  • Is this noise (a real vulnerability, but not worth the time and effort to fix)? 

  • Is this an exploitable vulnerability that is worth the effort to remediate?

An IEEE study found that it takes an average of 10 minutes to triage each finding. Keep in mind this time does not include fixing an issue; it is just conducting an assessment to determine if the issue is exploitable and should be fixed. 

The National Institute of Standards & Technology (NIST) has conducted a series of studies on the effectiveness of Static Application Security Testing (SAST) tools. The most recent of these studies evaluated 14 open-source and commercial SAST tools.

The results were eye-opening.

The study revealed that 30 percent of findings from the average SAST tool are false positives and another 36 percent are insignificant. An insignificant finding is one that is true, but not useful. This means that an average of 66 percent of the results are irrelevant, and only 34 percent are truly important and real. 

Since we know analysts spend approximately 10 minutes triaging each finding, think about all of the time wasted filtering through findings that turn out to be irrelevant. Even if your organization is running just one software application through just one AppSec tool, triage is still a huge roadblock to an agile DevSecOps process.

But the math gets even more concerning.

How to use machine learning to win the time battle against AppSec triage

One tool returns an average of 10,000 results on just one application. Given the fact that it takes an average of ten minutes to review one finding, it would take an analyst 200 days to review all 10,000 results. Factoring in the average of 66 percent of results being irrelevant, 132 of those 200 days would have been spent reviewing findings that are false positives or noise. 

This is just one application being run through one tool. The reality is most organizations are running multiple applications and managing hundreds or even thousands of tools. It quickly becomes physically impossible to review each finding one by one. It would literally take thousands of days.

The false positive issue also creates a barrier to AppSec tool adoption among developers. Microsoft conducted a study on tool adoption. False positives were identified as one of the leading barriers to tool adoption, with 90 percent of developers being willing to accept a false positive rate of only 5 percent. From there, the numbers drop quite a bit, with 46 percent of developers willing to accept 15 percent false positives and 24 percent willing to accept 20 percent false positives. Unfortunately, the average AppSec tool has a false positive rate of an astounding 30 percent.

A Google study on the same topic produced similar results. The study found that a 10 percent effective false positive rate encourages tool adoption. An effective rate is defined as any issue a developer decided not to act on for any reason. The false positive issue is wasting time and money, while making it less likely for developers to participate in the AppSec process.

The solution: Meet Triage Assistant

Code Dx is unveiling the latest addition to our application security workflow management tool: Triage Assistant. This machine-learning technology greatly reduces the time and effort needed to predict which AppSec tool findings are relevant.

This is the best solution to the most intense and time-consuming part of application security. It can also increase the adoption of AppSec tools among developers. 

Let’s see how it works. 

In the typical AppSec process, your applications are scanned and sent to an analyst for review. As the analyst completes his work, he will flag some findings as irrelevant and some as important. All of this information, along with descriptive features, are sent to the Code Dx machine learning Triage Assistant engine. The classifier ingests this information and uses it to learn about findings and their likely classification. 

How to use machine learning to win the time battle against AppSec triage

Moving forward, Triage Assistant automatically recommends which findings to act on and which ones to ignore, based on prior triage decisions made for similar warnings. The Assistant trains on your specific data, so it is tailored to your organization and can be deployed across all of your tools and applications. It can also be deployed completely on-premise, within your firewall, removing any concerns about sensitive data being exposed. 

Now your theoretical vulnerabilities go to the Triage Assistant first, rather than to your analysts. Instead of getting one long list of unknowns, your analysts are given information on which findings are the most important.

Each finding is marked with a predicted status and confidence score ranging from zero to 100. Analysts can filter by status or score to focus on high-priority items and to remove irrelevant findings from the results set. 

If you’re wondering just how much time and money this saves, every 240 findings that are automatically categorized saves your organization the equivalent of one week of work from a full-time employee. 

Does it work? 

We took data from five of our existing customers, along with internal data, to create a collection of more than one million labeled findings and feature data. We used a portion of this information to train Triage Assistant. Then, predictions were made on the remaining data and compared to the “ground truth” to measure how accurate the classification model is. 

We looked at four metrics during this test:

1. Precision—The percentage of findings the model correctly predicted to be irrelevant out of all the records that were predicted as such. This is the most important number because you don’t want to report something as a false positive if it is a true issue that needs to be addressed. Scores ranged from 96.48 to 100 percent.

2. Recall—The percentage of findings the model correctly predicted to be irrelevant out of all the records that were actually irrelevant in the data set. This number is less dangerous, as it simply means someone did not review a false positive. Scores ranged from 96.12 to 99.998 percent.

3. F1—The harmonic mean of precision and recall. Scores ranged from 96.29 to 99.99 percent.

4. Accuracy—The percentage the model correctly predicted to be relevant or irrelevant. Scores ranged from 96.24 to 99.99 percent.

How to use machine learning to win the time battle against AppSec triage

Overall, the Code Dx Triage Assistant averaged 99.31 percent accuracy across all data sets. 

Similar results are possible for your organization because the model is created specifically for you, and it continues to learn based on your use and preferences. The engine automatically retrains itself each night, but you also have the option to manually trigger retraining at any time. 

Just imagine the time your team will save. 

Let’s take an example of a scan on an application that comes back with 16,000 results from three different tools. Code Dx will first remove the duplicate results, reducing the total number of findings by 49 percent. Your team is now dealing with a little more than 8,000 results. 

Now we can see the true power of Triage Assistant. You can use the predicted status and confidence scores to prioritize findings. Your team can address issues marked as “escalated” first, since those are most urgent. If you were to filter out insignificant findings and false positives with a confidence score above 70 percent, you would be able to remove more than 2,000 findings from the results, saving approximately 2 months from the triage process. 

The improved accuracy and ability to filter out false positives with a high confidence score brings your AppSec tools into the acceptable false positive range for developer adoption. Triage Assistant not only drastically reduces the time your analysts spend triaging findings; it also improves the results set in such a way that developers are willing to incorporate them into their workflow. This integration is exactly what is needed for a true DevSecOps approach. 

If you’re struggling with large volumes of AppSec results and your developers won’t adopt your tools, the Triage Assistant from Code Dx is the answer. 

To learn more, watch this quick video demo from our CTO, Ken Prole.

Share This Story, Choose Your Platform