Manually reviewing findings from code quality and security testing tools is plenty of work on its own, without dealing with unnecessary duplicates. As code moves around with edits, many static analysis tools report findings associated with that code as new, even though the finding is really the same as a previously reported one. Code Dx was recently enhanced to correlate findings across code revisions, reducing the number of findings that require manual review by 50-90%.

Application security management systems, like Code Dx, are an important component in a secure software development toolchain. These systems consolidate and simplify the management of testing findings by creating a single source of truth for quality and security issues. Application security management systems work by aggregating and correlating test findings from multiple sources. They also serve as a hub from which issues can be assigned for remediation, remediation progress can be tracked, and reports for compliance and other purposes can be generated.

A key capability of any good application security management system is good correlation—the ability to merge duplicate findings into one (for example, a finding from Analyzer A and another from Analyzer B). This correlation saves you time and offers a practical way to help prioritize findings—if multiple tools found an issue, it’s probably real.

Under the hood, most application security management systems use information like analyzer-assigned identifier, source code file path, line number, and finding type to associate findings. But this approach has limitations. As source code moves around with edits, the location of each reported finding changes with it, as can the analyzer-assigned identifier. This breaks correlation and can make findings on moved code look like they’re “New,” when in fact you’ve already marked them as something else, like “Ignore.” Also, code edits can cause real, new issues to get hidden behind an old finding’s triage status.

Both situations are bad: in the first, you’re being forced to re-inspect lots of duplicate issues. In the second, real issues are being silently hidden from view.

Both situations also occur relatively frequently. While developing this functionality, we studied 63 randomly-selected commit pairs, each 1-12 months apart, across 9 open source projects. We found that for every week of time between the commit pairs, there are an average of 15 findings with source code path changes. Assuming a month between scans, you could expect to have to review 60 duplicate findings, which is about 10 hours of extra work.

To overcome this problem, we developed functionality that uses the differences in the code that you submit with each analysis to more accurately correlate findings across code revisions. The gain in correlation accuracy is impressive—often a 50-90% reduction in the total number of findings. We determined this by scanning each of those 63 commit pairs with open source analyzers in their default checker configuration. Using the first commit in each pair as a baseline, we then compared how many findings were reported by Code Dx in two cases: with basic correlation, and with difference-enhanced correlation. Difference-enhanced correlation resulted in more findings being correlated and fewer appearing as “New.”

As of version 3.6, Code Dx now uses this difference-enhanced correlation to more accurately correlate findings across code revisions. Based on our testing, you can expect to achieve near perfect correlation of findings with location changes across scans—thus eliminating the problem of findings reappearing from scan-to-scan.

The Automated Triage Assistant was originally developed under the ASTAM program.

This material is based on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) via contract number HHSP233201600058C.