Skip to main content

ADR-0007: Proposal How to Mark Findings With Hashes to Find Duplicates

Status:PROPOSED
Date:2020-11-25
Author(s):Sven Strittmatter Sven.Strittmatter@iteratec.com

NOTE: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Context​

We need the possibility to find duplicate findings. One use case is that we want to accept a finding and want to ignore the same finding in the future.

Assumptions​

  • The execution order of hooks is unspecified.
  • The information if a finding’s hash is a duplicate MUST NOT be stored or maintained in the SCB S3 storage.
  • The SCB MUST NOT remove findings: read-write-hooks may alter them, but never delete or filter them out.
    • Maybe a read-hooks MAY decide to not store a finding into an external system.

Decision​

  • We generate a hash for each finding so we can compare findings by the hash and identify duplicates.
  • This hash MUST be mutable and MAY be altered by read-write-hooks because we don’t want to introduce an exceptions to what a read-write-hooks can alter.
  • The parser MUST generate the initial hash of a finding from some of it’s attributes (e.g. name, lication, category …).
    • Each scanner MUST define a default set of attributes used for the hashing.
    • This set of hashed attributes MAY be overwritten.
  • Each read-write-hooks MUST update the hash as last step because the hook MAY changed a hashed attribute.

We implement the hashing step in the parser first with feature flag to evaluate this proposal.

Consequences​

  • We don’t need to introduce an ordering for the read-write-hooks.
  • The duplicate detection/handling MUST be done in another service with its own data storage. This is because we have no stable hash until the read-hooks will be executed and these MUST NOT alter the data in SCB itself. But the read-hooks MAY decide to not store data into an external system.