Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add exclude-regex-patterns #458

Merged
merged 6 commits into from
Mar 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ vX.X.X - Mar 3 2023

Features:
* [#455](https://github.com/godaddy/tartufo/pull/455) - Update documentation to fix incorrect wording
* [#458](https://github.com/godaddy/tartufo/pull/458) - Adds `--exclude-regex-patterns` to allow for regex-based exclusions
* [#479](https://github.com/godaddy/tartufo/pull/479) - Remove upward traversal logic for config discovery

Bug fixes:
Expand Down
50 changes: 50 additions & 0 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,56 @@ match-type No String ("search" or "match") Whether to perform a `search
scope No String ("word" or "line") Whether to match against the current word or full line of text
============ ======== ============================ ==============================================================

.. regex-exclusion-patterns:

Regex Exclusion Patterns
++++++++++++++++++++++++

Regex scans can produce false positive matches such as environment variables in
URLs. To avoid these false positives, you can use the
``exclude-regex-patterns`` configuration option. These patterns will be
applied to and matched against any strings flagged by regex pattern checks. As
above, this directive utilizes an `array of tables`_, enabling two forms:

Option 1:

.. code-block:: toml

[tool.tartufo]
exclude-regex-patterns = [
{path-pattern = 'products_.*\.txt', pattern = '^SK[\d]{16,32}$', reason = 'SKU pattern that resembles Twilio API Key'},
{path-pattern = '\.github/workflows/.*\.yaml', pattern = 'https://\${\S+}:\${\S+}@\S+', reason = 'URL with env variables for auth'},
]

Option 2:

.. code-block:: toml

[[tool.tartufo.exclude-regex-patterns]]
path-pattern = 'products_.*\.txt'
pattern = '^SK[\d]{16,32}$'
reason = 'SKU pattern that resembles Twilio API Key'

[[tool.tartufo.exclude-regex-patterns]]
path-pattern = '\.github/workflows/.*\.yaml'
pattern = 'https://\${\S+}:\${\S+}@\S+'
reason = 'URL with env variables for auth'


There are 4 relevant keys for this directive, as described below. Note that
regex scans differ from entropy scans, so the exclusion pattern is always
tested against the offending regex match(es). As a result, there is no
``scope`` key for this directive.

============ ======== ============================ ==============================================================
Key Required Value Description
============ ======== ============================ ==============================================================
pattern Yes Regular expression The pattern used to check against the match
path-pattern No Regular expression A pattern to specify to what files the exclusion will apply
reason No String A plaintext reason the exclusion has been added
match-type No String ("search" or "match") Whether to perform a `search or match`_ regex operation
============ ======== ============================ ==============================================================

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You addressed this in the PR description, but it would be worth throwing in a line something like "The pattern is always tested against the entire line." (Or any other words that would explain why scope is a thing just above for entropy exclusions, but isn't here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will do

.. _TOML: https://toml.io/
.. _array of tables: https://toml.io/en/v1.0.0#array-of-tables
.. _search or match: https://docs.python.org/3/library/re.html#search-vs-match
10 changes: 10 additions & 0 deletions tartufo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,16 @@ def get_command(self, ctx: click.Context, cmd_name: str) -> Optional[click.Comma
excluded. ({"path-pattern": {path regex}, "pattern": {pattern regex}, "match-type": "match"|"search",
"scope": "word"|"line"}).""",
)
@click.option(
"-xr",
"--exclude-regex-patterns",
multiple=True,
hidden=True,
type=click.UNPROCESSED,
help="""Specify a regular expression which matches regex strings to exclude from the scan. This option can be
specified multiple times to exclude multiple patterns. If not provided (default), no regex strings will be
excluded. ({"path-pattern": {path regex}, "pattern": {pattern regex}, "match-type": "match"|"search"}).""",
)
@click.option(
"-e",
"--exclude-signatures",
Expand Down
21 changes: 13 additions & 8 deletions tartufo/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def compile_path_rules(patterns: Iterable[str]) -> List[Pattern]:
]


def compile_rules(patterns: Iterable[Dict[str, str]]) -> List[Rule]:
def compile_rules(patterns: Iterable[Dict[str, str]], exclude_type: str) -> List[Rule]:
"""Take a list of regex string with paths and compile them into a List of Rule.

:param patterns: The list of patterns to be compiled
Expand All @@ -255,12 +255,17 @@ def compile_rules(patterns: Iterable[Dict[str, str]]) -> List[Rule]:
raise ConfigException(
f"Invalid value for match-type: {pattern.get('match-type')}"
) from exc
try:
scope = Scope(pattern.get("scope", Scope.Line.value))
except ValueError as exc:
raise ConfigException(
f"Invalid value for scope: {pattern.get('scope')}"
) from exc
if exclude_type == "regex":
# regex exclusions always have line scope
scope = Scope.Line
else:
# entropy exclusions can specify scope
try:
scope = Scope(pattern.get("scope", Scope.Line.value))
except ValueError as exc:
raise ConfigException(
f"Invalid value for scope: {pattern.get('scope')}"
) from exc
try:
rules.append(
Rule(
Expand All @@ -273,6 +278,6 @@ def compile_rules(patterns: Iterable[Dict[str, str]]) -> List[Rule]:
)
except KeyError as exc:
raise ConfigException(
f"Invalid exclude-entropy-patterns: {patterns}"
f"Invalid exclude-{exclude_type}-patterns: {patterns}"
) from exc
return rules
48 changes: 43 additions & 5 deletions tartufo/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ class ScannerBase(abc.ABC): # pylint: disable=too-many-instance-attributes
_included_paths: Optional[List[Pattern]] = None
_excluded_paths: Optional[List[Pattern]] = None
_excluded_entropy: Optional[List[Rule]] = None
_excluded_regex: Optional[List[Rule]] = None
_rules_regexes: Optional[Set[Rule]] = None
global_options: types.GlobalOptions
logger: logging.Logger
Expand Down Expand Up @@ -272,12 +273,30 @@ def excluded_entropy(self) -> List[Rule]:
patterns = list(self.global_options.exclude_entropy_patterns or ()) + list(
self.config_data.get("exclude_entropy_patterns", ())
)
self._excluded_entropy = config.compile_rules(patterns) if patterns else []
self._excluded_entropy = (
config.compile_rules(patterns, "entropy") if patterns else []
)
self.logger.debug(
"Excluded entropy was initialized as: %s", self._excluded_entropy
)
return self._excluded_entropy

@property
def excluded_regex(self) -> List[Rule]:
"""Get a list of regexes used as an exclusive list of paths to scan."""
if self._excluded_regex is None:
self.logger.info("Initializing excluded regex patterns")
patterns = list(self.global_options.exclude_regex_patterns or ()) + list(
self.config_data.get("exclude_regex_patterns", ())
)
self._excluded_regex = (
config.compile_rules(patterns, "regex") if patterns else []
)
self.logger.debug(
"Excluded regex was initialized as: %s", self._excluded_regex
)
return self._excluded_regex

@property
def excluded_paths(self) -> List[Pattern]:
"""Get a list of regexes used to match paths to exclude from the scan"""
Expand Down Expand Up @@ -390,7 +409,7 @@ def signature_is_excluded(self, blob: str, file_path: str) -> bool:

@staticmethod
@lru_cache(maxsize=None)
def rule_matches(rule: Rule, string: str, line: str, path: str) -> bool:
def rule_matches(rule: Rule, string: Optional[str], line: str, path: str) -> bool:
"""
Match string and path against rule.

Expand All @@ -402,6 +421,8 @@ def rule_matches(rule: Rule, string: str, line: str, path: str) -> bool:
"""
match = False
if rule.re_match_scope == Scope.Word:
if not string:
raise TartufoException(f"String required for {Scope.Word} scope")
scope = string
elif rule.re_match_scope == Scope.Line:
scope = line
Expand Down Expand Up @@ -434,6 +455,18 @@ def entropy_string_is_excluded(self, string: str, line: str, path: str) -> bool:
for p in self.excluded_entropy
)

def regex_string_is_excluded(self, line: str, path: str) -> bool:
"""Find whether the signature of some data has been excluded in configuration.

:param line: Source line containing string of interest
:param path: Path to check against rule path pattern
:return: True if excluded, False otherwise
"""

return bool(self.excluded_regex) and any(
ScannerBase.rule_matches(p, None, line, path) for p in self.excluded_regex
)

@staticmethod
@lru_cache(maxsize=None)
def calculate_entropy(data: str) -> float:
Expand Down Expand Up @@ -608,9 +641,14 @@ def scan_regex(self, chunk: types.Chunk) -> Generator[Issue, None, None]:
for match in found_strings:
# Filter out any explicitly "allowed" match signatures
if not self.signature_is_excluded(match, chunk.file_path):
issue = Issue(types.IssueType.RegEx, match, chunk)
issue.issue_detail = rule.name
yield issue
if self.regex_string_is_excluded(match, chunk.file_path):
self.logger.debug(
"line containing regex was excluded: %s", match
)
else:
issue = Issue(types.IssueType.RegEx, match, chunk)
issue.issue_detail = rule.name
yield issue

@property
@abc.abstractmethod
Expand Down
4 changes: 4 additions & 0 deletions tartufo/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ class GlobalOptions:
:param exclude_path_patterns: A list of paths to be excluded from the scan
:param exclude_entropy_patterns: Patterns to be excluded from entropy
matches
:param exclude_regex_patterns: Patterns to be excluded from regex
matches
:param exclude_signatures: Signatures of previously found findings to be
excluded from the list of current findings
:param exclude_findings: Signatures of previously found findings to be
Expand Down Expand Up @@ -92,6 +94,7 @@ class GlobalOptions:
"include_path_patterns",
"exclude_path_patterns",
"exclude_entropy_patterns",
"exclude_regex_patterns",
"exclude_signatures",
"output_dir",
"temp_dir",
Expand All @@ -114,6 +117,7 @@ class GlobalOptions:
include_path_patterns: Tuple[Dict[str, str], ...]
exclude_path_patterns: Tuple[Dict[str, str], ...]
exclude_entropy_patterns: Tuple[Dict[str, str], ...]
exclude_regex_patterns: Tuple[Dict[str, str], ...]
exclude_signatures: Tuple[Dict[str, str], ...]
output_dir: Optional[str]
temp_dir: Optional[str]
Expand Down
16 changes: 16 additions & 0 deletions tartufo/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,17 @@ def echo_report_result(scanner: "ScannerBase", now: str):
f" {pattern} (path={path_pattern}, scope={m_scope}, type={m_type}): {reason}"
)

click.echo("\nExcluded regex patterns:")
for e_item in scanner.excluded_regex:
pattern = e_item.pattern.pattern if e_item.pattern else ""
path_pattern = e_item.path_pattern.pattern if e_item.path_pattern else ""
m_scope = e_item.re_match_scope.value if e_item.re_match_scope else ""
m_type = e_item.re_match_type.value if e_item.re_match_type else ""
reason = e_item.name
click.echo(
f" {pattern} (path={path_pattern}, scope={m_scope}, type={m_type}): {reason}"
)


def echo_result(
options: "types.GlobalOptions",
Expand Down Expand Up @@ -151,6 +162,9 @@ def echo_result(
"exclude_entropy_patterns": [
str(pattern) for pattern in options.exclude_entropy_patterns
],
"exclude_regex_patterns": [
str(pattern) for pattern in options.exclude_regex_patterns
],
# This member is for reference. Read below...
# "found_issues": [
# issue.as_dict(compact=options.compact) for issue in scanner.issues
Expand Down Expand Up @@ -191,6 +205,8 @@ def echo_result(
click.echo("\n".join(scanner.excluded_signatures))
click.echo("\nExcluded entropy patterns:")
click.echo("\n".join(str(path) for path in scanner.excluded_entropy))
click.echo("\nExcluded regex patterns:")
click.echo("\n".join(str(path) for path in scanner.excluded_regex))


def write_outputs(
Expand Down
Loading