Shared: Improvements to SensitiveDataHeuristics.qll by geoffw0 · Pull Request #21806 · github/codeql

geoffw0 · 2026-05-06T17:47:04Z

This PR consists of a series of small improvements to SensitiveDataHeuristics.qll, intended to find more true and less false sources of sensitive data. One of these changes addresses a request from a user, the rest are motivated by issues we've spotted at various points in the past. None are expected to have a big impact by themselves (but 7 changes x 5 affected languages is quite a lot of surface area).

more TPs: card.?no, api.?tok, security.?code patterns. We already had similar cases but no exact coverage for these.
less FPS: wildcard_no is not card.?no; profile is not file; coauthor is not oauth.
more TPs: the logic for identifying encrypted / encoded values (based on the variable name) was overly wide, excluding names such as security_code for containing code. It was also handling unencrypted incorrectly - while unencrypt was not matched due to the special case, the crypt substring was matched due to the entire unen part of the regex being optional. Copilot gets most of the credit for spotting this one.

~~Draft PR because I need to:~~

….MD5.hash.

Copilot

Pull request overview

This PR updates the shared sensitive-data naming heuristics used across multiple languages to improve classification of passwords/private information (increasing true positives and reducing false positives), and refreshes language-specific tests and change notes to reflect the updated behavior.

Changes:

Refines shared sensitive-data heuristics (regex patterns and exclusions) in SensitiveDataHeuristics.qll.
Updates Swift/Python/Rust tests and expected baselines to reflect newly-detected (or no-longer-detected) sensitive data sources.
Adds per-language change notes documenting the heuristics improvement.

Show a summary per file

File	Description
shared/concepts/codeql/concepts/internal/SensitiveDataHeuristics.qll	Updates shared sensitive-data heuristic patterns/exclusions used by multiple languages.
rust/ql/test/library-tests/sensitivedata/test.rs	Extends Rust library test coverage for the updated sensitive-data name heuristics.
python/ql/test/query-tests/Security/CWE-312-CleartextLogging/test.py	Updates Python cleartext logging test to reflect newly-classified sensitive values.
swift/ql/test/query-tests/Security/CWE-328/testCryptoKit.swift	Extends Swift hashing tests to cover additional API spellings.
swift/ql/test/query-tests/Security/CWE-311/testSend.swift	Updates Swift transmission test to reflect newly-detected sensitive field.
swift/ql/test/query-tests/Security/CWE-328/WeakSensitiveDataHashing.expected	Updates Swift expected results baseline for weak sensitive data hashing.
swift/ql/test/query-tests/Security/CWE-328/WeakPasswordHashing.expected	Updates Swift expected results baseline for weak password hashing.
swift/ql/test/query-tests/Security/CWE-311/SensitiveExprs.expected	Updates Swift expected sensitive-expression baseline.
swift/ql/test/query-tests/Security/CWE-311/CleartextTransmission.expected	Updates Swift expected cleartext transmission baseline.
swift/ql/lib/change-notes/2026-05-14-sensitive-data.md	Adds Swift change note for the sensitive-data heuristics update.
rust/ql/lib/change-notes/2026-05-14-sensitive-data.md	Adds Rust change note for the sensitive-data heuristics update.
python/ql/lib/change-notes/2026-05-14-sensitive-data.md	Adds Python change note for the sensitive-data heuristics update.
javascript/ql/lib/change-notes/2026-05-14-sensitive-data.md	Adds JavaScript change note for the sensitive-data heuristics update.

Copilot's findings

Files reviewed: 14/14 changed files
Comments generated: 5

andersfugmann · 2026-05-18T13:25:51Z

    result =
      "(?is).*(pass(wd|word|code|.?phrase)(?!.*question)|(auth(entication|ori[sz]ation)?).?key|oauth|"
-        + "api.?(key|token)|([_-]|\\b)mfa([_-]|\\b)).*"
+        + "api.?(key|tok)|([_-]|\\b)mfa([_-]|\\b)).*"


This no longer accepts token, e.g. api-token but does accept accepts api-tok, which seems somewhat strange.

Should tok be tok(en)?

It will accept api-token because the regex is followed by .*, so we're effectively matching a substring here. There are a couple of test cases for rust that examine this:

sink(api_token); // $ sensitive=password sink(api_tok); // $ sensitive=password

andersfugmann · 2026-05-18T13:27:35Z

        // Financial data - such as credit card numbers, salary, bank accounts, and debts
-        "(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|acc(ou)?nt.?(no|num|credit)|routing.?num|"
+        "(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|(card|acc(ou)?nt).?(no|num|credit)|routing.?num|"
        + "salary|billing|beneficiary|credit.?(rating|score)|([_-]|\\b)(ccn|cvv|iban)([_-]|\\b)|" +


Nit: The new regex accepts strings like cardCredit, which the old one did not.

Yeah, I thought about this case, and decided (1) it's unlikely to come up but more importantly (2) if someone has a variable called cardCredit, there's a very good chance that's sensitive data anyway (e.g. the amount of credit someone has on a card?).

andersfugmann · 2026-05-18T13:28:45Z

-      "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|((?<!un)(en))?(crypt|(?<!pass)code)|"
-        + "certain|concert|secretar|account(ant|ab|ing|ed)|file|path|([_-]|\\b)url).*"
+      "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|(?<!unen)crypt|(?<!un)encode|" +
+        "certain|concert|secretar|wildcard|coauthor|account(ant|ab|ing|ed)|(?<!pro)file|path|([_-]|\\b)url).*"


The new regex no longer accepts unencrypt. Don't know if that's on purpose.

Yes that's on purpose - the original was supposed to match crypt and encrypt but not unencrypt, but it actually did match unencrypt (via ignoring the optional bit and just matching .*crypt.*). The new version matches encrypt and crypt but not unencrypt.

yoff

Python 👍

geoffw0 · 2026-05-19T15:21:57Z

Thanks for the reviews, I'm going to merge this now but I'm happy to respond to any further comments post-merge.

geoffw0 added 10 commits May 6, 2026 10:27

Swift: Test spacing.

b6155ff

Swift: Add test cases for an alternative pattern of calls to Insecure…

dc863c3

….MD5.hash.

Rust: Additional test cases for sensitive data heuristics.

d95001f

Shared: Add 'card.?no' sensitive data heuristic.

07d4df1

Shared: Fix for 'wildcard'.

cb84e63

Shared: Fix for 'profile'.

b60ce3c

Shared: Fix for 'api_tok'.

213ab90

Shared: Fix for 'coauthor'.

6e2fb6f

Shared: Fix and simplify the exclusion for 'encrypted' values.

5ed78d1

Shared: Add 'security_code' sensitive data heuristic.

f2f4f4c

geoffw0 added Python Ruby Rust Pull requests that update Rust code Swift javascript Pull requests that update Javascript code labels May 6, 2026

github-actions Bot removed Python Ruby labels May 6, 2026

geoffw0 added 6 commits May 7, 2026 10:01

Shared: Autoformat.

809da0f

Merge remote-tracking branch 'upstream/main' into extsensitive

7c72898

Swift: Accept test changes (improvement).

0f8b0a7

Javascript: Accept test changes (regression).

ea711b0

Python: Accept test changes (improvement).

1c704a0

Shared: Small adjustment to the encrypt not-sensitive regex.

df37b50

github-actions Bot added the Python label May 7, 2026

geoffw0 added 5 commits May 7, 2026 17:21

Shared: Autoformat.

3694631

Merge branch 'main' into extsensitive

af0124f

Merge branch 'main' into extsensitive

51dae16

Merge branch 'main' into extsensitive

c8196e4

Add change notes.

59dbd68

github-actions Bot added the JS label May 14, 2026

github-actions Bot added the documentation label May 14, 2026

geoffw0 marked this pull request as ready for review May 14, 2026 16:25

geoffw0 requested review from a team as code owners May 14, 2026 16:25

Copilot AI review requested due to automatic review settings May 14, 2026 16:25

geoffw0 requested a review from a team as a code owner May 14, 2026 16:25

Copilot started reviewing on behalf of geoffw0 May 14, 2026 16:26 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Update change notes (Copilot's suggestions).

a4b2c0f

andersfugmann reviewed May 18, 2026

View reviewed changes

owen-mc approved these changes May 18, 2026

View reviewed changes

yoff approved these changes May 19, 2026

View reviewed changes

geoffw0 merged commit 3aa6606 into github:main May 19, 2026
140 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared: Improvements to SensitiveDataHeuristics.qll#21806

Shared: Improvements to SensitiveDataHeuristics.qll#21806
geoffw0 merged 22 commits into
github:mainfrom
geoffw0:extsensitive

geoffw0 commented May 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andersfugmann May 18, 2026

Uh oh!

geoffw0 May 18, 2026

Uh oh!

andersfugmann May 18, 2026

Uh oh!

geoffw0 May 18, 2026

Uh oh!

andersfugmann May 18, 2026

Uh oh!

geoffw0 May 18, 2026

Uh oh!

yoff left a comment

Uh oh!

geoffw0 commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

geoffw0 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andersfugmann May 18, 2026

Choose a reason for hiding this comment

Uh oh!

geoffw0 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

andersfugmann May 18, 2026

Choose a reason for hiding this comment

Uh oh!

geoffw0 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

andersfugmann May 18, 2026

Choose a reason for hiding this comment

Uh oh!

geoffw0 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

geoffw0 commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

geoffw0 commented May 6, 2026 •

edited

Loading