Conversation
|
|
||
| /** Google ad-serving domains whose URLs should be proxied (exact match). */ | ||
| const GPT_DOMAINS = [ | ||
| 'securepubads.g.doubleclick.net', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
General approach: whenever we turn a hostname string into a regular expression, we should escape all regex metacharacters, not just dots, so that the regex always represents a literal hostname match. This prevents any accidental regex behavior if hostname strings change in the future.
Best minimal fix here: introduce a small helper that takes a hostname string and returns its fully regex-escaped form, then use that helper instead of domain.replace(/\./g, '\\.') when constructing the RegExp in the rewriteUrl fallback. This keeps existing behavior for the current hostnames while making the code robust against future changes, and directly addresses the CodeQL concern that the domain strings are used as regex patterns.
Concrete changes in crates/js/lib/src/integrations/gpt/script_guard.ts:
- Add a helper function, e.g.
escapeHostnameForRegex(hostname: string): string, that replaces all regex metacharacters ([\\^$.*+?()[]{}|]) with escaped versions. - In the loop within the
catchblock ofrewriteUrl, replace the inlinedomain.replace(/\./g, '\\.')call withescapeHostnameForRegex(domain). - Keep all existing logic and data structures unchanged; we’re only improving the regex-construction step.
No external libraries are needed; we can implement the escape helper with a simple .replace call.
| @@ -51,6 +51,15 @@ | ||
| /** Regex to match wildcard `*.safeframe.googlesyndication.com` subdomains. */ | ||
| const SAFEFRAME_WILDCARD_RE = /[a-zA-Z0-9_-]+\.safeframe\.googlesyndication\.com/i; | ||
|
|
||
| /** | ||
| * Escape a hostname string so it can be safely interpolated into a RegExp | ||
| * as a literal. This prevents regex metacharacters from changing the match | ||
| * semantics if domain strings ever contain them. | ||
| */ | ||
| function escapeHostnameForRegex(hostname: string): string { | ||
| return hostname.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); | ||
| } | ||
|
|
||
| /** Integration route prefix on the first-party domain. */ | ||
| const PROXY_PREFIX = '/integrations/gpt'; | ||
|
|
||
| @@ -97,8 +106,9 @@ | ||
| for (const domain of GPT_DOMAINS) { | ||
| if (lower.includes(domain)) { | ||
| const prefix = hostPrefixForDomain(domain); | ||
| const escapedDomain = escapeHostnameForRegex(domain); | ||
| return originalUrl.replace( | ||
| new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'), | ||
| new RegExp(`https?://(?:www\\.)?${escapedDomain}`, 'i'), | ||
| `${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`, | ||
| ); | ||
| } |
| /** Google ad-serving domains whose URLs should be proxied (exact match). */ | ||
| const GPT_DOMAINS = [ | ||
| 'securepubads.g.doubleclick.net', | ||
| 'pagead2.googlesyndication.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| const GPT_DOMAINS = [ | ||
| 'securepubads.g.doubleclick.net', | ||
| 'pagead2.googlesyndication.com', | ||
| 'tpc.googlesyndication.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 9 days ago
In general, to avoid incomplete hostname regular expressions, any string used to build a regex should be run through a generic “escape for regex literal” function that escapes all regex metacharacters, not only dots. This ensures future additions to GPT_DOMAINS cannot accidentally introduce patterns that match more than the literal hostname.
Concretely, in crates/js/lib/src/integrations/gpt/script_guard.ts, the fallback block in rewriteUrl currently builds the regex with:
new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i')This manually escapes only dots in domain. Replace this with a helper escapeRegex (defined in this file) that escapes every regex metacharacter: \ ^ $ * + ? . ( ) | { } [ ]. Use that helper both for domain and for any other future regex constructions based on literal strings if needed. The change is localized to this file: add the helper function (near the top, after constants or helpers) and change the new RegExp(...) call to use escapeRegex(domain) instead of domain.replace(/\./g, '\\.'). No behavior changes for current values, but it becomes robust and satisfies the security rule.
| @@ -54,6 +54,13 @@ | ||
| /** Integration route prefix on the first-party domain. */ | ||
| const PROXY_PREFIX = '/integrations/gpt'; | ||
|
|
||
| /** | ||
| * Escape a string so it can be safely used inside a RegExp literal. | ||
| */ | ||
| function escapeRegex(value: string): string { | ||
| return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); | ||
| } | ||
|
|
||
| // --------------------------------------------------------------------------- | ||
| // URL matching and rewriting | ||
| // --------------------------------------------------------------------------- | ||
| @@ -98,7 +105,7 @@ | ||
| if (lower.includes(domain)) { | ||
| const prefix = hostPrefixForDomain(domain); | ||
| return originalUrl.replace( | ||
| new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'), | ||
| new RegExp(`https?://(?:www\\.)?${escapeRegex(domain)}`, 'i'), | ||
| `${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`, | ||
| ); | ||
| } |
| 'pagead2.googlesyndication.com', | ||
| 'tpc.googlesyndication.com', | ||
| 'googletagservices.com', | ||
| 'www.googletagservices.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| 'tpc.googlesyndication.com', | ||
| 'googletagservices.com', | ||
| 'www.googletagservices.com', | ||
| 'cm.g.doubleclick.net', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| 'cm.g.doubleclick.net', | ||
| 'ep1.adtrafficquality.google', | ||
| 'ep2.adtrafficquality.google', | ||
| 'www.googleadservices.com', |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
b6ec286 to
b811668
Compare
|
We should consider the scope of this integration. Currently this is doing a lot of rewriting and proxying it's not catching everything but many scripts and 3rd party calls are proxied through the 1st party context. |
a800ad7 to
264f64f
Compare
264f64f to
597bef9
Compare
|
@ChristianPavilonis Can you please make changes or resolve comment for the previous review? |
|
Made some changes @aram356 |
d03f6aa to
295b166
Compare
There was a problem hiding this comment.
Summary
👍 Solid GPT first-party proxy integration. The Rust code follows existing integration patterns well (error-stack, log::, Arc, IntegrationRegistration::builder). The client-side script guard is thorough with six interception layers. Two documentation accuracy issues need fixing before merge.
Findings
🔧 High (P1)
X-Script-Sourceheader claimed but not implemented:docs/guide/integrations/gpt.md:76claims proxy responses include anX-Script-Sourceheader, butgpt.rsnever sets it. Either add the header or fix the docs. (inline comment)- Wrong GitHub org in doc links:
docs/guide/integrations/gpt.md:150-151links toAnomalyCo/trusted-serverinstead ofIABTechLab/trusted-server. (inline comment)
🤔 Medium (P2)
- Awkward module doc line break:
gpt.rs:13-15splits a sentence mid-phrase, creating an orphan line in rendered docs. (inline comment) - Unverified doc links:
docs/guide/integrations/gpt.md:155-158references/guide/integrations-overview,/guide/configuration,/guide/first-party-proxy, and/guide/integrations/gam— verify these pages exist or remove the links.
⛏ Low (P3)
- Inconsistent dash style:
gpt.mduses ASCII--throughout while the rest of the project uses em-dash (—).
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- CodeQL: PASS
| - `GET /integrations/gpt/pagead/*` -- Proxies secondary GPT scripts and resources | ||
| - `GET /integrations/gpt/tag/*` -- Proxies tag-path resources | ||
|
|
||
| Successful proxy responses include `X-GPT-Proxy: true` and `X-Script-Source` headers for debugging. |
There was a problem hiding this comment.
🔧 X-Script-Source header doesn't exist in code: The docs claim proxy responses include both X-GPT-Proxy: true and X-Script-Source headers, but gpt.rs only sets X-GPT-Proxy: true — there is no X-Script-Source header anywhere in the implementation.
Fix: Either add response.set_header("X-Script-Source", &target_url) to both handle_script_serving and handle_pagead_proxy, or remove X-Script-Source from this line.
|
|
||
| ## Implementation | ||
|
|
||
| - **Rust**: [crates/common/src/integrations/gpt.rs](https://github.com/AnomalyCo/trusted-server/blob/main/crates/common/src/integrations/gpt.rs) |
There was a problem hiding this comment.
🔧 Wrong GitHub org: Links point to AnomalyCo/trusted-server but should be IABTechLab/trusted-server.
Fix:
- **Rust**: [crates/common/src/integrations/gpt.rs](https://github.com/IABTechLab/trusted-server/blob/main/crates/common/src/integrations/gpt.rs)
- **TypeScript**: [crates/js/lib/src/integrations/gpt/](https://github.com/IABTechLab/trusted-server/blob/main/crates/js/lib/src/integrations/gpt/)| //! 4. Auxiliary scripts – viewability, monitoring, error reporting | ||
| //! | ||
| //! All of these are served from `securepubads.g.doubleclick.net`. The | ||
| //! integration proxies these scripts |
There was a problem hiding this comment.
🤔 Awkward doc line break: The sentence splits mid-phrase across lines 13-15:
//! integration proxies these scripts
//! through the publisher's domain while ...
This creates an orphan line in rendered docs.
Fix: Reflow to //! The integration proxies these scripts through the publisher's domain while a client-side shim intercepts
What this does
Proxies gpt.js script and additional scripts loaded by gpt.js that do the heavy lifting for ad calls for google.
Change Summary
Closes #227
Verifying GPT Script Proxying in the Browser
To confirm that GPT scripts are being proxied through the trusted server domain: