Skip to content

Comments

GPT integration#282

Open
ChristianPavilonis wants to merge 11 commits intomainfrom
feature/gpt-integration
Open

GPT integration#282
ChristianPavilonis wants to merge 11 commits intomainfrom
feature/gpt-integration

Conversation

@ChristianPavilonis
Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis commented Feb 10, 2026

What this does
Proxies gpt.js script and additional scripts loaded by gpt.js that do the heavy lifting for ad calls for google.

Change Summary

  • Add a new GPT (Google Publisher Tags) integration that proxies the entire GPT script cascade through the publisher's first-party domain, eliminating third-party script loads to securepubads.g.doubleclick.net
  • Server-side: Rust module with HTML attribute rewriting, bootstrap script serving (/integrations/gpt/script), and wildcard proxy endpoints (/pagead/, /tag/) that serve Google's scripts verbatim
  • Client-side: TypeScript script guard with six interception layers (document.write, src property descriptor, setAttribute, createElement, DOM insertion patches, MutationObserver) to catch and rewrite all dynamically inserted GPT scripts back through the proxy
  • Includes a googletag.cmd command queue patch as a hook point for future synthetic ID targeting injection and consent gating
  • Adds integration documentation, unit tests (Rust + TypeScript), and default config in trusted-server.toml

Closes #227

Verifying GPT Script Proxying in the Browser
To confirm that GPT scripts are being proxied through the trusted server domain:

  1. Open DevTools > Network tab on a page with GPT ads
  2. Filter by JS or search for gpt
  3. Look for requests to your publisher domain (e.g. publisher.com/integrations/gpt/script and publisher.com/integrations/gpt/pagead/...) instead of securepubads.g.doubleclick.net
  4. Proxied responses will include the response headers X-GPT-Proxy: true and X-Script-Source:
  5. In the Console tab, look for log messages prefixed with GPT guard: confirming the interception layers are active (e.g. "GPT guard: rewriting document.write script src")


/** Google ad-serving domains whose URLs should be proxied (exact match). */
const GPT_DOMAINS = [
'securepubads.g.doubleclick.net',

Check failure

Code scanning / CodeQL

Incomplete regular expression for hostnames

This string, which is used as a regular expression [here](1), has an unescaped '.' before 'doubleclick.net', so it might match more hosts than expected.

Copilot Autofix

AI 1 day ago

General approach: whenever we turn a hostname string into a regular expression, we should escape all regex metacharacters, not just dots, so that the regex always represents a literal hostname match. This prevents any accidental regex behavior if hostname strings change in the future.

Best minimal fix here: introduce a small helper that takes a hostname string and returns its fully regex-escaped form, then use that helper instead of domain.replace(/\./g, '\\.') when constructing the RegExp in the rewriteUrl fallback. This keeps existing behavior for the current hostnames while making the code robust against future changes, and directly addresses the CodeQL concern that the domain strings are used as regex patterns.

Concrete changes in crates/js/lib/src/integrations/gpt/script_guard.ts:

  1. Add a helper function, e.g. escapeHostnameForRegex(hostname: string): string, that replaces all regex metacharacters ([\\^$.*+?()[]{}|]) with escaped versions.
  2. In the loop within the catch block of rewriteUrl, replace the inline domain.replace(/\./g, '\\.') call with escapeHostnameForRegex(domain).
  3. Keep all existing logic and data structures unchanged; we’re only improving the regex-construction step.

No external libraries are needed; we can implement the escape helper with a simple .replace call.

Suggested changeset 1
crates/js/lib/src/integrations/gpt/script_guard.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/crates/js/lib/src/integrations/gpt/script_guard.ts b/crates/js/lib/src/integrations/gpt/script_guard.ts
--- a/crates/js/lib/src/integrations/gpt/script_guard.ts
+++ b/crates/js/lib/src/integrations/gpt/script_guard.ts
@@ -51,6 +51,15 @@
 /** Regex to match wildcard `*.safeframe.googlesyndication.com` subdomains. */
 const SAFEFRAME_WILDCARD_RE = /[a-zA-Z0-9_-]+\.safeframe\.googlesyndication\.com/i;
 
+/**
+ * Escape a hostname string so it can be safely interpolated into a RegExp
+ * as a literal. This prevents regex metacharacters from changing the match
+ * semantics if domain strings ever contain them.
+ */
+function escapeHostnameForRegex(hostname: string): string {
+  return hostname.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+
 /** Integration route prefix on the first-party domain. */
 const PROXY_PREFIX = '/integrations/gpt';
 
@@ -97,8 +106,9 @@
     for (const domain of GPT_DOMAINS) {
       if (lower.includes(domain)) {
         const prefix = hostPrefixForDomain(domain);
+        const escapedDomain = escapeHostnameForRegex(domain);
         return originalUrl.replace(
-          new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'),
+          new RegExp(`https?://(?:www\\.)?${escapedDomain}`, 'i'),
           `${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`,
         );
       }
EOF
@@ -51,6 +51,15 @@
/** Regex to match wildcard `*.safeframe.googlesyndication.com` subdomains. */
const SAFEFRAME_WILDCARD_RE = /[a-zA-Z0-9_-]+\.safeframe\.googlesyndication\.com/i;

/**
* Escape a hostname string so it can be safely interpolated into a RegExp
* as a literal. This prevents regex metacharacters from changing the match
* semantics if domain strings ever contain them.
*/
function escapeHostnameForRegex(hostname: string): string {
return hostname.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

/** Integration route prefix on the first-party domain. */
const PROXY_PREFIX = '/integrations/gpt';

@@ -97,8 +106,9 @@
for (const domain of GPT_DOMAINS) {
if (lower.includes(domain)) {
const prefix = hostPrefixForDomain(domain);
const escapedDomain = escapeHostnameForRegex(domain);
return originalUrl.replace(
new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'),
new RegExp(`https?://(?:www\\.)?${escapedDomain}`, 'i'),
`${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`,
);
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
/** Google ad-serving domains whose URLs should be proxied (exact match). */
const GPT_DOMAINS = [
'securepubads.g.doubleclick.net',
'pagead2.googlesyndication.com',

Check failure

Code scanning / CodeQL

Incomplete regular expression for hostnames

This string, which is used as a regular expression [here](1), has an unescaped '.' before 'googlesyndication.com', so it might match more hosts than expected.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

const GPT_DOMAINS = [
'securepubads.g.doubleclick.net',
'pagead2.googlesyndication.com',
'tpc.googlesyndication.com',

Check failure

Code scanning / CodeQL

Incomplete regular expression for hostnames

This string, which is used as a regular expression [here](1), has an unescaped '.' before 'googlesyndication.com', so it might match more hosts than expected.

Copilot Autofix

AI 9 days ago

In general, to avoid incomplete hostname regular expressions, any string used to build a regex should be run through a generic “escape for regex literal” function that escapes all regex metacharacters, not only dots. This ensures future additions to GPT_DOMAINS cannot accidentally introduce patterns that match more than the literal hostname.

Concretely, in crates/js/lib/src/integrations/gpt/script_guard.ts, the fallback block in rewriteUrl currently builds the regex with:

new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i')

This manually escapes only dots in domain. Replace this with a helper escapeRegex (defined in this file) that escapes every regex metacharacter: \ ^ $ * + ? . ( ) | { } [ ]. Use that helper both for domain and for any other future regex constructions based on literal strings if needed. The change is localized to this file: add the helper function (near the top, after constants or helpers) and change the new RegExp(...) call to use escapeRegex(domain) instead of domain.replace(/\./g, '\\.'). No behavior changes for current values, but it becomes robust and satisfies the security rule.

Suggested changeset 1
crates/js/lib/src/integrations/gpt/script_guard.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/crates/js/lib/src/integrations/gpt/script_guard.ts b/crates/js/lib/src/integrations/gpt/script_guard.ts
--- a/crates/js/lib/src/integrations/gpt/script_guard.ts
+++ b/crates/js/lib/src/integrations/gpt/script_guard.ts
@@ -54,6 +54,13 @@
 /** Integration route prefix on the first-party domain. */
 const PROXY_PREFIX = '/integrations/gpt';
 
+/**
+ * Escape a string so it can be safely used inside a RegExp literal.
+ */
+function escapeRegex(value: string): string {
+  return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+
 // ---------------------------------------------------------------------------
 // URL matching and rewriting
 // ---------------------------------------------------------------------------
@@ -98,7 +105,7 @@
       if (lower.includes(domain)) {
         const prefix = hostPrefixForDomain(domain);
         return originalUrl.replace(
-          new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'),
+          new RegExp(`https?://(?:www\\.)?${escapeRegex(domain)}`, 'i'),
           `${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`,
         );
       }
EOF
@@ -54,6 +54,13 @@
/** Integration route prefix on the first-party domain. */
const PROXY_PREFIX = '/integrations/gpt';

/**
* Escape a string so it can be safely used inside a RegExp literal.
*/
function escapeRegex(value: string): string {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

// ---------------------------------------------------------------------------
// URL matching and rewriting
// ---------------------------------------------------------------------------
@@ -98,7 +105,7 @@
if (lower.includes(domain)) {
const prefix = hostPrefixForDomain(domain);
return originalUrl.replace(
new RegExp(`https?://(?:www\\.)?${domain.replace(/\./g, '\\.')}`, 'i'),
new RegExp(`https?://(?:www\\.)?${escapeRegex(domain)}`, 'i'),
`${window.location.protocol}//${window.location.host}${PROXY_PREFIX}${prefix}`,
);
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
'pagead2.googlesyndication.com',
'tpc.googlesyndication.com',
'googletagservices.com',
'www.googletagservices.com',

Check failure

Code scanning / CodeQL

Incomplete regular expression for hostnames

This string, which is used as a regular expression [here](1), has an unescaped '.' before 'googletagservices.com', so it might match more hosts than expected.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

'tpc.googlesyndication.com',
'googletagservices.com',
'www.googletagservices.com',
'cm.g.doubleclick.net',

Check failure

Code scanning / CodeQL

Incomplete regular expression for hostnames

This string, which is used as a regular expression [here](1), has an unescaped '.' before 'doubleclick.net', so it might match more hosts than expected.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

'cm.g.doubleclick.net',
'ep1.adtrafficquality.google',
'ep2.adtrafficquality.google',
'www.googleadservices.com',

Check failure

Code scanning / CodeQL

Incomplete regular expression for hostnames

This string, which is used as a regular expression [here](1), has an unescaped '.' before 'googleadservices.com', so it might match more hosts than expected.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@ChristianPavilonis
Copy link
Collaborator Author

We should consider the scope of this integration.

Currently this is doing a lot of rewriting and proxying it's not catching everything but many scripts and 3rd party calls are proxied through the 1st party context.

@ChristianPavilonis ChristianPavilonis marked this pull request as ready for review February 13, 2026 18:18
@aram356
Copy link
Collaborator

aram356 commented Feb 20, 2026

@ChristianPavilonis Can you please make changes or resolve comment for the previous review?

@ChristianPavilonis
Copy link
Collaborator Author

Made some changes @aram356

Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

👍 Solid GPT first-party proxy integration. The Rust code follows existing integration patterns well (error-stack, log::, Arc, IntegrationRegistration::builder). The client-side script guard is thorough with six interception layers. Two documentation accuracy issues need fixing before merge.

Findings

🔧 High (P1)

  • X-Script-Source header claimed but not implemented: docs/guide/integrations/gpt.md:76 claims proxy responses include an X-Script-Source header, but gpt.rs never sets it. Either add the header or fix the docs. (inline comment)
  • Wrong GitHub org in doc links: docs/guide/integrations/gpt.md:150-151 links to AnomalyCo/trusted-server instead of IABTechLab/trusted-server. (inline comment)

🤔 Medium (P2)

  • Awkward module doc line break: gpt.rs:13-15 splits a sentence mid-phrase, creating an orphan line in rendered docs. (inline comment)
  • Unverified doc links: docs/guide/integrations/gpt.md:155-158 references /guide/integrations-overview, /guide/configuration, /guide/first-party-proxy, and /guide/integrations/gam — verify these pages exist or remove the links.

⛏ Low (P3)

  • Inconsistent dash style: gpt.md uses ASCII -- throughout while the rest of the project uses em-dash (—).

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • CodeQL: PASS

- `GET /integrations/gpt/pagead/*` -- Proxies secondary GPT scripts and resources
- `GET /integrations/gpt/tag/*` -- Proxies tag-path resources

Successful proxy responses include `X-GPT-Proxy: true` and `X-Script-Source` headers for debugging.
Copy link
Collaborator

@aram356 aram356 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 X-Script-Source header doesn't exist in code: The docs claim proxy responses include both X-GPT-Proxy: true and X-Script-Source headers, but gpt.rs only sets X-GPT-Proxy: true — there is no X-Script-Source header anywhere in the implementation.

Fix: Either add response.set_header("X-Script-Source", &target_url) to both handle_script_serving and handle_pagead_proxy, or remove X-Script-Source from this line.


## Implementation

- **Rust**: [crates/common/src/integrations/gpt.rs](https://github.com/AnomalyCo/trusted-server/blob/main/crates/common/src/integrations/gpt.rs)
Copy link
Collaborator

@aram356 aram356 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Wrong GitHub org: Links point to AnomalyCo/trusted-server but should be IABTechLab/trusted-server.

Fix:

- **Rust**: [crates/common/src/integrations/gpt.rs](https://github.com/IABTechLab/trusted-server/blob/main/crates/common/src/integrations/gpt.rs)
- **TypeScript**: [crates/js/lib/src/integrations/gpt/](https://github.com/IABTechLab/trusted-server/blob/main/crates/js/lib/src/integrations/gpt/)

//! 4. Auxiliary scripts – viewability, monitoring, error reporting
//!
//! All of these are served from `securepubads.g.doubleclick.net`. The
//! integration proxies these scripts
Copy link
Collaborator

@aram356 aram356 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Awkward doc line break: The sentence splits mid-phrase across lines 13-15:

//! integration proxies these scripts
//! through the publisher's domain while ...

This creates an orphan line in rendered docs.

Fix: Reflow to //! The integration proxies these scripts through the publisher's domain while a client-side shim intercepts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

As publisher I want to host GPT script in publisher domain

2 participants