Skip to content

Comments

Add Google Tag Manager first-party integration#262

Open
prk-Jr wants to merge 17 commits intomainfrom
feat/gtm-integration
Open

Add Google Tag Manager first-party integration#262
prk-Jr wants to merge 17 commits intomainfrom
feat/gtm-integration

Conversation

@prk-Jr
Copy link
Collaborator

@prk-Jr prk-Jr commented Feb 9, 2026

Scripts like GTM and GA4 are often blocked by ad blockers or privacy extensions when loaded from third-party domains, leading to data loss. Third-party cookie deprecation further limits tracking durability.

This change transparently proxies GTM/GA4 scripts and analytics beacons through the Trusted Server, establishing a first-party context. It automatically rewrites HTML tags (including <link rel="preload">) and script content to point to local proxy endpoints, bypassing blockers and extending cookie life.

Includes:

  • Proxy endpoints for gtm.js, gtag/js, /collect, and /g/collect with configurable caching and strict validation
  • Server-side HTML rewriting for src and href attributes targeting GTM/GA domains
  • Server-side script content rewriting to redirect internal GTM/GA calls through the proxy
  • Client-side script guard (DOM interception) for dynamically inserted scripts on Next.js and SPA sites
  • Privacy enhancement: client IP addresses are not forwarded to Google (Google sees only the edge server IP)
  • Regex hardened against subdomain spoofing (e.g., www.googletagmanager.com.evil.com)
  • Comprehensive testing: 18 unit/integration tests covering configuration, URL rewriting, HTML pipeline, header validation, and IP stripping

Manual Test Plan

Prerequisites: Configure .env with GTM enabled and a valid container ID, then start the local server.

1. Script proxy — gtm.js rewritten

curl -s 'http://127.0.0.1:7676/integrations/google_tag_manager/gtm.js?id=GTM-522ZT3X6' \
  | grep -c '/integrations/google_tag_manager'

Expected: count > 0

2. Script proxy — gtag/js returns 200

curl -s -o /dev/null -w '%{http_code}' \
  'http://127.0.0.1:7676/integrations/google_tag_manager/gtag/js?id=G-DQMZGMPHXN'

Expected: 200

3. gtag/js content rewritten (no Google domains)

curl -s 'http://127.0.0.1:7676/integrations/google_tag_manager/gtag/js?id=G-DQMZGMPHXN' \
  | grep -c 'www.google-analytics.com'

Expected: 0

4. Beacon proxy — POST /g/collect

curl -s -o /dev/null -w '%{http_code}' -X POST \
  'http://127.0.0.1:7676/integrations/google_tag_manager/g/collect?v=2&tid=G-TEST&cid=123&en=page_view' \
  -d 'v=2&tid=G-TEST'

Expected: 204 or 200

5. Beacon proxy — GET /collect

curl -s -o /dev/null -w '%{http_code}' \
  'http://127.0.0.1:7676/integrations/google_tag_manager/collect?v=2&tid=G-TEST'

Expected: 204 or 200

6. Cache headers present

curl -sI 'http://127.0.0.1:7676/integrations/google_tag_manager/gtm.js?id=GTM-522ZT3X6' \
  | grep -i cache-control

Expected: cache-control: public, max-age=900

Resolves: #224

Scripts like GTM and GA4 are often blocked by ad blockers and privacy extensions when loaded from third-party domains, leading to data loss. Third-party cookie deprecation further limits tracking durability.

This change proxies GTM scripts and analytics beacons through the Trusted Server, establishing a first-party context. It automatically rewrites HTML tags and script content to point to local proxy endpoints, bypassing blockers and extending cookie life.

Includes:

Proxy endpoints for gtm.js and /collect
Content rewriting for redirecting internal GTM calls
Configuration and integration tests

Resolves: #224
@prk-Jr prk-Jr self-assigned this Feb 9, 2026
@prk-Jr prk-Jr marked this pull request as draft February 9, 2026 09:20
@prk-Jr prk-Jr marked this pull request as ready for review February 9, 2026 10:23
@prk-Jr prk-Jr marked this pull request as draft February 9, 2026 10:24
Adds comprehensive tests for:
- GTM configuration parsing and default values
- HTML processor pipeline integration
- Response body rewriting logic
@prk-Jr prk-Jr marked this pull request as ready for review February 9, 2026 11:13
Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Please make sure checks pass before assigning to review.

@prk-Jr prk-Jr requested a review from aram356 February 10, 2026 12:20
Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. Need to address specific items and the following.

Duplicated rewrite logic across three places

The GTM URL rewriting logic exists in three separate methods: rewrite_gtm_script(), IntegrationAttributeRewriter::rewrite(), and IntegrationScriptRewriter::rewrite(). Each handles a slightly different set of patterns. This is error-prone — a new URL pattern needs to be added in multiple places.

…, set default enablement to false, and update documentation for handling
@prk-Jr prk-Jr requested a review from aram356 February 13, 2026 09:33
@prk-Jr prk-Jr requested a review from aram356 February 16, 2026 07:56
…nctions, improving request configuration for beacons and scripts
@prk-Jr prk-Jr marked this pull request as draft February 16, 2026 16:07
@prk-Jr prk-Jr marked this pull request as ready for review February 17, 2026 11:29
Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the re-writes to work with nextjs sites we need to use the scriptGuard pattern. Next sites can load the scripts dynamically after the fact so we catch and rewrite the script tags client side also.

see: https://github.com/IABTechLab/trusted-server/blob/main/crates/js/lib/src/integrations/permutive/script_guard.ts

@aram356
Copy link
Collaborator

aram356 commented Feb 19, 2026

@prk-Jr Please add (manual) test plan

Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good overall. I would like to see test plan and address script guard.

- Widen IntegrationAttributeRewriter to rewrite href/src for gtag/js
  and google-analytics.com URLs (not just gtm.js), fixing <link
  rel=preload> tags not being rewritten on Next.js sites
- Add client-side script guard for dynamically inserted GTM/GA scripts
  using the shared createScriptGuard factory (matches DataDome pattern)
- Harden URL regex with delimiter capture group to prevent subdomain
  spoofing (e.g., www.googletagmanager.com.evil.com)
- Add is_rewritable_url helper to selectively rewrite only URLs with
  corresponding proxy routes (excludes ns.html)
- Document gtag/js endpoint in integration guide
@prk-Jr prk-Jr marked this pull request as draft February 20, 2026 15:07
@prk-Jr prk-Jr marked this pull request as draft February 20, 2026 15:07
@prk-Jr prk-Jr marked this pull request as draft February 20, 2026 15:07
gtag.js constructs beacon URLs dynamically from bare domain strings,
so rewriting them at the script level produces broken URLs. Instead,
add a shared beacon_guard that patches navigator.sendBeacon and
window.fetch at runtime to intercept requests to google-analytics.com
and analytics.google.com, rewriting them to the first-party proxy.
- Add shared beacon_guard.ts factory (sendBeacon + fetch interception)
- Wire GTM integration to install beacon guard on init
- Require // prefix in Rust GTM_URL_PATTERN to prevent bare domain rewrites
- Add tests for both shared factory and GTM-specific beacon interception
- Use status 200 instead of 204 (jsdom rejects 204 as null-body status)
- Use absolute URLs in test rewriteUrl to satisfy jsdom's Request constructor
@prk-Jr prk-Jr marked this pull request as ready for review February 20, 2026 15:51
Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prk-Jr Please deploy to staging for autoblog.com I did not see in the plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

As publisher I want to host GTM in publisher domain

3 participants