Skip to content

TIKA-4679: Add HTTP/2 support to tika-server via Jetty http2-server#2672

Draft
nddipiazza wants to merge 5 commits intomainfrom
TIKA-4679-http2-server-support
Draft

TIKA-4679: Add HTTP/2 support to tika-server via Jetty http2-server#2672
nddipiazza wants to merge 5 commits intomainfrom
TIKA-4679-http2-server-support

Conversation

@nddipiazza
Copy link
Contributor

Summary

Adds HTTP/2 (h2c cleartext) support to tika-server by including the org.eclipse.jetty.http2:http2-server jar on the classpath. When this jar is present, CXF's Jetty transport automatically negotiates HTTP/2 alongside HTTP/1.1 on the existing port (default 9998). Existing HTTP/1.1 clients are completely unaffected.

This implements TIKA-4679. The core dependency change was originally contributed by Lawrence Moorehead (@elemdisc) — see elemdisc/tika PR#1 — and is cherry-picked here with full author credit.


Changes

tika-parent/pom.xml

  • Added http2-server to the dependency management block alongside the existing http2-hpack, http2-client, http2-common entries (all at ${jetty.http2.version})

tika-server/tika-server-core/pom.xml (Lawrence Moorehead's commit)

  • Added org.eclipse.jetty.http2:http2-server runtime dependency (version from parent BOM)

tika-server/tika-server-core/src/test/.../TikaServerIntegrationTest.java (Lawrence Moorehead's commit)

  • Added testH2c() unit test that sends a request via HttpClient.Version.HTTP_2 and asserts the response was served over HTTP/2

tika-e2e-tests/tika-server/ (new module)

  • New e2e module that starts the actual fat-jar process and validates HTTP/2 (h2c) end-to-end
  • Tests are skipped by default; run with -Pe2e
  • Wired into tika-e2e-tests/pom.xml

How it works

Adding http2-server to the classpath is sufficient for h2c (HTTP/2 cleartext) support. CXF's JettyHTTPServerEngineFactory detects the jar at startup and wires in HTTP2CServerConnectionFactory. No startup code changes are required.

For h2 over TLS (recommended for production), configure TlsConfig in tika-server.json. Java 17's built-in ALPN handles protocol negotiation automatically — no separate ALPN agent is needed.


Port management

  • Single port (9998 by default) continues to serve both HTTP/1.1 and HTTP/2
  • No second port added; Docker EXPOSE 9998 and health-check are unchanged
  • The fat-jar grows by ~500 KB from the new jar

Shutdown note

HTTP/2 multiplexes multiple requests over a single TCP connection. The current shutdownNow() path does not send a GOAWAY frame before closing. Under moderate load this is acceptable for h2c, but a future improvement could add a drain timeout for graceful HTTP/2 shutdown.


Backward compatibility

Purely additive classpath change:

  • Does not change the default port
  • Does not require TLS (TLS remains opt-in)
  • Does not break any existing HTTP/1.1 client
  • Does not change the REST API surface

Testing Instructions

# Unit test (no external process)
mvn test -pl tika-server/tika-server-core -Dtest=TikaServerIntegrationTest#testH2c

# E2E test (requires fat-jar to be built first)
mvn package -pl tika-server/tika-server-standard -DskipTests
mvn test -pl tika-e2e-tests/tika-server -Pe2e

Manually with curl (after starting the server):

# HTTP/2 cleartext (h2c)
curl --http2-prior-knowledge http://localhost:9998/tika

# HTTP/1.1 — unchanged behavior
curl http://localhost:9998/tika

Review Checklist

  • http2-server version comes from ${jetty.http2.version} in parent BOM (not hardcoded)
  • Existing HTTP/1.1 tests still pass
  • TikaServerIntegrationTest#testH2c passes
  • E2E module compiles and tests pass with -Pe2e
  • No second port introduced

Potential Concerns

  • h2c vs h2: This PR enables h2c (cleartext). For h2 over TLS an additional jetty-alpn-java-server dependency may be needed depending on the Jetty version and JVM. This can be addressed in a follow-up.
  • Reverse proxies: Most reverse proxies (nginx, AWS ALB, GCP LB) do not support h2c — they require h2 over TLS. For internal service-to-service use h2c is fine; for edge deployments, TLS is recommended.
  • Fat-jar size: The http2-server jar adds ~500 KB to tika-server-standard. This also increases the apache/tika Docker image slightly.

elemdisc and others added 2 commits March 4, 2026 10:24
- Add tika-e2e-tests/tika-server module with TikaServerHttp2Test
- Test starts the real fat-jar and verifies HTTP/2 (h2c) responses via
  Java HttpClient configured with Version.HTTP_2
- Wire module into tika-e2e-tests/pom.xml modules list
- Module is skipped by default; enable with -Pe2e profile

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nddipiazza nddipiazza marked this pull request as draft March 4, 2026 16:33
@tballison
Copy link
Contributor

. For h2 over TLS an additional jetty-alpn-java-server dependency may be needed depending on the Jetty version and JVM. This can be addressed in a follow-up.

I think we're good with Java 17.

…h-check

- Add Assumptions.assumeTrue(jar.exists()) so tests skip gracefully when
  tika-server-standard fat-jar hasn't been built (CI without prior install)
- Change startup health-check from / to /status (more reliable 200 response)
- Increase startup timeout to 90s for slower CI environments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds HTTP/2 (h2c cleartext) support to tika-server by adding the org.eclipse.jetty.http2:http2-server jar as a dependency. CXF's Jetty transport automatically detects this jar on the classpath and enables h2c negotiation alongside HTTP/1.1 on the existing port. No application code changes are needed — just the dependency addition.

Changes:

  • Added http2-server to the parent BOM dependency management and as a dependency in tika-server-core
  • Added a unit test (testH2c) in TikaServerIntegrationTest verifying HTTP/2 negotiation
  • Added a new tika-e2e-tests/tika-server module with end-to-end tests that start the actual fat-jar and validate HTTP/2 (h2c) on both status and parse endpoints

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tika-parent/pom.xml Adds http2-server artifact to the dependency management block using ${jetty.http2.version}
tika-server/tika-server-core/pom.xml Adds http2-server as a compile dependency (version inherited from parent BOM)
TikaServerIntegrationTest.java Adds testH2c() unit test using Java's HttpClient to verify HTTP/2 negotiation
tika-e2e-tests/pom.xml Registers the new tika-server e2e test module
tika-e2e-tests/tika-server/pom.xml New e2e module POM with surefire skip-by-default and -Pe2e profile activation
TikaServerHttp2Test.java New e2e test class that starts the fat-jar process and validates h2c on status and parse endpoints

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nddipiazza and others added 2 commits March 5, 2026 07:00
- Use tika-server-standard assembly zip (unpacked via dependency plugin)
  instead of thin jar, so the required lib/ dependencies are available
- Health-check endpoint changed from /status to / (root always returns 200;
  /status requires explicit endpoint config to be enabled)
- Pre-negotiate h2c before PUT /tika parse test: h2c Upgrade requires a
  no-body request first; GET / establishes the HTTP/2 connection so the
  subsequent PUT reuses it correctly
- Drop --noFork flag (TikaServerCli does not recognize it; server runs
  its own fork management independently)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused moduleDir variable; initialize repoRoot directly
- stopServer() now uses waitFor(5s) + destroyForcibly() + waitFor(30s)
  to avoid indefinite blocking if SIGTERM doesn't terminate the process

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nddipiazza
Copy link
Contributor Author

Addressed both Copilot review comments in commit 30b9ff3:

  1. Unused moduleDir variable — removed; repoRoot is now initialized directly from Paths.get("").toAbsolutePath()
  2. stopServer() blocking on waitFor() — adopted the safer pattern from IntegrationTestBase.tearDown(): waitFor(5s)destroyForcibly()waitFor(30s) to prevent CI hangs

@nddipiazza
Copy link
Contributor Author

@copilot should i be trying to make the http2 option optional so people aren't forced to have that on classpath?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants