Skip to content

Jnation workshop#19

Open
pedanticdev wants to merge 33 commits into
developfrom
jnation-workshop
Open

Jnation workshop#19
pedanticdev wants to merge 33 commits into
developfrom
jnation-workshop

Conversation

@pedanticdev
Copy link
Copy Markdown
Owner

No description provided.

pedanticdev and others added 25 commits December 17, 2025 16:13
 Changelog

  Optimizations
   * MarketDataFragmentHandler:
       * Reduced object allocation by introducing reusable byte[] and StringBuilder buffers for SBE message processing.
       * Added early-return logic to skip full message decoding when sampling is active but the message won't be broadcast.
       * Fixed a bug in SBE decoding order for MarketDataDepth messages (traversing repeating groups before variable-length data).
   * MarketDataPublisher:
       * Replaced java.util.Random with java.util.concurrent.ThreadLocalRandom for improved concurrency performance.
       * Cached the isDirectMode configuration flag to avoid repeated string comparisons in the hot path.

  Docker Improvements
   * Refactored Dockerfile and Dockerfile.standard to use the ADD directive with a remote URL for downloading payara-micro.jar, replacing the previous RUN wget... commands to streamline the build layers.
   * Standardized JAVA_OPTS to use -Xms8g -Xmx8g in both Dockerfiles.

  Tests
   * Updated MarketDataFragmentHandlerTest assertions to be robust against standard floating-point string formatting (e.g., accepting 1.5 vs 1.5000), accommodating the removal of expensive String.format calls in the optimized handler.
 Add GC monitoring and stress testing capabilities with UI improvements

  Implemented comprehensive GC monitoring and memory pressure testing features
  to demonstrate Azul Platform Prime C4 collector performance under load.

  Backend additions:
  - GC statistics collection via JMX MXBeans (GCStatsService, GCStats)
  - Memory pressure testing with 5 allocation modes (OFF/LOW/MEDIUM/HIGH/EXTREME)
  - REST endpoints for GC stats (/api/gc/stats, /api/gc/reset)
  - REST endpoints for memory pressure control (/api/pressure/*)
  - Rate-limited logging to reduce log flooding from high-throughput publisher
  - Circuit breaker pattern for publish failures

  Infrastructure improvements:
  - Increased heap size to 8GB for both Azul Prime and Standard JDK
  - Pre-touch memory (-XX:+AlwaysPreTouch) and transparent huge pages support
  - Enhanced GC logging with detailed decorators
  - Docker ADD directive optimization
  - Maven wrapper for consistent builds
  - GitHub Actions workflow with basic test assertions

  UI enhancements:
  - Real-time GC pause time visualization (Chart.js)
  - GC Challenge Mode controls with visual button feedback
  - Reorganized layout: GC Pause Time chart side-by-side with Challenge Mode
  - Backend message rate display
  - Fixed API endpoint paths (removed /trader-stream-ee context prefix)

  Test coverage:
  - WebSocket connection tests
  - Market data publisher/subscriber tests
  - Error handling and exception catching improvements

  These enhancements enable side-by-side comparison of Azul C4's pauseless
  collection vs standard G1GC under synthetic memory pressure, inspired by
  1 Billion Row Challenge (1BRC) stress testing techniques.
…ties, GC stress testing, and quality assurance to TradeStreamEE for production-grade JVM performance comparison.

Key Features

  Observability Stack:
  - Complete monitoring with Prometheus, Grafana, Loki, and Traefik
  - Real-time JVM comparison dashboard showing GC pause times, memory usage, and performance metrics
  - JMX exporter integration for detailed GC statistics and memory monitoring
  - Automated deployment scripts (start-comparison.sh, stop-comparison.sh)

  Clustering & Scalability:
  - Hazelcast clustering with split-brain protection for both C4 and G1GC environments
  - Multi-instance Docker configurations supporting horizontal scaling
  - Load-balanced deployments with health checks and proper service discovery
  - Separate cluster configurations for C4 vs G1GC environments

  GC Stress Testing & Monitoring:
  - MemoryPressureService with configurable allocation modes (LOW to EXTREME: 1MB-2GB/sec)
  - GCStatsService with real-time pause time analysis and percentile calculations
  - Interactive "GC Challenge Mode" UI for live stress testing
  - REST endpoints for memory pressure control and GC statistics collection

  Code Quality & Testing:
  - Spotless formatting with Google Java Format for consistent code style
  - 19 unit tests with 100% pass rate covering core components
  - JaCoCo coverage reporting with 15% threshold enforcement in CI
  - Test utilities including GCTestUtil for memory pressure testing

  Infrastructure Improvements:
  - Enhanced JVM tuning (8GB heap, pre-touch memory, transparent huge pages)
  - Improved OOM monitoring with efficient pattern matching
  - Rate-limited logging to prevent log flooding under load
  - Comprehensive .gitignore for monitoring data and build artifacts

  This transforms TradeStreamEE from a demonstration into a production-ready platform for comparing JVM performance under realistic workload conditions.
 This PR enhances TradeStreamEE with a DIRECT ingestion mode, improves Azul C4 detection accuracy, adds container restart policies, and refreshes the web UI with dark mode support.

  Key Changes

  Core Functionality
  - New DIRECT ingestion mode (AeronSubscriberBean.java, MarketDataPublisher.java): Added TRADER_INGESTION_MODE config property to skip Aeron/MediaDriver initialization, allowing direct market data ingestion without the Aeron transport layer
  - Fixed Azul C4 detection (GCStatsResource.java:96-97): Changed from vendor string check to accurate GC MXBean name check ("GPGC"), preventing false positives
  - Added allocation rate metrics (GCStatsResource.java:93-95): New allocationRateMBps field in GC stats response

  Infrastructure
  - Container restart policy (docker-compose.yml, docker-compose-standard.yml): Added restart: unless-stopped for improved resilience

  Code Modernization
  - Lambda conversion (MarketDataBroadcaster.java:52-55): Replaced anonymous inner class with lambda for Hazelcast message listener

  Frontend
  - Dark mode support (index.html): Complete UI refresh with CSS custom properties for theme switching
  - New blog page (blog.html): Comprehensive article on Azul C4 and Payara for high-frequency trading

  Test Coverage

  Updated README documentation reflects expanded test suite: 160 tests passing (up from 19), with >90% coverage for monitoring components.
This PR refactors the TradeStreamEE project documentation and deployment tooling to emphasize its primary purpose: real-time side-by-side JVM comparison between Azul C4 (pauseless GC) and standard G1GC (stop-the-world).

  Key Changes

  1. Deployment Script Refactoring (start-comparison.sh)

  - Added parameter support: ./start-comparison.sh (clusters only) vs ./start-comparison.sh all (full monitoring stack)
  - Removed redundant Maven build (Dockerfiles handle it)
  - Added Docker validation, compose v1/v2 compatibility wrapper, and cluster status checks
  - Made monitoring network creation conditional on deployment mode

  2. README.md Restructuring

  - Simplified Core Technologies - Added Payara Micro, condensed verbose explanations, removed marketing language
  - Rewrote Quick Start - Clear "Primary Use Case" section emphasizing side-by-side JVM comparison with both clusters running identical WAR files
  - Fixed Technical Inaccuracies:
    - Heap size documentation (8GB single instance vs 4GB clustered)
    - Clarified both clusters use AERON by default (not one naive/one optimized)
    - Added TRADER_INGESTION_MODE environment variable values (AERON or DIRECT)
  - Condensed Monitoring Section - Reduced from ~170 lines to ~30 lines, removed duplicate setup instructions
  - Improved Diagram - Replaced multiple verbose Mermaid diagrams with single horizontal side-by-side comparison (DIRECT vs AERON modes)
  - Fixed Section Ordering - Eliminated backward references between sections

  3. UI Improvements (index.html)

  - Moved "GC Pause Time (Live)" chart to first position in dashboard for prominence

  4. Dockerfiles

  - Added spotless:apply to all Dockerfiles to ensure code formatting during build

  Technical Accuracy Corrections

  | Before                                      | After                                                                     |
  |---------------------------------------------|---------------------------------------------------------------------------|
  | Implied DIRECT mode for G1GC, AERON for C4  | Both use AERON by default; mode is architectural choice, not JVM-specific |
  | 8GB heap universally documented             | 8GB (single) vs 4GB (clustered) per deployment type                       |
  | Verbose "binary encoding explained" section | Condensed to essential technical points                                   |
  | Referred users to Grafana for comparison    | Directs users to application UI GC Pause Time chart                       |

  Files Modified

  - README.md - Comprehensive restructuring and technical corrections
  - start-comparison.sh - Parameterized deployment with Docker validation
  - stop-comparison.sh - New stop script
  - index.html - UI chart repositioning
  - Dockerfile, Dockerfile.scale, Dockerfile.scale.standard, Dockerfile.standard - Added spotless formatting
  - docker-compose-monitoring.yml - New monitoring stack configuration
This pull request refactors the memory pressure simulation and GC monitoring infrastructure to achieve production-grade performance and high-fidelity HFT workload simulation.

  Key Changes

   * Parallel Memory Pressure: Leverages Jakarta Concurrency 3.1 and Virtual Threads to parallelize garbage generation, overcoming single-threaded bandwidth bottlenecks and achieving the target 4.0 GB/sec
     throughput in EXTREME mode.
   * HFT Allocation Patterns: Implements a Strategy-based simulation registry featuring domain-specific patterns (OrderBook, MarketTick, MarketDepth) that create realistic, hierarchical object graphs instead of
     primitive arrays.
   * Coordinated Bursts: Introduced timed allocation spikes (3x multiplier) to simulate market events like flash crashes and news-driven volume surges.
   * GC Phase Observability: Enhanced GCStatsService to decompose pause times into constituent phases (e.g., Mark, Relocate), providing percentile-based statistics for deeper collector analysis.
   * Documentation Refactoring: Updated README.md to reflect the new architecture and transitioned to a direct, technical tone suitable for a reference implementation.
Co-authored-by: Luqman Saeed <luqman.saeed@payara.fish>
  Redesigned memory pressure testing from simple allocation modes to GC-specific stress scenarios that target different garbage collector behaviors.

  New Scenarios:
  - STEADY_LOAD - Constant allocation rate with stable live set
  - GROWING_HEAP - Expanding live set until near OOM
  - PROMOTION_STORM - Objects surviving past tenuring threshold
  - FRAGMENTATION - Variable-sized allocations creating heap holes
  - CROSS_GEN_REFS - Old→young references triggering write barriers and remembered set maintenance

  Changes:
  - Refactored AllocationMode enum with ScenarioType classification
  - Implemented proper cross-generational reference pattern with RefHolder class
  - Updated REST API responses to include scenario metadata
  - Added GC terminology to README glossary

  Documentation & Presentation

  - Created reveal.js presentation (presentation.html) explaining GC fundamentals, G1 vs C4 differences, and the testing approach
  - Updated blog.html with scenario-based examples
  - Refreshed index.html UI for mobile responsiveness
* Revert HTF simulation strategy

* implement alternative gc stress tests (#15)

Co-authored-by: Luqman Saeed <luqman.saeed@payara.fish>

* Fix C4 name identification

* Use bundled eclipse formatter with spotless

* copy spotless config file. Update presentation

* Amend reame

---------

Co-authored-by: Luqman Saeed <luqman.saeed@payara.fish>
* stratify jvm flags

* clarifying log entries

* JFR impl - initial setup

* Update readme, amend start-comparison script to accept run mode

---------

Co-authored-by: Luqman Saeed <luqman.saeed@payara.fish>
- docker-compose-workshop.yml: one Azul Platform Prime instance with
  2 GB heap, JFR_ALWAYS_ON=true, and the workshop/jfr-settings mount.
  Bypasses the Zing checkpoint-sync abort that the 6-JVM cluster
  triggers on laptop hardware while keeping the C4 narrative intact.
- workshop/operational-notes.md: risk matrix and live diagnostic
  checklist for the speaker.
- workshop/speaker-prep.md: dated pre-workshop checklist (two weeks
  out through morning-of and post-workshop).
- workshop/scripts/jfr-query.sh: CLI wrapper for the analyses
  attendees do most often (summary, gc-pauses, gc-top, allocation,
  custom event counts, side-by-side compare).
- workshop/exercises/module-5-apply-to-your-app/gc-pathology-catalog.md:
  take-home reference for the four pathologies with JFR signatures,
  plain-language root causes, and fix-order tables.
- comparison-template.md: add Total pause time, Old gen growth,
  Application throughput, and SLA-violation rows.
- WORKSHOP.md and README.md: link to the new docs.
quickstart.sh runs verify-setup, builds the workshop image, starts
the container, waits for healthy, and fires a smoke recording to
prove the JFR pipeline end-to-end. Three smoke levels are
configurable: SMOKE=fast (status check), SMOKE=medium (15 s
recording, default), SMOKE=full (60 s recording with a
PROMOTION_STORM scenario).

WORKSHOP.md and workshop/README.md updated to point at it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant