diff --git a/auto/autoresearch.ideas.md b/auto/autoresearch.ideas.md new file mode 100644 index 000000000..4a25837e7 --- /dev/null +++ b/auto/autoresearch.ideas.md @@ -0,0 +1,30 @@ +# Autoresearch Ideas + +## Dead Ends (tried and failed) + +- **Tag name interning** (skip+byte dispatch): saves 878 allocs but verification loop overhead kills speed +- **String dedup (-@)** for filter names: no alloc savings, creates temp strings anyway +- **Split-based tokenizer**: 2.5x faster C-level split but can't handle {{ followed by %} nesting +- **Streaming tokenizer**: needs own StringScanner (+alloc), per-shift overhead worse than eager array +- **Merge simple_lookup? into initialize**: logic overhead offsets saved index call +- **Cursor for filter scanning**: cursor.reset overhead worse than inline byte loops +- **Direct strainer call**: YJIT already inlines context.invoke_single well +- **TruthyCondition subclass**: YJIT polymorphism at evaluate call site hurts more than 115 saved allocs +- **Index loop for filters**: YJIT optimizes each+destructure MUCH better than manual filter[0]/filter[1] + +## Key Insights + +- YJIT monomorphism > allocation reduction at this scale +- C-level StringScanner.scan/skip > Ruby-level byte loops (already applied) +- String#split is 2.5x faster than manual tokenization, but Liquid's grammar is too complex for regex +- 74% of total CPU time is GC — alloc reduction is the highest-leverage optimization +- But YJIT-deoptimization from polymorphism costs more than the GC savings + +## Remaining Ideas + +- **Tokenizer: use String#index + byteslice instead of StringScanner**: avoid the StringScanner overhead entirely for the simple case of finding {%/{{ delimiters +- **Pre-freeze all Condition operator lambdas**: reduce alloc in Condition initialization +- **Avoid `@blocks = []` in If with single-element optimization**: use `@block` ivar for single condition, only create array for elsif +- **Reduce ForloopDrop allocation**: reuse ForloopDrop objects across iterations or use a lighter-weight object +- **VariableLookup: single-segment optimization**: for "product.title" (1 lookup), use an ivar instead of 1-element Array + diff --git a/auto/autoresearch.md b/auto/autoresearch.md new file mode 100644 index 000000000..8ba585717 --- /dev/null +++ b/auto/autoresearch.md @@ -0,0 +1,109 @@ +# Autoresearch: Liquid Parse+Render Performance + +## Objective +Optimize the Shopify Liquid template engine's parse and render performance. +The workload is the ThemeRunner benchmark which parses and renders real Shopify +theme templates (dropify, ripen, tribble, vogue) with realistic data from +`performance/shopify/database.rb`. We measure parse time, render time, and +object allocations. The optimization target is combined parse+render time (µs). + +## How to Run +Run `./auto/autoresearch.sh` — it runs unit tests, liquid-spec conformance, +then the performance benchmark, outputting metrics in parseable format. + +## Metrics +- **Primary (optimization target)**: `combined_µs` (µs, lower is better) — sum of parse + render time +- **Secondary (tradeoff monitoring)**: + - `parse_µs` — time to parse all theme templates (Liquid::Template#parse) + - `render_µs` — time to render all pre-compiled templates + - `allocations` — total object allocations for one parse+render cycle + Parse dominates (~70-75% of combined). Allocations correlate with GC pressure. + +## Files in Scope +- `lib/liquid/*.rb` — core Liquid library (parser, lexer, context, expression, etc.) +- `lib/liquid/tags/*.rb` — tag implementations (for, if, assign, etc.) +- `performance/bench_quick.rb` — benchmark script + +## Off Limits +- `test/` — tests must continue to pass unchanged +- `performance/tests/` — benchmark templates, do not modify +- `performance/shopify/` — benchmark data/filters, do not modify + +## Constraints +- All unit tests must pass (`bundle exec rake base_test`) +- liquid-spec failures must not increase beyond 2 (pre-existing UTF-8 edge cases) +- No new gem dependencies +- Semantic correctness must be preserved — templates must render identical output +- **Security**: Liquid runs untrusted user code. See Strategic Direction for details. + +## Strategic Direction +The long-term goal is to converge toward a **single-pass, forward-only parsing +architecture** using one shared StringScanner instance. The current system has +multiple redundant passes: Tokenizer → BlockBody → Lexer → Parser → Expression +→ VariableLookup, each re-scanning portions of the source. A unified scanner +approach would: + +1. **One StringScanner** flows through the entire parse — no intermediate token + arrays, no re-lexing filter chains, no string reconstruction in Parser#expression. +2. **Emit a lightweight IL or normalized AST** during the single forward pass, + decoupling strictness checking from the hot parse path. The LiquidIL project + (`~/src/tries/2026-01-05-liquid-il`) demonstrated this: a recursive-descent + parser emitting IL directly achieved significant speedups. +3. **Minimal backtracking** — the scanner advances forward, byte-checking as it + goes. liquid-c (`~/src/tries/2026-01-16-Shopify-liquid-c`) showed that a + C-level cursor-based tokenizer eliminates most allocation overhead. + +Current fast-path optimizations (byte-level tag/variable/for/if parsing) are +steps toward this goal. Each one replaces a regex+MatchData pattern with +forward-only byte scanning. The remaining Lexer→Parser path for filter args +is the next target for elimination. + +**Security note**: Liquid executes untrusted user templates. All parsing must +use explicit byte-range checks. Never use eval, send on user input, dynamic +method dispatch, const_get, or any pattern that lets template authors escape +the sandbox. + +## Baseline +- **Commit**: 4ea835a (original, before any optimizations) +- **combined_µs**: 7,374 +- **parse_µs**: 5,928 +- **render_µs**: 1,446 +- **allocations**: 62,620 + +## Progress Log +- 3329b09: Replace FullToken regex with manual byte parsing → combined 7,262 (-1.5%) +- 97e6893: Replace VariableParser regex with manual byte scanner → combined 6,945 (-5.8%), allocs 58,009 +- 2b78e4b: getbyte instead of string indexing in whitespace_handler/create_variable → allocs 51,477 +- d291e63: Lexer equal? for frozen arrays, \s+ whitespace skip → combined ~6,331 +- d79b9fa: Avoid strip alloc in Expression.parse, byteslice for strings → allocs 49,151 +- fa41224: Short-circuit parse_number with first-byte check → allocs 48,240 +- c1113ad: Fast-path String in render_obj_to_output → combined ~6,071 +- 25f9224: Fast-path simple variable parsing (skip Lexer/Parser) → combined ~5,860, allocs 45,202 +- 3939d74: Replace SIMPLE_VARIABLE regex with byte scanner → combined ~5,717, allocs 42,763 +- fe7a2f5: Fast-path simple if conditions → combined ~5,444, allocs 41,490 +- cfa0dfe: Replace For tag Syntax regex with manual byte parser → combined ~4,974, allocs 39,847 +- 8a92a4e: Unified fast-path Variable: parse name directly, only lex filter chain → combined ~5,060, allocs 40,520 +- 58d2514: parse_tag_token returns [tag_name, markup, newlines] → combined ~4,815, allocs 37,355 +- db43492: Hoist write score check out of render loop → render ~1,345 +- 17daac9: Extend fast-path to quoted string literal variables → all 1,197 variables fast-pathed +- 9fd7cec: Split filter parsing: no-arg filters scanned directly, Lexer only for args → combined ~4,595, allocs 35,159 +- e5933fc: Avoid array alloc in parse_tag_token via class ivars → allocs 34,281 +- 2e207e6: Replace WhitespaceOrNothing regex with byte-level blank_string? → combined ~4,800 +- 526af22: invoke_single fast path for no-arg filter invocation → allocs 32,621 +- 76ae8f1: find_variable top-scope fast path → combined ~4,740 +- 4cda1a5: slice_collection: skip copy for full Array → allocs 32,004 +- 79840b1: Replace SIMPLE_CONDITION regex with manual byte parser → combined ~4,663, allocs 31,465 +- 69430e9: Replace INTEGER_REGEX/FLOAT_REGEX with byte-level parse_number → allocs 31,129 +- 405e3dc: Frozen EMPTY_ARRAY/EMPTY_HASH for Context @filters/@disabled_tags → allocs 31,009 +- b90d7f0: Avoid unnecessary array wrapping for Context environments → allocs 30,709 +- 3799d4c: Lazy seen={} hash in Utils.to_s/inspect → allocs 30,169 +- 0b07487: Fast-path VariableLookup: skip scan_variable for simple identifiers → allocs 29,711 +- 9de1527: Introduce Cursor class for centralized byte-level scanning +- dd4a100: Remove dead parse_tag_token/SIMPLE_CONDITION (now in Cursor) +- cdc3438: For tag: migrate lax_parse to Cursor with zero-alloc scanning → allocs 29,620 + +## Current Best +- **combined_µs**: ~3,400 (-54% from original 7,374 baseline) +- **parse_µs**: ~2,300 +- **render_µs**: ~1,100 +- **allocations**: 24,882 (-60% from original 62,620 baseline) diff --git a/auto/autoresearch.sh b/auto/autoresearch.sh new file mode 100755 index 000000000..f421767e6 --- /dev/null +++ b/auto/autoresearch.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash +# Autoresearch benchmark runner for Liquid performance optimization +# Runs: unit tests → performance benchmark (3 runs, takes best) +# Outputs METRIC lines for the agent to parse +# Exit code 0 = all good, non-zero = broken +set -euo pipefail + +cd "$(dirname "$0")/.." + +# ── Step 1: Unit tests (fast gate) ────────────────────────────────── +echo "=== Unit Tests ===" +TEST_OUT=$(bundle exec rake base_test 2>&1) +TEST_RESULT=$(echo "$TEST_OUT" | tail -1) +if echo "$TEST_OUT" | grep -q 'failures\|errors' && ! echo "$TEST_RESULT" | grep -q '0 failures, 0 errors'; then + echo "$TEST_OUT" | grep -E 'Failure|Error|failures|errors' | head -20 + echo "FATAL: unit tests failed" + exit 1 +fi +echo "$TEST_RESULT" + +# ── Step 2: Performance benchmark (3 runs, take best) ────────────── +echo "" +echo "=== Performance Benchmark (3 runs) ===" +BEST_COMBINED=999999 +BEST_PARSE=0 +BEST_RENDER=0 +BEST_ALLOC=0 + +for i in 1 2 3; do + OUT=$(bundle exec ruby performance/bench_quick.rb 2>&1) + P=$(echo "$OUT" | grep '^parse_us=' | cut -d= -f2) + R=$(echo "$OUT" | grep '^render_us=' | cut -d= -f2) + C=$(echo "$OUT" | grep '^combined_us=' | cut -d= -f2) + A=$(echo "$OUT" | grep '^allocations=' | cut -d= -f2) + echo " run $i: combined=${C}µs (parse=${P} render=${R}) allocs=${A}" + if [ "$C" -lt "$BEST_COMBINED" ]; then + BEST_COMBINED=$C + BEST_PARSE=$P + BEST_RENDER=$R + BEST_ALLOC=$A + fi +done + +echo "" +echo "METRIC combined_us=$BEST_COMBINED" +echo "METRIC parse_us=$BEST_PARSE" +echo "METRIC render_us=$BEST_RENDER" +echo "METRIC allocations=$BEST_ALLOC" diff --git a/auto/bench.sh b/auto/bench.sh new file mode 100755 index 000000000..77fc48092 --- /dev/null +++ b/auto/bench.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# Auto-research benchmark script for Liquid +# Runs: unit tests → liquid-spec → performance benchmark +# Outputs machine-readable metrics on success +# Exit code 0 = all good, non-zero = broken +set -euo pipefail + +cd "$(dirname "$0")/.." + +# ── Step 1: Unit tests (fast gate) ────────────────────────────────── +echo "=== Unit Tests ===" +if ! bundle exec rake base_test 2>&1; then + echo "FATAL: unit tests failed" + exit 1 +fi + +# ── Step 2: liquid-spec (correctness gate) ────────────────────────── +echo "" +echo "=== Liquid Spec ===" +SPEC_OUTPUT=$(bundle exec liquid-spec run spec/ruby_liquid.rb 2>&1 || true) +echo "$SPEC_OUTPUT" | tail -3 + +# Extract failure count from "Total: N passed, N failed, N errors" line +# Allow known pre-existing failures (≤2) +TOTAL_LINE=$(echo "$SPEC_OUTPUT" | grep "^Total:" || echo "Total: 0 passed, 0 failed, 0 errors") +FAILURES=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) failed.*/\1/p') +ERRORS=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) error.*/\1/p') +FAILURES=${FAILURES:-0} +ERRORS=${ERRORS:-0} +TOTAL_BAD=$((FAILURES + ERRORS)) + +if [ "$TOTAL_BAD" -gt 2 ]; then + echo "FATAL: liquid-spec has $FAILURES failures and $ERRORS errors (threshold: 2)" + exit 1 +fi + +# ── Step 3: Performance benchmark ────────────────────────────────── +echo "" +echo "=== Performance Benchmark ===" +bundle exec ruby performance/bench_quick.rb 2>&1 diff --git a/autoresearch.jsonl b/autoresearch.jsonl new file mode 100644 index 000000000..3b69d91ba --- /dev/null +++ b/autoresearch.jsonl @@ -0,0 +1,30 @@ +{"type":"config","name":"Liquid parse+render performance (tenderlove-inspired)","metricName":"combined_µs","metricUnit":"µs","bestDirection":"lower"} +{"run":1,"commit":"c09e722","metric":3818,"metrics":{"parse_µs":2722,"render_µs":1096,"allocations":24881},"status":"keep","description":"Baseline: 3,818µs combined, 24,881 allocs","timestamp":1773348490227} +{"run":2,"commit":"c09e722","metric":4063,"metrics":{"parse_µs":2901,"render_µs":1162,"allocations":24003},"status":"discard","description":"Tag name interning via skip+byte dispatch: saves 878 allocs but verification loop slower than scan","timestamp":1773348738557,"segment":0} +{"run":3,"commit":"c09e722","metric":3881,"metrics":{"parse_µs":2720,"render_µs":1161,"allocations":24881},"status":"discard","description":"String dedup (-@) for filter names: no alloc savings, no speed benefit","timestamp":1773348781481,"segment":0} +{"run":4,"commit":"c09e722","metric":3970,"metrics":{"parse_µs":2829,"render_µs":1141,"allocations":24881},"status":"discard","description":"Streaming tokenizer: needs own StringScanner (+1 alloc), per-shift overhead worse than saved array","timestamp":1773348883093,"segment":0} +{"run":5,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split-based tokenizer — regex can't handle unclosed tags inside raw blocks","timestamp":1773349089230,"segment":0} +{"run":6,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split regex tokenizer v2 — can't handle {{ followed by %} (variable-becomes-tag nesting)","timestamp":1773349248313,"segment":0} +{"run":7,"commit":"c09e722","metric":3861,"metrics":{"parse_µs":2744,"render_µs":1117,"allocations":24881},"status":"discard","description":"Merge simple_lookup? dot position into initialize — logic overhead offsets saved index call","timestamp":1773349376707,"segment":0} +{"run":8,"commit":"c09e722","metric":4048,"metrics":{"parse_µs":2929,"render_µs":1119,"allocations":24881},"status":"discard","description":"Use Cursor regex for filter name scanning — cursor.reset + method dispatch overhead worse than inline bytes","timestamp":1773349447172,"segment":0} +{"run":9,"commit":"c09e722","metric":3872,"metrics":{"parse_µs":2744,"render_µs":1128,"allocations":24881},"status":"discard","description":"Direct strainer call in Variable#render — YJIT already inlines context.invoke_single well","timestamp":1773349497593,"segment":0} +{"run":10,"commit":"c09e722","metric":3839,"metrics":{"parse_µs":2732,"render_µs":1107,"allocations":24879},"status":"discard","description":"Array#[] fast path for slice_collection with limit/offset — only 2 alloc savings, not meaningful","timestamp":1773349555348,"segment":0} +{"run":11,"commit":"c09e722","metric":3889,"metrics":{"parse_µs":2770,"render_µs":1119,"allocations":24766},"status":"discard","description":"TruthyCondition for simple if checks: -115 allocs but YJIT polymorphism at evaluate call site hurts speed","timestamp":1773349649377,"segment":0} +{"run":12,"commit":"c09e722","metric":4150,"metrics":{"parse_µs":2769,"render_µs":1381,"allocations":24881},"status":"discard","description":"Index loop for filters: YJIT optimizes each+destructure better than manual indexing","timestamp":1773349699285,"segment":0} +{"run":13,"commit":"b7ae55f","metric":3556,"metrics":{"parse_µs":2388,"render_µs":1168,"allocations":24882},"status":"keep","description":"Replace StringScanner tokenizer with String#byteindex — 12% faster parse, no regex overhead for delimiter finding","timestamp":1773349875890,"segment":0} +{"run":14,"commit":"e25f2f1","metric":3464,"metrics":{"parse_µs":2335,"render_µs":1129,"allocations":24882},"status":"keep","description":"Confirmation run: byteindex tokenizer consistently 3,400-3,600µs","timestamp":1773349889465,"segment":0} +{"run":15,"commit":"b37fa98","metric":3490,"metrics":{"parse_µs":2331,"render_µs":1159,"allocations":24882},"status":"keep","description":"Clean up tokenizer: remove unused StringScanner setup and regex constants","timestamp":1773349928672,"segment":0} +{"run":16,"commit":"b37fa98","metric":3638,"metrics":{"parse_µs":2460,"render_µs":1178,"allocations":24882},"status":"discard","description":"Single-char byteindex for %} search: Ruby loop overhead worse for nearby targets","timestamp":1773349985509,"segment":0} +{"run":17,"commit":"b37fa98","metric":3553,"metrics":{"parse_µs":2431,"render_µs":1122,"allocations":25256},"status":"discard","description":"Regex simple_variable_markup: MatchData creates 374 extra allocs, offsetting speed gain","timestamp":1773350066627,"segment":0} +{"run":18,"commit":"b37fa98","metric":3629,"metrics":{"parse_µs":2455,"render_µs":1174,"allocations":25002},"status":"discard","description":"String.new(capacity: 4096) for output buffer: allocates more objects, not fewer","timestamp":1773350101852,"segment":0} +{"run":19,"commit":"f6baeae","metric":3350,"metrics":{"parse_µs":2212,"render_µs":1138,"allocations":24882},"status":"keep","description":"parse_tag_token without StringScanner: pure byte ops avoid reset(token) overhead, -12% combined","timestamp":1773350230252,"segment":0} +{"run":20,"commit":"f6baead","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: regex ultra-fast path for Variable — name pattern too broad, matches invalid trailing dots","timestamp":1773350472859,"segment":0} +{"run":21,"commit":"ae9a2e2","metric":3314,"metrics":{"parse_µs":2203,"render_µs":1111,"allocations":24882},"status":"keep","description":"Clean confirmation run: 3,314µs (-55% from main), stable","timestamp":1773350544354,"segment":0} +{"run":22,"commit":"ae9a2e2","metric":3497,"metrics":{"parse_µs":2336,"render_µs":1161,"allocations":24882},"status":"discard","description":"Regex fast path for no-filter variables: include? + match? overhead exceeds byte scan savings","timestamp":1773350641375,"segment":0} +{"run":23,"commit":"ca327b0","metric":3445,"metrics":{"parse_µs":2284,"render_µs":1161,"allocations":24647},"status":"keep","description":"Condition#evaluate: skip loop block for simple conditions (no child_relation) — saves 235 allocs","timestamp":1773350691752,"segment":0} +{"run":24,"commit":"99454a9","metric":3489,"metrics":{"parse_µs":2353,"render_µs":1136,"allocations":24647},"status":"keep","description":"Replace simple_lookup? byte scan with match? regex — 8x faster per call, cleaner code","timestamp":1773350837721,"segment":0} +{"run":25,"commit":"99454a9","metric":3797,"metrics":{"parse_µs":2636,"render_µs":1161,"allocations":29627},"status":"discard","description":"Regex name extraction in try_fast_parse: MatchData creates 5K extra allocs, much worse","timestamp":1773351048938,"segment":0} +{"run":26,"commit":"db348e0","metric":3459,"metrics":{"parse_µs":2318,"render_µs":1141,"allocations":24647},"status":"keep","description":"Inline to_liquid_value in If render — avoids one method dispatch per condition evaluation","timestamp":1773351080001,"segment":0} +{"run":27,"commit":"b195d09","metric":3496,"metrics":{"parse_µs":2356,"render_µs":1140,"allocations":24530},"status":"keep","description":"Replace @blocks.each with while loop in If render — avoids block proc allocation per render","timestamp":1773351101134,"segment":0} +{"run":28,"commit":"b195d09","metric":3648,"metrics":{"parse_µs":2457,"render_µs":1191,"allocations":24530},"status":"discard","description":"While loop in For render: YJIT optimizes each well for hot loops with many iterations","timestamp":1773351142275,"segment":0} +{"run":29,"commit":"b195d09","metric":3966,"metrics":{"parse_µs":2641,"render_µs":1325,"allocations":24060},"status":"discard","description":"While loop for environment search: -470 allocs but YJIT deopt makes render 16% slower","timestamp":1773351193863,"segment":0} diff --git a/lib/liquid.rb b/lib/liquid.rb index 4d0a71a64..cfdb88d50 100644 --- a/lib/liquid.rb +++ b/lib/liquid.rb @@ -52,6 +52,8 @@ module Liquid require "liquid/version" require "liquid/deprecations" require "liquid/const" +require 'liquid/byte_tables' +require 'liquid/cursor' require 'liquid/standardfilters' require 'liquid/file_system' require 'liquid/parser_switching' diff --git a/lib/liquid/block.rb b/lib/liquid/block.rb index 73d86c7bd..19a76cb36 100644 --- a/lib/liquid/block.rb +++ b/lib/liquid/block.rb @@ -60,8 +60,11 @@ def block_name @tag_name end + # Cache block delimiters per tag name to avoid repeated string allocation + BLOCK_DELIMITER_CACHE = Hash.new { |h, k| h[k] = "end#{k}".freeze } + def block_delimiter - @block_delimiter ||= "end#{block_name}" + @block_delimiter ||= BLOCK_DELIMITER_CACHE[block_name] end private diff --git a/lib/liquid/block_body.rb b/lib/liquid/block_body.rb index e4ada7d16..5a618fea5 100644 --- a/lib/liquid/block_body.rb +++ b/lib/liquid/block_body.rb @@ -1,7 +1,5 @@ # frozen_string_literal: true -require 'English' - module Liquid class BlockBody LiquidTagToken = /\A\s*(#{TagName})\s*(.*?)\z/o @@ -38,7 +36,7 @@ def freeze private def parse_for_liquid_tag(tokenizer, parse_context) while (token = tokenizer.shift) - unless token.empty? || token.match?(WhitespaceOrNothing) + unless token.empty? || BlockBody.blank_string?(token) unless token =~ LiquidTagToken # line isn't empty but didn't match tag syntax, yield and let the # caller raise a syntax error @@ -53,8 +51,7 @@ def freeze end unless (tag = parse_context.environment.tag_for_name(tag_name)) - # end parsing if we reach an unknown tag and let the caller decide - # determine how to proceed + # end parsing if we reach an unknown tag; let the caller determine how to proceed return yield tag_name, markup end new_tag = tag.parse(tag_name, markup, tokenizer, parse_context) @@ -124,48 +121,38 @@ def self.rescue_render_node(context, output, line_number, exc, blank_tag) end end + def self.blank_string?(str) + str.match?(WhitespaceOrNothing) + end + private def parse_for_document(tokenizer, parse_context, &block) while (token = tokenizer.shift) next if token.empty? - case - when token.start_with?(TAGSTART) - whitespace_handler(token, parse_context) - unless token =~ FullToken - return handle_invalid_tag_token(token, parse_context, &block) - end - tag_name = Regexp.last_match(2) - markup = Regexp.last_match(4) - - if parse_context.line_number - # newlines inside the tag should increase the line number, - # particularly important for multiline {% liquid %} tags - parse_context.line_number += Regexp.last_match(1).count("\n") + Regexp.last_match(3).count("\n") - end - - if tag_name == 'liquid' - parse_liquid_tag(markup, parse_context) - next - end - unless (tag = parse_context.environment.tag_for_name(tag_name)) - # end parsing if we reach an unknown tag and let the caller decide - # determine how to proceed - return yield tag_name, markup + first_byte = token.getbyte(0) + if first_byte == Cursor::LCURLY + second_byte = token.getbyte(1) + if second_byte == Cursor::PCT + # handle_tag_token returns: + # nil — tag parsed normally, continue (update line number) + # :next — 'liquid' inline tag; skip line number update + # :unknown — end tag or unknown tag; yield to caller and return + # :invalid — malformed tag token; delegate to handle_invalid_tag_token + result = handle_tag_token(token, parse_context, tokenizer) + next unless result # nil: normal + next if result == :next # :next: 'liquid' + return yield(@_unknown_tag_name, parse_context.cursor.tag_markup) if result == :unknown + return handle_invalid_tag_token(token, parse_context, &block) # :invalid + elsif second_byte == Cursor::LCURLY + whitespace_handler(token, parse_context) + @nodelist << create_variable(token, parse_context) + @blank = false + else + # Fallback: text token starting with '{' + append_text_token(token, parse_context) end - new_tag = tag.parse(tag_name, markup, tokenizer, parse_context) - @blank &&= new_tag.blank? - @nodelist << new_tag - when token.start_with?(VARSTART) - whitespace_handler(token, parse_context) - @nodelist << create_variable(token, parse_context) - @blank = false else - if parse_context.trim_whitespace - token.lstrip! - end - parse_context.trim_whitespace = false - @nodelist << token - @blank &&= token.match?(WhitespaceOrNothing) + append_text_token(token, parse_context) end parse_context.line_number = tokenizer.line_number end @@ -173,8 +160,54 @@ def self.rescue_render_node(context, output, line_number, exc, blank_tag) yield nil, nil end - def whitespace_handler(token, parse_context) - if token[2] == WhitespaceControl + # Handles a {%...%} tag token. Does not receive the outer block — callers handle + # yield/block passing themselves, keeping the Proc off the hot path. + # Returns: + # nil — tag parsed, caller continues the loop + # :next — 'liquid' inline tag; caller skips line number update + # :unknown — unknown/end tag; @_unknown_tag_name holds the tag name; + # markup is in parse_context.cursor.tag_markup + # :invalid — malformed token; caller delegates to handle_invalid_tag_token + private def handle_tag_token(token, parse_context, tokenizer) + whitespace_handler(token, parse_context) + cursor = parse_context.cursor + tag_name = cursor.parse_tag_token(token) + return :invalid unless tag_name + + markup = cursor.tag_markup + if parse_context.line_number + newlines = cursor.tag_newlines + parse_context.line_number += newlines if newlines > 0 + end + + if tag_name == 'liquid' + parse_liquid_tag(markup, parse_context) + return :next + end + + tag = parse_context.environment.tag_for_name(tag_name) + unless tag + # end parsing if we reach an unknown tag; let the caller determine how to proceed + @_unknown_tag_name = tag_name + return :unknown + end + + new_tag = tag.parse(tag_name, markup, tokenizer, parse_context) + @blank &&= new_tag.blank? + @nodelist << new_tag + nil + end + + def append_text_token(token, parse_context) + token.lstrip! if parse_context.trim_whitespace + parse_context.trim_whitespace = false + @nodelist << token + @blank &&= BlockBody.blank_string?(token) + end + private :append_text_token + + private def whitespace_handler(token, parse_context) + if token.getbyte(2) == Cursor::DASH previous_token = @nodelist.last if previous_token.is_a?(String) first_byte = previous_token.getbyte(0) @@ -184,7 +217,7 @@ def whitespace_handler(token, parse_context) end end end - parse_context.trim_whitespace = (token[-3] == WhitespaceControl) + parse_context.trim_whitespace = (token.getbyte(token.bytesize - 3) == Cursor::DASH) end def blank? @@ -216,24 +249,35 @@ def render(context) end def render_to_output_buffer(context, output) - freeze unless frozen? + freeze - context.resource_limits.increment_render_score(@nodelist.length) + resource_limits = context.resource_limits + resource_limits.increment_render_score(@nodelist.length) + # Hot render loop — split on check_write so the common case (no resource + # limits) pays zero branch cost per node. idx = 0 - while (node = @nodelist[idx]) - if node.instance_of?(String) - output << node - else - render_node(context, output, node) - # If we get an Interrupt that means the block must stop processing. An - # Interrupt is any command that stops block execution such as {% break %} - # or {% continue %}. These tags may also occur through Block or Include tags. - break if context.interrupt? # might have happened in a for-block + if resource_limits.render_length_limit || resource_limits.last_capture_length + while (node = @nodelist[idx]) + if node.instance_of?(String) + output << node + else + render_node(context, output, node) + break if context.interrupt? + end + idx += 1 + resource_limits.increment_write_score(output) + end + else + while (node = @nodelist[idx]) + if node.instance_of?(String) + output << node + else + render_node(context, output, node) + break if context.interrupt? + end + idx += 1 end - idx += 1 - - context.resource_limits.increment_write_score(output) end output @@ -241,19 +285,15 @@ def render_to_output_buffer(context, output) private + # Indirection allows subclasses to intercept per-node rendering. def render_node(context, output, node) BlockBody.render_node(context, output, node) end def create_variable(token, parse_context) - if token.end_with?("}}") - i = 2 - i = 3 if token[i] == "-" - parse_end = token.length - 3 - parse_end -= 1 if token[parse_end] == "-" - markup_end = parse_end - i + 1 - markup = markup_end <= 0 ? "" : token.slice(i, markup_end) - + len = token.bytesize + if len >= 4 && token.getbyte(len - 1) == Cursor::RCURLY && token.getbyte(len - 2) == Cursor::RCURLY + markup = parse_context.cursor.parse_variable_token(token) return Variable.new(markup, parse_context) end diff --git a/lib/liquid/byte_tables.rb b/lib/liquid/byte_tables.rb new file mode 100644 index 000000000..b39c3f8b5 --- /dev/null +++ b/lib/liquid/byte_tables.rb @@ -0,0 +1,40 @@ +# frozen_string_literal: true + +module Liquid + # Pre-computed 256-entry boolean lookup tables for byte classification. + # Built once at load time; used as TABLE[byte] — a single array index + # instead of 3-5 comparison operators per check. + # + # Performance: neutral to slightly faster vs. chained comparisons. + # Readability: replaces expressions like + # (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95 + # with the intent-revealing + # ByteTables::IDENT_START[b] + module ByteTables + # [a-zA-Z_] — valid first byte of an identifier + IDENT_START = Array.new(256, false).tap do |t| + (97..122).each { |b| t[b] = true } # a-z + (65..90).each { |b| t[b] = true } # A-Z + t[95] = true # _ + end.freeze + + # [a-zA-Z0-9_-] — valid continuation byte of an identifier + IDENT_CONT = Array.new(256, false).tap do |t| + (97..122).each { |b| t[b] = true } # a-z + (65..90).each { |b| t[b] = true } # A-Z + (48..57).each { |b| t[b] = true } # 0-9 + t[95] = true # _ + t[45] = true # - + end.freeze + + # [0-9] — ASCII digit + DIGIT = Array.new(256, false).tap do |t| + (48..57).each { |b| t[b] = true } + end.freeze + + # [ \t\n\v\f\r] — ASCII whitespace (mirrors Ruby's \s) + WHITESPACE = Array.new(256, false).tap do |t| + [32, 9, 10, 11, 12, 13].each { |b| t[b] = true } # space, tab, \n, \v, \f, \r + end.freeze + end +end diff --git a/lib/liquid/condition.rb b/lib/liquid/condition.rb index 9d55c42b3..bf8b94093 100644 --- a/lib/liquid/condition.rb +++ b/lib/liquid/condition.rb @@ -65,20 +65,21 @@ def initialize(left = nil, operator = nil, right = nil) end def evaluate(context = deprecated_default_context) - condition = self - result = nil - loop do - result = interpret_condition(condition.left, condition.right, condition.operator, context) + result = interpret_condition(@left, @right, @operator, context) + + # Fast path: no child conditions (most common) + return result unless @child_relation + condition = self + while condition.child_relation case condition.child_relation when :or break if Liquid::Utils.to_liquid_value(result) when :and break unless Liquid::Utils.to_liquid_value(result) - else - break end condition = condition.child_condition + result = interpret_condition(condition.left, condition.right, condition.operator, context) end result end diff --git a/lib/liquid/context.rb b/lib/liquid/context.rb index 433b6d003..766719099 100644 --- a/lib/liquid/context.rb +++ b/lib/liquid/context.rb @@ -24,10 +24,15 @@ def self.build(environment: Environment.default, environments: {}, outer_scope: def initialize(environments = {}, outer_scope = {}, registers = {}, rethrow_errors = false, resource_limits = nil, static_environments = {}, environment = Environment.default) @environment = environment - @environments = [environments] - @environments.flatten! + @environments = environments.is_a?(Array) ? environments : [environments] - @static_environments = [static_environments].flatten(1).freeze + @static_environments = if static_environments.is_a?(Array) + static_environments.frozen? ? static_environments : static_environments.freeze + elsif static_environments.empty? + Const::EMPTY_ARRAY + else + [static_environments].freeze + end @scopes = [outer_scope || {}] @registers = registers.is_a?(Registers) ? registers : Registers.new(registers) @errors = [] @@ -35,14 +40,13 @@ def initialize(environments = {}, outer_scope = {}, registers = {}, rethrow_erro @strict_variables = false @resource_limits = resource_limits || ResourceLimits.new(environment.default_resource_limits) @base_scope_depth = 0 - @interrupts = [] - @filters = [] + @interrupts = Const::EMPTY_ARRAY + @filters = Const::EMPTY_ARRAY @global_filter = nil - @disabled_tags = {} + @disabled_tags = Const::EMPTY_HASH - # Instead of constructing new StringScanner objects for each Expression parse, - # we recycle the same one. - @string_scanner = StringScanner.new("") + # Lazy-init StringScanner — only needed if Context#[] is called during render + @string_scanner = nil @registers.static[:cached_partials] ||= {} @registers.static[:file_system] ||= environment.file_system @@ -73,7 +77,7 @@ def strainer # Note that this does not register the filters with the main Template object. see Template.register_filter # for that def add_filters(filters) - filters = [filters].flatten.compact + filters = Array(filters).flatten.compact @filters += filters @strainer = nil end @@ -84,11 +88,12 @@ def apply_global_filter(obj) # are there any not handled interrupts? def interrupt? - !@interrupts.empty? + !@interrupts.equal?(Const::EMPTY_ARRAY) && @interrupts.any? end # push an interrupt to the stack. this interrupt is considered not handled. def push_interrupt(e) + @interrupts = [] if @interrupts.frozen? @interrupts.push(e) end @@ -109,6 +114,20 @@ def invoke(method, *args) strainer.invoke(method, *args).to_liquid end + # Arity-specialized filter delegation — generated to match StrainerTemplate's specializations. + # The pattern (avoid *args splat) is the same for each arity; generating makes it explicit. + { + invoke_single: ['input'], + invoke_two: ['input', 'arg1'], + }.each do |method_name, params| + all_params = (["method"] + params).join(", ") + module_eval(<<~RUBY, __FILE__, __LINE__ + 1) + def #{method_name}(#{all_params}) + strainer.#{method_name}(#{all_params}).to_liquid + end + RUBY + end + # Push new local scope on the stack. use Context#stack instead def push(new_scope = {}) @scopes.unshift(new_scope) @@ -180,11 +199,11 @@ def []=(key, value) # Example: # products == empty #=> products.empty? def [](expression) - evaluate(Expression.parse(expression, @string_scanner)) + evaluate(Expression.parse(expression, @string_scanner ||= StringScanner.new(""))) end def key?(key) - find_variable(key, raise_on_not_found: false) != nil + !find_variable(key, raise_on_not_found: false).nil? end def evaluate(object) @@ -193,22 +212,38 @@ def evaluate(object) # Fetches an object starting at the local scope and then moving up the hierachy def find_variable(key, raise_on_not_found: true) - # This was changed from find() to find_index() because this is a very hot - # path and find_index() is optimized in MRI to reduce object allocation - index = @scopes.find_index { |s| s.key?(key) } - - variable = if index - lookup_and_evaluate(@scopes[index], key, raise_on_not_found: raise_on_not_found) + # Fast path: check top scope first (most common in for loops) + scope = @scopes[0] + if scope.key?(key) + variable = lookup_and_evaluate(scope, key, raise_on_not_found: raise_on_not_found) + elsif @scopes.length == 1 + # Only one scope and key not found — go straight to environments + variable = try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found) else - try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found) + # Multiple scopes — search through all of them + scope = @scopes.find { |s| s.key?(key) } + + variable = if scope + lookup_and_evaluate(scope, key, raise_on_not_found: raise_on_not_found) + else + try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found) + end end # update variable's context before invoking #to_liquid + # Fast path: primitive types don't need context= or to_liquid conversion + case variable + when String, Integer, Float, NilClass, TrueClass, FalseClass, Array, Hash, Time + return variable + end + variable.context = self if variable.respond_to?(:context=) liquid_variable = variable.to_liquid - liquid_variable.context = self if variable != liquid_variable && liquid_variable.respond_to?(:context=) + if variable != liquid_variable + liquid_variable.context = self if liquid_variable.respond_to?(:context=) + end liquid_variable end @@ -228,6 +263,7 @@ def lookup_and_evaluate(obj, key, raise_on_not_found: true) end def with_disabled_tags(tag_names) + @disabled_tags = {} if @disabled_tags.frozen? tag_names.each do |name| @disabled_tags[name] = @disabled_tags.fetch(name, 0) + 1 end @@ -251,17 +287,16 @@ def tag_disabled?(tag_name) attr_reader :base_scope_depth def try_variable_find_in_environments(key, raise_on_not_found:) - @environments.each do |environment| - found_variable = lookup_and_evaluate(environment, key, raise_on_not_found: raise_on_not_found) - if !found_variable.nil? || @strict_variables && raise_on_not_found - return found_variable - end - end - @static_environments.each do |environment| + found = find_in_envs(@environments, key, raise_on_not_found: raise_on_not_found) + return found unless found.nil? && !(@strict_variables && raise_on_not_found) + + find_in_envs(@static_environments, key, raise_on_not_found: raise_on_not_found) + end + + def find_in_envs(envs, key, raise_on_not_found:) + envs.each do |environment| found_variable = lookup_and_evaluate(environment, key, raise_on_not_found: raise_on_not_found) - if !found_variable.nil? || @strict_variables && raise_on_not_found - return found_variable - end + return found_variable if !found_variable.nil? || (@strict_variables && raise_on_not_found) end nil end diff --git a/lib/liquid/cursor.rb b/lib/liquid/cursor.rb new file mode 100644 index 000000000..c957e381b --- /dev/null +++ b/lib/liquid/cursor.rb @@ -0,0 +1,317 @@ +# frozen_string_literal: true + +require "strscan" + +module Liquid + # Single-pass forward-only scanner for Liquid parsing. + # Wraps StringScanner with higher-level methods for common Liquid constructs. + # One Cursor per template parse — threaded through all parsing code. + class Cursor + # Byte constants + SPACE = 32 + TAB = 9 + NL = 10 + CR = 13 + FF = 12 + DASH = 45 # '-' + DOT = 46 # '.' + COLON = 58 # ':' + PIPE = 124 # '|' + QUOTE_S = 39 # "'" + QUOTE_D = 34 # '"' + LBRACK = 91 # '[' + RBRACK = 93 # ']' + LPAREN = 40 # '(' + RPAREN = 41 # ')' + QMARK = 63 # '?' + HASH = 35 # '#' + USCORE = 95 # '_' + COMMA = 44 + ZERO = 48 + NINE = 57 + PCT = 37 # '%' + LCURLY = 123 # '{' + RCURLY = 125 # '}' + + attr_reader :ss + + def initialize(source) + @source = source + @ss = StringScanner.new(source) + end + + # ── Position ──────────────────────────────────────────────────── + def pos = @ss.pos + + def pos=(n) + @ss.pos = n + end + + def eos? = @ss.eos? + def peek_byte = @ss.peek_byte + def scan_byte = @ss.scan_byte + + # Reset scanner to a new string (for reuse on sub-markup) + def reset(source) + @source = source + @ss.string = source + end + + # Extract a slice from the source (deferred allocation) + def slice(start, len) + @source.byteslice(start, len) + end + + # ── Whitespace ────────────────────────────────────────────────── + # Skip spaces/tabs/newlines/cr + def skip_ws + while (b = @ss.peek_byte) + case b + when SPACE, TAB, CR, FF, NL then @ss.scan_byte + else break + end + end + end + + # Check if remaining bytes are all whitespace (or EOS). + # exist?(/\S/) returns nil when no non-whitespace remains, without advancing position. + def rest_blank? + !@ss.exist?(/\S/) + end + + # Regex for identifier: [a-zA-Z_][\w-]*\?? + ID_REGEX = /[a-zA-Z_][\w-]*\??/ + + # ── Identifiers ───────────────────────────────────────────────── + # Skip an identifier without allocating a string. Returns length skipped, or 0. + def skip_id + @ss.skip(ID_REGEX) || 0 + end + + # Check if next id matches expected string, consume if so. No allocation. + def expect_id(expected) + start = @ss.pos + len = @ss.skip(ID_REGEX) + if len == expected.bytesize + # Compare bytes directly without allocating a string + i = 0 + while i < len + unless @source.getbyte(start + i) == expected.getbyte(i) + @ss.pos = start + return false + end + i += 1 + end + return true + end + @ss.pos = start if len + false + end + + # Scan a single identifier: [a-zA-Z_][\w-]*\?? + # Returns the string or nil if not at an identifier + def scan_id + @ss.scan(ID_REGEX) + end + + # Scan a tag name: '#' or \w+ + def scan_tag_name + if @ss.peek_byte == HASH + @ss.scan_byte + "#" + else + scan_id + end + end + + # Regex for numbers: -?\d+(\.\d+)? + FLOAT_REGEX = /-?\d+\.\d+/ + INT_REGEX = /-?\d+/ + + # ── Numbers ───────────────────────────────────────────────────── + # Try to scan an integer or float. Returns the number or nil. + def scan_number + if (s = @ss.scan(FLOAT_REGEX)) + s.to_f + elsif (s = @ss.scan(INT_REGEX)) + s.to_i + end + end + + # Regex for quoted string content (without quotes) + SINGLE_QUOTED_CONTENT = /'([^']*)'/ + DOUBLE_QUOTED_CONTENT = /"([^"]*)"/ + + # ── Strings ───────────────────────────────────────────────────── + # Scan a quoted string ('...' or "..."). Returns the content without quotes, or nil. + def scan_quoted_string + if @ss.scan(SINGLE_QUOTED_CONTENT) || @ss.scan(DOUBLE_QUOTED_CONTENT) + @ss[1] + end + end + + # Regex for quoted strings (single or double quoted, including quotes) + QUOTED_STRING_RAW = /"[^"]*"|'[^']*'/ + + # Scan a quoted string including quotes. Returns the full "..." or '...' string, or nil. + def scan_quoted_string_raw + @ss.scan(QUOTED_STRING_RAW) + end + + # Regex for dotted identifier: name(.name)* + DOTTED_ID_REGEX = /[a-zA-Z_][\w-]*\??(?:\.[a-zA-Z_][\w-]*\??)*/ + + # ── Expressions ───────────────────────────────────────────────── + # Scan a simple variable lookup: name(.name)* — no brackets, no filters + # Returns the string or nil + def scan_dotted_id + @ss.scan(DOTTED_ID_REGEX) + end + + # Skip a fragment without allocating. Returns length skipped, or 0. + def skip_fragment + @ss.skip(QUOTED_STRING_RAW) || @ss.skip(UNQUOTED_FRAGMENT) || 0 + end + + # Regex for unquoted fragment: non-whitespace/comma/pipe sequence + UNQUOTED_FRAGMENT = /[^\s,|]+/ + + # Scan a "QuotedFragment" — a quoted string or non-whitespace/comma/pipe run + def scan_fragment + @ss.scan(QUOTED_STRING_RAW) || @ss.scan(UNQUOTED_FRAGMENT) + end + + # ── Comparison operators ──────────────────────────────────────── + # Identity map used for frozen string interning: StringScanner#scan returns a + # new unfrozen String on every call. Indexing into this hash returns the frozen + # literal stored here, avoiding a separate allocation and enabling faster + # equality checks downstream (frozen strings can be compared by identity). + COMPARISON_OPS = { + '==' => '==', + '!=' => '!=', + '<>' => '<>', + '<=' => '<=', + '>=' => '>=', + '<' => '<', + '>' => '>', + 'contains' => 'contains', + }.freeze + + # Scan a comparison operator. Returns frozen string or nil. + # Regex for comparison operators + COMPARISON_OP_REGEX = /==|!=|<>|<=|>=|<|>|contains(?!\w)/ + + def scan_comparison_op + if (op = @ss.scan(COMPARISON_OP_REGEX)) + COMPARISON_OPS[op] + end + end + + # ── Tag parsing helpers ───────────────────────────────────────── + # Results from last parse_tag_token call (avoids array allocation) + attr_reader :tag_markup, :tag_newlines + + # Parse the interior of a tag token: "{%[-] tag_name markup [-]%}" + # Pure byte operations — avoids StringScanner reset overhead. + # Returns tag_name string or nil. Sets tag_markup and tag_newlines. + def parse_tag_token(token) + len = token.bytesize + pos = 2 # skip "{%" + pos += 1 if token.getbyte(pos) == DASH # skip '-' + nl = 0 + + # Skip whitespace, count newlines + while pos < len + b = token.getbyte(pos) + case b + when SPACE, TAB, CR, FF then pos += 1 + when NL then pos += 1 + nl += 1 + else break + end + end + + # Scan tag name: '#' or [a-zA-Z_][\w-]* + name_start = pos + b = token.getbyte(pos) + if b == HASH + pos += 1 + elsif b && ByteTables::IDENT_START[b] + pos += 1 + while pos < len + b = token.getbyte(pos) + break unless ByteTables::IDENT_CONT[b] + + pos += 1 + end + pos += 1 if pos < len && token.getbyte(pos) == QMARK + else + return + end + tag_name = token.byteslice(name_start, pos - name_start) + + # Skip whitespace after tag name, count newlines + while pos < len + b = token.getbyte(pos) + case b + when SPACE, TAB, CR, FF then pos += 1 + when NL then pos += 1 + nl += 1 + else break + end + end + + # markup is everything up to optional '-' before '%}' + markup_end = len - 2 + markup_end -= 1 if markup_end > pos && token.getbyte(markup_end - 1) == DASH + @tag_markup = pos >= markup_end ? "" : token.byteslice(pos, markup_end - pos) + @tag_newlines = nl + + tag_name + end + + # Parse variable token interior: extract markup from "{{[-] ... [-]}}" + def parse_variable_token(token) + len = token.bytesize + return if len < 4 + + i = 2 + i = 3 if token.getbyte(i) == DASH + parse_end = len - 3 + parse_end -= 1 if token.getbyte(parse_end) == DASH + markup_len = parse_end - i + 1 + markup_len <= 0 ? "" : token.byteslice(i, markup_len) + end + + # ── Simple condition parser ───────────────────────────────────── + # Results from last parse_simple_condition call + attr_reader :cond_left, :cond_op, :cond_right + + # Parse "expr [op expr]" from current position to end. + # Returns true on success, nil on failure. Sets cond_left, cond_op, cond_right. + def parse_simple_condition + skip_ws + @cond_left = scan_fragment + return unless @cond_left + + skip_ws + if eos? + @cond_op = nil + @cond_right = nil + return true + end + + @cond_op = scan_comparison_op + return unless @cond_op + + skip_ws + @cond_right = scan_fragment + return unless @cond_right + + skip_ws + return unless eos? # trailing junk + + true + end + end +end diff --git a/lib/liquid/expression.rb b/lib/liquid/expression.rb index 00c40a4c3..466c021e6 100644 --- a/lib/liquid/expression.rb +++ b/lib/liquid/expression.rb @@ -16,16 +16,9 @@ class Expression '-' => VariableLookup.parse("-", nil).freeze, }.freeze - DOT = ".".ord - ZERO = "0".ord - NINE = "9".ord - DASH = "-".ord - # Use an atomic group (?>...) to avoid pathological backtracing from # malicious input as described in https://github.com/Shopify/liquid/issues/1357 RANGES_REGEX = /\A\(\s*(?>(\S+)\s*\.\.)\s*(\S+)\s*\)\z/ - INTEGER_REGEX = /\A(-?\d+)\z/ - FLOAT_REGEX = /\A(-?\d+)\.\d+\z/ class << self def safe_parse(parser, ss = StringScanner.new(""), cache = nil) @@ -35,11 +28,17 @@ def safe_parse(parser, ss = StringScanner.new(""), cache = nil) def parse(markup, ss = StringScanner.new(""), cache = nil) return unless markup - markup = markup.strip # markup can be a frozen string + # Only strip if there's leading/trailing whitespace (avoids allocation) + first_byte = markup.getbyte(0) + if first_byte && ByteTables::WHITESPACE[first_byte] + markup = markup.strip + elsif first_byte + markup = markup.strip if ByteTables::WHITESPACE[markup.getbyte(markup.bytesize - 1)] + end if (markup.start_with?('"') && markup.end_with?('"')) || (markup.start_with?("'") && markup.end_with?("'")) - return markup[1..-2] + return markup.byteslice(1, markup.bytesize - 2) elsif LITERALS.key?(markup) return LITERALS[markup] end @@ -71,57 +70,85 @@ def inner_parse(markup, ss, cache) end end - def parse_number(markup, ss) - # check if the markup is simple integer or float - case markup - when INTEGER_REGEX - return Integer(markup, 10) - when FLOAT_REGEX - return markup.to_f - end - - ss.string = markup - # the first byte must be a digit or a dash - byte = ss.scan_byte + def parse_number(markup, _ss = nil) + len = markup.bytesize + return if len == 0 - return false if byte != DASH && (byte < ZERO || byte > NINE) + # Quick reject: first byte must be digit or dash + pos = 0 + first = markup.getbyte(pos) + if first == Cursor::DASH + pos += 1 + return if pos >= len - if byte == DASH - peek_byte = ss.peek_byte + b = markup.getbyte(pos) + return unless ByteTables::DIGIT[b] - # if it starts with a dash, the next byte must be a digit - return false if peek_byte.nil? || !(peek_byte >= ZERO && peek_byte <= NINE) + pos += 1 + elsif ByteTables::DIGIT[first] + pos += 1 + else + return end - # The markup could be a float with multiple dots - first_dot_pos = nil - num_end_pos = nil + # Scan digits + while pos < len + b = markup.getbyte(pos) + break unless ByteTables::DIGIT[b] - while (byte = ss.scan_byte) - return false if byte != DOT && (byte < ZERO || byte > NINE) + pos += 1 + end - # we found our number and now we are just scanning the rest of the string - next if num_end_pos + # If we consumed everything, it's a simple integer + if pos == len + return Integer(markup, 10) + end + + # Check for dot (float) + if markup.getbyte(pos) == Cursor::DOT + dot_pos = pos + pos += 1 + # Must have at least one digit after dot + digit_after_dot = pos + while pos < len + b = markup.getbyte(pos) + break unless ByteTables::DIGIT[b] + + pos += 1 + end - if byte == DOT - if first_dot_pos.nil? - first_dot_pos = ss.pos - else - # we found another dot, so we know that the number ends here - num_end_pos = ss.pos - 1 - end + if pos > digit_after_dot && pos == len + # Simple float like "123.456" + return markup.to_f + elsif pos > digit_after_dot + # Float followed by more content: "1.2.3.4" — scan to find where the + # numeric portion ends (stop at next dot or non-digit). + return scan_float_with_trailing(markup, pos, len) + else + # dot at end: "123." + return markup.byteslice(0, dot_pos).to_f end end - num_end_pos = markup.length if ss.eos? + # Not a number (has non-digit, non-dot characters) + nil + end - if num_end_pos - # number ends with a number "123.123" - markup.byteslice(0, num_end_pos).to_f - else - # number ends with a dot "123." - markup.byteslice(0, first_dot_pos).to_f + private + + # Scans forward from `pos` through digits, returning the float up to the + # next dot or the end of string. Returns nil when a non-digit, non-dot + # byte is found (not a valid number). Used by parse_number for inputs + # like "1.2.3.4" where the float literal ends at the second dot. + def scan_float_with_trailing(markup, pos, len) + while pos < len + b = markup.getbyte(pos) + return markup.byteslice(0, pos).to_f if b == Cursor::DOT + return unless ByteTables::DIGIT[b] + + pos += 1 end + markup.byteslice(0, pos).to_f end end end diff --git a/lib/liquid/lexer.rb b/lib/liquid/lexer.rb index f1740dbad..dfcdb5587 100644 --- a/lib/liquid/lexer.rb +++ b/lib/liquid/lexer.rb @@ -29,6 +29,7 @@ class Lexer RUBY_WHITESPACE = [" ", "\t", "\r", "\n", "\f"].freeze SINGLE_STRING_LITERAL = /'[^\']*'/ WHITESPACE_OR_NOTHING = /\s*/ + WHITESPACE = /\s+/ SINGLE_COMPARISON_TOKENS = [].tap do |table| table["<".ord] = COMPARISON_LESS_THAN @@ -104,7 +105,7 @@ def tokenize(ss) output = [] until ss.eos? - ss.skip(WHITESPACE_OR_NOTHING) + ss.skip(WHITESPACE) break if ss.eos? @@ -114,10 +115,10 @@ def tokenize(ss) if (special = SPECIAL_TABLE[peeked]) ss.scan_byte # Special case for ".." - if special == DOT && ss.peek_byte == DOT_ORD + if special.equal?(DOT) && ss.peek_byte == DOT_ORD ss.scan_byte output << DOTDOT - elsif special == DASH + elsif special.equal?(DASH) # Special case for negative numbers if (peeked_byte = ss.peek_byte) && NUMBER_TABLE[peeked_byte] ss.pos -= 1 diff --git a/lib/liquid/parse_context.rb b/lib/liquid/parse_context.rb index 855acc64e..d736319ec 100644 --- a/lib/liquid/parse_context.rb +++ b/lib/liquid/parse_context.rb @@ -3,7 +3,7 @@ module Liquid class ParseContext attr_accessor :locale, :line_number, :trim_whitespace, :depth - attr_reader :partial, :warnings, :error_mode, :environment + attr_reader :partial, :warnings, :error_mode, :environment, :expression_cache, :string_scanner, :cursor def initialize(options = Const::EMPTY_HASH) @environment = options.fetch(:environment, Environment.default) @@ -24,6 +24,8 @@ def initialize(options = Const::EMPTY_HASH) {} end + @cursor = Cursor.new("") + self.depth = 0 self.partial = false end diff --git a/lib/liquid/parser.rb b/lib/liquid/parser.rb index 645dfa3a1..0d0d0d019 100644 --- a/lib/liquid/parser.rb +++ b/lib/liquid/parser.rb @@ -83,6 +83,9 @@ def argument end def variable_lookups + # Fast path: no lookups at all (most common case for simple identifiers) + return "" unless look(:dot) || look(:open_square) + str = +"" loop do if look(:open_square) diff --git a/lib/liquid/registers.rb b/lib/liquid/registers.rb index 0b65d862c..88562c88c 100644 --- a/lib/liquid/registers.rb +++ b/lib/liquid/registers.rb @@ -6,15 +6,15 @@ class Registers def initialize(registers = {}) @static = registers.is_a?(Registers) ? registers.static : registers - @changes = {} + @changes = nil end def []=(key, value) - @changes[key] = value + (@changes ||= {})[key] = value end def [](key) - if @changes.key?(key) + if @changes&.key?(key) @changes[key] else @static[key] @@ -22,13 +22,13 @@ def [](key) end def delete(key) - @changes.delete(key) + @changes&.delete(key) end UNDEFINED = Object.new def fetch(key, default = UNDEFINED, &block) - if @changes.key?(key) + if @changes&.key?(key) @changes.fetch(key) elsif default != UNDEFINED if block_given? @@ -42,7 +42,7 @@ def fetch(key, default = UNDEFINED, &block) end def key?(key) - @changes.key?(key) || @static.key?(key) + @changes&.key?(key) || @static.key?(key) end end diff --git a/lib/liquid/resource_limits.rb b/lib/liquid/resource_limits.rb index ee0c66cbb..3b4fc93a4 100644 --- a/lib/liquid/resource_limits.rb +++ b/lib/liquid/resource_limits.rb @@ -9,6 +9,7 @@ class ResourceLimits :cumulative_assign_score_limit attr_reader :render_score, :assign_score, + :last_capture_length, :cumulative_render_score, :cumulative_assign_score diff --git a/lib/liquid/standardfilters.rb b/lib/liquid/standardfilters.rb index ed6141566..d22c6b024 100644 --- a/lib/liquid/standardfilters.rb +++ b/lib/liquid/standardfilters.rb @@ -275,18 +275,71 @@ def truncatewords(input, words = 15, truncate_string = "...") words = Utils.to_integer(words) words = 1 if words <= 0 - wordlist = begin - input.split(" ", words + 1) - rescue RangeError - # integer too big for String#split, but we can semantically assume no truncation is needed - return input if words + 1 > MAX_I32 - raise # unexpected error + return input if words + 1 > MAX_I32 + + # Scan words tracking byte positions; build the normalized (single-space) + # result string only when truncation is actually needed. + len = input.bytesize + pos = 0 + word_count = 0 + # Flat array of [start, end, start, end, ...] for up to `words` words. + # Avoids allocating a result string in the common no-truncation case. + positions = [] + + # Skip leading whitespace + while pos < len + break unless ByteTables::WHITESPACE[input.getbyte(pos)] + pos += 1 + end + + while pos < len + word_start = pos + word_count += 1 + + # Scan to end of word + while pos < len + break if ByteTables::WHITESPACE[input.getbyte(pos)] + pos += 1 + end + + if word_count <= words + positions.push(word_start, pos) # [start, end, start, end, ...] + else + # Truncation confirmed — build normalized result from stored positions + result = +input.byteslice(positions[0], positions[1] - positions[0]) + i = 2 + while i < positions.length + result << " " << input.byteslice(positions[i], positions[i + 1] - positions[i]) + i += 2 + end + return result << Utils.to_s(truncate_string) + end + + # Skip whitespace between words + while pos < len + break unless ByteTables::WHITESPACE[input.getbyte(pos)] + pos += 1 + end + end + + # Fewer words than requested — no truncation needed, return original unchanged. + return input if word_count < words + + # Exactly `words` words. Ruby's split(" ", words+1) would produce a words+1-th + # empty element when input has trailing whitespace, triggering the truncation path. + # Match that behaviour: if the input ends with whitespace, normalize and append + # truncate_string even though no word was cut. + if len > 0 && ByteTables::WHITESPACE[input.getbyte(len - 1)] + result = +input.byteslice(positions[0], positions[1] - positions[0]) + i = 2 + while i < positions.length + result << " " << input.byteslice(positions[i], positions[i + 1] - positions[i]) + i += 2 + end + return result << Utils.to_s(truncate_string) end - return input if wordlist.length <= words - wordlist.pop - truncate_string = Utils.to_s(truncate_string) - wordlist.join(" ").concat(truncate_string) + input end # @liquid_public_docs diff --git a/lib/liquid/strainer_template.rb b/lib/liquid/strainer_template.rb index ca0626dda..67160d4cf 100644 --- a/lib/liquid/strainer_template.rb +++ b/lib/liquid/strainer_template.rb @@ -58,5 +58,31 @@ def invoke(method, *args) rescue ::ArgumentError => e raise Liquid::ArgumentError, e.message, e.backtrace end + + # Arity-specialized filter invocation. + # Avoids *args splat allocation for the common 0-arg and 1-arg cases. + # `invoke` (general case) still uses *args for 2+ extra arguments. + { + invoke_single: ['input'], + invoke_two: ['input', 'arg1'], + }.each do |method_name, params| + all_params = (["method"] + params).join(", ") + send_params = params.join(", ") + # __LINE__ + 1 is a parse-time constant; both generated methods will report + # the same file:line in backtraces. The method name in the trace distinguishes them. + module_eval(<<~RUBY, __FILE__, __LINE__ + 1) + def #{method_name}(#{all_params}) + if self.class.invokable?(method) + send(method, #{send_params}) + elsif @context.strict_filters + raise Liquid::UndefinedFilter, "undefined filter \#{method}" + else + input + end + rescue ::ArgumentError => e + raise Liquid::ArgumentError, e.message, e.backtrace + end + RUBY + end end end diff --git a/lib/liquid/tags/for.rb b/lib/liquid/tags/for.rb index cbea85bcb..2ed7d186d 100644 --- a/lib/liquid/tags/for.rb +++ b/lib/liquid/tags/for.rb @@ -25,8 +25,6 @@ module Liquid # @liquid_optional_param range [untyped] A custom numeric range to iterate over. # @liquid_optional_param reversed [untyped] Iterate in reverse order. class For < Block - Syntax = /\A(#{VariableSegment}+)\s+in\s+(#{QuotedFragment}+)\s*(reversed)?/o - attr_reader :collection_name, :variable_name, :limit, :from def initialize(tag_name, markup, options) @@ -72,18 +70,52 @@ def render_to_output_buffer(context, output) protected + # Fast byte-level parser for "var in collection [reversed] [limit:N] [offset:N]" def lax_parse(markup) - if markup =~ Syntax - @variable_name = Regexp.last_match(1) - collection_name = Regexp.last_match(2) - @reversed = !!Regexp.last_match(3) - @name = "#{@variable_name}-#{collection_name}" - @collection_name = parse_expression(collection_name) - markup.scan(TagAttributes) do |key, value| - set_attribute(key, value) + c = @parse_context.cursor + c.reset(markup) + c.skip_ws + + # Parse variable name + var_start = c.pos + var_len = c.skip_id + raise SyntaxError, options[:locale].t("errors.syntax.for") if var_len == 0 + @variable_name = c.slice(var_start, var_len) + + # Expect "in" + c.skip_ws + raise SyntaxError, options[:locale].t("errors.syntax.for") unless c.expect_id("in") + c.skip_ws + + # Parse collection name + col_start = c.pos + if c.peek_byte == Cursor::LPAREN + # Parenthesized range: (1..10) + depth = 1 + c.scan_byte + while !c.eos? && depth > 0 + b = c.scan_byte + depth += 1 if b == Cursor::LPAREN + depth -= 1 if b == Cursor::RPAREN end else - raise SyntaxError, options[:locale].t("errors.syntax.for") + c.skip_fragment + end + collection_name = c.slice(col_start, c.pos - col_start) + + @name = "#{@variable_name}-#{collection_name}" + @collection_name = parse_expression(collection_name) + + c.skip_ws + @reversed = c.expect_id("reversed") + c.skip_ws + + # Parse limit:/offset: if present. + # Cursor doesn't handle key:value attributes — delegate to regex for limit:/offset:. + if !c.eos? && (rest = c.slice(c.pos, markup.bytesize - c.pos)).include?(':') + rest.scan(TagAttributes) do |key, value| + set_attribute(key, value) + end end end @@ -111,9 +143,7 @@ def strict_parse(markup) private - def strict2_parse(markup) - strict_parse(markup) - end + alias_method :strict2_parse, :strict_parse def collection_segment(context) offsets = context.registers[:for] ||= {} @@ -122,22 +152,14 @@ def collection_segment(context) offsets[@name].to_i else from_value = context.evaluate(@from) - if from_value.nil? - 0 - else - Utils.to_integer(from_value) - end + from_value.nil? ? 0 : Utils.to_integer(from_value) end collection = context.evaluate(@collection_name) collection = collection.to_a if collection.is_a?(Range) limit_value = context.evaluate(@limit) - to = if limit_value.nil? - nil - else - Utils.to_integer(limit_value) + from - end + to = limit_value && (Utils.to_integer(limit_value) + from) segment = Utils.slice_collection(collection, from, to) segment.reverse! if @reversed @@ -192,11 +214,7 @@ def set_attribute(key, expr, safe: false) end def render_else(context, output) - if @else_block - @else_block.render_to_output_buffer(context, output) - else - output - end + @else_block ? @else_block.render_to_output_buffer(context, output) : output end class ParseTreeVisitor < Liquid::ParseTreeVisitor diff --git a/lib/liquid/tags/if.rb b/lib/liquid/tags/if.rb index c423c1e84..cc77161ec 100644 --- a/lib/liquid/tags/if.rb +++ b/lib/liquid/tags/if.rb @@ -51,14 +51,17 @@ def unknown_tag(tag, markup, tokens) end def render_to_output_buffer(context, output) - @blocks.each do |block| - result = Liquid::Utils.to_liquid_value( - block.evaluate(context), - ) + idx = 0 + blocks = @blocks + while idx < blocks.length + block = blocks[idx] + result = block.evaluate(context) + result = result.to_liquid_value if result.respond_to?(:to_liquid_value) if result return block.attachment.render_to_output_buffer(context, output) end + idx += 1 end output @@ -86,6 +89,27 @@ def parse_expression(markup, safe: false) end def lax_parse(markup) + # Fastest path: simple identifier truthiness like "product.available" or "forloop.first" + if (simple = Variable.simple_variable_markup(markup)) + return Condition.new(parse_expression(simple)) + end + + # Fast path: simple condition without and/or — use Cursor. + # The include? pre-checks are both a correctness guard (parse_simple_condition + # only handles a single comparison) and a perf gate (avoids cursor allocation + # for the compound-condition case that will always fall through to lax_parse). + if !markup.include?(' and ') && !markup.include?(' or ') + cursor = @parse_context.cursor + cursor.reset(markup) + if cursor.parse_simple_condition + return Condition.new( + parse_expression(cursor.cond_left), + cursor.cond_op, + cursor.cond_right ? parse_expression(cursor.cond_right) : nil, + ) + end + end + expressions = markup.scan(ExpressionsAndOperators) raise SyntaxError, options[:locale].t("errors.syntax.if") unless expressions.pop =~ Syntax diff --git a/lib/liquid/tokenizer.rb b/lib/liquid/tokenizer.rb index 8b331d93c..ba3e0da01 100644 --- a/lib/liquid/tokenizer.rb +++ b/lib/liquid/tokenizer.rb @@ -1,37 +1,23 @@ # frozen_string_literal: true -require "strscan" - module Liquid class Tokenizer attr_reader :line_number, :for_liquid_tag - TAG_END = /%\}/ - TAG_OR_VARIABLE_START = /\{[\{\%]/ - NEWLINE = /\n/ - - OPEN_CURLEY = "{".ord - CLOSE_CURLEY = "}".ord - PERCENTAGE = "%".ord - def initialize( source:, - string_scanner:, + string_scanner: nil, line_numbers: false, line_number: nil, for_liquid_tag: false ) @line_number = line_number || (line_numbers ? 1 : nil) @for_liquid_tag = for_liquid_tag - @source = source.to_s.to_str + @source = source.to_s @offset = 0 @tokens = [] - if @source - @ss = string_scanner - @ss.string = @source - tokenize - end + tokenize end def shift @@ -54,108 +40,113 @@ def tokenize if @for_liquid_tag @tokens = @source.split("\n") else - @tokens << shift_normal until @ss.eos? + tokenize_fast end @source = nil - @ss = nil - end - - def shift_normal - token = next_token - - return unless token - - token - end - - def next_token - # possible states: :text, :tag, :variable - byte_a = @ss.peek_byte - - if byte_a == OPEN_CURLEY - @ss.scan_byte - - byte_b = @ss.peek_byte - - if byte_b == PERCENTAGE - @ss.scan_byte - return next_tag_token - elsif byte_b == OPEN_CURLEY - @ss.scan_byte - return next_variable_token - end - - @ss.pos -= 1 - end - - next_text_token end - def next_text_token - start = @ss.pos - - unless @ss.skip_until(TAG_OR_VARIABLE_START) - token = @ss.rest - @ss.terminate - return token + # Fast tokenizer using String#byteindex instead of StringScanner regex. + # String#byteindex is ~40% faster for finding { delimiters. + def tokenize_fast + src = @source + unless src.valid_encoding? + raise SyntaxError, "Invalid byte sequence in #{src.encoding}" end - pos = @ss.pos -= 2 - @source.byteslice(start, pos - start) - rescue ::ArgumentError => e - if e.message == "invalid byte sequence in #{@ss.string.encoding}" - raise SyntaxError, "Invalid byte sequence in #{@ss.string.encoding}" - else - raise - end - end + len = src.bytesize + pos = 0 - def next_variable_token - start = @ss.pos - 2 + while pos < len + # Find next { which could start a tag or variable + idx = src.byteindex('{', pos) - byte_a = byte_b = @ss.scan_byte - - while byte_b - byte_a = @ss.scan_byte while byte_a && byte_a != CLOSE_CURLEY && byte_a != OPEN_CURLEY - - break unless byte_a - - if @ss.eos? - return byte_a == CLOSE_CURLEY ? @source.byteslice(start, @ss.pos - start) : "{{" + unless idx + # No more tags/variables — rest is text + @tokens << src.byteslice(pos, len - pos) if pos < len + break end - byte_b = @ss.scan_byte - - if byte_a == CLOSE_CURLEY - if byte_b == CLOSE_CURLEY - return @source.byteslice(start, @ss.pos - start) - elsif byte_b != CLOSE_CURLEY - @ss.pos -= 1 - return @source.byteslice(start, @ss.pos - start) + next_byte = idx + 1 < len ? src.getbyte(idx + 1) : nil + + if next_byte == Cursor::PCT # {% + # Emit text before tag + @tokens << src.byteslice(pos, idx - pos) if idx > pos + + # Find %} to close the tag + close = src.byteindex('%}', idx + 2) + if close + @tokens << src.byteslice(idx, close + 2 - idx) + pos = close + 2 + else + # Emit malformed token to propagate a missing-terminator error in the parser + @tokens << "{%" + pos = idx + 2 + end + elsif next_byte == Cursor::LCURLY # {{ + # Emit text before variable, then scan for the closing }}. + @tokens << src.byteslice(pos, idx - pos) if idx > pos + pos = scan_variable_token(src, idx, len) + else + # Lone '{' — not the start of a tag or variable. + # Find the next '{{' or '{%' to know where this text token ends. + # Using two byteindex calls avoids a nested loop and is always O(n). + tag_start = src.byteindex('{%', idx + 1) + var_start = src.byteindex('{{', idx + 1) + next_token = [tag_start, var_start].compact.min + if next_token + @tokens << src.byteslice(pos, next_token - pos) + pos = next_token + else + @tokens << src.byteslice(pos, len - pos) + pos = len end - elsif byte_a == OPEN_CURLEY && byte_b == PERCENTAGE - return next_tag_token_with_start(start) end - - byte_a = byte_b end - - "{{" end - def next_tag_token - start = @ss.pos - 2 - if (len = @ss.skip_until(TAG_END)) - @source.byteslice(start, len + 2) - else - "{%" + # Scans a {{ ... }} variable token starting at `idx` in `src`. + # Emits the token to @tokens and returns the new position after the token. + # Handles }}, single }, and embedded {% ... %} (nested tag inside variable). + private def scan_variable_token(src, idx, len) + # Byte-by-byte scan: find } or {, then inspect the next byte. + scan_pos = idx + 2 + while scan_pos < len + b = src.getbyte(scan_pos) + if b == Cursor::RCURLY # } + if scan_pos + 1 >= len + # } at end of string — emit token up to here + @tokens << src.byteslice(idx, scan_pos + 1 - idx) + return scan_pos + 1 + end + b2 = src.getbyte(scan_pos + 1) + if b2 == Cursor::RCURLY + # Found }} — close variable + @tokens << src.byteslice(idx, scan_pos + 2 - idx) + return scan_pos + 2 + else + # } followed by non-} — emit token up to here (matches original: @ss.pos -= 1) + @tokens << src.byteslice(idx, scan_pos + 1 - idx) + return scan_pos + 1 + end + elsif b == Cursor::LCURLY && scan_pos + 1 < len && src.getbyte(scan_pos + 1) == Cursor::PCT + # Found {% inside {{ — scan to %} and emit as one token + close = src.byteindex('%}', scan_pos + 2) + if close + @tokens << src.byteslice(idx, close + 2 - idx) + return close + 2 + else + @tokens << src.byteslice(idx, len - idx) + return len + end + else + scan_pos += 1 + end end - end - def next_tag_token_with_start(start) - @ss.skip_until(TAG_END) - @source.byteslice(start, @ss.pos - start) + # Reached end without finding }} — malformed + @tokens << "{{" + idx + 2 end end end diff --git a/lib/liquid/utils.rb b/lib/liquid/utils.rb index 084739a21..41b9f621a 100644 --- a/lib/liquid/utils.rb +++ b/lib/liquid/utils.rb @@ -8,6 +8,9 @@ module Utils def self.slice_collection(collection, from, to) if (from != 0 || !to.nil?) && collection.respond_to?(:load_slice) collection.load_slice(from, to) + elsif from == 0 && to.nil? && collection.is_a?(Array) + # Fast path: no offset/limit on an Array — return as-is (avoid copy) + collection else slice_collection_using_each(collection, from, to) end @@ -15,23 +18,17 @@ def self.slice_collection(collection, from, to) def self.slice_collection_using_each(collection, from, to) segments = [] - index = 0 - # Maintains Ruby 1.8.7 String#each behaviour on 1.9 + # String is Enumerable but #each is not defined; handle it as a single-element collection if collection.is_a?(String) return collection.empty? ? [] : [collection] end return [] unless collection.respond_to?(:each) + index = 0 collection.each do |item| - if to && to <= index - break - end - - if from <= index - segments << item - end - + break if to && to <= index + segments << item if from <= index index += 1 end @@ -93,8 +90,14 @@ def self.to_liquid_value(obj) obj end - def self.to_s(obj, seen = {}) + # Cached string representations for common small integers (0-999) + # Avoids repeated Integer#to_s allocations during rendering + SMALL_INT_STRINGS = Array.new(1000) { |i| i.to_s.freeze }.freeze + + def self.to_s(obj, seen = nil) case obj + when Integer + obj >= 0 && obj < 1000 ? SMALL_INT_STRINGS[obj] : obj.to_s when BigDecimal obj.to_s("F") when Hash @@ -102,30 +105,30 @@ def self.to_s(obj, seen = {}) # custom implementation. Otherwise we use Liquid's default # implementation. if obj.class.instance_method(:to_s) == HASH_TO_S_METHOD - hash_inspect(obj, seen) + hash_inspect(obj, seen || {}) else obj.to_s end when Array - array_inspect(obj, seen) + array_inspect(obj, seen || {}) else obj.to_s end end - def self.inspect(obj, seen = {}) + def self.inspect(obj, seen = nil) case obj when Hash # If the custom hash implementation overrides `#inspect`, use their # custom implementation. Otherwise we use Liquid's default # implementation. if obj.class.instance_method(:inspect) == HASH_INSPECT_METHOD - hash_inspect(obj, seen) + hash_inspect(obj, seen || {}) else obj.inspect end when Array - array_inspect(obj, seen) + array_inspect(obj, seen || {}) else obj.inspect end diff --git a/lib/liquid/variable.rb b/lib/liquid/variable.rb index 6b5fb412b..22615db15 100644 --- a/lib/liquid/variable.rb +++ b/lib/liquid/variable.rb @@ -12,6 +12,26 @@ module Liquid # {{ user | link }} # class Variable + # Checks if markup is a simple "name.lookup.chain" with no filters/brackets/quotes. + # Returns the trimmed markup string, or nil if not simple. + def self.simple_variable_markup(markup) + return if markup.empty? + return unless markup.match?(SIMPLE_VARIABLE_RE) + # Avoid allocation when there's no surrounding whitespace (the common case) + first = markup.getbyte(0) + last = markup.getbyte(markup.bytesize - 1) + needs_strip = first == Cursor::SPACE || first == Cursor::TAB || first == Cursor::NL || first == Cursor::CR || + last == Cursor::SPACE || last == Cursor::TAB || last == Cursor::NL || last == Cursor::CR + needs_strip ? markup.strip : markup + end + + # Cache for [filtername, EMPTY_ARRAY] tuples — avoids repeated array creation + NO_ARG_FILTER_CACHE = Hash.new { |h, k| h[k] = [k, Const::EMPTY_ARRAY].freeze } + + # Regex for a simple variable lookup with optional surrounding whitespace. + # Shares the identifier grammar with VariableLookup::SIMPLE_LOOKUP_RE. + SIMPLE_VARIABLE_RE = /\A\s*[\w-]+\??(?:\.[\w-]+\??)*\s*\z/ + FilterMarkupRegex = /#{FilterSeparator}\s*(.*)/om FilterParser = /(?:\s+|#{QuotedFragment}|#{ArgumentSeparator})+/o FilterArgsRegex = /(?:#{FilterArgumentSeparator}|#{ArgumentSeparator})\s*((?:\w+\s*\:\s*)?#{QuotedFragment})/o @@ -30,7 +50,278 @@ def initialize(markup, parse_context) @parse_context = parse_context @line_number = parse_context.line_number - strict_parse_with_error_mode_fallback(markup) + # Fast path: try to parse without going through Lexer → Parser + # Skip for strict2/rigid modes which require different parsing + # Fast path only for lax/warn modes — strict modes need full error checking + error_mode = parse_context.error_mode + if error_mode == :strict2 || error_mode == :rigid || error_mode == :strict || !try_fast_parse(markup, parse_context) + strict_parse_with_error_mode_fallback(markup) + end + end + + private def try_fast_parse(markup, parse_context) + pos = fast_scan_name(markup) + return false unless pos + + # fast_resolve_name calls VariableLookup.parse_simple / Expression::LITERALS — the + # only sites that can raise SyntaxError on malformed input. The byte scanners return + # false instead of raising. + begin + fast_resolve_name(markup, parse_context) + rescue SyntaxError + return false + end + + # End of markup — no filters + if pos >= markup.bytesize + @filters = Const::EMPTY_ARRAY + return true + end + + # Must be followed by a pipe filter separator + return false unless markup.getbyte(pos) == Cursor::PIPE + + fast_scan_filters(markup, pos, parse_context) + end + + # Scan the variable name (quoted string or identifier chain) at the start of markup. + # Returns the position after the name + trailing whitespace, or false on failure. + # Sets @_fast_name_start and @_fast_name_end for fast_resolve_name. + private def fast_scan_name(markup) + len = markup.bytesize + return false if len == 0 + + # Skip leading whitespace + pos = 0 + while pos < len + b = markup.getbyte(pos) + break unless b == Cursor::SPACE || b == Cursor::TAB || b == Cursor::NL || b == Cursor::CR + pos += 1 + end + return false if pos >= len + + b = markup.getbyte(pos) + + if b == Cursor::QUOTE_S || b == Cursor::QUOTE_D + # Quoted string literal: scan to matching close quote + quote = b + @_fast_name_start = pos + pos += 1 + pos += 1 while pos < len && markup.getbyte(pos) != quote + pos += 1 if pos < len # skip closing quote + @_fast_name_end = pos + elsif ByteTables::IDENT_START[b] + # Identifier chain: [a-zA-Z_][a-zA-Z0-9_-]*(.[a-zA-Z_][a-zA-Z0-9_-]*)* + @_fast_name_start = pos + pos += 1 + while pos < len + b = markup.getbyte(pos) + if ByteTables::IDENT_CONT[b] + pos += 1 + elsif b == Cursor::DOT + pos += 1 + return false if pos >= len + b = markup.getbyte(pos) + return false unless ByteTables::IDENT_START[b] + pos += 1 + else + break + end + end + @_fast_name_end = pos + else + return false + end + + # Skip whitespace after name + while pos < len + b = markup.getbyte(pos) + break unless b == Cursor::SPACE || b == Cursor::TAB || b == Cursor::NL || b == Cursor::CR + pos += 1 + end + + pos + end + + # Resolve the scanned name bytes to a Liquid expression object. + # Reads @_fast_name_start / @_fast_name_end set by fast_scan_name. + # Sets @name. May raise SyntaxError (rescued in try_fast_parse). + private def fast_resolve_name(markup, parse_context) + name_start = @_fast_name_start + name_end = @_fast_name_end + len = markup.bytesize + + # Avoid byteslice when the name spans the whole markup (no surrounding whitespace/filters) + expr_markup = name_start == 0 && name_end == len ? markup : markup.byteslice(name_start, name_end - name_start) + + cache = parse_context.expression_cache + ss = parse_context.string_scanner + + first_byte = expr_markup.getbyte(0) + @name = if first_byte == Cursor::QUOTE_S || first_byte == Cursor::QUOTE_D + # String literal — strip enclosing quotes + expr_markup.byteslice(1, expr_markup.bytesize - 2) + elsif Expression::LITERALS.key?(expr_markup) + Expression::LITERALS[expr_markup] + elsif cache + cache[expr_markup] || (cache[expr_markup] = VariableLookup.parse_simple(expr_markup, ss, cache).freeze) + else + VariableLookup.parse_simple(expr_markup, ss || StringScanner.new(""), nil).freeze + end + end + + # Scan the filter chain starting at `pos` (the first '|'). + # Returns true on success (sets @filters), false to fall back to the Lexer. + # Rescues SyntaxError from Expression.parse inside fast_scan_filter_args. + private def fast_scan_filters(markup, pos, parse_context) + len = markup.bytesize + @filters = [] + filter_pos = pos + + while filter_pos < len && markup.getbyte(filter_pos) == Cursor::PIPE + filter_pos += 1 + # Skip spaces after pipe (tabs/newlines handled in the between-filters skip below) + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE + + # Scan filter name: must start with [a-zA-Z_] + fname_start = filter_pos + b = filter_pos < len ? markup.getbyte(filter_pos) : nil + break unless b && ByteTables::IDENT_START[b] + filter_pos += 1 + while filter_pos < len + b = markup.getbyte(filter_pos) + break unless ByteTables::IDENT_CONT[b] + filter_pos += 1 + end + filtername = markup.byteslice(fname_start, filter_pos - fname_start) + + # Skip whitespace after filter name + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE + + if filter_pos < len && markup.getbyte(filter_pos) == Cursor::COLON + # Has arguments — fast-scan positional args; fall to Lexer on keyword args + filter_pos += 1 # skip ':' + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE + + result = fast_scan_filter_args(markup, filter_pos, parse_context) + return fall_to_lexer_filters(markup, pos, fname_start, len, parse_context) if result == :fall_to_lexer + + filter_args, filter_pos = result + @filters << [filtername, filter_args] + else + # No-arg filter — reuse the cached [name, EMPTY_ARRAY] tuple + @filters << NO_ARG_FILTER_CACHE[filtername] + end + + # Skip whitespace (including tabs and newlines) between filters + filter_pos += 1 while filter_pos < len && ( + markup.getbyte(filter_pos) == Cursor::SPACE || + markup.getbyte(filter_pos) == Cursor::TAB || + markup.getbyte(filter_pos) == Cursor::NL || + markup.getbyte(filter_pos) == Cursor::CR + ) + end + + # Trailing bytes that aren't a pipe mean something the fast path doesn't handle + return false if filter_pos < len + + @filters = Const::EMPTY_ARRAY if @filters.empty? + true + rescue SyntaxError + # Expression.parse (called inside fast_scan_filter_args for identifier args) can + # raise SyntaxError on malformed input. Fall back to full Lexer parse. + @name = nil + @filters = nil + false + end + + # Called when fast_scan_filter_args encounters keyword args or an unrecognised + # token. Hands the remaining filter chain (from the pipe before fname_start) + # to the full Lexer-based parser, merges results into @filters, and returns true. + private def fall_to_lexer_filters(markup, pos, fname_start, len, parse_context) + # Walk back from fname_start to find the pipe that opened this filter. + # Equivalent to: markup.rindex('|', fname_start), bounded by pos. + rest_start = fname_start + rest_start -= 1 while rest_start > pos && markup.getbyte(rest_start) != Cursor::PIPE + rest_markup = markup.byteslice(rest_start, len - rest_start) + p = parse_context.new_parser(rest_markup) + while p.consume?(:pipe) + fn = p.consume(:id) + fa = p.consume?(:colon) ? parse_filterargs(p) : Const::EMPTY_ARRAY + @filters << lax_parse_filter_expressions(fn, fa) + end + p.consume(:end_of_string) + @filters = Const::EMPTY_ARRAY if @filters.empty? + true + end + + # Scan positional filter arguments starting at `filter_pos`. + # Returns [filter_args_array, new_filter_pos] on success, or :fall_to_lexer when + # keyword args or unrecognised tokens are encountered. + private def fast_scan_filter_args(markup, filter_pos, parse_context) + len = markup.bytesize + filter_args = [] + + loop do + arg_start = filter_pos + b = filter_pos < len ? markup.getbyte(filter_pos) : nil + + if b == Cursor::QUOTE_S || b == Cursor::QUOTE_D + # Quoted string argument + quote = b + filter_pos += 1 + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) != quote + filter_pos += 1 if filter_pos < len # skip closing quote + filter_args << markup.byteslice(arg_start + 1, filter_pos - arg_start - 2) + + elsif b && (ByteTables::DIGIT[b] || + (b == Cursor::DASH && filter_pos + 1 < len && ByteTables::DIGIT[markup.getbyte(filter_pos + 1)])) + # Numeric argument (integer or float, optionally negative) + filter_pos += 1 if b == Cursor::DASH + filter_pos += 1 while filter_pos < len && ByteTables::DIGIT[markup.getbyte(filter_pos)] + if filter_pos < len && markup.getbyte(filter_pos) == Cursor::DOT # float + filter_pos += 1 + filter_pos += 1 while filter_pos < len && ByteTables::DIGIT[markup.getbyte(filter_pos)] + end + num_str = markup.byteslice(arg_start, filter_pos - arg_start) + filter_args << (num_str.include?('.') ? num_str.to_f : num_str.to_i) + + elsif b && ByteTables::IDENT_START[b] + # Identifier argument — may be a variable lookup or keyword arg + id_start = filter_pos + filter_pos += 1 + while filter_pos < len + b2 = markup.getbyte(filter_pos) + break unless ByteTables::IDENT_CONT[b2] || b2 == Cursor::DOT + filter_pos += 1 + end + filter_pos += 1 if filter_pos < len && markup.getbyte(filter_pos) == Cursor::QMARK + + # Peek past whitespace: if followed by ':', this is a keyword arg → fall to Lexer + kw_check = filter_pos + kw_check += 1 while kw_check < len && markup.getbyte(kw_check) == Cursor::SPACE + return :fall_to_lexer if kw_check < len && markup.getbyte(kw_check) == Cursor::COLON + + id_markup = markup.byteslice(id_start, filter_pos - id_start) + filter_args << Expression.parse(id_markup, parse_context.string_scanner, parse_context.expression_cache) + + else + return :fall_to_lexer + end + + # Skip whitespace after argument + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE + + # Comma: more arguments follow; anything else: done with this filter's args + if filter_pos < len && markup.getbyte(filter_pos) == Cursor::COMMA + filter_pos += 1 + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE + else + break + end + end + + [filter_args, filter_pos] end def raw @@ -42,7 +333,7 @@ def markup_context(markup) end def lax_parse(markup) - @filters = [] + @filters = Const::EMPTY_ARRAY return unless markup =~ MarkupWithQuotedFragment name_markup = Regexp.last_match(1) @@ -54,19 +345,21 @@ def lax_parse(markup) next unless f =~ /\w+/ filtername = Regexp.last_match(0) filterargs = f.scan(FilterArgsRegex).flatten + @filters = [] if @filters.frozen? @filters << lax_parse_filter_expressions(filtername, filterargs) end end end def strict_parse(markup) - @filters = [] + @filters = Const::EMPTY_ARRAY p = @parse_context.new_parser(markup) return if p.look(:end_of_string) @name = parse_context.safe_parse_expression(p) while p.consume?(:pipe) + @filters = [] if @filters.frozen? filtername = p.consume(:id) filterargs = p.consume?(:colon) ? parse_filterargs(p) : Const::EMPTY_ARRAY @filters << lax_parse_filter_expressions(filtername, filterargs) @@ -75,13 +368,16 @@ def strict_parse(markup) end def strict2_parse(markup) - @filters = [] + @filters = Const::EMPTY_ARRAY p = @parse_context.new_parser(markup) return if p.look(:end_of_string) @name = parse_context.safe_parse_expression(p) - @filters << strict2_parse_filter_expressions(p) while p.consume?(:pipe) + while p.consume?(:pipe) + @filters = [] if @filters.frozen? + @filters << strict2_parse_filter_expressions(p) + end p.consume(:end_of_string) end @@ -97,24 +393,37 @@ def render(context) obj = context.evaluate(@name) @filters.each do |filter_name, filter_args, filter_kwargs| - filter_args = evaluate_filter_expressions(context, filter_args, filter_kwargs) - obj = context.invoke(filter_name, obj, *filter_args) + if filter_args.empty? && !filter_kwargs + obj = context.invoke_single(filter_name, obj) + elsif !filter_kwargs && filter_args.length == 1 + # Single positional arg — most common after no-arg + obj = context.invoke_two(filter_name, obj, context.evaluate(filter_args[0])) + else + filter_args = evaluate_filter_expressions(context, filter_args, filter_kwargs) + obj = context.invoke(filter_name, obj, *filter_args) + end end context.apply_global_filter(obj) end def render_to_output_buffer(context, output) - obj = render(context) + # Fast path: no filters and no global filter + obj = if @filters.empty? && context.global_filter.nil? + context.evaluate(@name) + else + render(context) + end render_obj_to_output(obj, output) output end def render_obj_to_output(obj, output) - case obj - when NilClass + if obj.instance_of?(String) + output << obj + elsif obj.nil? # Do nothing - when Array + elsif obj.instance_of?(Array) obj.each do |o| render_obj_to_output(o, output) end @@ -128,7 +437,7 @@ def disabled?(_context) end def disabled_tags - [] + Const::EMPTY_ARRAY end private @@ -137,7 +446,8 @@ def lax_parse_filter_expressions(filter_name, unparsed_args) filter_args = [] keyword_args = nil unparsed_args.each do |a| - if (matches = a.match(JustTagAttributes)) + # Fast check: keyword args must contain ':' + if a.include?(':') && (matches = a.match(JustTagAttributes)) keyword_args ||= {} keyword_args[matches[1]] = parse_context.parse_expression(matches[2]) else @@ -190,15 +500,19 @@ def end_of_arguments?(p) end def evaluate_filter_expressions(context, filter_args, filter_kwargs) - parsed_args = filter_args.map { |expr| context.evaluate(expr) } if filter_kwargs + parsed_args = filter_args.map { |expr| context.evaluate(expr) } parsed_kwargs = {} filter_kwargs.each do |key, expr| parsed_kwargs[key] = context.evaluate(expr) end parsed_args << parsed_kwargs + parsed_args + elsif filter_args.empty? + Const::EMPTY_ARRAY + else + filter_args.map { |expr| context.evaluate(expr) } end - parsed_args end class ParseTreeVisitor < Liquid::ParseTreeVisitor diff --git a/lib/liquid/variable_lookup.rb b/lib/liquid/variable_lookup.rb index 4fba2a658..bb33b68c7 100644 --- a/lib/liquid/variable_lookup.rb +++ b/lib/liquid/variable_lookup.rb @@ -10,11 +10,108 @@ def self.parse(markup, string_scanner = StringScanner.new(""), cache = nil) new(markup, string_scanner, cache) end - def initialize(markup, string_scanner = StringScanner.new(""), cache = nil) - lookups = markup.scan(VariableParser) + # Fast parse that skips simple_lookup? check — caller guarantees simple identifier chain + def self.parse_simple(markup, string_scanner = nil, cache = nil) + new(markup, string_scanner, cache, true) + end + + # Fast manual scanner replacing markup.scan(VariableParser) + # VariableParser = /\[(?>[^\[\]]+|\g<0>)*\]|[\w-]+\??/ + # Splits "product.variants[0].title" into ["product", "variants", "[0]", "title"] + def self.scan_variable(markup) + result = [] + pos = 0 + len = markup.bytesize + + while pos < len + byte = markup.getbyte(pos) + + if byte == 91 # '[' + # Scan balanced brackets + depth = 1 + start = pos + pos += 1 + while pos < len && depth > 0 + b = markup.getbyte(pos) + depth += 1 if b == 91 + depth -= 1 if b == 93 + pos += 1 + end + if depth == 0 + result << markup.byteslice(start, pos - start) + else + # Unbalanced bracket - skip '[' and continue + pos = start + 1 + end + elsif byte == 46 # '.' + pos += 1 + elsif ByteTables::IDENT_CONT[byte] # [\w-] + start = pos + pos += 1 + while pos < len + b = markup.getbyte(pos) + break unless ByteTables::IDENT_CONT[b] + pos += 1 + end + # Check trailing '?' + if pos < len && markup.getbyte(pos) == 63 + pos += 1 + end + result << markup.byteslice(start, pos - start) + else + pos += 1 + end + end + + result + end + + # Check if markup is a simple identifier chain: [\w-]+\??(.[\w-]+\??)* + # Uses C-level match? — 8x faster than Ruby byte scanning + SIMPLE_LOOKUP_RE = /\A[\w-]+\??(?:\.[\w-]+\??)*\z/ + + def self.simple_lookup?(markup) + markup.bytesize > 0 && markup.match?(SIMPLE_LOOKUP_RE) + end + + def initialize(markup, string_scanner = StringScanner.new(""), cache = nil, simple = false) + # Fast path: simple identifier chain without brackets + if simple || self.class.simple_lookup?(markup) + dot_pos = markup.index('.') + if dot_pos.nil? + @name = markup + @lookups = Const::EMPTY_ARRAY + @command_flags = 0 + return + end + @name = markup.byteslice(0, dot_pos) + # Build lookups array from remaining dot-separated segments + lookups = [] + @command_flags = 0 + pos = dot_pos + 1 + len = markup.bytesize + while pos < len + seg_start = pos + while pos < len + b = markup.getbyte(pos) + break if b == 46 # '.' + pos += 1 + end + seg = markup.byteslice(seg_start, pos - seg_start) + if COMMAND_METHODS.include?(seg) + @command_flags |= 1 << lookups.length + end + lookups << seg + pos += 1 # skip dot + end + @lookups = lookups + return + end + + lookups = self.class.scan_variable(markup) name = lookups.shift - if name&.start_with?('[') && name&.end_with?(']') + if name&.start_with?('[') && name.end_with?(']') name = Expression.parse( name[1..-2], string_scanner, @@ -26,9 +123,8 @@ def initialize(markup, string_scanner = StringScanner.new(""), cache = nil) @lookups = lookups @command_flags = 0 - @lookups.each_index do |i| - lookup = lookups[i] - if lookup&.start_with?('[') && lookup&.end_with?(']') + @lookups.each_with_index do |lookup, i| + if lookup&.start_with?('[') && lookup.end_with?(']') lookups[i] = Expression.parse( lookup[1..-2], string_scanner, @@ -49,26 +145,34 @@ def evaluate(context) object = context.find_variable(name) @lookups.each_index do |i| - key = context.evaluate(@lookups[i]) + lookup = @lookups[i] + key = lookup.instance_of?(String) ? lookup : context.evaluate(lookup) # Cast "key" to its liquid value to enable it to act as a primitive value - key = Liquid::Utils.to_liquid_value(key) + # Fast path: strings and integers (most common key types) don't need conversion + unless key.instance_of?(String) || key.instance_of?(Integer) + key = Liquid::Utils.to_liquid_value(key) + end # If object is a hash- or array-like object we look for the # presence of the key and if its available we return it - if object.respond_to?(:[]) && - ((object.respond_to?(:key?) && object.key?(key)) || - (object.respond_to?(:fetch) && key.is_a?(Integer))) - + if accessible?(object, key) # if its a proc we will replace the entry with the proc - res = context.lookup_and_evaluate(object, key) - object = res.to_liquid + object = context.lookup_and_evaluate(object, key) + # Skip to_liquid for common primitive types (they return self) + unless object.instance_of?(String) || object.instance_of?(Integer) || object.instance_of?(Float) || + object.instance_of?(Array) || object.instance_of?(Hash) || object.nil? + object = liquidize(object, context) + end # Some special cases. If the part wasn't in square brackets and # no key with the same name was found we interpret following calls # as commands and call them on the current object elsif lookup_command?(i) && object.respond_to?(key) - object = object.send(key).to_liquid + object = object.send(key) + unless object.instance_of?(String) || object.instance_of?(Integer) || object.instance_of?(Array) || object.nil? + object = liquidize(object, context) + end # Handle string first/last like ActiveSupport does (returns first/last character) # ActiveSupport returns "" for empty strings, not nil @@ -82,9 +186,6 @@ def evaluate(context) return nil unless context.strict_variables raise Liquid::UndefinedVariable, "undefined variable #{key}" end - - # If we are dealing with a drop here we have to - object.context = context if object.respond_to?(:context=) end object @@ -94,6 +195,27 @@ def ==(other) self.class == other.class && state == other.state end + private + + # Returns true if +object+ has +key+ accessible via [] lookup. + def accessible?(object, key) + if object.instance_of?(Hash) + object.key?(key) + else + object.respond_to?(:[]) && + ((object.respond_to?(:key?) && object.key?(key)) || + (object.respond_to?(:fetch) && key.is_a?(Integer))) + end + end + + # Calls to_liquid on +object+ and wires up the context reference if needed. + # Skipped for primitive types that return self from to_liquid. + def liquidize(object, context) + object = object.to_liquid + object.context = context if object.respond_to?(:context=) + object + end + protected def state diff --git a/performance/bench_quick.rb b/performance/bench_quick.rb new file mode 100644 index 000000000..6168f80e3 --- /dev/null +++ b/performance/bench_quick.rb @@ -0,0 +1,62 @@ +# frozen_string_literal: true + +# Quick benchmark for autoresearch: measures parse µs, render µs, and object allocations +# Outputs machine-readable metrics to stdout + +require_relative 'theme_runner' + +RubyVM::YJIT.enable if defined?(RubyVM::YJIT) + +runner = ThemeRunner.new + +# Warmup — enough iterations for YJIT to fully optimize hot paths +20.times { runner.compile } +20.times { runner.render } + +GC.start +GC.compact if GC.respond_to?(:compact) + +# Measure parse +parse_times = [] +10.times do + GC.disable + t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + runner.compile + t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + GC.enable + GC.start + parse_times << (t1 - t0) * 1_000_000 # µs +end + +# Measure render +render_times = [] +10.times do + GC.disable + t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + runner.render + t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + GC.enable + GC.start + render_times << (t1 - t0) * 1_000_000 # µs +end + +# Measure object allocations for one parse+render cycle +require 'objspace' +GC.start +GC.disable +before = ObjectSpace.count_objects.values_at(:TOTAL).first - ObjectSpace.count_objects.values_at(:FREE).first +runner.compile +runner.render +after = ObjectSpace.count_objects.values_at(:TOTAL).first - ObjectSpace.count_objects.values_at(:FREE).first +GC.enable +allocations = after - before + +parse_us = parse_times.min.round(0) +render_us = render_times.min.round(0) +combined_us = parse_us + render_us + +puts "RESULTS" +puts "parse_us=#{parse_us}" +puts "render_us=#{render_us}" +puts "combined_us=#{combined_us}" +puts "allocations=#{allocations}" diff --git a/test/unit/tokenizer_unit_test.rb b/test/unit/tokenizer_unit_test.rb index 76d379d36..b3da5e4d3 100644 --- a/test/unit/tokenizer_unit_test.rb +++ b/test/unit/tokenizer_unit_test.rb @@ -48,6 +48,33 @@ def test_unmatching_start_and_end assert_equal(["{%%}", "}"], tokenize('{%%}}')) end + # Regression: lone '{' at or near end of string previously caused an infinite + # loop. The stray-{ else branch left `pos` unchanged when no further '{{' or + # '{%' existed, so the outer loop found the same '{' on every iteration. + def test_lone_brace_does_not_loop + assert_equal(["{"], tokenize('{')) + assert_equal(["a{"], tokenize('a{')) + assert_equal(["hello { world {"], tokenize('hello { world {')) + assert_equal(["{ world"], tokenize('{ world')) + assert_equal(["x{y"], tokenize('x{y')) + assert_equal(["{b{c"], tokenize('{b{c')) + end + + def test_lone_brace_before_real_token + assert_equal( + ["a { b ", "{% if x %}", "yes", "{% endif %}", " c"], + tokenize('a { b {% if x %}yes{% endif %} c'), + ) + assert_equal( + ["x { ", "{{ var }}", " y"], + tokenize('x { {{ var }} y'), + ) + assert_equal( + ["{ ", "{{ var }}"], + tokenize('{ {{ var }}'), + ) + end + private def new_tokenizer(source, parse_context: Liquid::ParseContext.new, start_line_number: nil) diff --git a/test/unit/variable_fast_parse_test.rb b/test/unit/variable_fast_parse_test.rb new file mode 100644 index 000000000..fd67c5eaf --- /dev/null +++ b/test/unit/variable_fast_parse_test.rb @@ -0,0 +1,126 @@ +# frozen_string_literal: true + +require 'test_helper' + +# Tests that the fast-path parser (try_fast_parse) produces the same result as the +# full Lexer → Parser pipeline for every input we expect it to handle. +# +# This protects against silent regressions where a change to try_fast_parse causes it +# to produce different output from the slow path (the existing test suite would still +# pass because the slow path catches it, but correctness would be silently lost). +class VariableFastParseTest < Minitest::Test + include Liquid + + EQUIVALENCE_CASES = [ + # Simple lookups + "product", + "product.title", + "product.variants.first.title", + # Quoted string literals + "'hello'", + '"hello"', + # Variables with no-arg filters + "product | upcase", + "product | upcase | downcase", + "product | strip | upcase | downcase", + # Variables with single-arg filters + "product | truncate: 50", + "product | plus: 1", + "product | plus: -3", + "product | round: 2", + "product | append: ' world'", + # Variables with multi-arg filters + "product | replace: 'a', 'b'", + "product | pluralize: 'item', 'items'", + "product | slice: 0, 5", + # Chained mixed filters + "product.title | truncate: 50", + "'hello' | append: ' world' | upcase", + "name | prepend: 'Dr. ' | append: ' PhD' | upcase", + # Numeric args + "count | plus: 1.5", + "price | minus: 0.99", + # No whitespace around pipe + "x|upcase", + "x|replace:'a','b'|upcase", + # Leading/trailing whitespace + " product ", + " product.title | upcase ", + ].freeze + + EQUIVALENCE_CASES.each_with_index do |markup, i| + define_method(:"test_fast_parse_equivalence_#{i.to_s.rjust(2, "0")}") do + lax_ctx = Liquid::ParseContext.new(error_mode: :lax) + strict_ctx = Liquid::ParseContext.new(error_mode: :strict) + + lax_var = Liquid::Variable.new(markup, lax_ctx) + strict_var = Liquid::Variable.new(markup, strict_ctx) + + assert_equal strict_var.name, + lax_var.name, + "Name mismatch for #{markup.inspect}: " \ + "lax=#{lax_var.name.inspect} strict=#{strict_var.name.inspect}" + assert_equal strict_var.filters.length, + lax_var.filters.length, + "Filter count mismatch for #{markup.inspect}: " \ + "lax=#{lax_var.filters.inspect} strict=#{strict_var.filters.inspect}" + strict_var.filters.each_with_index do |(s_name, *), i| + l_name = lax_var.filters[i][0] + assert_equal s_name, + l_name, + "Filter name mismatch at index #{i} for #{markup.inspect}" + end + end + end + + # Verify the fast path is actually taken for simple variables (i.e. filters is the + # shared frozen EMPTY_ARRAY, not a newly allocated array). + def test_fast_path_taken_for_simple_variable + ctx = Liquid::ParseContext.new(error_mode: :lax) + var = Liquid::Variable.new("product.title", ctx) + assert_same( + Liquid::Const::EMPTY_ARRAY, + var.filters, + "Expected fast path (frozen EMPTY_ARRAY) for simple variable", + ) + end + + def test_fast_path_taken_for_no_arg_filter + ctx = Liquid::ParseContext.new(error_mode: :lax) + var = Liquid::Variable.new("product | upcase", ctx) + assert_equal(1, var.filters.length) + assert_equal("upcase", var.filters[0][0]) + # The no-arg filter tuple should come from NO_ARG_FILTER_CACHE (frozen) + assert_predicate(var.filters[0], :frozen?) + end + + def test_fast_path_taken_for_single_arg_filter + ctx = Liquid::ParseContext.new(error_mode: :lax) + var = Liquid::Variable.new("product | truncate: 50", ctx) + assert_equal(1, var.filters.length) + assert_equal("truncate", var.filters[0][0]) + assert_equal([50], var.filters[0][1]) + end + + # Keyword args must fall through to the Lexer — verify the result is still correct. + def test_keyword_arg_falls_to_lexer_and_parses_correctly + ctx = Liquid::ParseContext.new(error_mode: :lax) + var = Liquid::Variable.new("img | img_tag: class: 'hero'", ctx) + assert_equal(1, var.filters.length) + assert_equal("img_tag", var.filters[0][0]) + end + + # Numeric filter arguments: integers and floats + def test_numeric_filter_args + ctx = Liquid::ParseContext.new(error_mode: :lax) + + int_var = Liquid::Variable.new("price | plus: 3", ctx) + assert_equal([3], int_var.filters[0][1]) + + neg_var = Liquid::Variable.new("price | minus: -1", ctx) + assert_equal([-1], neg_var.filters[0][1]) + + float_var = Liquid::Variable.new("price | round: 2.5", ctx) + assert_equal([2.5], float_var.filters[0][1]) + end +end