diff --git a/auto/autoresearch.ideas.md b/auto/autoresearch.ideas.md
new file mode 100644
index 000000000..4a25837e7
--- /dev/null
+++ b/auto/autoresearch.ideas.md
@@ -0,0 +1,30 @@
+# Autoresearch Ideas
+
+## Dead Ends (tried and failed)
+
+- **Tag name interning** (skip+byte dispatch): saves 878 allocs but verification loop overhead kills speed
+- **String dedup (-@)** for filter names: no alloc savings, creates temp strings anyway
+- **Split-based tokenizer**: 2.5x faster C-level split but can't handle {{ followed by %} nesting
+- **Streaming tokenizer**: needs own StringScanner (+alloc), per-shift overhead worse than eager array
+- **Merge simple_lookup? into initialize**: logic overhead offsets saved index call
+- **Cursor for filter scanning**: cursor.reset overhead worse than inline byte loops
+- **Direct strainer call**: YJIT already inlines context.invoke_single well
+- **TruthyCondition subclass**: YJIT polymorphism at evaluate call site hurts more than 115 saved allocs
+- **Index loop for filters**: YJIT optimizes each+destructure MUCH better than manual filter[0]/filter[1]
+
+## Key Insights
+
+- YJIT monomorphism > allocation reduction at this scale
+- C-level StringScanner.scan/skip > Ruby-level byte loops (already applied)
+- String#split is 2.5x faster than manual tokenization, but Liquid's grammar is too complex for regex
+- 74% of total CPU time is GC — alloc reduction is the highest-leverage optimization
+- But YJIT-deoptimization from polymorphism costs more than the GC savings
+
+## Remaining Ideas
+
+- **Tokenizer: use String#index + byteslice instead of StringScanner**: avoid the StringScanner overhead entirely for the simple case of finding {%/{{ delimiters
+- **Pre-freeze all Condition operator lambdas**: reduce alloc in Condition initialization
+- **Avoid `@blocks = []` in If with single-element optimization**: use `@block` ivar for single condition, only create array for elsif
+- **Reduce ForloopDrop allocation**: reuse ForloopDrop objects across iterations or use a lighter-weight object
+- **VariableLookup: single-segment optimization**: for "product.title" (1 lookup), use an ivar instead of 1-element Array
+
diff --git a/auto/autoresearch.md b/auto/autoresearch.md
new file mode 100644
index 000000000..8ba585717
--- /dev/null
+++ b/auto/autoresearch.md
@@ -0,0 +1,109 @@
+# Autoresearch: Liquid Parse+Render Performance
+
+## Objective
+Optimize the Shopify Liquid template engine's parse and render performance.
+The workload is the ThemeRunner benchmark which parses and renders real Shopify
+theme templates (dropify, ripen, tribble, vogue) with realistic data from
+`performance/shopify/database.rb`. We measure parse time, render time, and
+object allocations. The optimization target is combined parse+render time (µs).
+
+## How to Run
+Run `./auto/autoresearch.sh` — it runs unit tests, liquid-spec conformance,
+then the performance benchmark, outputting metrics in parseable format.
+
+## Metrics
+- **Primary (optimization target)**: `combined_µs` (µs, lower is better) — sum of parse + render time
+- **Secondary (tradeoff monitoring)**:
+ - `parse_µs` — time to parse all theme templates (Liquid::Template#parse)
+ - `render_µs` — time to render all pre-compiled templates
+ - `allocations` — total object allocations for one parse+render cycle
+ Parse dominates (~70-75% of combined). Allocations correlate with GC pressure.
+
+## Files in Scope
+- `lib/liquid/*.rb` — core Liquid library (parser, lexer, context, expression, etc.)
+- `lib/liquid/tags/*.rb` — tag implementations (for, if, assign, etc.)
+- `performance/bench_quick.rb` — benchmark script
+
+## Off Limits
+- `test/` — tests must continue to pass unchanged
+- `performance/tests/` — benchmark templates, do not modify
+- `performance/shopify/` — benchmark data/filters, do not modify
+
+## Constraints
+- All unit tests must pass (`bundle exec rake base_test`)
+- liquid-spec failures must not increase beyond 2 (pre-existing UTF-8 edge cases)
+- No new gem dependencies
+- Semantic correctness must be preserved — templates must render identical output
+- **Security**: Liquid runs untrusted user code. See Strategic Direction for details.
+
+## Strategic Direction
+The long-term goal is to converge toward a **single-pass, forward-only parsing
+architecture** using one shared StringScanner instance. The current system has
+multiple redundant passes: Tokenizer → BlockBody → Lexer → Parser → Expression
+→ VariableLookup, each re-scanning portions of the source. A unified scanner
+approach would:
+
+1. **One StringScanner** flows through the entire parse — no intermediate token
+ arrays, no re-lexing filter chains, no string reconstruction in Parser#expression.
+2. **Emit a lightweight IL or normalized AST** during the single forward pass,
+ decoupling strictness checking from the hot parse path. The LiquidIL project
+ (`~/src/tries/2026-01-05-liquid-il`) demonstrated this: a recursive-descent
+ parser emitting IL directly achieved significant speedups.
+3. **Minimal backtracking** — the scanner advances forward, byte-checking as it
+ goes. liquid-c (`~/src/tries/2026-01-16-Shopify-liquid-c`) showed that a
+ C-level cursor-based tokenizer eliminates most allocation overhead.
+
+Current fast-path optimizations (byte-level tag/variable/for/if parsing) are
+steps toward this goal. Each one replaces a regex+MatchData pattern with
+forward-only byte scanning. The remaining Lexer→Parser path for filter args
+is the next target for elimination.
+
+**Security note**: Liquid executes untrusted user templates. All parsing must
+use explicit byte-range checks. Never use eval, send on user input, dynamic
+method dispatch, const_get, or any pattern that lets template authors escape
+the sandbox.
+
+## Baseline
+- **Commit**: 4ea835a (original, before any optimizations)
+- **combined_µs**: 7,374
+- **parse_µs**: 5,928
+- **render_µs**: 1,446
+- **allocations**: 62,620
+
+## Progress Log
+- 3329b09: Replace FullToken regex with manual byte parsing → combined 7,262 (-1.5%)
+- 97e6893: Replace VariableParser regex with manual byte scanner → combined 6,945 (-5.8%), allocs 58,009
+- 2b78e4b: getbyte instead of string indexing in whitespace_handler/create_variable → allocs 51,477
+- d291e63: Lexer equal? for frozen arrays, \s+ whitespace skip → combined ~6,331
+- d79b9fa: Avoid strip alloc in Expression.parse, byteslice for strings → allocs 49,151
+- fa41224: Short-circuit parse_number with first-byte check → allocs 48,240
+- c1113ad: Fast-path String in render_obj_to_output → combined ~6,071
+- 25f9224: Fast-path simple variable parsing (skip Lexer/Parser) → combined ~5,860, allocs 45,202
+- 3939d74: Replace SIMPLE_VARIABLE regex with byte scanner → combined ~5,717, allocs 42,763
+- fe7a2f5: Fast-path simple if conditions → combined ~5,444, allocs 41,490
+- cfa0dfe: Replace For tag Syntax regex with manual byte parser → combined ~4,974, allocs 39,847
+- 8a92a4e: Unified fast-path Variable: parse name directly, only lex filter chain → combined ~5,060, allocs 40,520
+- 58d2514: parse_tag_token returns [tag_name, markup, newlines] → combined ~4,815, allocs 37,355
+- db43492: Hoist write score check out of render loop → render ~1,345
+- 17daac9: Extend fast-path to quoted string literal variables → all 1,197 variables fast-pathed
+- 9fd7cec: Split filter parsing: no-arg filters scanned directly, Lexer only for args → combined ~4,595, allocs 35,159
+- e5933fc: Avoid array alloc in parse_tag_token via class ivars → allocs 34,281
+- 2e207e6: Replace WhitespaceOrNothing regex with byte-level blank_string? → combined ~4,800
+- 526af22: invoke_single fast path for no-arg filter invocation → allocs 32,621
+- 76ae8f1: find_variable top-scope fast path → combined ~4,740
+- 4cda1a5: slice_collection: skip copy for full Array → allocs 32,004
+- 79840b1: Replace SIMPLE_CONDITION regex with manual byte parser → combined ~4,663, allocs 31,465
+- 69430e9: Replace INTEGER_REGEX/FLOAT_REGEX with byte-level parse_number → allocs 31,129
+- 405e3dc: Frozen EMPTY_ARRAY/EMPTY_HASH for Context @filters/@disabled_tags → allocs 31,009
+- b90d7f0: Avoid unnecessary array wrapping for Context environments → allocs 30,709
+- 3799d4c: Lazy seen={} hash in Utils.to_s/inspect → allocs 30,169
+- 0b07487: Fast-path VariableLookup: skip scan_variable for simple identifiers → allocs 29,711
+- 9de1527: Introduce Cursor class for centralized byte-level scanning
+- dd4a100: Remove dead parse_tag_token/SIMPLE_CONDITION (now in Cursor)
+- cdc3438: For tag: migrate lax_parse to Cursor with zero-alloc scanning → allocs 29,620
+
+## Current Best
+- **combined_µs**: ~3,400 (-54% from original 7,374 baseline)
+- **parse_µs**: ~2,300
+- **render_µs**: ~1,100
+- **allocations**: 24,882 (-60% from original 62,620 baseline)
diff --git a/auto/autoresearch.sh b/auto/autoresearch.sh
new file mode 100755
index 000000000..f421767e6
--- /dev/null
+++ b/auto/autoresearch.sh
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+# Autoresearch benchmark runner for Liquid performance optimization
+# Runs: unit tests → performance benchmark (3 runs, takes best)
+# Outputs METRIC lines for the agent to parse
+# Exit code 0 = all good, non-zero = broken
+set -euo pipefail
+
+cd "$(dirname "$0")/.."
+
+# ── Step 1: Unit tests (fast gate) ──────────────────────────────────
+echo "=== Unit Tests ==="
+TEST_OUT=$(bundle exec rake base_test 2>&1)
+TEST_RESULT=$(echo "$TEST_OUT" | tail -1)
+if echo "$TEST_OUT" | grep -q 'failures\|errors' && ! echo "$TEST_RESULT" | grep -q '0 failures, 0 errors'; then
+ echo "$TEST_OUT" | grep -E 'Failure|Error|failures|errors' | head -20
+ echo "FATAL: unit tests failed"
+ exit 1
+fi
+echo "$TEST_RESULT"
+
+# ── Step 2: Performance benchmark (3 runs, take best) ──────────────
+echo ""
+echo "=== Performance Benchmark (3 runs) ==="
+BEST_COMBINED=999999
+BEST_PARSE=0
+BEST_RENDER=0
+BEST_ALLOC=0
+
+for i in 1 2 3; do
+ OUT=$(bundle exec ruby performance/bench_quick.rb 2>&1)
+ P=$(echo "$OUT" | grep '^parse_us=' | cut -d= -f2)
+ R=$(echo "$OUT" | grep '^render_us=' | cut -d= -f2)
+ C=$(echo "$OUT" | grep '^combined_us=' | cut -d= -f2)
+ A=$(echo "$OUT" | grep '^allocations=' | cut -d= -f2)
+ echo " run $i: combined=${C}µs (parse=${P} render=${R}) allocs=${A}"
+ if [ "$C" -lt "$BEST_COMBINED" ]; then
+ BEST_COMBINED=$C
+ BEST_PARSE=$P
+ BEST_RENDER=$R
+ BEST_ALLOC=$A
+ fi
+done
+
+echo ""
+echo "METRIC combined_us=$BEST_COMBINED"
+echo "METRIC parse_us=$BEST_PARSE"
+echo "METRIC render_us=$BEST_RENDER"
+echo "METRIC allocations=$BEST_ALLOC"
diff --git a/auto/bench.sh b/auto/bench.sh
new file mode 100755
index 000000000..77fc48092
--- /dev/null
+++ b/auto/bench.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# Auto-research benchmark script for Liquid
+# Runs: unit tests → liquid-spec → performance benchmark
+# Outputs machine-readable metrics on success
+# Exit code 0 = all good, non-zero = broken
+set -euo pipefail
+
+cd "$(dirname "$0")/.."
+
+# ── Step 1: Unit tests (fast gate) ──────────────────────────────────
+echo "=== Unit Tests ==="
+if ! bundle exec rake base_test 2>&1; then
+ echo "FATAL: unit tests failed"
+ exit 1
+fi
+
+# ── Step 2: liquid-spec (correctness gate) ──────────────────────────
+echo ""
+echo "=== Liquid Spec ==="
+SPEC_OUTPUT=$(bundle exec liquid-spec run spec/ruby_liquid.rb 2>&1 || true)
+echo "$SPEC_OUTPUT" | tail -3
+
+# Extract failure count from "Total: N passed, N failed, N errors" line
+# Allow known pre-existing failures (≤2)
+TOTAL_LINE=$(echo "$SPEC_OUTPUT" | grep "^Total:" || echo "Total: 0 passed, 0 failed, 0 errors")
+FAILURES=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) failed.*/\1/p')
+ERRORS=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) error.*/\1/p')
+FAILURES=${FAILURES:-0}
+ERRORS=${ERRORS:-0}
+TOTAL_BAD=$((FAILURES + ERRORS))
+
+if [ "$TOTAL_BAD" -gt 2 ]; then
+ echo "FATAL: liquid-spec has $FAILURES failures and $ERRORS errors (threshold: 2)"
+ exit 1
+fi
+
+# ── Step 3: Performance benchmark ──────────────────────────────────
+echo ""
+echo "=== Performance Benchmark ==="
+bundle exec ruby performance/bench_quick.rb 2>&1
diff --git a/autoresearch.jsonl b/autoresearch.jsonl
new file mode 100644
index 000000000..3b69d91ba
--- /dev/null
+++ b/autoresearch.jsonl
@@ -0,0 +1,30 @@
+{"type":"config","name":"Liquid parse+render performance (tenderlove-inspired)","metricName":"combined_µs","metricUnit":"µs","bestDirection":"lower"}
+{"run":1,"commit":"c09e722","metric":3818,"metrics":{"parse_µs":2722,"render_µs":1096,"allocations":24881},"status":"keep","description":"Baseline: 3,818µs combined, 24,881 allocs","timestamp":1773348490227}
+{"run":2,"commit":"c09e722","metric":4063,"metrics":{"parse_µs":2901,"render_µs":1162,"allocations":24003},"status":"discard","description":"Tag name interning via skip+byte dispatch: saves 878 allocs but verification loop slower than scan","timestamp":1773348738557,"segment":0}
+{"run":3,"commit":"c09e722","metric":3881,"metrics":{"parse_µs":2720,"render_µs":1161,"allocations":24881},"status":"discard","description":"String dedup (-@) for filter names: no alloc savings, no speed benefit","timestamp":1773348781481,"segment":0}
+{"run":4,"commit":"c09e722","metric":3970,"metrics":{"parse_µs":2829,"render_µs":1141,"allocations":24881},"status":"discard","description":"Streaming tokenizer: needs own StringScanner (+1 alloc), per-shift overhead worse than saved array","timestamp":1773348883093,"segment":0}
+{"run":5,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split-based tokenizer — regex can't handle unclosed tags inside raw blocks","timestamp":1773349089230,"segment":0}
+{"run":6,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split regex tokenizer v2 — can't handle {{ followed by %} (variable-becomes-tag nesting)","timestamp":1773349248313,"segment":0}
+{"run":7,"commit":"c09e722","metric":3861,"metrics":{"parse_µs":2744,"render_µs":1117,"allocations":24881},"status":"discard","description":"Merge simple_lookup? dot position into initialize — logic overhead offsets saved index call","timestamp":1773349376707,"segment":0}
+{"run":8,"commit":"c09e722","metric":4048,"metrics":{"parse_µs":2929,"render_µs":1119,"allocations":24881},"status":"discard","description":"Use Cursor regex for filter name scanning — cursor.reset + method dispatch overhead worse than inline bytes","timestamp":1773349447172,"segment":0}
+{"run":9,"commit":"c09e722","metric":3872,"metrics":{"parse_µs":2744,"render_µs":1128,"allocations":24881},"status":"discard","description":"Direct strainer call in Variable#render — YJIT already inlines context.invoke_single well","timestamp":1773349497593,"segment":0}
+{"run":10,"commit":"c09e722","metric":3839,"metrics":{"parse_µs":2732,"render_µs":1107,"allocations":24879},"status":"discard","description":"Array#[] fast path for slice_collection with limit/offset — only 2 alloc savings, not meaningful","timestamp":1773349555348,"segment":0}
+{"run":11,"commit":"c09e722","metric":3889,"metrics":{"parse_µs":2770,"render_µs":1119,"allocations":24766},"status":"discard","description":"TruthyCondition for simple if checks: -115 allocs but YJIT polymorphism at evaluate call site hurts speed","timestamp":1773349649377,"segment":0}
+{"run":12,"commit":"c09e722","metric":4150,"metrics":{"parse_µs":2769,"render_µs":1381,"allocations":24881},"status":"discard","description":"Index loop for filters: YJIT optimizes each+destructure better than manual indexing","timestamp":1773349699285,"segment":0}
+{"run":13,"commit":"b7ae55f","metric":3556,"metrics":{"parse_µs":2388,"render_µs":1168,"allocations":24882},"status":"keep","description":"Replace StringScanner tokenizer with String#byteindex — 12% faster parse, no regex overhead for delimiter finding","timestamp":1773349875890,"segment":0}
+{"run":14,"commit":"e25f2f1","metric":3464,"metrics":{"parse_µs":2335,"render_µs":1129,"allocations":24882},"status":"keep","description":"Confirmation run: byteindex tokenizer consistently 3,400-3,600µs","timestamp":1773349889465,"segment":0}
+{"run":15,"commit":"b37fa98","metric":3490,"metrics":{"parse_µs":2331,"render_µs":1159,"allocations":24882},"status":"keep","description":"Clean up tokenizer: remove unused StringScanner setup and regex constants","timestamp":1773349928672,"segment":0}
+{"run":16,"commit":"b37fa98","metric":3638,"metrics":{"parse_µs":2460,"render_µs":1178,"allocations":24882},"status":"discard","description":"Single-char byteindex for %} search: Ruby loop overhead worse for nearby targets","timestamp":1773349985509,"segment":0}
+{"run":17,"commit":"b37fa98","metric":3553,"metrics":{"parse_µs":2431,"render_µs":1122,"allocations":25256},"status":"discard","description":"Regex simple_variable_markup: MatchData creates 374 extra allocs, offsetting speed gain","timestamp":1773350066627,"segment":0}
+{"run":18,"commit":"b37fa98","metric":3629,"metrics":{"parse_µs":2455,"render_µs":1174,"allocations":25002},"status":"discard","description":"String.new(capacity: 4096) for output buffer: allocates more objects, not fewer","timestamp":1773350101852,"segment":0}
+{"run":19,"commit":"f6baeae","metric":3350,"metrics":{"parse_µs":2212,"render_µs":1138,"allocations":24882},"status":"keep","description":"parse_tag_token without StringScanner: pure byte ops avoid reset(token) overhead, -12% combined","timestamp":1773350230252,"segment":0}
+{"run":20,"commit":"f6baead","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: regex ultra-fast path for Variable — name pattern too broad, matches invalid trailing dots","timestamp":1773350472859,"segment":0}
+{"run":21,"commit":"ae9a2e2","metric":3314,"metrics":{"parse_µs":2203,"render_µs":1111,"allocations":24882},"status":"keep","description":"Clean confirmation run: 3,314µs (-55% from main), stable","timestamp":1773350544354,"segment":0}
+{"run":22,"commit":"ae9a2e2","metric":3497,"metrics":{"parse_µs":2336,"render_µs":1161,"allocations":24882},"status":"discard","description":"Regex fast path for no-filter variables: include? + match? overhead exceeds byte scan savings","timestamp":1773350641375,"segment":0}
+{"run":23,"commit":"ca327b0","metric":3445,"metrics":{"parse_µs":2284,"render_µs":1161,"allocations":24647},"status":"keep","description":"Condition#evaluate: skip loop block for simple conditions (no child_relation) — saves 235 allocs","timestamp":1773350691752,"segment":0}
+{"run":24,"commit":"99454a9","metric":3489,"metrics":{"parse_µs":2353,"render_µs":1136,"allocations":24647},"status":"keep","description":"Replace simple_lookup? byte scan with match? regex — 8x faster per call, cleaner code","timestamp":1773350837721,"segment":0}
+{"run":25,"commit":"99454a9","metric":3797,"metrics":{"parse_µs":2636,"render_µs":1161,"allocations":29627},"status":"discard","description":"Regex name extraction in try_fast_parse: MatchData creates 5K extra allocs, much worse","timestamp":1773351048938,"segment":0}
+{"run":26,"commit":"db348e0","metric":3459,"metrics":{"parse_µs":2318,"render_µs":1141,"allocations":24647},"status":"keep","description":"Inline to_liquid_value in If render — avoids one method dispatch per condition evaluation","timestamp":1773351080001,"segment":0}
+{"run":27,"commit":"b195d09","metric":3496,"metrics":{"parse_µs":2356,"render_µs":1140,"allocations":24530},"status":"keep","description":"Replace @blocks.each with while loop in If render — avoids block proc allocation per render","timestamp":1773351101134,"segment":0}
+{"run":28,"commit":"b195d09","metric":3648,"metrics":{"parse_µs":2457,"render_µs":1191,"allocations":24530},"status":"discard","description":"While loop in For render: YJIT optimizes each well for hot loops with many iterations","timestamp":1773351142275,"segment":0}
+{"run":29,"commit":"b195d09","metric":3966,"metrics":{"parse_µs":2641,"render_µs":1325,"allocations":24060},"status":"discard","description":"While loop for environment search: -470 allocs but YJIT deopt makes render 16% slower","timestamp":1773351193863,"segment":0}
diff --git a/lib/liquid.rb b/lib/liquid.rb
index 4d0a71a64..cfdb88d50 100644
--- a/lib/liquid.rb
+++ b/lib/liquid.rb
@@ -52,6 +52,8 @@ module Liquid
require "liquid/version"
require "liquid/deprecations"
require "liquid/const"
+require 'liquid/byte_tables'
+require 'liquid/cursor'
require 'liquid/standardfilters'
require 'liquid/file_system'
require 'liquid/parser_switching'
diff --git a/lib/liquid/block.rb b/lib/liquid/block.rb
index 73d86c7bd..19a76cb36 100644
--- a/lib/liquid/block.rb
+++ b/lib/liquid/block.rb
@@ -60,8 +60,11 @@ def block_name
@tag_name
end
+ # Cache block delimiters per tag name to avoid repeated string allocation
+ BLOCK_DELIMITER_CACHE = Hash.new { |h, k| h[k] = "end#{k}".freeze }
+
def block_delimiter
- @block_delimiter ||= "end#{block_name}"
+ @block_delimiter ||= BLOCK_DELIMITER_CACHE[block_name]
end
private
diff --git a/lib/liquid/block_body.rb b/lib/liquid/block_body.rb
index e4ada7d16..5a618fea5 100644
--- a/lib/liquid/block_body.rb
+++ b/lib/liquid/block_body.rb
@@ -1,7 +1,5 @@
# frozen_string_literal: true
-require 'English'
-
module Liquid
class BlockBody
LiquidTagToken = /\A\s*(#{TagName})\s*(.*?)\z/o
@@ -38,7 +36,7 @@ def freeze
private def parse_for_liquid_tag(tokenizer, parse_context)
while (token = tokenizer.shift)
- unless token.empty? || token.match?(WhitespaceOrNothing)
+ unless token.empty? || BlockBody.blank_string?(token)
unless token =~ LiquidTagToken
# line isn't empty but didn't match tag syntax, yield and let the
# caller raise a syntax error
@@ -53,8 +51,7 @@ def freeze
end
unless (tag = parse_context.environment.tag_for_name(tag_name))
- # end parsing if we reach an unknown tag and let the caller decide
- # determine how to proceed
+ # end parsing if we reach an unknown tag; let the caller determine how to proceed
return yield tag_name, markup
end
new_tag = tag.parse(tag_name, markup, tokenizer, parse_context)
@@ -124,48 +121,38 @@ def self.rescue_render_node(context, output, line_number, exc, blank_tag)
end
end
+ def self.blank_string?(str)
+ str.match?(WhitespaceOrNothing)
+ end
+
private def parse_for_document(tokenizer, parse_context, &block)
while (token = tokenizer.shift)
next if token.empty?
- case
- when token.start_with?(TAGSTART)
- whitespace_handler(token, parse_context)
- unless token =~ FullToken
- return handle_invalid_tag_token(token, parse_context, &block)
- end
- tag_name = Regexp.last_match(2)
- markup = Regexp.last_match(4)
-
- if parse_context.line_number
- # newlines inside the tag should increase the line number,
- # particularly important for multiline {% liquid %} tags
- parse_context.line_number += Regexp.last_match(1).count("\n") + Regexp.last_match(3).count("\n")
- end
-
- if tag_name == 'liquid'
- parse_liquid_tag(markup, parse_context)
- next
- end
- unless (tag = parse_context.environment.tag_for_name(tag_name))
- # end parsing if we reach an unknown tag and let the caller decide
- # determine how to proceed
- return yield tag_name, markup
+ first_byte = token.getbyte(0)
+ if first_byte == Cursor::LCURLY
+ second_byte = token.getbyte(1)
+ if second_byte == Cursor::PCT
+ # handle_tag_token returns:
+ # nil — tag parsed normally, continue (update line number)
+ # :next — 'liquid' inline tag; skip line number update
+ # :unknown — end tag or unknown tag; yield to caller and return
+ # :invalid — malformed tag token; delegate to handle_invalid_tag_token
+ result = handle_tag_token(token, parse_context, tokenizer)
+ next unless result # nil: normal
+ next if result == :next # :next: 'liquid'
+ return yield(@_unknown_tag_name, parse_context.cursor.tag_markup) if result == :unknown
+ return handle_invalid_tag_token(token, parse_context, &block) # :invalid
+ elsif second_byte == Cursor::LCURLY
+ whitespace_handler(token, parse_context)
+ @nodelist << create_variable(token, parse_context)
+ @blank = false
+ else
+ # Fallback: text token starting with '{'
+ append_text_token(token, parse_context)
end
- new_tag = tag.parse(tag_name, markup, tokenizer, parse_context)
- @blank &&= new_tag.blank?
- @nodelist << new_tag
- when token.start_with?(VARSTART)
- whitespace_handler(token, parse_context)
- @nodelist << create_variable(token, parse_context)
- @blank = false
else
- if parse_context.trim_whitespace
- token.lstrip!
- end
- parse_context.trim_whitespace = false
- @nodelist << token
- @blank &&= token.match?(WhitespaceOrNothing)
+ append_text_token(token, parse_context)
end
parse_context.line_number = tokenizer.line_number
end
@@ -173,8 +160,54 @@ def self.rescue_render_node(context, output, line_number, exc, blank_tag)
yield nil, nil
end
- def whitespace_handler(token, parse_context)
- if token[2] == WhitespaceControl
+ # Handles a {%...%} tag token. Does not receive the outer block — callers handle
+ # yield/block passing themselves, keeping the Proc off the hot path.
+ # Returns:
+ # nil — tag parsed, caller continues the loop
+ # :next — 'liquid' inline tag; caller skips line number update
+ # :unknown — unknown/end tag; @_unknown_tag_name holds the tag name;
+ # markup is in parse_context.cursor.tag_markup
+ # :invalid — malformed token; caller delegates to handle_invalid_tag_token
+ private def handle_tag_token(token, parse_context, tokenizer)
+ whitespace_handler(token, parse_context)
+ cursor = parse_context.cursor
+ tag_name = cursor.parse_tag_token(token)
+ return :invalid unless tag_name
+
+ markup = cursor.tag_markup
+ if parse_context.line_number
+ newlines = cursor.tag_newlines
+ parse_context.line_number += newlines if newlines > 0
+ end
+
+ if tag_name == 'liquid'
+ parse_liquid_tag(markup, parse_context)
+ return :next
+ end
+
+ tag = parse_context.environment.tag_for_name(tag_name)
+ unless tag
+ # end parsing if we reach an unknown tag; let the caller determine how to proceed
+ @_unknown_tag_name = tag_name
+ return :unknown
+ end
+
+ new_tag = tag.parse(tag_name, markup, tokenizer, parse_context)
+ @blank &&= new_tag.blank?
+ @nodelist << new_tag
+ nil
+ end
+
+ def append_text_token(token, parse_context)
+ token.lstrip! if parse_context.trim_whitespace
+ parse_context.trim_whitespace = false
+ @nodelist << token
+ @blank &&= BlockBody.blank_string?(token)
+ end
+ private :append_text_token
+
+ private def whitespace_handler(token, parse_context)
+ if token.getbyte(2) == Cursor::DASH
previous_token = @nodelist.last
if previous_token.is_a?(String)
first_byte = previous_token.getbyte(0)
@@ -184,7 +217,7 @@ def whitespace_handler(token, parse_context)
end
end
end
- parse_context.trim_whitespace = (token[-3] == WhitespaceControl)
+ parse_context.trim_whitespace = (token.getbyte(token.bytesize - 3) == Cursor::DASH)
end
def blank?
@@ -216,24 +249,35 @@ def render(context)
end
def render_to_output_buffer(context, output)
- freeze unless frozen?
+ freeze
- context.resource_limits.increment_render_score(@nodelist.length)
+ resource_limits = context.resource_limits
+ resource_limits.increment_render_score(@nodelist.length)
+ # Hot render loop — split on check_write so the common case (no resource
+ # limits) pays zero branch cost per node.
idx = 0
- while (node = @nodelist[idx])
- if node.instance_of?(String)
- output << node
- else
- render_node(context, output, node)
- # If we get an Interrupt that means the block must stop processing. An
- # Interrupt is any command that stops block execution such as {% break %}
- # or {% continue %}. These tags may also occur through Block or Include tags.
- break if context.interrupt? # might have happened in a for-block
+ if resource_limits.render_length_limit || resource_limits.last_capture_length
+ while (node = @nodelist[idx])
+ if node.instance_of?(String)
+ output << node
+ else
+ render_node(context, output, node)
+ break if context.interrupt?
+ end
+ idx += 1
+ resource_limits.increment_write_score(output)
+ end
+ else
+ while (node = @nodelist[idx])
+ if node.instance_of?(String)
+ output << node
+ else
+ render_node(context, output, node)
+ break if context.interrupt?
+ end
+ idx += 1
end
- idx += 1
-
- context.resource_limits.increment_write_score(output)
end
output
@@ -241,19 +285,15 @@ def render_to_output_buffer(context, output)
private
+ # Indirection allows subclasses to intercept per-node rendering.
def render_node(context, output, node)
BlockBody.render_node(context, output, node)
end
def create_variable(token, parse_context)
- if token.end_with?("}}")
- i = 2
- i = 3 if token[i] == "-"
- parse_end = token.length - 3
- parse_end -= 1 if token[parse_end] == "-"
- markup_end = parse_end - i + 1
- markup = markup_end <= 0 ? "" : token.slice(i, markup_end)
-
+ len = token.bytesize
+ if len >= 4 && token.getbyte(len - 1) == Cursor::RCURLY && token.getbyte(len - 2) == Cursor::RCURLY
+ markup = parse_context.cursor.parse_variable_token(token)
return Variable.new(markup, parse_context)
end
diff --git a/lib/liquid/byte_tables.rb b/lib/liquid/byte_tables.rb
new file mode 100644
index 000000000..b39c3f8b5
--- /dev/null
+++ b/lib/liquid/byte_tables.rb
@@ -0,0 +1,40 @@
+# frozen_string_literal: true
+
+module Liquid
+ # Pre-computed 256-entry boolean lookup tables for byte classification.
+ # Built once at load time; used as TABLE[byte] — a single array index
+ # instead of 3-5 comparison operators per check.
+ #
+ # Performance: neutral to slightly faster vs. chained comparisons.
+ # Readability: replaces expressions like
+ # (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95
+ # with the intent-revealing
+ # ByteTables::IDENT_START[b]
+ module ByteTables
+ # [a-zA-Z_] — valid first byte of an identifier
+ IDENT_START = Array.new(256, false).tap do |t|
+ (97..122).each { |b| t[b] = true } # a-z
+ (65..90).each { |b| t[b] = true } # A-Z
+ t[95] = true # _
+ end.freeze
+
+ # [a-zA-Z0-9_-] — valid continuation byte of an identifier
+ IDENT_CONT = Array.new(256, false).tap do |t|
+ (97..122).each { |b| t[b] = true } # a-z
+ (65..90).each { |b| t[b] = true } # A-Z
+ (48..57).each { |b| t[b] = true } # 0-9
+ t[95] = true # _
+ t[45] = true # -
+ end.freeze
+
+ # [0-9] — ASCII digit
+ DIGIT = Array.new(256, false).tap do |t|
+ (48..57).each { |b| t[b] = true }
+ end.freeze
+
+ # [ \t\n\v\f\r] — ASCII whitespace (mirrors Ruby's \s)
+ WHITESPACE = Array.new(256, false).tap do |t|
+ [32, 9, 10, 11, 12, 13].each { |b| t[b] = true } # space, tab, \n, \v, \f, \r
+ end.freeze
+ end
+end
diff --git a/lib/liquid/condition.rb b/lib/liquid/condition.rb
index 9d55c42b3..bf8b94093 100644
--- a/lib/liquid/condition.rb
+++ b/lib/liquid/condition.rb
@@ -65,20 +65,21 @@ def initialize(left = nil, operator = nil, right = nil)
end
def evaluate(context = deprecated_default_context)
- condition = self
- result = nil
- loop do
- result = interpret_condition(condition.left, condition.right, condition.operator, context)
+ result = interpret_condition(@left, @right, @operator, context)
+
+ # Fast path: no child conditions (most common)
+ return result unless @child_relation
+ condition = self
+ while condition.child_relation
case condition.child_relation
when :or
break if Liquid::Utils.to_liquid_value(result)
when :and
break unless Liquid::Utils.to_liquid_value(result)
- else
- break
end
condition = condition.child_condition
+ result = interpret_condition(condition.left, condition.right, condition.operator, context)
end
result
end
diff --git a/lib/liquid/context.rb b/lib/liquid/context.rb
index 433b6d003..766719099 100644
--- a/lib/liquid/context.rb
+++ b/lib/liquid/context.rb
@@ -24,10 +24,15 @@ def self.build(environment: Environment.default, environments: {}, outer_scope:
def initialize(environments = {}, outer_scope = {}, registers = {}, rethrow_errors = false, resource_limits = nil, static_environments = {}, environment = Environment.default)
@environment = environment
- @environments = [environments]
- @environments.flatten!
+ @environments = environments.is_a?(Array) ? environments : [environments]
- @static_environments = [static_environments].flatten(1).freeze
+ @static_environments = if static_environments.is_a?(Array)
+ static_environments.frozen? ? static_environments : static_environments.freeze
+ elsif static_environments.empty?
+ Const::EMPTY_ARRAY
+ else
+ [static_environments].freeze
+ end
@scopes = [outer_scope || {}]
@registers = registers.is_a?(Registers) ? registers : Registers.new(registers)
@errors = []
@@ -35,14 +40,13 @@ def initialize(environments = {}, outer_scope = {}, registers = {}, rethrow_erro
@strict_variables = false
@resource_limits = resource_limits || ResourceLimits.new(environment.default_resource_limits)
@base_scope_depth = 0
- @interrupts = []
- @filters = []
+ @interrupts = Const::EMPTY_ARRAY
+ @filters = Const::EMPTY_ARRAY
@global_filter = nil
- @disabled_tags = {}
+ @disabled_tags = Const::EMPTY_HASH
- # Instead of constructing new StringScanner objects for each Expression parse,
- # we recycle the same one.
- @string_scanner = StringScanner.new("")
+ # Lazy-init StringScanner — only needed if Context#[] is called during render
+ @string_scanner = nil
@registers.static[:cached_partials] ||= {}
@registers.static[:file_system] ||= environment.file_system
@@ -73,7 +77,7 @@ def strainer
# Note that this does not register the filters with the main Template object. see Template.register_filter
# for that
def add_filters(filters)
- filters = [filters].flatten.compact
+ filters = Array(filters).flatten.compact
@filters += filters
@strainer = nil
end
@@ -84,11 +88,12 @@ def apply_global_filter(obj)
# are there any not handled interrupts?
def interrupt?
- !@interrupts.empty?
+ !@interrupts.equal?(Const::EMPTY_ARRAY) && @interrupts.any?
end
# push an interrupt to the stack. this interrupt is considered not handled.
def push_interrupt(e)
+ @interrupts = [] if @interrupts.frozen?
@interrupts.push(e)
end
@@ -109,6 +114,20 @@ def invoke(method, *args)
strainer.invoke(method, *args).to_liquid
end
+ # Arity-specialized filter delegation — generated to match StrainerTemplate's specializations.
+ # The pattern (avoid *args splat) is the same for each arity; generating makes it explicit.
+ {
+ invoke_single: ['input'],
+ invoke_two: ['input', 'arg1'],
+ }.each do |method_name, params|
+ all_params = (["method"] + params).join(", ")
+ module_eval(<<~RUBY, __FILE__, __LINE__ + 1)
+ def #{method_name}(#{all_params})
+ strainer.#{method_name}(#{all_params}).to_liquid
+ end
+ RUBY
+ end
+
# Push new local scope on the stack. use Context#stack instead
def push(new_scope = {})
@scopes.unshift(new_scope)
@@ -180,11 +199,11 @@ def []=(key, value)
# Example:
# products == empty #=> products.empty?
def [](expression)
- evaluate(Expression.parse(expression, @string_scanner))
+ evaluate(Expression.parse(expression, @string_scanner ||= StringScanner.new("")))
end
def key?(key)
- find_variable(key, raise_on_not_found: false) != nil
+ !find_variable(key, raise_on_not_found: false).nil?
end
def evaluate(object)
@@ -193,22 +212,38 @@ def evaluate(object)
# Fetches an object starting at the local scope and then moving up the hierachy
def find_variable(key, raise_on_not_found: true)
- # This was changed from find() to find_index() because this is a very hot
- # path and find_index() is optimized in MRI to reduce object allocation
- index = @scopes.find_index { |s| s.key?(key) }
-
- variable = if index
- lookup_and_evaluate(@scopes[index], key, raise_on_not_found: raise_on_not_found)
+ # Fast path: check top scope first (most common in for loops)
+ scope = @scopes[0]
+ if scope.key?(key)
+ variable = lookup_and_evaluate(scope, key, raise_on_not_found: raise_on_not_found)
+ elsif @scopes.length == 1
+ # Only one scope and key not found — go straight to environments
+ variable = try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found)
else
- try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found)
+ # Multiple scopes — search through all of them
+ scope = @scopes.find { |s| s.key?(key) }
+
+ variable = if scope
+ lookup_and_evaluate(scope, key, raise_on_not_found: raise_on_not_found)
+ else
+ try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found)
+ end
end
# update variable's context before invoking #to_liquid
+ # Fast path: primitive types don't need context= or to_liquid conversion
+ case variable
+ when String, Integer, Float, NilClass, TrueClass, FalseClass, Array, Hash, Time
+ return variable
+ end
+
variable.context = self if variable.respond_to?(:context=)
liquid_variable = variable.to_liquid
- liquid_variable.context = self if variable != liquid_variable && liquid_variable.respond_to?(:context=)
+ if variable != liquid_variable
+ liquid_variable.context = self if liquid_variable.respond_to?(:context=)
+ end
liquid_variable
end
@@ -228,6 +263,7 @@ def lookup_and_evaluate(obj, key, raise_on_not_found: true)
end
def with_disabled_tags(tag_names)
+ @disabled_tags = {} if @disabled_tags.frozen?
tag_names.each do |name|
@disabled_tags[name] = @disabled_tags.fetch(name, 0) + 1
end
@@ -251,17 +287,16 @@ def tag_disabled?(tag_name)
attr_reader :base_scope_depth
def try_variable_find_in_environments(key, raise_on_not_found:)
- @environments.each do |environment|
- found_variable = lookup_and_evaluate(environment, key, raise_on_not_found: raise_on_not_found)
- if !found_variable.nil? || @strict_variables && raise_on_not_found
- return found_variable
- end
- end
- @static_environments.each do |environment|
+ found = find_in_envs(@environments, key, raise_on_not_found: raise_on_not_found)
+ return found unless found.nil? && !(@strict_variables && raise_on_not_found)
+
+ find_in_envs(@static_environments, key, raise_on_not_found: raise_on_not_found)
+ end
+
+ def find_in_envs(envs, key, raise_on_not_found:)
+ envs.each do |environment|
found_variable = lookup_and_evaluate(environment, key, raise_on_not_found: raise_on_not_found)
- if !found_variable.nil? || @strict_variables && raise_on_not_found
- return found_variable
- end
+ return found_variable if !found_variable.nil? || (@strict_variables && raise_on_not_found)
end
nil
end
diff --git a/lib/liquid/cursor.rb b/lib/liquid/cursor.rb
new file mode 100644
index 000000000..c957e381b
--- /dev/null
+++ b/lib/liquid/cursor.rb
@@ -0,0 +1,317 @@
+# frozen_string_literal: true
+
+require "strscan"
+
+module Liquid
+ # Single-pass forward-only scanner for Liquid parsing.
+ # Wraps StringScanner with higher-level methods for common Liquid constructs.
+ # One Cursor per template parse — threaded through all parsing code.
+ class Cursor
+ # Byte constants
+ SPACE = 32
+ TAB = 9
+ NL = 10
+ CR = 13
+ FF = 12
+ DASH = 45 # '-'
+ DOT = 46 # '.'
+ COLON = 58 # ':'
+ PIPE = 124 # '|'
+ QUOTE_S = 39 # "'"
+ QUOTE_D = 34 # '"'
+ LBRACK = 91 # '['
+ RBRACK = 93 # ']'
+ LPAREN = 40 # '('
+ RPAREN = 41 # ')'
+ QMARK = 63 # '?'
+ HASH = 35 # '#'
+ USCORE = 95 # '_'
+ COMMA = 44
+ ZERO = 48
+ NINE = 57
+ PCT = 37 # '%'
+ LCURLY = 123 # '{'
+ RCURLY = 125 # '}'
+
+ attr_reader :ss
+
+ def initialize(source)
+ @source = source
+ @ss = StringScanner.new(source)
+ end
+
+ # ── Position ────────────────────────────────────────────────────
+ def pos = @ss.pos
+
+ def pos=(n)
+ @ss.pos = n
+ end
+
+ def eos? = @ss.eos?
+ def peek_byte = @ss.peek_byte
+ def scan_byte = @ss.scan_byte
+
+ # Reset scanner to a new string (for reuse on sub-markup)
+ def reset(source)
+ @source = source
+ @ss.string = source
+ end
+
+ # Extract a slice from the source (deferred allocation)
+ def slice(start, len)
+ @source.byteslice(start, len)
+ end
+
+ # ── Whitespace ──────────────────────────────────────────────────
+ # Skip spaces/tabs/newlines/cr
+ def skip_ws
+ while (b = @ss.peek_byte)
+ case b
+ when SPACE, TAB, CR, FF, NL then @ss.scan_byte
+ else break
+ end
+ end
+ end
+
+ # Check if remaining bytes are all whitespace (or EOS).
+ # exist?(/\S/) returns nil when no non-whitespace remains, without advancing position.
+ def rest_blank?
+ !@ss.exist?(/\S/)
+ end
+
+ # Regex for identifier: [a-zA-Z_][\w-]*\??
+ ID_REGEX = /[a-zA-Z_][\w-]*\??/
+
+ # ── Identifiers ─────────────────────────────────────────────────
+ # Skip an identifier without allocating a string. Returns length skipped, or 0.
+ def skip_id
+ @ss.skip(ID_REGEX) || 0
+ end
+
+ # Check if next id matches expected string, consume if so. No allocation.
+ def expect_id(expected)
+ start = @ss.pos
+ len = @ss.skip(ID_REGEX)
+ if len == expected.bytesize
+ # Compare bytes directly without allocating a string
+ i = 0
+ while i < len
+ unless @source.getbyte(start + i) == expected.getbyte(i)
+ @ss.pos = start
+ return false
+ end
+ i += 1
+ end
+ return true
+ end
+ @ss.pos = start if len
+ false
+ end
+
+ # Scan a single identifier: [a-zA-Z_][\w-]*\??
+ # Returns the string or nil if not at an identifier
+ def scan_id
+ @ss.scan(ID_REGEX)
+ end
+
+ # Scan a tag name: '#' or \w+
+ def scan_tag_name
+ if @ss.peek_byte == HASH
+ @ss.scan_byte
+ "#"
+ else
+ scan_id
+ end
+ end
+
+ # Regex for numbers: -?\d+(\.\d+)?
+ FLOAT_REGEX = /-?\d+\.\d+/
+ INT_REGEX = /-?\d+/
+
+ # ── Numbers ─────────────────────────────────────────────────────
+ # Try to scan an integer or float. Returns the number or nil.
+ def scan_number
+ if (s = @ss.scan(FLOAT_REGEX))
+ s.to_f
+ elsif (s = @ss.scan(INT_REGEX))
+ s.to_i
+ end
+ end
+
+ # Regex for quoted string content (without quotes)
+ SINGLE_QUOTED_CONTENT = /'([^']*)'/
+ DOUBLE_QUOTED_CONTENT = /"([^"]*)"/
+
+ # ── Strings ─────────────────────────────────────────────────────
+ # Scan a quoted string ('...' or "..."). Returns the content without quotes, or nil.
+ def scan_quoted_string
+ if @ss.scan(SINGLE_QUOTED_CONTENT) || @ss.scan(DOUBLE_QUOTED_CONTENT)
+ @ss[1]
+ end
+ end
+
+ # Regex for quoted strings (single or double quoted, including quotes)
+ QUOTED_STRING_RAW = /"[^"]*"|'[^']*'/
+
+ # Scan a quoted string including quotes. Returns the full "..." or '...' string, or nil.
+ def scan_quoted_string_raw
+ @ss.scan(QUOTED_STRING_RAW)
+ end
+
+ # Regex for dotted identifier: name(.name)*
+ DOTTED_ID_REGEX = /[a-zA-Z_][\w-]*\??(?:\.[a-zA-Z_][\w-]*\??)*/
+
+ # ── Expressions ─────────────────────────────────────────────────
+ # Scan a simple variable lookup: name(.name)* — no brackets, no filters
+ # Returns the string or nil
+ def scan_dotted_id
+ @ss.scan(DOTTED_ID_REGEX)
+ end
+
+ # Skip a fragment without allocating. Returns length skipped, or 0.
+ def skip_fragment
+ @ss.skip(QUOTED_STRING_RAW) || @ss.skip(UNQUOTED_FRAGMENT) || 0
+ end
+
+ # Regex for unquoted fragment: non-whitespace/comma/pipe sequence
+ UNQUOTED_FRAGMENT = /[^\s,|]+/
+
+ # Scan a "QuotedFragment" — a quoted string or non-whitespace/comma/pipe run
+ def scan_fragment
+ @ss.scan(QUOTED_STRING_RAW) || @ss.scan(UNQUOTED_FRAGMENT)
+ end
+
+ # ── Comparison operators ────────────────────────────────────────
+ # Identity map used for frozen string interning: StringScanner#scan returns a
+ # new unfrozen String on every call. Indexing into this hash returns the frozen
+ # literal stored here, avoiding a separate allocation and enabling faster
+ # equality checks downstream (frozen strings can be compared by identity).
+ COMPARISON_OPS = {
+ '==' => '==',
+ '!=' => '!=',
+ '<>' => '<>',
+ '<=' => '<=',
+ '>=' => '>=',
+ '<' => '<',
+ '>' => '>',
+ 'contains' => 'contains',
+ }.freeze
+
+ # Scan a comparison operator. Returns frozen string or nil.
+ # Regex for comparison operators
+ COMPARISON_OP_REGEX = /==|!=|<>|<=|>=|<|>|contains(?!\w)/
+
+ def scan_comparison_op
+ if (op = @ss.scan(COMPARISON_OP_REGEX))
+ COMPARISON_OPS[op]
+ end
+ end
+
+ # ── Tag parsing helpers ─────────────────────────────────────────
+ # Results from last parse_tag_token call (avoids array allocation)
+ attr_reader :tag_markup, :tag_newlines
+
+ # Parse the interior of a tag token: "{%[-] tag_name markup [-]%}"
+ # Pure byte operations — avoids StringScanner reset overhead.
+ # Returns tag_name string or nil. Sets tag_markup and tag_newlines.
+ def parse_tag_token(token)
+ len = token.bytesize
+ pos = 2 # skip "{%"
+ pos += 1 if token.getbyte(pos) == DASH # skip '-'
+ nl = 0
+
+ # Skip whitespace, count newlines
+ while pos < len
+ b = token.getbyte(pos)
+ case b
+ when SPACE, TAB, CR, FF then pos += 1
+ when NL then pos += 1
+ nl += 1
+ else break
+ end
+ end
+
+ # Scan tag name: '#' or [a-zA-Z_][\w-]*
+ name_start = pos
+ b = token.getbyte(pos)
+ if b == HASH
+ pos += 1
+ elsif b && ByteTables::IDENT_START[b]
+ pos += 1
+ while pos < len
+ b = token.getbyte(pos)
+ break unless ByteTables::IDENT_CONT[b]
+
+ pos += 1
+ end
+ pos += 1 if pos < len && token.getbyte(pos) == QMARK
+ else
+ return
+ end
+ tag_name = token.byteslice(name_start, pos - name_start)
+
+ # Skip whitespace after tag name, count newlines
+ while pos < len
+ b = token.getbyte(pos)
+ case b
+ when SPACE, TAB, CR, FF then pos += 1
+ when NL then pos += 1
+ nl += 1
+ else break
+ end
+ end
+
+ # markup is everything up to optional '-' before '%}'
+ markup_end = len - 2
+ markup_end -= 1 if markup_end > pos && token.getbyte(markup_end - 1) == DASH
+ @tag_markup = pos >= markup_end ? "" : token.byteslice(pos, markup_end - pos)
+ @tag_newlines = nl
+
+ tag_name
+ end
+
+ # Parse variable token interior: extract markup from "{{[-] ... [-]}}"
+ def parse_variable_token(token)
+ len = token.bytesize
+ return if len < 4
+
+ i = 2
+ i = 3 if token.getbyte(i) == DASH
+ parse_end = len - 3
+ parse_end -= 1 if token.getbyte(parse_end) == DASH
+ markup_len = parse_end - i + 1
+ markup_len <= 0 ? "" : token.byteslice(i, markup_len)
+ end
+
+ # ── Simple condition parser ─────────────────────────────────────
+ # Results from last parse_simple_condition call
+ attr_reader :cond_left, :cond_op, :cond_right
+
+ # Parse "expr [op expr]" from current position to end.
+ # Returns true on success, nil on failure. Sets cond_left, cond_op, cond_right.
+ def parse_simple_condition
+ skip_ws
+ @cond_left = scan_fragment
+ return unless @cond_left
+
+ skip_ws
+ if eos?
+ @cond_op = nil
+ @cond_right = nil
+ return true
+ end
+
+ @cond_op = scan_comparison_op
+ return unless @cond_op
+
+ skip_ws
+ @cond_right = scan_fragment
+ return unless @cond_right
+
+ skip_ws
+ return unless eos? # trailing junk
+
+ true
+ end
+ end
+end
diff --git a/lib/liquid/expression.rb b/lib/liquid/expression.rb
index 00c40a4c3..466c021e6 100644
--- a/lib/liquid/expression.rb
+++ b/lib/liquid/expression.rb
@@ -16,16 +16,9 @@ class Expression
'-' => VariableLookup.parse("-", nil).freeze,
}.freeze
- DOT = ".".ord
- ZERO = "0".ord
- NINE = "9".ord
- DASH = "-".ord
-
# Use an atomic group (?>...) to avoid pathological backtracing from
# malicious input as described in https://github.com/Shopify/liquid/issues/1357
RANGES_REGEX = /\A\(\s*(?>(\S+)\s*\.\.)\s*(\S+)\s*\)\z/
- INTEGER_REGEX = /\A(-?\d+)\z/
- FLOAT_REGEX = /\A(-?\d+)\.\d+\z/
class << self
def safe_parse(parser, ss = StringScanner.new(""), cache = nil)
@@ -35,11 +28,17 @@ def safe_parse(parser, ss = StringScanner.new(""), cache = nil)
def parse(markup, ss = StringScanner.new(""), cache = nil)
return unless markup
- markup = markup.strip # markup can be a frozen string
+ # Only strip if there's leading/trailing whitespace (avoids allocation)
+ first_byte = markup.getbyte(0)
+ if first_byte && ByteTables::WHITESPACE[first_byte]
+ markup = markup.strip
+ elsif first_byte
+ markup = markup.strip if ByteTables::WHITESPACE[markup.getbyte(markup.bytesize - 1)]
+ end
if (markup.start_with?('"') && markup.end_with?('"')) ||
(markup.start_with?("'") && markup.end_with?("'"))
- return markup[1..-2]
+ return markup.byteslice(1, markup.bytesize - 2)
elsif LITERALS.key?(markup)
return LITERALS[markup]
end
@@ -71,57 +70,85 @@ def inner_parse(markup, ss, cache)
end
end
- def parse_number(markup, ss)
- # check if the markup is simple integer or float
- case markup
- when INTEGER_REGEX
- return Integer(markup, 10)
- when FLOAT_REGEX
- return markup.to_f
- end
-
- ss.string = markup
- # the first byte must be a digit or a dash
- byte = ss.scan_byte
+ def parse_number(markup, _ss = nil)
+ len = markup.bytesize
+ return if len == 0
- return false if byte != DASH && (byte < ZERO || byte > NINE)
+ # Quick reject: first byte must be digit or dash
+ pos = 0
+ first = markup.getbyte(pos)
+ if first == Cursor::DASH
+ pos += 1
+ return if pos >= len
- if byte == DASH
- peek_byte = ss.peek_byte
+ b = markup.getbyte(pos)
+ return unless ByteTables::DIGIT[b]
- # if it starts with a dash, the next byte must be a digit
- return false if peek_byte.nil? || !(peek_byte >= ZERO && peek_byte <= NINE)
+ pos += 1
+ elsif ByteTables::DIGIT[first]
+ pos += 1
+ else
+ return
end
- # The markup could be a float with multiple dots
- first_dot_pos = nil
- num_end_pos = nil
+ # Scan digits
+ while pos < len
+ b = markup.getbyte(pos)
+ break unless ByteTables::DIGIT[b]
- while (byte = ss.scan_byte)
- return false if byte != DOT && (byte < ZERO || byte > NINE)
+ pos += 1
+ end
- # we found our number and now we are just scanning the rest of the string
- next if num_end_pos
+ # If we consumed everything, it's a simple integer
+ if pos == len
+ return Integer(markup, 10)
+ end
+
+ # Check for dot (float)
+ if markup.getbyte(pos) == Cursor::DOT
+ dot_pos = pos
+ pos += 1
+ # Must have at least one digit after dot
+ digit_after_dot = pos
+ while pos < len
+ b = markup.getbyte(pos)
+ break unless ByteTables::DIGIT[b]
+
+ pos += 1
+ end
- if byte == DOT
- if first_dot_pos.nil?
- first_dot_pos = ss.pos
- else
- # we found another dot, so we know that the number ends here
- num_end_pos = ss.pos - 1
- end
+ if pos > digit_after_dot && pos == len
+ # Simple float like "123.456"
+ return markup.to_f
+ elsif pos > digit_after_dot
+ # Float followed by more content: "1.2.3.4" — scan to find where the
+ # numeric portion ends (stop at next dot or non-digit).
+ return scan_float_with_trailing(markup, pos, len)
+ else
+ # dot at end: "123."
+ return markup.byteslice(0, dot_pos).to_f
end
end
- num_end_pos = markup.length if ss.eos?
+ # Not a number (has non-digit, non-dot characters)
+ nil
+ end
- if num_end_pos
- # number ends with a number "123.123"
- markup.byteslice(0, num_end_pos).to_f
- else
- # number ends with a dot "123."
- markup.byteslice(0, first_dot_pos).to_f
+ private
+
+ # Scans forward from `pos` through digits, returning the float up to the
+ # next dot or the end of string. Returns nil when a non-digit, non-dot
+ # byte is found (not a valid number). Used by parse_number for inputs
+ # like "1.2.3.4" where the float literal ends at the second dot.
+ def scan_float_with_trailing(markup, pos, len)
+ while pos < len
+ b = markup.getbyte(pos)
+ return markup.byteslice(0, pos).to_f if b == Cursor::DOT
+ return unless ByteTables::DIGIT[b]
+
+ pos += 1
end
+ markup.byteslice(0, pos).to_f
end
end
end
diff --git a/lib/liquid/lexer.rb b/lib/liquid/lexer.rb
index f1740dbad..dfcdb5587 100644
--- a/lib/liquid/lexer.rb
+++ b/lib/liquid/lexer.rb
@@ -29,6 +29,7 @@ class Lexer
RUBY_WHITESPACE = [" ", "\t", "\r", "\n", "\f"].freeze
SINGLE_STRING_LITERAL = /'[^\']*'/
WHITESPACE_OR_NOTHING = /\s*/
+ WHITESPACE = /\s+/
SINGLE_COMPARISON_TOKENS = [].tap do |table|
table["<".ord] = COMPARISON_LESS_THAN
@@ -104,7 +105,7 @@ def tokenize(ss)
output = []
until ss.eos?
- ss.skip(WHITESPACE_OR_NOTHING)
+ ss.skip(WHITESPACE)
break if ss.eos?
@@ -114,10 +115,10 @@ def tokenize(ss)
if (special = SPECIAL_TABLE[peeked])
ss.scan_byte
# Special case for ".."
- if special == DOT && ss.peek_byte == DOT_ORD
+ if special.equal?(DOT) && ss.peek_byte == DOT_ORD
ss.scan_byte
output << DOTDOT
- elsif special == DASH
+ elsif special.equal?(DASH)
# Special case for negative numbers
if (peeked_byte = ss.peek_byte) && NUMBER_TABLE[peeked_byte]
ss.pos -= 1
diff --git a/lib/liquid/parse_context.rb b/lib/liquid/parse_context.rb
index 855acc64e..d736319ec 100644
--- a/lib/liquid/parse_context.rb
+++ b/lib/liquid/parse_context.rb
@@ -3,7 +3,7 @@
module Liquid
class ParseContext
attr_accessor :locale, :line_number, :trim_whitespace, :depth
- attr_reader :partial, :warnings, :error_mode, :environment
+ attr_reader :partial, :warnings, :error_mode, :environment, :expression_cache, :string_scanner, :cursor
def initialize(options = Const::EMPTY_HASH)
@environment = options.fetch(:environment, Environment.default)
@@ -24,6 +24,8 @@ def initialize(options = Const::EMPTY_HASH)
{}
end
+ @cursor = Cursor.new("")
+
self.depth = 0
self.partial = false
end
diff --git a/lib/liquid/parser.rb b/lib/liquid/parser.rb
index 645dfa3a1..0d0d0d019 100644
--- a/lib/liquid/parser.rb
+++ b/lib/liquid/parser.rb
@@ -83,6 +83,9 @@ def argument
end
def variable_lookups
+ # Fast path: no lookups at all (most common case for simple identifiers)
+ return "" unless look(:dot) || look(:open_square)
+
str = +""
loop do
if look(:open_square)
diff --git a/lib/liquid/registers.rb b/lib/liquid/registers.rb
index 0b65d862c..88562c88c 100644
--- a/lib/liquid/registers.rb
+++ b/lib/liquid/registers.rb
@@ -6,15 +6,15 @@ class Registers
def initialize(registers = {})
@static = registers.is_a?(Registers) ? registers.static : registers
- @changes = {}
+ @changes = nil
end
def []=(key, value)
- @changes[key] = value
+ (@changes ||= {})[key] = value
end
def [](key)
- if @changes.key?(key)
+ if @changes&.key?(key)
@changes[key]
else
@static[key]
@@ -22,13 +22,13 @@ def [](key)
end
def delete(key)
- @changes.delete(key)
+ @changes&.delete(key)
end
UNDEFINED = Object.new
def fetch(key, default = UNDEFINED, &block)
- if @changes.key?(key)
+ if @changes&.key?(key)
@changes.fetch(key)
elsif default != UNDEFINED
if block_given?
@@ -42,7 +42,7 @@ def fetch(key, default = UNDEFINED, &block)
end
def key?(key)
- @changes.key?(key) || @static.key?(key)
+ @changes&.key?(key) || @static.key?(key)
end
end
diff --git a/lib/liquid/resource_limits.rb b/lib/liquid/resource_limits.rb
index ee0c66cbb..3b4fc93a4 100644
--- a/lib/liquid/resource_limits.rb
+++ b/lib/liquid/resource_limits.rb
@@ -9,6 +9,7 @@ class ResourceLimits
:cumulative_assign_score_limit
attr_reader :render_score,
:assign_score,
+ :last_capture_length,
:cumulative_render_score,
:cumulative_assign_score
diff --git a/lib/liquid/standardfilters.rb b/lib/liquid/standardfilters.rb
index ed6141566..d22c6b024 100644
--- a/lib/liquid/standardfilters.rb
+++ b/lib/liquid/standardfilters.rb
@@ -275,18 +275,71 @@ def truncatewords(input, words = 15, truncate_string = "...")
words = Utils.to_integer(words)
words = 1 if words <= 0
- wordlist = begin
- input.split(" ", words + 1)
- rescue RangeError
- # integer too big for String#split, but we can semantically assume no truncation is needed
- return input if words + 1 > MAX_I32
- raise # unexpected error
+ return input if words + 1 > MAX_I32
+
+ # Scan words tracking byte positions; build the normalized (single-space)
+ # result string only when truncation is actually needed.
+ len = input.bytesize
+ pos = 0
+ word_count = 0
+ # Flat array of [start, end, start, end, ...] for up to `words` words.
+ # Avoids allocating a result string in the common no-truncation case.
+ positions = []
+
+ # Skip leading whitespace
+ while pos < len
+ break unless ByteTables::WHITESPACE[input.getbyte(pos)]
+ pos += 1
+ end
+
+ while pos < len
+ word_start = pos
+ word_count += 1
+
+ # Scan to end of word
+ while pos < len
+ break if ByteTables::WHITESPACE[input.getbyte(pos)]
+ pos += 1
+ end
+
+ if word_count <= words
+ positions.push(word_start, pos) # [start, end, start, end, ...]
+ else
+ # Truncation confirmed — build normalized result from stored positions
+ result = +input.byteslice(positions[0], positions[1] - positions[0])
+ i = 2
+ while i < positions.length
+ result << " " << input.byteslice(positions[i], positions[i + 1] - positions[i])
+ i += 2
+ end
+ return result << Utils.to_s(truncate_string)
+ end
+
+ # Skip whitespace between words
+ while pos < len
+ break unless ByteTables::WHITESPACE[input.getbyte(pos)]
+ pos += 1
+ end
+ end
+
+ # Fewer words than requested — no truncation needed, return original unchanged.
+ return input if word_count < words
+
+ # Exactly `words` words. Ruby's split(" ", words+1) would produce a words+1-th
+ # empty element when input has trailing whitespace, triggering the truncation path.
+ # Match that behaviour: if the input ends with whitespace, normalize and append
+ # truncate_string even though no word was cut.
+ if len > 0 && ByteTables::WHITESPACE[input.getbyte(len - 1)]
+ result = +input.byteslice(positions[0], positions[1] - positions[0])
+ i = 2
+ while i < positions.length
+ result << " " << input.byteslice(positions[i], positions[i + 1] - positions[i])
+ i += 2
+ end
+ return result << Utils.to_s(truncate_string)
end
- return input if wordlist.length <= words
- wordlist.pop
- truncate_string = Utils.to_s(truncate_string)
- wordlist.join(" ").concat(truncate_string)
+ input
end
# @liquid_public_docs
diff --git a/lib/liquid/strainer_template.rb b/lib/liquid/strainer_template.rb
index ca0626dda..67160d4cf 100644
--- a/lib/liquid/strainer_template.rb
+++ b/lib/liquid/strainer_template.rb
@@ -58,5 +58,31 @@ def invoke(method, *args)
rescue ::ArgumentError => e
raise Liquid::ArgumentError, e.message, e.backtrace
end
+
+ # Arity-specialized filter invocation.
+ # Avoids *args splat allocation for the common 0-arg and 1-arg cases.
+ # `invoke` (general case) still uses *args for 2+ extra arguments.
+ {
+ invoke_single: ['input'],
+ invoke_two: ['input', 'arg1'],
+ }.each do |method_name, params|
+ all_params = (["method"] + params).join(", ")
+ send_params = params.join(", ")
+ # __LINE__ + 1 is a parse-time constant; both generated methods will report
+ # the same file:line in backtraces. The method name in the trace distinguishes them.
+ module_eval(<<~RUBY, __FILE__, __LINE__ + 1)
+ def #{method_name}(#{all_params})
+ if self.class.invokable?(method)
+ send(method, #{send_params})
+ elsif @context.strict_filters
+ raise Liquid::UndefinedFilter, "undefined filter \#{method}"
+ else
+ input
+ end
+ rescue ::ArgumentError => e
+ raise Liquid::ArgumentError, e.message, e.backtrace
+ end
+ RUBY
+ end
end
end
diff --git a/lib/liquid/tags/for.rb b/lib/liquid/tags/for.rb
index cbea85bcb..2ed7d186d 100644
--- a/lib/liquid/tags/for.rb
+++ b/lib/liquid/tags/for.rb
@@ -25,8 +25,6 @@ module Liquid
# @liquid_optional_param range [untyped] A custom numeric range to iterate over.
# @liquid_optional_param reversed [untyped] Iterate in reverse order.
class For < Block
- Syntax = /\A(#{VariableSegment}+)\s+in\s+(#{QuotedFragment}+)\s*(reversed)?/o
-
attr_reader :collection_name, :variable_name, :limit, :from
def initialize(tag_name, markup, options)
@@ -72,18 +70,52 @@ def render_to_output_buffer(context, output)
protected
+ # Fast byte-level parser for "var in collection [reversed] [limit:N] [offset:N]"
def lax_parse(markup)
- if markup =~ Syntax
- @variable_name = Regexp.last_match(1)
- collection_name = Regexp.last_match(2)
- @reversed = !!Regexp.last_match(3)
- @name = "#{@variable_name}-#{collection_name}"
- @collection_name = parse_expression(collection_name)
- markup.scan(TagAttributes) do |key, value|
- set_attribute(key, value)
+ c = @parse_context.cursor
+ c.reset(markup)
+ c.skip_ws
+
+ # Parse variable name
+ var_start = c.pos
+ var_len = c.skip_id
+ raise SyntaxError, options[:locale].t("errors.syntax.for") if var_len == 0
+ @variable_name = c.slice(var_start, var_len)
+
+ # Expect "in"
+ c.skip_ws
+ raise SyntaxError, options[:locale].t("errors.syntax.for") unless c.expect_id("in")
+ c.skip_ws
+
+ # Parse collection name
+ col_start = c.pos
+ if c.peek_byte == Cursor::LPAREN
+ # Parenthesized range: (1..10)
+ depth = 1
+ c.scan_byte
+ while !c.eos? && depth > 0
+ b = c.scan_byte
+ depth += 1 if b == Cursor::LPAREN
+ depth -= 1 if b == Cursor::RPAREN
end
else
- raise SyntaxError, options[:locale].t("errors.syntax.for")
+ c.skip_fragment
+ end
+ collection_name = c.slice(col_start, c.pos - col_start)
+
+ @name = "#{@variable_name}-#{collection_name}"
+ @collection_name = parse_expression(collection_name)
+
+ c.skip_ws
+ @reversed = c.expect_id("reversed")
+ c.skip_ws
+
+ # Parse limit:/offset: if present.
+ # Cursor doesn't handle key:value attributes — delegate to regex for limit:/offset:.
+ if !c.eos? && (rest = c.slice(c.pos, markup.bytesize - c.pos)).include?(':')
+ rest.scan(TagAttributes) do |key, value|
+ set_attribute(key, value)
+ end
end
end
@@ -111,9 +143,7 @@ def strict_parse(markup)
private
- def strict2_parse(markup)
- strict_parse(markup)
- end
+ alias_method :strict2_parse, :strict_parse
def collection_segment(context)
offsets = context.registers[:for] ||= {}
@@ -122,22 +152,14 @@ def collection_segment(context)
offsets[@name].to_i
else
from_value = context.evaluate(@from)
- if from_value.nil?
- 0
- else
- Utils.to_integer(from_value)
- end
+ from_value.nil? ? 0 : Utils.to_integer(from_value)
end
collection = context.evaluate(@collection_name)
collection = collection.to_a if collection.is_a?(Range)
limit_value = context.evaluate(@limit)
- to = if limit_value.nil?
- nil
- else
- Utils.to_integer(limit_value) + from
- end
+ to = limit_value && (Utils.to_integer(limit_value) + from)
segment = Utils.slice_collection(collection, from, to)
segment.reverse! if @reversed
@@ -192,11 +214,7 @@ def set_attribute(key, expr, safe: false)
end
def render_else(context, output)
- if @else_block
- @else_block.render_to_output_buffer(context, output)
- else
- output
- end
+ @else_block ? @else_block.render_to_output_buffer(context, output) : output
end
class ParseTreeVisitor < Liquid::ParseTreeVisitor
diff --git a/lib/liquid/tags/if.rb b/lib/liquid/tags/if.rb
index c423c1e84..cc77161ec 100644
--- a/lib/liquid/tags/if.rb
+++ b/lib/liquid/tags/if.rb
@@ -51,14 +51,17 @@ def unknown_tag(tag, markup, tokens)
end
def render_to_output_buffer(context, output)
- @blocks.each do |block|
- result = Liquid::Utils.to_liquid_value(
- block.evaluate(context),
- )
+ idx = 0
+ blocks = @blocks
+ while idx < blocks.length
+ block = blocks[idx]
+ result = block.evaluate(context)
+ result = result.to_liquid_value if result.respond_to?(:to_liquid_value)
if result
return block.attachment.render_to_output_buffer(context, output)
end
+ idx += 1
end
output
@@ -86,6 +89,27 @@ def parse_expression(markup, safe: false)
end
def lax_parse(markup)
+ # Fastest path: simple identifier truthiness like "product.available" or "forloop.first"
+ if (simple = Variable.simple_variable_markup(markup))
+ return Condition.new(parse_expression(simple))
+ end
+
+ # Fast path: simple condition without and/or — use Cursor.
+ # The include? pre-checks are both a correctness guard (parse_simple_condition
+ # only handles a single comparison) and a perf gate (avoids cursor allocation
+ # for the compound-condition case that will always fall through to lax_parse).
+ if !markup.include?(' and ') && !markup.include?(' or ')
+ cursor = @parse_context.cursor
+ cursor.reset(markup)
+ if cursor.parse_simple_condition
+ return Condition.new(
+ parse_expression(cursor.cond_left),
+ cursor.cond_op,
+ cursor.cond_right ? parse_expression(cursor.cond_right) : nil,
+ )
+ end
+ end
+
expressions = markup.scan(ExpressionsAndOperators)
raise SyntaxError, options[:locale].t("errors.syntax.if") unless expressions.pop =~ Syntax
diff --git a/lib/liquid/tokenizer.rb b/lib/liquid/tokenizer.rb
index 8b331d93c..ba3e0da01 100644
--- a/lib/liquid/tokenizer.rb
+++ b/lib/liquid/tokenizer.rb
@@ -1,37 +1,23 @@
# frozen_string_literal: true
-require "strscan"
-
module Liquid
class Tokenizer
attr_reader :line_number, :for_liquid_tag
- TAG_END = /%\}/
- TAG_OR_VARIABLE_START = /\{[\{\%]/
- NEWLINE = /\n/
-
- OPEN_CURLEY = "{".ord
- CLOSE_CURLEY = "}".ord
- PERCENTAGE = "%".ord
-
def initialize(
source:,
- string_scanner:,
+ string_scanner: nil,
line_numbers: false,
line_number: nil,
for_liquid_tag: false
)
@line_number = line_number || (line_numbers ? 1 : nil)
@for_liquid_tag = for_liquid_tag
- @source = source.to_s.to_str
+ @source = source.to_s
@offset = 0
@tokens = []
- if @source
- @ss = string_scanner
- @ss.string = @source
- tokenize
- end
+ tokenize
end
def shift
@@ -54,108 +40,113 @@ def tokenize
if @for_liquid_tag
@tokens = @source.split("\n")
else
- @tokens << shift_normal until @ss.eos?
+ tokenize_fast
end
@source = nil
- @ss = nil
- end
-
- def shift_normal
- token = next_token
-
- return unless token
-
- token
- end
-
- def next_token
- # possible states: :text, :tag, :variable
- byte_a = @ss.peek_byte
-
- if byte_a == OPEN_CURLEY
- @ss.scan_byte
-
- byte_b = @ss.peek_byte
-
- if byte_b == PERCENTAGE
- @ss.scan_byte
- return next_tag_token
- elsif byte_b == OPEN_CURLEY
- @ss.scan_byte
- return next_variable_token
- end
-
- @ss.pos -= 1
- end
-
- next_text_token
end
- def next_text_token
- start = @ss.pos
-
- unless @ss.skip_until(TAG_OR_VARIABLE_START)
- token = @ss.rest
- @ss.terminate
- return token
+ # Fast tokenizer using String#byteindex instead of StringScanner regex.
+ # String#byteindex is ~40% faster for finding { delimiters.
+ def tokenize_fast
+ src = @source
+ unless src.valid_encoding?
+ raise SyntaxError, "Invalid byte sequence in #{src.encoding}"
end
- pos = @ss.pos -= 2
- @source.byteslice(start, pos - start)
- rescue ::ArgumentError => e
- if e.message == "invalid byte sequence in #{@ss.string.encoding}"
- raise SyntaxError, "Invalid byte sequence in #{@ss.string.encoding}"
- else
- raise
- end
- end
+ len = src.bytesize
+ pos = 0
- def next_variable_token
- start = @ss.pos - 2
+ while pos < len
+ # Find next { which could start a tag or variable
+ idx = src.byteindex('{', pos)
- byte_a = byte_b = @ss.scan_byte
-
- while byte_b
- byte_a = @ss.scan_byte while byte_a && byte_a != CLOSE_CURLEY && byte_a != OPEN_CURLEY
-
- break unless byte_a
-
- if @ss.eos?
- return byte_a == CLOSE_CURLEY ? @source.byteslice(start, @ss.pos - start) : "{{"
+ unless idx
+ # No more tags/variables — rest is text
+ @tokens << src.byteslice(pos, len - pos) if pos < len
+ break
end
- byte_b = @ss.scan_byte
-
- if byte_a == CLOSE_CURLEY
- if byte_b == CLOSE_CURLEY
- return @source.byteslice(start, @ss.pos - start)
- elsif byte_b != CLOSE_CURLEY
- @ss.pos -= 1
- return @source.byteslice(start, @ss.pos - start)
+ next_byte = idx + 1 < len ? src.getbyte(idx + 1) : nil
+
+ if next_byte == Cursor::PCT # {%
+ # Emit text before tag
+ @tokens << src.byteslice(pos, idx - pos) if idx > pos
+
+ # Find %} to close the tag
+ close = src.byteindex('%}', idx + 2)
+ if close
+ @tokens << src.byteslice(idx, close + 2 - idx)
+ pos = close + 2
+ else
+ # Emit malformed token to propagate a missing-terminator error in the parser
+ @tokens << "{%"
+ pos = idx + 2
+ end
+ elsif next_byte == Cursor::LCURLY # {{
+ # Emit text before variable, then scan for the closing }}.
+ @tokens << src.byteslice(pos, idx - pos) if idx > pos
+ pos = scan_variable_token(src, idx, len)
+ else
+ # Lone '{' — not the start of a tag or variable.
+ # Find the next '{{' or '{%' to know where this text token ends.
+ # Using two byteindex calls avoids a nested loop and is always O(n).
+ tag_start = src.byteindex('{%', idx + 1)
+ var_start = src.byteindex('{{', idx + 1)
+ next_token = [tag_start, var_start].compact.min
+ if next_token
+ @tokens << src.byteslice(pos, next_token - pos)
+ pos = next_token
+ else
+ @tokens << src.byteslice(pos, len - pos)
+ pos = len
end
- elsif byte_a == OPEN_CURLEY && byte_b == PERCENTAGE
- return next_tag_token_with_start(start)
end
-
- byte_a = byte_b
end
-
- "{{"
end
- def next_tag_token
- start = @ss.pos - 2
- if (len = @ss.skip_until(TAG_END))
- @source.byteslice(start, len + 2)
- else
- "{%"
+ # Scans a {{ ... }} variable token starting at `idx` in `src`.
+ # Emits the token to @tokens and returns the new position after the token.
+ # Handles }}, single }, and embedded {% ... %} (nested tag inside variable).
+ private def scan_variable_token(src, idx, len)
+ # Byte-by-byte scan: find } or {, then inspect the next byte.
+ scan_pos = idx + 2
+ while scan_pos < len
+ b = src.getbyte(scan_pos)
+ if b == Cursor::RCURLY # }
+ if scan_pos + 1 >= len
+ # } at end of string — emit token up to here
+ @tokens << src.byteslice(idx, scan_pos + 1 - idx)
+ return scan_pos + 1
+ end
+ b2 = src.getbyte(scan_pos + 1)
+ if b2 == Cursor::RCURLY
+ # Found }} — close variable
+ @tokens << src.byteslice(idx, scan_pos + 2 - idx)
+ return scan_pos + 2
+ else
+ # } followed by non-} — emit token up to here (matches original: @ss.pos -= 1)
+ @tokens << src.byteslice(idx, scan_pos + 1 - idx)
+ return scan_pos + 1
+ end
+ elsif b == Cursor::LCURLY && scan_pos + 1 < len && src.getbyte(scan_pos + 1) == Cursor::PCT
+ # Found {% inside {{ — scan to %} and emit as one token
+ close = src.byteindex('%}', scan_pos + 2)
+ if close
+ @tokens << src.byteslice(idx, close + 2 - idx)
+ return close + 2
+ else
+ @tokens << src.byteslice(idx, len - idx)
+ return len
+ end
+ else
+ scan_pos += 1
+ end
end
- end
- def next_tag_token_with_start(start)
- @ss.skip_until(TAG_END)
- @source.byteslice(start, @ss.pos - start)
+ # Reached end without finding }} — malformed
+ @tokens << "{{"
+ idx + 2
end
end
end
diff --git a/lib/liquid/utils.rb b/lib/liquid/utils.rb
index 084739a21..41b9f621a 100644
--- a/lib/liquid/utils.rb
+++ b/lib/liquid/utils.rb
@@ -8,6 +8,9 @@ module Utils
def self.slice_collection(collection, from, to)
if (from != 0 || !to.nil?) && collection.respond_to?(:load_slice)
collection.load_slice(from, to)
+ elsif from == 0 && to.nil? && collection.is_a?(Array)
+ # Fast path: no offset/limit on an Array — return as-is (avoid copy)
+ collection
else
slice_collection_using_each(collection, from, to)
end
@@ -15,23 +18,17 @@ def self.slice_collection(collection, from, to)
def self.slice_collection_using_each(collection, from, to)
segments = []
- index = 0
- # Maintains Ruby 1.8.7 String#each behaviour on 1.9
+ # String is Enumerable but #each is not defined; handle it as a single-element collection
if collection.is_a?(String)
return collection.empty? ? [] : [collection]
end
return [] unless collection.respond_to?(:each)
+ index = 0
collection.each do |item|
- if to && to <= index
- break
- end
-
- if from <= index
- segments << item
- end
-
+ break if to && to <= index
+ segments << item if from <= index
index += 1
end
@@ -93,8 +90,14 @@ def self.to_liquid_value(obj)
obj
end
- def self.to_s(obj, seen = {})
+ # Cached string representations for common small integers (0-999)
+ # Avoids repeated Integer#to_s allocations during rendering
+ SMALL_INT_STRINGS = Array.new(1000) { |i| i.to_s.freeze }.freeze
+
+ def self.to_s(obj, seen = nil)
case obj
+ when Integer
+ obj >= 0 && obj < 1000 ? SMALL_INT_STRINGS[obj] : obj.to_s
when BigDecimal
obj.to_s("F")
when Hash
@@ -102,30 +105,30 @@ def self.to_s(obj, seen = {})
# custom implementation. Otherwise we use Liquid's default
# implementation.
if obj.class.instance_method(:to_s) == HASH_TO_S_METHOD
- hash_inspect(obj, seen)
+ hash_inspect(obj, seen || {})
else
obj.to_s
end
when Array
- array_inspect(obj, seen)
+ array_inspect(obj, seen || {})
else
obj.to_s
end
end
- def self.inspect(obj, seen = {})
+ def self.inspect(obj, seen = nil)
case obj
when Hash
# If the custom hash implementation overrides `#inspect`, use their
# custom implementation. Otherwise we use Liquid's default
# implementation.
if obj.class.instance_method(:inspect) == HASH_INSPECT_METHOD
- hash_inspect(obj, seen)
+ hash_inspect(obj, seen || {})
else
obj.inspect
end
when Array
- array_inspect(obj, seen)
+ array_inspect(obj, seen || {})
else
obj.inspect
end
diff --git a/lib/liquid/variable.rb b/lib/liquid/variable.rb
index 6b5fb412b..22615db15 100644
--- a/lib/liquid/variable.rb
+++ b/lib/liquid/variable.rb
@@ -12,6 +12,26 @@ module Liquid
# {{ user | link }}
#
class Variable
+ # Checks if markup is a simple "name.lookup.chain" with no filters/brackets/quotes.
+ # Returns the trimmed markup string, or nil if not simple.
+ def self.simple_variable_markup(markup)
+ return if markup.empty?
+ return unless markup.match?(SIMPLE_VARIABLE_RE)
+ # Avoid allocation when there's no surrounding whitespace (the common case)
+ first = markup.getbyte(0)
+ last = markup.getbyte(markup.bytesize - 1)
+ needs_strip = first == Cursor::SPACE || first == Cursor::TAB || first == Cursor::NL || first == Cursor::CR ||
+ last == Cursor::SPACE || last == Cursor::TAB || last == Cursor::NL || last == Cursor::CR
+ needs_strip ? markup.strip : markup
+ end
+
+ # Cache for [filtername, EMPTY_ARRAY] tuples — avoids repeated array creation
+ NO_ARG_FILTER_CACHE = Hash.new { |h, k| h[k] = [k, Const::EMPTY_ARRAY].freeze }
+
+ # Regex for a simple variable lookup with optional surrounding whitespace.
+ # Shares the identifier grammar with VariableLookup::SIMPLE_LOOKUP_RE.
+ SIMPLE_VARIABLE_RE = /\A\s*[\w-]+\??(?:\.[\w-]+\??)*\s*\z/
+
FilterMarkupRegex = /#{FilterSeparator}\s*(.*)/om
FilterParser = /(?:\s+|#{QuotedFragment}|#{ArgumentSeparator})+/o
FilterArgsRegex = /(?:#{FilterArgumentSeparator}|#{ArgumentSeparator})\s*((?:\w+\s*\:\s*)?#{QuotedFragment})/o
@@ -30,7 +50,278 @@ def initialize(markup, parse_context)
@parse_context = parse_context
@line_number = parse_context.line_number
- strict_parse_with_error_mode_fallback(markup)
+ # Fast path: try to parse without going through Lexer → Parser
+ # Skip for strict2/rigid modes which require different parsing
+ # Fast path only for lax/warn modes — strict modes need full error checking
+ error_mode = parse_context.error_mode
+ if error_mode == :strict2 || error_mode == :rigid || error_mode == :strict || !try_fast_parse(markup, parse_context)
+ strict_parse_with_error_mode_fallback(markup)
+ end
+ end
+
+ private def try_fast_parse(markup, parse_context)
+ pos = fast_scan_name(markup)
+ return false unless pos
+
+ # fast_resolve_name calls VariableLookup.parse_simple / Expression::LITERALS — the
+ # only sites that can raise SyntaxError on malformed input. The byte scanners return
+ # false instead of raising.
+ begin
+ fast_resolve_name(markup, parse_context)
+ rescue SyntaxError
+ return false
+ end
+
+ # End of markup — no filters
+ if pos >= markup.bytesize
+ @filters = Const::EMPTY_ARRAY
+ return true
+ end
+
+ # Must be followed by a pipe filter separator
+ return false unless markup.getbyte(pos) == Cursor::PIPE
+
+ fast_scan_filters(markup, pos, parse_context)
+ end
+
+ # Scan the variable name (quoted string or identifier chain) at the start of markup.
+ # Returns the position after the name + trailing whitespace, or false on failure.
+ # Sets @_fast_name_start and @_fast_name_end for fast_resolve_name.
+ private def fast_scan_name(markup)
+ len = markup.bytesize
+ return false if len == 0
+
+ # Skip leading whitespace
+ pos = 0
+ while pos < len
+ b = markup.getbyte(pos)
+ break unless b == Cursor::SPACE || b == Cursor::TAB || b == Cursor::NL || b == Cursor::CR
+ pos += 1
+ end
+ return false if pos >= len
+
+ b = markup.getbyte(pos)
+
+ if b == Cursor::QUOTE_S || b == Cursor::QUOTE_D
+ # Quoted string literal: scan to matching close quote
+ quote = b
+ @_fast_name_start = pos
+ pos += 1
+ pos += 1 while pos < len && markup.getbyte(pos) != quote
+ pos += 1 if pos < len # skip closing quote
+ @_fast_name_end = pos
+ elsif ByteTables::IDENT_START[b]
+ # Identifier chain: [a-zA-Z_][a-zA-Z0-9_-]*(.[a-zA-Z_][a-zA-Z0-9_-]*)*
+ @_fast_name_start = pos
+ pos += 1
+ while pos < len
+ b = markup.getbyte(pos)
+ if ByteTables::IDENT_CONT[b]
+ pos += 1
+ elsif b == Cursor::DOT
+ pos += 1
+ return false if pos >= len
+ b = markup.getbyte(pos)
+ return false unless ByteTables::IDENT_START[b]
+ pos += 1
+ else
+ break
+ end
+ end
+ @_fast_name_end = pos
+ else
+ return false
+ end
+
+ # Skip whitespace after name
+ while pos < len
+ b = markup.getbyte(pos)
+ break unless b == Cursor::SPACE || b == Cursor::TAB || b == Cursor::NL || b == Cursor::CR
+ pos += 1
+ end
+
+ pos
+ end
+
+ # Resolve the scanned name bytes to a Liquid expression object.
+ # Reads @_fast_name_start / @_fast_name_end set by fast_scan_name.
+ # Sets @name. May raise SyntaxError (rescued in try_fast_parse).
+ private def fast_resolve_name(markup, parse_context)
+ name_start = @_fast_name_start
+ name_end = @_fast_name_end
+ len = markup.bytesize
+
+ # Avoid byteslice when the name spans the whole markup (no surrounding whitespace/filters)
+ expr_markup = name_start == 0 && name_end == len ? markup : markup.byteslice(name_start, name_end - name_start)
+
+ cache = parse_context.expression_cache
+ ss = parse_context.string_scanner
+
+ first_byte = expr_markup.getbyte(0)
+ @name = if first_byte == Cursor::QUOTE_S || first_byte == Cursor::QUOTE_D
+ # String literal — strip enclosing quotes
+ expr_markup.byteslice(1, expr_markup.bytesize - 2)
+ elsif Expression::LITERALS.key?(expr_markup)
+ Expression::LITERALS[expr_markup]
+ elsif cache
+ cache[expr_markup] || (cache[expr_markup] = VariableLookup.parse_simple(expr_markup, ss, cache).freeze)
+ else
+ VariableLookup.parse_simple(expr_markup, ss || StringScanner.new(""), nil).freeze
+ end
+ end
+
+ # Scan the filter chain starting at `pos` (the first '|').
+ # Returns true on success (sets @filters), false to fall back to the Lexer.
+ # Rescues SyntaxError from Expression.parse inside fast_scan_filter_args.
+ private def fast_scan_filters(markup, pos, parse_context)
+ len = markup.bytesize
+ @filters = []
+ filter_pos = pos
+
+ while filter_pos < len && markup.getbyte(filter_pos) == Cursor::PIPE
+ filter_pos += 1
+ # Skip spaces after pipe (tabs/newlines handled in the between-filters skip below)
+ filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE
+
+ # Scan filter name: must start with [a-zA-Z_]
+ fname_start = filter_pos
+ b = filter_pos < len ? markup.getbyte(filter_pos) : nil
+ break unless b && ByteTables::IDENT_START[b]
+ filter_pos += 1
+ while filter_pos < len
+ b = markup.getbyte(filter_pos)
+ break unless ByteTables::IDENT_CONT[b]
+ filter_pos += 1
+ end
+ filtername = markup.byteslice(fname_start, filter_pos - fname_start)
+
+ # Skip whitespace after filter name
+ filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE
+
+ if filter_pos < len && markup.getbyte(filter_pos) == Cursor::COLON
+ # Has arguments — fast-scan positional args; fall to Lexer on keyword args
+ filter_pos += 1 # skip ':'
+ filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE
+
+ result = fast_scan_filter_args(markup, filter_pos, parse_context)
+ return fall_to_lexer_filters(markup, pos, fname_start, len, parse_context) if result == :fall_to_lexer
+
+ filter_args, filter_pos = result
+ @filters << [filtername, filter_args]
+ else
+ # No-arg filter — reuse the cached [name, EMPTY_ARRAY] tuple
+ @filters << NO_ARG_FILTER_CACHE[filtername]
+ end
+
+ # Skip whitespace (including tabs and newlines) between filters
+ filter_pos += 1 while filter_pos < len && (
+ markup.getbyte(filter_pos) == Cursor::SPACE ||
+ markup.getbyte(filter_pos) == Cursor::TAB ||
+ markup.getbyte(filter_pos) == Cursor::NL ||
+ markup.getbyte(filter_pos) == Cursor::CR
+ )
+ end
+
+ # Trailing bytes that aren't a pipe mean something the fast path doesn't handle
+ return false if filter_pos < len
+
+ @filters = Const::EMPTY_ARRAY if @filters.empty?
+ true
+ rescue SyntaxError
+ # Expression.parse (called inside fast_scan_filter_args for identifier args) can
+ # raise SyntaxError on malformed input. Fall back to full Lexer parse.
+ @name = nil
+ @filters = nil
+ false
+ end
+
+ # Called when fast_scan_filter_args encounters keyword args or an unrecognised
+ # token. Hands the remaining filter chain (from the pipe before fname_start)
+ # to the full Lexer-based parser, merges results into @filters, and returns true.
+ private def fall_to_lexer_filters(markup, pos, fname_start, len, parse_context)
+ # Walk back from fname_start to find the pipe that opened this filter.
+ # Equivalent to: markup.rindex('|', fname_start), bounded by pos.
+ rest_start = fname_start
+ rest_start -= 1 while rest_start > pos && markup.getbyte(rest_start) != Cursor::PIPE
+ rest_markup = markup.byteslice(rest_start, len - rest_start)
+ p = parse_context.new_parser(rest_markup)
+ while p.consume?(:pipe)
+ fn = p.consume(:id)
+ fa = p.consume?(:colon) ? parse_filterargs(p) : Const::EMPTY_ARRAY
+ @filters << lax_parse_filter_expressions(fn, fa)
+ end
+ p.consume(:end_of_string)
+ @filters = Const::EMPTY_ARRAY if @filters.empty?
+ true
+ end
+
+ # Scan positional filter arguments starting at `filter_pos`.
+ # Returns [filter_args_array, new_filter_pos] on success, or :fall_to_lexer when
+ # keyword args or unrecognised tokens are encountered.
+ private def fast_scan_filter_args(markup, filter_pos, parse_context)
+ len = markup.bytesize
+ filter_args = []
+
+ loop do
+ arg_start = filter_pos
+ b = filter_pos < len ? markup.getbyte(filter_pos) : nil
+
+ if b == Cursor::QUOTE_S || b == Cursor::QUOTE_D
+ # Quoted string argument
+ quote = b
+ filter_pos += 1
+ filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) != quote
+ filter_pos += 1 if filter_pos < len # skip closing quote
+ filter_args << markup.byteslice(arg_start + 1, filter_pos - arg_start - 2)
+
+ elsif b && (ByteTables::DIGIT[b] ||
+ (b == Cursor::DASH && filter_pos + 1 < len && ByteTables::DIGIT[markup.getbyte(filter_pos + 1)]))
+ # Numeric argument (integer or float, optionally negative)
+ filter_pos += 1 if b == Cursor::DASH
+ filter_pos += 1 while filter_pos < len && ByteTables::DIGIT[markup.getbyte(filter_pos)]
+ if filter_pos < len && markup.getbyte(filter_pos) == Cursor::DOT # float
+ filter_pos += 1
+ filter_pos += 1 while filter_pos < len && ByteTables::DIGIT[markup.getbyte(filter_pos)]
+ end
+ num_str = markup.byteslice(arg_start, filter_pos - arg_start)
+ filter_args << (num_str.include?('.') ? num_str.to_f : num_str.to_i)
+
+ elsif b && ByteTables::IDENT_START[b]
+ # Identifier argument — may be a variable lookup or keyword arg
+ id_start = filter_pos
+ filter_pos += 1
+ while filter_pos < len
+ b2 = markup.getbyte(filter_pos)
+ break unless ByteTables::IDENT_CONT[b2] || b2 == Cursor::DOT
+ filter_pos += 1
+ end
+ filter_pos += 1 if filter_pos < len && markup.getbyte(filter_pos) == Cursor::QMARK
+
+ # Peek past whitespace: if followed by ':', this is a keyword arg → fall to Lexer
+ kw_check = filter_pos
+ kw_check += 1 while kw_check < len && markup.getbyte(kw_check) == Cursor::SPACE
+ return :fall_to_lexer if kw_check < len && markup.getbyte(kw_check) == Cursor::COLON
+
+ id_markup = markup.byteslice(id_start, filter_pos - id_start)
+ filter_args << Expression.parse(id_markup, parse_context.string_scanner, parse_context.expression_cache)
+
+ else
+ return :fall_to_lexer
+ end
+
+ # Skip whitespace after argument
+ filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE
+
+ # Comma: more arguments follow; anything else: done with this filter's args
+ if filter_pos < len && markup.getbyte(filter_pos) == Cursor::COMMA
+ filter_pos += 1
+ filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == Cursor::SPACE
+ else
+ break
+ end
+ end
+
+ [filter_args, filter_pos]
end
def raw
@@ -42,7 +333,7 @@ def markup_context(markup)
end
def lax_parse(markup)
- @filters = []
+ @filters = Const::EMPTY_ARRAY
return unless markup =~ MarkupWithQuotedFragment
name_markup = Regexp.last_match(1)
@@ -54,19 +345,21 @@ def lax_parse(markup)
next unless f =~ /\w+/
filtername = Regexp.last_match(0)
filterargs = f.scan(FilterArgsRegex).flatten
+ @filters = [] if @filters.frozen?
@filters << lax_parse_filter_expressions(filtername, filterargs)
end
end
end
def strict_parse(markup)
- @filters = []
+ @filters = Const::EMPTY_ARRAY
p = @parse_context.new_parser(markup)
return if p.look(:end_of_string)
@name = parse_context.safe_parse_expression(p)
while p.consume?(:pipe)
+ @filters = [] if @filters.frozen?
filtername = p.consume(:id)
filterargs = p.consume?(:colon) ? parse_filterargs(p) : Const::EMPTY_ARRAY
@filters << lax_parse_filter_expressions(filtername, filterargs)
@@ -75,13 +368,16 @@ def strict_parse(markup)
end
def strict2_parse(markup)
- @filters = []
+ @filters = Const::EMPTY_ARRAY
p = @parse_context.new_parser(markup)
return if p.look(:end_of_string)
@name = parse_context.safe_parse_expression(p)
- @filters << strict2_parse_filter_expressions(p) while p.consume?(:pipe)
+ while p.consume?(:pipe)
+ @filters = [] if @filters.frozen?
+ @filters << strict2_parse_filter_expressions(p)
+ end
p.consume(:end_of_string)
end
@@ -97,24 +393,37 @@ def render(context)
obj = context.evaluate(@name)
@filters.each do |filter_name, filter_args, filter_kwargs|
- filter_args = evaluate_filter_expressions(context, filter_args, filter_kwargs)
- obj = context.invoke(filter_name, obj, *filter_args)
+ if filter_args.empty? && !filter_kwargs
+ obj = context.invoke_single(filter_name, obj)
+ elsif !filter_kwargs && filter_args.length == 1
+ # Single positional arg — most common after no-arg
+ obj = context.invoke_two(filter_name, obj, context.evaluate(filter_args[0]))
+ else
+ filter_args = evaluate_filter_expressions(context, filter_args, filter_kwargs)
+ obj = context.invoke(filter_name, obj, *filter_args)
+ end
end
context.apply_global_filter(obj)
end
def render_to_output_buffer(context, output)
- obj = render(context)
+ # Fast path: no filters and no global filter
+ obj = if @filters.empty? && context.global_filter.nil?
+ context.evaluate(@name)
+ else
+ render(context)
+ end
render_obj_to_output(obj, output)
output
end
def render_obj_to_output(obj, output)
- case obj
- when NilClass
+ if obj.instance_of?(String)
+ output << obj
+ elsif obj.nil?
# Do nothing
- when Array
+ elsif obj.instance_of?(Array)
obj.each do |o|
render_obj_to_output(o, output)
end
@@ -128,7 +437,7 @@ def disabled?(_context)
end
def disabled_tags
- []
+ Const::EMPTY_ARRAY
end
private
@@ -137,7 +446,8 @@ def lax_parse_filter_expressions(filter_name, unparsed_args)
filter_args = []
keyword_args = nil
unparsed_args.each do |a|
- if (matches = a.match(JustTagAttributes))
+ # Fast check: keyword args must contain ':'
+ if a.include?(':') && (matches = a.match(JustTagAttributes))
keyword_args ||= {}
keyword_args[matches[1]] = parse_context.parse_expression(matches[2])
else
@@ -190,15 +500,19 @@ def end_of_arguments?(p)
end
def evaluate_filter_expressions(context, filter_args, filter_kwargs)
- parsed_args = filter_args.map { |expr| context.evaluate(expr) }
if filter_kwargs
+ parsed_args = filter_args.map { |expr| context.evaluate(expr) }
parsed_kwargs = {}
filter_kwargs.each do |key, expr|
parsed_kwargs[key] = context.evaluate(expr)
end
parsed_args << parsed_kwargs
+ parsed_args
+ elsif filter_args.empty?
+ Const::EMPTY_ARRAY
+ else
+ filter_args.map { |expr| context.evaluate(expr) }
end
- parsed_args
end
class ParseTreeVisitor < Liquid::ParseTreeVisitor
diff --git a/lib/liquid/variable_lookup.rb b/lib/liquid/variable_lookup.rb
index 4fba2a658..bb33b68c7 100644
--- a/lib/liquid/variable_lookup.rb
+++ b/lib/liquid/variable_lookup.rb
@@ -10,11 +10,108 @@ def self.parse(markup, string_scanner = StringScanner.new(""), cache = nil)
new(markup, string_scanner, cache)
end
- def initialize(markup, string_scanner = StringScanner.new(""), cache = nil)
- lookups = markup.scan(VariableParser)
+ # Fast parse that skips simple_lookup? check — caller guarantees simple identifier chain
+ def self.parse_simple(markup, string_scanner = nil, cache = nil)
+ new(markup, string_scanner, cache, true)
+ end
+
+ # Fast manual scanner replacing markup.scan(VariableParser)
+ # VariableParser = /\[(?>[^\[\]]+|\g<0>)*\]|[\w-]+\??/
+ # Splits "product.variants[0].title" into ["product", "variants", "[0]", "title"]
+ def self.scan_variable(markup)
+ result = []
+ pos = 0
+ len = markup.bytesize
+
+ while pos < len
+ byte = markup.getbyte(pos)
+
+ if byte == 91 # '['
+ # Scan balanced brackets
+ depth = 1
+ start = pos
+ pos += 1
+ while pos < len && depth > 0
+ b = markup.getbyte(pos)
+ depth += 1 if b == 91
+ depth -= 1 if b == 93
+ pos += 1
+ end
+ if depth == 0
+ result << markup.byteslice(start, pos - start)
+ else
+ # Unbalanced bracket - skip '[' and continue
+ pos = start + 1
+ end
+ elsif byte == 46 # '.'
+ pos += 1
+ elsif ByteTables::IDENT_CONT[byte] # [\w-]
+ start = pos
+ pos += 1
+ while pos < len
+ b = markup.getbyte(pos)
+ break unless ByteTables::IDENT_CONT[b]
+ pos += 1
+ end
+ # Check trailing '?'
+ if pos < len && markup.getbyte(pos) == 63
+ pos += 1
+ end
+ result << markup.byteslice(start, pos - start)
+ else
+ pos += 1
+ end
+ end
+
+ result
+ end
+
+ # Check if markup is a simple identifier chain: [\w-]+\??(.[\w-]+\??)*
+ # Uses C-level match? — 8x faster than Ruby byte scanning
+ SIMPLE_LOOKUP_RE = /\A[\w-]+\??(?:\.[\w-]+\??)*\z/
+
+ def self.simple_lookup?(markup)
+ markup.bytesize > 0 && markup.match?(SIMPLE_LOOKUP_RE)
+ end
+
+ def initialize(markup, string_scanner = StringScanner.new(""), cache = nil, simple = false)
+ # Fast path: simple identifier chain without brackets
+ if simple || self.class.simple_lookup?(markup)
+ dot_pos = markup.index('.')
+ if dot_pos.nil?
+ @name = markup
+ @lookups = Const::EMPTY_ARRAY
+ @command_flags = 0
+ return
+ end
+ @name = markup.byteslice(0, dot_pos)
+ # Build lookups array from remaining dot-separated segments
+ lookups = []
+ @command_flags = 0
+ pos = dot_pos + 1
+ len = markup.bytesize
+ while pos < len
+ seg_start = pos
+ while pos < len
+ b = markup.getbyte(pos)
+ break if b == 46 # '.'
+ pos += 1
+ end
+ seg = markup.byteslice(seg_start, pos - seg_start)
+ if COMMAND_METHODS.include?(seg)
+ @command_flags |= 1 << lookups.length
+ end
+ lookups << seg
+ pos += 1 # skip dot
+ end
+ @lookups = lookups
+ return
+ end
+
+ lookups = self.class.scan_variable(markup)
name = lookups.shift
- if name&.start_with?('[') && name&.end_with?(']')
+ if name&.start_with?('[') && name.end_with?(']')
name = Expression.parse(
name[1..-2],
string_scanner,
@@ -26,9 +123,8 @@ def initialize(markup, string_scanner = StringScanner.new(""), cache = nil)
@lookups = lookups
@command_flags = 0
- @lookups.each_index do |i|
- lookup = lookups[i]
- if lookup&.start_with?('[') && lookup&.end_with?(']')
+ @lookups.each_with_index do |lookup, i|
+ if lookup&.start_with?('[') && lookup.end_with?(']')
lookups[i] = Expression.parse(
lookup[1..-2],
string_scanner,
@@ -49,26 +145,34 @@ def evaluate(context)
object = context.find_variable(name)
@lookups.each_index do |i|
- key = context.evaluate(@lookups[i])
+ lookup = @lookups[i]
+ key = lookup.instance_of?(String) ? lookup : context.evaluate(lookup)
# Cast "key" to its liquid value to enable it to act as a primitive value
- key = Liquid::Utils.to_liquid_value(key)
+ # Fast path: strings and integers (most common key types) don't need conversion
+ unless key.instance_of?(String) || key.instance_of?(Integer)
+ key = Liquid::Utils.to_liquid_value(key)
+ end
# If object is a hash- or array-like object we look for the
# presence of the key and if its available we return it
- if object.respond_to?(:[]) &&
- ((object.respond_to?(:key?) && object.key?(key)) ||
- (object.respond_to?(:fetch) && key.is_a?(Integer)))
-
+ if accessible?(object, key)
# if its a proc we will replace the entry with the proc
- res = context.lookup_and_evaluate(object, key)
- object = res.to_liquid
+ object = context.lookup_and_evaluate(object, key)
+ # Skip to_liquid for common primitive types (they return self)
+ unless object.instance_of?(String) || object.instance_of?(Integer) || object.instance_of?(Float) ||
+ object.instance_of?(Array) || object.instance_of?(Hash) || object.nil?
+ object = liquidize(object, context)
+ end
# Some special cases. If the part wasn't in square brackets and
# no key with the same name was found we interpret following calls
# as commands and call them on the current object
elsif lookup_command?(i) && object.respond_to?(key)
- object = object.send(key).to_liquid
+ object = object.send(key)
+ unless object.instance_of?(String) || object.instance_of?(Integer) || object.instance_of?(Array) || object.nil?
+ object = liquidize(object, context)
+ end
# Handle string first/last like ActiveSupport does (returns first/last character)
# ActiveSupport returns "" for empty strings, not nil
@@ -82,9 +186,6 @@ def evaluate(context)
return nil unless context.strict_variables
raise Liquid::UndefinedVariable, "undefined variable #{key}"
end
-
- # If we are dealing with a drop here we have to
- object.context = context if object.respond_to?(:context=)
end
object
@@ -94,6 +195,27 @@ def ==(other)
self.class == other.class && state == other.state
end
+ private
+
+ # Returns true if +object+ has +key+ accessible via [] lookup.
+ def accessible?(object, key)
+ if object.instance_of?(Hash)
+ object.key?(key)
+ else
+ object.respond_to?(:[]) &&
+ ((object.respond_to?(:key?) && object.key?(key)) ||
+ (object.respond_to?(:fetch) && key.is_a?(Integer)))
+ end
+ end
+
+ # Calls to_liquid on +object+ and wires up the context reference if needed.
+ # Skipped for primitive types that return self from to_liquid.
+ def liquidize(object, context)
+ object = object.to_liquid
+ object.context = context if object.respond_to?(:context=)
+ object
+ end
+
protected
def state
diff --git a/performance/bench_quick.rb b/performance/bench_quick.rb
new file mode 100644
index 000000000..6168f80e3
--- /dev/null
+++ b/performance/bench_quick.rb
@@ -0,0 +1,62 @@
+# frozen_string_literal: true
+
+# Quick benchmark for autoresearch: measures parse µs, render µs, and object allocations
+# Outputs machine-readable metrics to stdout
+
+require_relative 'theme_runner'
+
+RubyVM::YJIT.enable if defined?(RubyVM::YJIT)
+
+runner = ThemeRunner.new
+
+# Warmup — enough iterations for YJIT to fully optimize hot paths
+20.times { runner.compile }
+20.times { runner.render }
+
+GC.start
+GC.compact if GC.respond_to?(:compact)
+
+# Measure parse
+parse_times = []
+10.times do
+ GC.disable
+ t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+ runner.compile
+ t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+ GC.enable
+ GC.start
+ parse_times << (t1 - t0) * 1_000_000 # µs
+end
+
+# Measure render
+render_times = []
+10.times do
+ GC.disable
+ t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+ runner.render
+ t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+ GC.enable
+ GC.start
+ render_times << (t1 - t0) * 1_000_000 # µs
+end
+
+# Measure object allocations for one parse+render cycle
+require 'objspace'
+GC.start
+GC.disable
+before = ObjectSpace.count_objects.values_at(:TOTAL).first - ObjectSpace.count_objects.values_at(:FREE).first
+runner.compile
+runner.render
+after = ObjectSpace.count_objects.values_at(:TOTAL).first - ObjectSpace.count_objects.values_at(:FREE).first
+GC.enable
+allocations = after - before
+
+parse_us = parse_times.min.round(0)
+render_us = render_times.min.round(0)
+combined_us = parse_us + render_us
+
+puts "RESULTS"
+puts "parse_us=#{parse_us}"
+puts "render_us=#{render_us}"
+puts "combined_us=#{combined_us}"
+puts "allocations=#{allocations}"
diff --git a/test/unit/tokenizer_unit_test.rb b/test/unit/tokenizer_unit_test.rb
index 76d379d36..b3da5e4d3 100644
--- a/test/unit/tokenizer_unit_test.rb
+++ b/test/unit/tokenizer_unit_test.rb
@@ -48,6 +48,33 @@ def test_unmatching_start_and_end
assert_equal(["{%%}", "}"], tokenize('{%%}}'))
end
+ # Regression: lone '{' at or near end of string previously caused an infinite
+ # loop. The stray-{ else branch left `pos` unchanged when no further '{{' or
+ # '{%' existed, so the outer loop found the same '{' on every iteration.
+ def test_lone_brace_does_not_loop
+ assert_equal(["{"], tokenize('{'))
+ assert_equal(["a{"], tokenize('a{'))
+ assert_equal(["hello { world {"], tokenize('hello { world {'))
+ assert_equal(["{ world"], tokenize('{ world'))
+ assert_equal(["x{y"], tokenize('x{y'))
+ assert_equal(["{b{c"], tokenize('{b{c'))
+ end
+
+ def test_lone_brace_before_real_token
+ assert_equal(
+ ["a { b ", "{% if x %}", "yes", "{% endif %}", " c"],
+ tokenize('a { b {% if x %}yes{% endif %} c'),
+ )
+ assert_equal(
+ ["x { ", "{{ var }}", " y"],
+ tokenize('x { {{ var }} y'),
+ )
+ assert_equal(
+ ["{ ", "{{ var }}"],
+ tokenize('{ {{ var }}'),
+ )
+ end
+
private
def new_tokenizer(source, parse_context: Liquid::ParseContext.new, start_line_number: nil)
diff --git a/test/unit/variable_fast_parse_test.rb b/test/unit/variable_fast_parse_test.rb
new file mode 100644
index 000000000..fd67c5eaf
--- /dev/null
+++ b/test/unit/variable_fast_parse_test.rb
@@ -0,0 +1,126 @@
+# frozen_string_literal: true
+
+require 'test_helper'
+
+# Tests that the fast-path parser (try_fast_parse) produces the same result as the
+# full Lexer → Parser pipeline for every input we expect it to handle.
+#
+# This protects against silent regressions where a change to try_fast_parse causes it
+# to produce different output from the slow path (the existing test suite would still
+# pass because the slow path catches it, but correctness would be silently lost).
+class VariableFastParseTest < Minitest::Test
+ include Liquid
+
+ EQUIVALENCE_CASES = [
+ # Simple lookups
+ "product",
+ "product.title",
+ "product.variants.first.title",
+ # Quoted string literals
+ "'hello'",
+ '"hello"',
+ # Variables with no-arg filters
+ "product | upcase",
+ "product | upcase | downcase",
+ "product | strip | upcase | downcase",
+ # Variables with single-arg filters
+ "product | truncate: 50",
+ "product | plus: 1",
+ "product | plus: -3",
+ "product | round: 2",
+ "product | append: ' world'",
+ # Variables with multi-arg filters
+ "product | replace: 'a', 'b'",
+ "product | pluralize: 'item', 'items'",
+ "product | slice: 0, 5",
+ # Chained mixed filters
+ "product.title | truncate: 50",
+ "'hello' | append: ' world' | upcase",
+ "name | prepend: 'Dr. ' | append: ' PhD' | upcase",
+ # Numeric args
+ "count | plus: 1.5",
+ "price | minus: 0.99",
+ # No whitespace around pipe
+ "x|upcase",
+ "x|replace:'a','b'|upcase",
+ # Leading/trailing whitespace
+ " product ",
+ " product.title | upcase ",
+ ].freeze
+
+ EQUIVALENCE_CASES.each_with_index do |markup, i|
+ define_method(:"test_fast_parse_equivalence_#{i.to_s.rjust(2, "0")}") do
+ lax_ctx = Liquid::ParseContext.new(error_mode: :lax)
+ strict_ctx = Liquid::ParseContext.new(error_mode: :strict)
+
+ lax_var = Liquid::Variable.new(markup, lax_ctx)
+ strict_var = Liquid::Variable.new(markup, strict_ctx)
+
+ assert_equal strict_var.name,
+ lax_var.name,
+ "Name mismatch for #{markup.inspect}: " \
+ "lax=#{lax_var.name.inspect} strict=#{strict_var.name.inspect}"
+ assert_equal strict_var.filters.length,
+ lax_var.filters.length,
+ "Filter count mismatch for #{markup.inspect}: " \
+ "lax=#{lax_var.filters.inspect} strict=#{strict_var.filters.inspect}"
+ strict_var.filters.each_with_index do |(s_name, *), i|
+ l_name = lax_var.filters[i][0]
+ assert_equal s_name,
+ l_name,
+ "Filter name mismatch at index #{i} for #{markup.inspect}"
+ end
+ end
+ end
+
+ # Verify the fast path is actually taken for simple variables (i.e. filters is the
+ # shared frozen EMPTY_ARRAY, not a newly allocated array).
+ def test_fast_path_taken_for_simple_variable
+ ctx = Liquid::ParseContext.new(error_mode: :lax)
+ var = Liquid::Variable.new("product.title", ctx)
+ assert_same(
+ Liquid::Const::EMPTY_ARRAY,
+ var.filters,
+ "Expected fast path (frozen EMPTY_ARRAY) for simple variable",
+ )
+ end
+
+ def test_fast_path_taken_for_no_arg_filter
+ ctx = Liquid::ParseContext.new(error_mode: :lax)
+ var = Liquid::Variable.new("product | upcase", ctx)
+ assert_equal(1, var.filters.length)
+ assert_equal("upcase", var.filters[0][0])
+ # The no-arg filter tuple should come from NO_ARG_FILTER_CACHE (frozen)
+ assert_predicate(var.filters[0], :frozen?)
+ end
+
+ def test_fast_path_taken_for_single_arg_filter
+ ctx = Liquid::ParseContext.new(error_mode: :lax)
+ var = Liquid::Variable.new("product | truncate: 50", ctx)
+ assert_equal(1, var.filters.length)
+ assert_equal("truncate", var.filters[0][0])
+ assert_equal([50], var.filters[0][1])
+ end
+
+ # Keyword args must fall through to the Lexer — verify the result is still correct.
+ def test_keyword_arg_falls_to_lexer_and_parses_correctly
+ ctx = Liquid::ParseContext.new(error_mode: :lax)
+ var = Liquid::Variable.new("img | img_tag: class: 'hero'", ctx)
+ assert_equal(1, var.filters.length)
+ assert_equal("img_tag", var.filters[0][0])
+ end
+
+ # Numeric filter arguments: integers and floats
+ def test_numeric_filter_args
+ ctx = Liquid::ParseContext.new(error_mode: :lax)
+
+ int_var = Liquid::Variable.new("price | plus: 3", ctx)
+ assert_equal([3], int_var.filters[0][1])
+
+ neg_var = Liquid::Variable.new("price | minus: -1", ctx)
+ assert_equal([-1], neg_var.filters[0][1])
+
+ float_var = Liquid::Variable.new("price | round: 2.5", ctx)
+ assert_equal([2.5], float_var.filters[0][1])
+ end
+end