Skip to content

Add array subset-check benchmark across Ruby 3.3, 3.4, 4.0#237

Draft
etagwerker wants to merge 1 commit into
mainfrom
array-subset-check-benchmark
Draft

Add array subset-check benchmark across Ruby 3.3, 3.4, 4.0#237
etagwerker wants to merge 1 commit into
mainfrom
array-subset-check-benchmark

Conversation

@etagwerker

Copy link
Copy Markdown
Member

What

Adds an Array benchmark for subset checks (is every element of a1 also in a2?), comparing five approaches across Ruby 3.3.10, 3.4.7, and 4.0.0:

  • (a1 - a2).empty?
  • (a1 & a2) == a1
  • (a1 & a2).size == a1.size
  • a1.all? { |e| a2.include?(e) }
  • a1.to_set.subset?(a2.to_set)

Background

This revisits the comparison from #125 by @gabteles (now closed). That PR had two problems the reviewers (@mblumtritt, @Arcovion) hinted at back in 2017:

  1. A correctness bug: Set#subset? had its arguments reversed (a2.to_set.subset?(a1.to_set)), so it returned false while every other method returned true. It wasn't measuring the same operation.
  2. No stable winner: the result is highly data-dependent.

This version fixes the Set arguments, adds an equivalence guard so all five approaches must agree before the benchmark runs, and reports results across three modern Ruby versions.

Findings

  • (a1 - a2).empty? is the consistent winner across 3.3, 3.4, and 4.0 for the common case where a1 really is a subset.
  • a1.all? { include? } is data-dependent: it short-circuits on the first miss (so it wins when a1 is not a subset), but it's O(n*m) and degrades badly on large true subsets. The README entry documents this caveat.
  • Set#subset? improved dramatically in Ruby 4.0: ~6.8x slower on 3.3/3.4 (dominated by to_set allocation) but only ~1.7x slower on 4.0. If you already hold Sets or check repeatedly, it scales best.

Notes

  • Benchmarks were run via rbenv on each version (benchmark-ips installed per version).
  • The README block shows the full output for 3.3.10 and the Comparison: summary for 3.4.7 and 4.0.0 to keep it readable.

🤖 Generated with Claude Code

Revisits the comparison from the closed #125 (gabteles), fixing the
reversed Set#subset? arguments so every approach returns the same
result (guarded by an equivalence check), and benchmarks across modern
Ruby versions.

Findings:
- (a1 - a2).empty? is the consistent winner for true-subset inputs.
- a1.all? { include? } only wins when a1 is NOT a subset (short-circuits)
  and is O(n*m) on large true subsets.
- Set#subset? (incl. to_set) went from ~6.8x slower on 3.3/3.4 to ~1.7x
  slower on 4.0, where Set got much faster.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant