diff --git a/README.md b/README.md index cecf3ce..0b72dae 100644 --- a/README.md +++ b/README.md @@ -492,6 +492,61 @@ Comparison: Array#sort_by &:-@: 229323.6 i/s - 2.44x slower ``` +##### Subset check: `(a1 - a2).empty?` vs alternatives [code](code/array/subset-check.rb) + +> To check whether every element of `a1` is also in `a2`, `(a1 - a2).empty?` is +> consistently the fastest across modern Ruby versions for the common case where +> `a1` is actually a subset.
+> **Caveat:** the winner is highly data-dependent. `a1.all? { |e| a2.include?(e) }` +> short-circuits on the first miss, so it wins when `a1` is *not* a subset, but it +> is O(n*m) and degrades badly on large true subsets.
+> Note also that `Set#subset?` (including the `to_set` conversion) was ~6.8x slower +> on Ruby 3.3/3.4 but only ~1.7x slower on Ruby 4.0, where `Set` became much +> faster. If you already hold `Set`s (or check repeatedly), `Set#subset?` scales +> best. + +``` +$ ruby -v code/array/subset-check.rb +ruby 3.3.10 (2025-10-23 revision 343ea05002) [arm64-darwin25] +Warming up -------------------------------------- + (a1 - a2).empty? 86.499k i/100ms + (a1 & a2) == a1 73.860k i/100ms + (a1 & a2).size == n 74.102k i/100ms +a1.all? { include? } 67.703k i/100ms + a1.to_set.subset? 12.068k i/100ms +Calculating ------------------------------------- + (a1 - a2).empty? 849.546k (± 2.7%) i/s (1.18 μs/i) - 4.325M in 5.090893s + (a1 & a2) == a1 746.103k (± 2.0%) i/s (1.34 μs/i) - 3.767M in 5.048711s + (a1 & a2).size == n 780.447k (± 1.9%) i/s (1.28 μs/i) - 3.927M in 5.032254s +a1.all? { include? } 715.301k (± 2.4%) i/s (1.40 μs/i) - 3.588M in 5.016435s + a1.to_set.subset? 124.189k (± 0.8%) i/s (8.05 μs/i) - 627.536k in 5.053052s + +Comparison: + (a1 - a2).empty?: 849546.4 i/s + (a1 & a2).size == n: 780446.7 i/s - 1.09x slower + (a1 & a2) == a1: 746103.3 i/s - 1.14x slower +a1.all? { include? }: 715300.6 i/s - 1.19x slower + a1.to_set.subset?: 124189.5 i/s - 6.84x slower + +$ ruby -v code/array/subset-check.rb +ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin25] +Comparison: + (a1 - a2).empty?: 807172.5 i/s + (a1 & a2).size == n: 719880.2 i/s - 1.12x slower +a1.all? { include? }: 713213.0 i/s - 1.13x slower + (a1 & a2) == a1: 687629.1 i/s - 1.17x slower + a1.to_set.subset?: 119834.6 i/s - 6.74x slower + +$ ruby -v code/array/subset-check.rb +ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25] +Comparison: + (a1 - a2).empty?: 784706.1 i/s +a1.all? { include? }: 727520.9 i/s - 1.08x slower + (a1 & a2).size == n: 701969.3 i/s - 1.12x slower + (a1 & a2) == a1: 670252.7 i/s - 1.17x slower + a1.to_set.subset?: 467206.3 i/s - 1.68x slower +``` + ### Enumerable ##### `Enumerable#each + push` vs `Enumerable#map` [code](code/enumerable/each-push-vs-map.rb) diff --git a/code/array/subset-check.rb b/code/array/subset-check.rb new file mode 100644 index 0000000..d3efa1f --- /dev/null +++ b/code/array/subset-check.rb @@ -0,0 +1,42 @@ +require "benchmark/ips" +require "set" + +# Check whether ARRAY1 is a subset of ARRAY2 (every element of ARRAY1 is in +# ARRAY2). The fastest approach is highly dependent on the input: `all?` + +# `include?` short-circuits on the first miss (great for non-subsets) but is +# O(n*m) when ARRAY1 really is a subset, while Set-based lookups stay O(n). +ARRAY1 = [*1..25] +ARRAY2 = [*1..100] + +def minus_empty + (ARRAY1 - ARRAY2).empty? +end + +def intersection_equal + (ARRAY1 & ARRAY2) == ARRAY1 +end + +def intersection_size + (ARRAY1 & ARRAY2).size == ARRAY1.size +end + +def all_include + ARRAY1.all? { |element| ARRAY2.include?(element) } +end + +def set_subset + ARRAY1.to_set.subset?(ARRAY2.to_set) +end + +# Sanity check: every approach must return the same answer. +results = [minus_empty, intersection_equal, intersection_size, all_include, set_subset] +raise "not equivalent: #{results.inspect}" unless results.uniq.size == 1 + +Benchmark.ips do |x| + x.report("(a1 - a2).empty?") { minus_empty } + x.report("(a1 & a2) == a1") { intersection_equal } + x.report("(a1 & a2).size == n") { intersection_size } + x.report("a1.all? { include? }") { all_include } + x.report("a1.to_set.subset?") { set_subset } + x.compare! +end