Skip to content

Fix CLFUS RAM cache value metric broken by integer division#13233

Open
phongn wants to merge 3 commits into
apache:masterfrom
phongn:fix-clfus-cache-value-int-division
Open

Fix CLFUS RAM cache value metric broken by integer division#13233
phongn wants to merge 3 commits into
apache:masterfrom
phongn:fix-clfus-cache-value-int-division

Conversation

@phongn

@phongn phongn commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

The CLFUS RAM cache ranks objects by a value density, value = (hits + 1) / (size + overhead), compared against a running double average on every eviction/admission decision. Since #11733 this has been computed with integer division and truncates to 0 for essentially every object, silently disabling CLFUS's frequency/size/recency logic and degrading it to FIFO.

Root cause

#11733 ("Fix C-style cast in Cache subsystem") modernized the macro:

-#define CACHE_VALUE_HITS_SIZE(_h, _s) ((float)((_h) + 1) / ((_s) + ENTRY_OVERHEAD))
+#define CACHE_VALUE_HITS_SIZE(_h, _s) (static_cast<float>(((_h) + 1) / ((_s) + ENTRY_OVERHEAD)))

The original C cast bound only to the numerator, so the division was floating point. The static_cast<float>(...) wraps the whole quotient, so (hits + 1) / (size + overhead) is now integer division. hits is on the order of tens while size + overhead is thousands+, so the result truncates to 0 for any normal object.

With value ≡ 0, _average_value stays 0 and every value-gated branch is dead (0 > 0 is false):

  • no promote-to-MRU on hit (get()),
  • no CLOCK second-chance / requeue (put()),
  • no value-based re-admission from the history (ghost) list.

CLFUS effectively becomes FIFO, which generally performs worse than the simpler LRU it is meant to beat. This has affected every CLFUS deployment since the regression merged on 2024-08-28.

Fix

Bind the cast to the numerator only, restoring floating-point division (the pre-#11733 semantics). In addition, convert to modern C++ constexpr for type-safety:

constexpr double
cache_value_hits_size(const uint64_t hits, const uint32_t size)
{
  return static_cast<double>(hits + 1) / (size + entry_overhead);
}

constexpr double
cache_value(const RamCacheCLFUSEntry *const e)
{
  return cache_value_hits_size(e->hits, e->size);
}

Test

Adds ram_cache_clfus_value, a focused regression test asserting the value metric is a non-zero fraction and is monotonic in hits and in size. It fails on the pre-fix macro and passes after:

  • buggy macro → FAILED ("value metric truncated to zero", etc.)
  • fixed macro → PASSED

The existing ram_cache regression still passes. With the fix, CLFUS's size-aware behavior is restored — in the bundled ram_cache test at the 16 MB size, its variable-size hit rate rises to ~0.82, slightly ahead of LRU (~0.80) on the same workload.

Drive-By Changes

@moonchen suggested that the new regression test could be run at compile time via use of static_assert; to facilitate this change the prior C/C++ preprocessor macros and constants have been converted to modern C++ constexpr constants and functions.

Notes

  • Severity: correctness regression affecting all users of the non-default proxy.config.cache.ram_cache.algorithm = 0 (CLFUS) since Fix C-style cast in Cache subsystem #11733; a backport may be warranted.
  • This restores the intended algorithm. A separate, long-standing limitation — resident entries' hit counts are only aged on the ghost list, never while resident — is not addressed here and is tracked separately.

Fixes a regression introduced in #11733.

PR apache#11733 rewrote the CACHE_VALUE_HITS_SIZE cast so static_cast<float>
wraps the whole quotient, making (hits + 1) / (size + overhead) integer
division. It truncates to 0 for normal object sizes, zeroing the value
metric and collapsing CLFUS to FIFO: no promote-on-hit, no clock second
chance, and no value-based ghost re-admission.

Bind the cast to the numerator to restore floating-point division, and
add the ram_cache_clfus_value regression test as a guard (it fails on
the pre-fix macro and passes after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a correctness regression in the CLFUS RAM cache eviction/admission “value density” metric by restoring floating-point division in CACHE_VALUE_HITS_SIZE, and adds a focused regression test to prevent the metric from silently collapsing to zero again.

Changes:

  • Fix CACHE_VALUE_HITS_SIZE so the cast applies to the numerator, ensuring floating-point division.
  • Add ram_cache_clfus_value regression test to validate the value metric is non-zero and monotonic with hits and size.

moonchen
moonchen previously approved these changes Jun 18, 2026

@moonchen moonchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. One optional suggestion:

in your regression test, you could use static_assert(CACHE_VALUE_HITS_SIZE(1u, 16384u) > 0.0f, ...) and similar statements, so that the failures are caught at compile time without running the test.

@paulo

paulo commented Jun 22, 2026

Copy link
Copy Markdown

A backport may be warranted

A backport would be ideal, I'm seeing the same behavior 🙏

@phongn

phongn commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

Change looks good. One optional suggestion:

in your regression test, you could use static_assert(CACHE_VALUE_HITS_SIZE(1u, 16384u) > 0.0f, ...) and similar statements, so that the failures are caught at compile time without running the test.

I took the liberty of replacing the old C/C++ preprocessor macros and constants with modern constexpr equivalents, in addition to altering the regression test to use static_assert.

@moonchen moonchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to converting macros to constexpr. Just a couple of artifacts from the conversion.

Comment thread src/iocore/cache/RamCacheCLFUS.cc Outdated
Comment thread src/iocore/cache/RamCacheCLFUS.cc Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants