feat: add Clock2QPlus eviction algorithm#314
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the Clock2QPlus eviction algorithm, adding the necessary logic for initialization, insertion, and eviction across a multi-queue structure. The feedback identifies several critical issues: a potential use-after-free vulnerability when removing objects from the main cache, a possible crash during parameter parsing due to unchecked string splitting, and an integer overflow risk in the correlation window logic. Other recommendations include correctly accounting for metadata overhead in cache size calculations, removing unused configuration fields, and adjusting the insertion check to allow objects larger than the FIFO queue.
| char *key = strsep((char **)¶ms_str, "="); | ||
| char *value = strsep((char **)¶ms_str, ","); |
There was a problem hiding this comment.
The value pointer returned by strsep can be NULL if the = delimiter is missing in the parameter string. Subsequent calls to strtod or atoi with a NULL argument will cause a crash. It is safer to validate that both key and value are present.
char *key = strsep((char **)¶ms_str, "=");
char *value = strsep((char **)¶ms_str, ",");
if (key == NULL || value == NULL) break;| bool removed = main->remove(main, obj_to_evict->obj_id); | ||
| if (!removed) { | ||
| ERROR("cannot remove obj %ld\n", obj_to_evict->obj_id); | ||
| } |
There was a problem hiding this comment.
Accessing obj_to_evict->obj_id after calling main->remove is potentially a use-after-free (UAF) vulnerability if the remove implementation frees the object. It is safer to store the obj_id in a local variable before performing the removal.
obj_id_t obj_id = obj_to_evict->obj_id;
bool removed = main->remove(main, obj_id);
if (!removed) {
ERROR("cannot remove obj %ld\n", obj_id);
}| cache->get_occupied_byte = Clock2QPlus_get_occupied_byte; | ||
| cache->can_insert = Clock2QPlus_can_insert; | ||
|
|
||
| cache->obj_md_size = 0; |
There was a problem hiding this comment.
Setting obj_md_size to 0 means the metadata overhead (Clock2QPlus_obj_metadata_t) is not accounted for in the cache's occupied bytes. This leads to inaccurate cache size simulations. It should be set to the size of the algorithm-specific metadata.
cache->obj_md_size = sizeof(Clock2QPlus_obj_metadata_t);| int time_since_insertion = | ||
| params->n_obj_admit_to_fifo - obj->Clock2QPlus.insertion_time; |
There was a problem hiding this comment.
time_since_insertion is declared as an int, but it is calculated from int64_t counters. In long-running simulations where the number of object admissions exceeds 2^31, this will overflow and break the correlation window logic.
int64_t time_since_insertion =
params->n_obj_admit_to_fifo - obj->Clock2QPlus.insertion_time;| int64_t corr_window_size_upper_bound; | ||
| int64_t corr_window_size_lower_bound; |
| const request_t *req) { | ||
| Clock2QPlus_params_t *params = (Clock2QPlus_params_t *)cache->eviction_params; | ||
|
|
||
| return req->obj_size <= params->fifo->cache_size; |
There was a problem hiding this comment.
The can_insert check is currently restricted to the size of the small FIFO. This prevents objects that are larger than the FIFO but smaller than the total cache from being admitted, even if they could be placed directly into the main cache (e.g., during warmup or on a ghost hit).
return req->obj_size <= cache->cache_size;|
Can you add some results? |
Do we want to add results to the repo or show results in this PR? |
We may need to add some test results, for example, these existing ones. |
- comment corr_window_size_upper/lower_bound fields - fix time_since_insertion type: int -> int64_t - save obj_id before remove() to avoid potential UAF
| params->n_obj_admit_to_fifo - obj->Clock2QPlus.insertion_time; | ||
| if (time_since_insertion >= params->corr_window_size) { | ||
| obj->Clock2QPlus.freq += 1; | ||
| } |
There was a problem hiding this comment.
Does Clock2Q+ work under ignore-obj-size only? If so, we may need to add a clarification in comments.
There was a problem hiding this comment.
comment added, thanks for pointing out
Just add to PR |
📝 WalkthroughWalkthroughA new cache eviction algorithm ChangesClock2QPlus Algorithm Addition
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libCacheSim/cache/eviction/Clock2QPlus.c`:
- Around line 111-119: The code computes fifo_cache_size/main_cache_size and
sets corr_window_* using parsed params without validating numeric inputs or
checking insert return values; update parsing call sites (where strtod/atoi are
used) to validate conversion (check errno, endptr, and numeric ranges) and
clamp/return on invalid values, ensure fifo_cache_size and main_cache_size are
>= 1 and that ghost sizes and window bounds are within sensible ranges before
using them in Clock2QPlus.c (references: ccache_params, params->fifo_size_ratio,
params->ghost_size_ratio, params->corr_window_ratio,
params->corr_window_size_upper_bound/lower_bound), and add NULL checks for
allocation/insert results (e.g., verify new_obj returned from insert operations
before accessing its fields at the insert sites referenced around the new_obj
usages) with proper error handling/logging and early exit to avoid dereference.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5cfc7988-3194-4ca8-a668-fd112a419ed8
📒 Files selected for processing (7)
libCacheSim/bin/cachesim/cache_init.hlibCacheSim/cache/CMakeLists.txtlibCacheSim/cache/eviction/Clock2QPlus.clibCacheSim/include/libCacheSim/cacheObj.hlibCacheSim/include/libCacheSim/evictionAlgo.htest/common.htest/test_evictionAlgo.c
| int64_t fifo_cache_size = | ||
| (int64_t)ccache_params.cache_size * params->fifo_size_ratio; | ||
| int64_t main_cache_size = ccache_params.cache_size - fifo_cache_size; | ||
| int64_t fifo_ghost_cache_size = | ||
| (int64_t)(ccache_params.cache_size * params->ghost_size_ratio); | ||
| params->corr_window_size_upper_bound = (int64_t)(fifo_cache_size * 0.5); | ||
| params->corr_window_size_lower_bound = (int64_t)(fifo_cache_size * 0.1); | ||
| params->corr_window_size = | ||
| (double)(fifo_cache_size * params->corr_window_ratio); |
There was a problem hiding this comment.
Validate parsed parameters before sizing sub-caches and using insert results.
At Line 499–Line 506, non-numeric or out-of-range values are accepted (strtod/atoi without end/range checks). That can produce invalid fifo_cache_size/main_cache_size at Line 111–Line 115 and lead to NULL dereference at Line 345 or Line 377 (new_obj->...) if insert fails.
Proposed hardening patch
static void Clock2QPlus_parse_params(cache_t *cache,
const char *cache_specific_params) {
@@
- if (strcasecmp(key, "fifo-size-ratio") == 0) {
- params->fifo_size_ratio = strtod(value, NULL);
- } else if (strcasecmp(key, "ghost-size-ratio") == 0) {
- params->ghost_size_ratio = strtod(value, NULL);
- } else if (strcasecmp(key, "corr-window-ratio") == 0) {
- params->corr_window_ratio = strtod(value, NULL);
- } else if (strcasecmp(key, "move-to-main-threshold") == 0) {
- params->move_to_main_threshold = atoi(value);
+ if (strcasecmp(key, "fifo-size-ratio") == 0) {
+ char *end = NULL;
+ double v = strtod(value, &end);
+ if (end == value || *end != '\0' || v <= 0.0 || v >= 1.0) {
+ ERROR("invalid fifo-size-ratio: %s\n", value);
+ exit(1);
+ }
+ params->fifo_size_ratio = v;
+ } else if (strcasecmp(key, "ghost-size-ratio") == 0) {
+ char *end = NULL;
+ double v = strtod(value, &end);
+ if (end == value || *end != '\0' || v < 0.0 || v > 1.0) {
+ ERROR("invalid ghost-size-ratio: %s\n", value);
+ exit(1);
+ }
+ params->ghost_size_ratio = v;
+ } else if (strcasecmp(key, "corr-window-ratio") == 0) {
+ char *end = NULL;
+ double v = strtod(value, &end);
+ if (end == value || *end != '\0' || v < 0.0 || v > 1.0) {
+ ERROR("invalid corr-window-ratio: %s\n", value);
+ exit(1);
+ }
+ params->corr_window_ratio = v;
+ } else if (strcasecmp(key, "move-to-main-threshold") == 0) {
+ char *end = NULL;
+ long v = strtol(value, &end, 10);
+ if (end == value || *end != '\0' || v < 0) {
+ ERROR("invalid move-to-main-threshold: %s\n", value);
+ exit(1);
+ }
+ params->move_to_main_threshold = (int)v;
@@
} int64_t fifo_cache_size =
(int64_t)ccache_params.cache_size * params->fifo_size_ratio;
int64_t main_cache_size = ccache_params.cache_size - fifo_cache_size;
+ if (fifo_cache_size <= 0 || main_cache_size <= 0) {
+ ERROR("invalid cache split: fifo=%" PRId64 ", main=%" PRId64 "\n",
+ fifo_cache_size, main_cache_size);
+ exit(1);
+ } cache_obj_t *new_obj = main->insert(main, params->req_local);
+ if (new_obj == NULL) {
+ ERROR("failed to move object %" PRIu64 " into main cache\n",
+ params->req_local->obj_id);
+ exit(1);
+ }
new_obj->freq = obj_to_evict->freq;
@@
cache_obj_t *new_obj = main->insert(main, params->req_local);
+ if (new_obj == NULL) {
+ ERROR("failed to reinsert object %" PRIu64 " into main cache\n",
+ params->req_local->obj_id);
+ exit(1);
+ }
// 1-bit counter
new_obj->Clock2QPlus.freq = 0;Also applies to: 345-347, 377-380, 475-517
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libCacheSim/cache/eviction/Clock2QPlus.c` around lines 111 - 119, The code
computes fifo_cache_size/main_cache_size and sets corr_window_* using parsed
params without validating numeric inputs or checking insert return values;
update parsing call sites (where strtod/atoi are used) to validate conversion
(check errno, endptr, and numeric ranges) and clamp/return on invalid values,
ensure fifo_cache_size and main_cache_size are >= 1 and that ghost sizes and
window bounds are within sensible ranges before using them in Clock2QPlus.c
(references: ccache_params, params->fifo_size_ratio, params->ghost_size_ratio,
params->corr_window_ratio, params->corr_window_size_upper_bound/lower_bound),
and add NULL checks for allocation/insert results (e.g., verify new_obj returned
from insert operations before accessing its fields at the insert sites
referenced around the new_obj usages) with proper error handling/logging and
early exit to avoid dereference.
Got it, I updated the PR message |
Summary
Adds Clock2QPlus, which is designed for metadata-cache workloads with correlated references. S3-FIFO sets the Small FIFO reference bit after a short-term re-reference, while Clock2Q+ suppresses reference-bit updates inside a correlation window near the head of the Small FIFO. This prevents bursty metadata accesses from being promoted into the Main Clock too early.
Behavior
New objects enter a small FIFO; those hit outside the correlation window are promoted to a main queue on eviction. A ghost cache admits repeatedly-seen objects directly into the main queue.
Changes
cache/eviction/Clock2QPlus.c: algorithm implementationinclude/libCacheSim/cacheObj.h: addClock2QPlus_obj_metadata_tand union memberinclude/libCacheSim/evictionAlgo.h: declareClock2QPlus_initcache/CMakeLists.txt: register source filebin/cachesim/cache_init.h: add"clock2qplus"entryParameters
fifo-size-ratioghost-size-ratiomove-to-main-thresholdcorr-window-ratioResults
The paper (Clock2Q+: A Simple and Efficient Replacement Algorithm for Metadata Cache in VMware vSAN) reports that Clock2Q+ consistently improves miss ratio on metadata-style traces and achieves the best miss-ratio improvement among the evaluated policies. On derived metadata traces, Clock2Q+ outperforms S3-FIFO 2-bit by up to 28.5% in miss ratio at the larger cache-size setting. It also improves over Clock, ARC, and S3-FIFO, etc across cache sizes on both derived metadata traces and original data traces.
Figure 1: miss_ratio_improvement_data.pdf
Figure 2: miss_ratio_improvement_metadata.pdf
Figure 1 & 2: Comparing Clock2Q+ with state-of-the-art eviction algorithms on data traces and metadata traces. Clock2Q+ shows strong improvement over the tested algorithms at both cache-size ratios.
Figure 3: CloudPhysics_miss_ratio_curves.pdf
Figure 3: Miss-ratio curves for CloudPhysics w56. Lower curves indicate better performance (fewer misses) at a given cache size. Clock2Q+ consistently has the lowest miss ratio across the tested cache sizes.