Skip to content

Antalya 26.1 - Forward port of list objects cache #1040#1405

Merged
zvonand merged 15 commits intoantalya-26.1from
fp_antalya_26_1_list_objects_cache
Mar 4, 2026
Merged

Antalya 26.1 - Forward port of list objects cache #1040#1405
zvonand merged 15 commits intoantalya-26.1from
fp_antalya_26_1_list_objects_cache

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Feb 14, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Forward port of list objects cache #1040

Documentation entry for user-facing changes

Cache for listobjects calls

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

…ache

Antalya 25.8 - Forward port of #805 List objects cache
@arthurpassos arthurpassos added port-antalya PRs to be ported to all new Antalya releases port-forward Needs to be ported to every future minor release of this major version antalya-26.1 labels Feb 14, 2026
@github-actions
Copy link

github-actions bot commented Feb 14, 2026

Workflow [PR], commit [68ba3dd]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bb0d4833c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache())
{
auto & cache = ObjectStorageListObjectsCache::instance();
ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope list-objects cache entries to auth identity

StorageObjectStorageSource::createFileIterator builds the cache key from only description/namespace/prefix, so entries are shared across sessions that use different object-store credentials for the same endpoint+bucket+prefix. In S3, getDescription() is endpoint-based, so a query can reuse another credential's cached object list, which can surface object names and trigger read failures on objects the current credentials cannot access. Include credential/user identity in the key (or explicitly disable cache for per-query credentials) to avoid cross-credential cache pollution.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we did discuss this and came to the conclusion it was ok.

auto & cache = ObjectStorageListObjectsCache::instance();
ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())};

if (auto objects_info = cache.get(cache_key, /*filter_by_prefix=*/ false))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Differentiate list cache entries by tag-fetch mode

The cache key does not include with_tags, so a cached listing produced with with_tags=false can be reused for a later _tags query that needs tag metadata. On a cache hit, createReader skips metadata refetch because each ObjectInfo already has metadata, so _tags can be silently empty/wrong even though the query requested tags. Add with_tags (or an equivalent metadata-completeness flag) to the cache key.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it in d0f05e4

const std::shared_ptr<Value> & value)
{
auto key_with_ttl = key;
key_with_ttl.expires_at = std::chrono::steady_clock::now() + std::chrono::seconds(ttl_in_seconds);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor zero TTL as non-expiring list cache entries

The server setting says TTL 0 means unlimited, but ObjectStorageListObjectsCache::set always writes expires_at = now + seconds(ttl_in_seconds). With ttl_in_seconds == 0, entries expire immediately and are treated as stale on subsequent lookups, effectively disabling caching instead of making entries non-expiring. Handle 0 as a special case (e.g., max time point) to match the documented setting behavior.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - I am ashamed, thanks machine

@zvonand zvonand merged commit 17917fe into antalya-26.1 Mar 4, 2026
249 of 265 checks passed
@CarlosFelipeOR
Copy link
Collaborator

QA Verification

PR Summary

Forward port of the ListObjects cache feature to antalya-26.1. Adds ObjectStorageListObjectsCache to cache S3/Azure ListObjects calls, reducing redundant requests during glob-based queries.

Tests Added by the PR

Test Type Status
gtest_object_storage_list_objects_cache Unit test (10 cases) PASSED
03377_object_storage_list_objects_cache.sql Stateless end-to-end PASSED

All builds (12), fuzzers (10), and the majority of integration/stateless/regression suites passed.

Integration and Stateless Tests — Unrelated Failures

1. Integration tests (arm_binary, distributed plan, 3/4)

Status: CANCELLED / ERROR
Root cause: ARM CI infrastructure timeout issue, being fixed by PR #1466. Affects all antalya-26.1 PRs.
Related to PR 1405: No

2. Export tests (126 failures in targeted + multiple jobs)

Tests:
test_export_merge_tree_part_to_object_storage, test_export_replicated_mt_partition_to_object_storage, 03572_export_merge_tree_part_limits_and_table_functions
Root cause: Introduced by PR #1388 (export part/partition forward port). Non-idempotent tests under --count 10 and ASAN timeouts. Tracked in clickhouse-regression#112. Affects all antalya-26.1 PRs since Mar 4.
Related to PR 1405: No

3. test_ttl_replicated/test.py::test_ttl_compatibility (6 failures across attempts)

Job: Integration tests (amd_asan, targeted)
Root cause: Known flaky timing-sensitive test. Fails daily upstream on ClickHouse/ClickHouse. On antalya-26.1, fails across PRs #1405, #1433, #1390, #1385. The SYSTEM SYNC REPLICA times out under ASAN overhead. High failure rate in targeted job is due to --count 10 amplification.
Related to PR 1405: No

4. test_restore_db_replica/test.py::test_query_after_restore_db_replica[alter table-with exists table-no restart]

Job: Integration tests (amd_tsan, 2/6)
Root cause: TSAN-specific flaky test. Also fails on PR #1390 (amd_tsan). Fails upstream (PRs ClickHouse#96051, ClickHouse#96605). Only 5 FAIL out of 460+ runs on antalya-26.1.
Related to PR 1405: No

5. Transient failures (failed in earlier attempts, passed on rerun)

All tests below failed in earlier attempts but passed on the final attempt. All are known flaky upstream:

Test Job (earlier attempt) Upstream fails (30d) Other antalya PRs
test_postgresql_replica_database_engine/test_3.py (18 tests) amd_tsan 2/6 5-16 per test --
test_database_glue/test.py (13 tests) amd_asan 2/6 -- PR #1428
test_kafka_bad_messages/test.py (4 tests) amd_binary 4/5 71-72 per test --
test_disk_over_web_server/test.py (7 tests) amd_binary 1/5 -- PRs #1414, #1390, #1375, master
test_storage_s3_queue/test_3.py (9 tests) amd_tsan 4/6 6-18 per test --
test_storage_delta/test.py::test_network_activity_with_system_tables amd_asan targeted, others -- All antalya PRs (PR #1432 regression, fixed by #1474)
test_backup_to_s3_different_credentials amd_binary 3/5 -- PR #1390
00167_read_bytes_from_fs amd_ubsan seq 4 (master only) --
ThreadSanitizer: data race / Scraping system tables amd_tsan s3 seq -- PRs #1414, #1390, #1395, #1466, master + 5 more

Related to PR 1405: No

Regression Tests (Altinity/clickhouse-regression)

All failures are unrelated and pre-existing:

Suite x86 arm64 Reason
iceberg_1, iceberg_2 FAIL FAIL Antalya features still being forward-ported to 26.1
parquet FAIL FAIL Antalya features still being forward-ported to 26.1
swarms FAIL FAIL Antalya features still being forward-ported to 26.1
settings FAIL FAIL Not updated for 26.1 yet

Conclusion

All CI failures are justified and unrelated to PR #1405. The PR's own tests (gtest + stateless SQL) pass. All "Checks New Fails" are either:

@CarlosFelipeOR CarlosFelipeOR added the verified Verified by QA label Mar 5, 2026
@CarlosFelipeOR
Copy link
Collaborator

PR #1405 Audit Review

AI review comment (model: gpt-5.3-codex)

1) Scope and partitions

PR reviewed: Altinity/ClickHouse#1405

Functional partitioning applied:

  • Partition A (core feature path): global list-objects cache and policy
    src/Storages/Cache/ObjectStorageListObjectsCache.*, src/Common/TTLCachePolicy.h
  • Partition B (integration into read path): object-storage source iterator/cache hookup
    src/Storages/ObjectStorage/StorageObjectStorageSource.*, object-storage interfaces/implementations
  • Partition C (control plane): settings, server bootstrap, system command, access rights, events
    programs/server/Server.cpp, src/Core/{ServerSettings,Settings,SettingsChangesHistory}.cpp, src/Interpreters/InterpreterSystemQuery.cpp, src/Parsers/ASTSystemQuery.*, src/Access/Common/AccessType.h, src/Common/ProfileEvents.cpp
  • Partition D (tests): gtest + stateless query tests
    src/Storages/Cache/tests/gtest_object_storage_list_objects_cache.cpp, tests/queries/0_stateless/03377_*

2) Call graph

Primary changed runtime flow:

  • query read path -> StorageObjectStorageSource::createFileIterator(...)
  • cache eligibility check (use_object_storage_list_objects_cache + supportsListObjectsCache)
  • cache lookup via singleton -> ObjectStorageListObjectsCache::instance().get(...)
  • cache miss -> object_storage->iterate(...) via GlobIterator
  • iterator completion -> list_cache->set(...) -> ObjectStorageListObjectsCache::set(...)

Control-plane flow:

  • server start -> sets cache limits/ttl in singleton
  • manual invalidation -> SYSTEM DROP OBJECT STORAGE LIST OBJECTS CACHE -> ObjectStorageListObjectsCache::clear()

3) Transition matrix

ID Transition Invariant
T1 request context -> cache key construction key must preserve authorization boundary
T2 cache hit path -> skip backend list call only safe if key identity is security-equivalent
T3 iterator completion -> cache population stored object list must be valid for future callers sharing key
T4 system drop cache -> global invalidation control-plane op must clear all tenant-visible entries
T5 observability profile events event semantics must match counters

4) Logical code-path testing summary

Reviewed branches:

  • cache disabled -> normal list path unchanged.
  • cache enabled + miss -> backend iterate + cache populate.
  • cache enabled + exact hit -> list bypass, local iterator from cached list.
  • cache enabled + prefix hit -> fallback search for best shorter prefix.
  • stale key path -> stale removal + continued prefix search.

Malformed/failure branches:

  • regex compile failures still throw as before.
  • no-file match behavior still enforced by throw_on_zero_files_match.

Interleaving analysis (multithreaded priority):

  • concurrent query A (privileged) can populate global singleton cache.
  • concurrent/follow-up query B (less privileged, same key material) can hit cached list and bypass backend ListObjects authorization check.
  • no locking/deadlock regressions observed in changed code; issue is authorization scope of shared cached state.

5) Fault categories and category-by-category injection results

Fault category Status Outcome Defects
Authorization boundary / tenant isolation Executed Fail 1
Cache-key identity correctness Executed Fail 1 (same root cause)
Concurrency/race/deadlock Executed Pass 0
Exception/partial-update safety Executed Pass 0
Error-contract consistency Executed Pass 0
Performance/resource pressure Executed Pass 0
Observability/diagnostics correctness Executed Fail 1
Parser/access-control wiring Executed Pass 0

6) Confirmed defects (High/Medium/Low)

High — Global cache key is not credential/user-scoped, enabling cross-credential list-result reuse

  • Impact: list results can be reused across different users/credentials if endpoint/bucket/prefix/with_tags match; this can bypass object-store ListObjects authorization boundary and leak object names (and possibly enable wildcard reads that should have required list permission).
  • Location: cache key construction and key identity logic.
  • Trigger condition: multi-user/multi-credential workload against same endpoint+bucket+prefix.
  • Affected transition: T1 -> T2 -> T3.
  • Why this is a defect: cache key omits effective principal/credential scope.
  • Smallest logical repro:
    1. User A (broader permissions) runs wildcard read, populates cache.
    2. User B (restricted list permission) runs matching wildcard with cache enabled.
    3. Query B hits cache and skips backend list call.
  • Likely fix direction: include stable authorization scope in key (effective credentials/fingerprint and/or query user id), and make equality/hash include it.
  • Regression test direction: two principals with different list permissions against same prefix; verify user B does not get cache hits from user A.

Code evidence:

bool ObjectStorageListObjectsCache::Key::operator==(const Key & other) const
{
    return storage_description == other.storage_description && bucket == other.bucket && prefix == other.prefix && with_tags == other.with_tags;
}

size_t ObjectStorageListObjectsCache::KeyHasher::operator()(const Key & key) const
{
    std::size_t seed = 0;

    boost::hash_combine(seed, key.storage_description);
    boost::hash_combine(seed, key.bucket);
    boost::hash_combine(seed, key.prefix);
    boost::hash_combine(seed, key.with_tags);

    return seed;
}
if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache())
{
    auto & cache = ObjectStorageListObjectsCache::instance();
    ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix()), with_tags};

    if (auto objects_info = cache.get(cache_key, /*filter_by_prefix=*/ false))
std::string getCommonKeyPrefix() const override { return uri.key; }

std::string getDescription() const override { return uri.endpoint; }

ObjectStorageType getType() const override { return ObjectStorageType::S3; }

bool supportsListObjectsCache() override { return true; }

Low — Profile event description for prefix-hit counter states “miss”

  • Impact: observability confusion; dashboards/alerts can misinterpret prefix-hit behavior.
  • Location: src/Common/ProfileEvents.cpp
  • Affected transition: T5.
  • Why defect: event semantic text contradicts event name and implementation behavior.

Code evidence:

M(ObjectStorageListObjectsCacheHits, "Number of times object storage list objects operation hit the cache.", ValueType::Number) \
M(ObjectStorageListObjectsCacheMisses, "Number of times object storage list objects operation miss the cache.", ValueType::Number)                                                                                                                                                                                                                                  \
M(ObjectStorageListObjectsCacheExactMatchHits, "Number of times object storage list objects operation hit the cache with an exact match.", ValueType::Number) \
M(ObjectStorageListObjectsCachePrefixMatchHits, "Number of times object storage list objects operation miss the cache using prefix matching.", ValueType::Number) \
M(ParquetMetaDataCacheHits, "Number of times the read from filesystem cache hit the cache.", ValueType::Number) \

7) Coverage accounting + stop-condition status

  • Call-graph nodes covered: all changed nodes in feature, integration, and control-plane wiring.
  • Transitions reviewed: T1-T5 all covered.
  • Fault categories: all listed categories executed; none deferred.
  • Coverage stop condition: met (all in-scope nodes/transitions/categories reviewed).

8) Assumptions & Limits

  • Static audit only; no live multi-user runtime replay in this pass.
  • Authorization impact inferred from key composition and cache flow, not from an executed IAM test matrix.
  • Sanitizer/runtime contention scenarios were not executed.

9) Confidence rating and confidence-raising evidence

  • Overall confidence: Medium-High.
  • To raise to High: run targeted multi-principal integration test (different S3 credentials, same prefix) with cache enabled; verify no cross-principal cache reuse.

10) Residual risks and untested paths

  • Residual security risk remains until cache key is principal-scoped.
  • Prefix matching behavior under highly adversarial mixed-prefix workloads is only statically reviewed.
  • No runtime stress test performed for cache memory pressure under large object listings.

@arthurpassos
Copy link
Collaborator Author

High — Global cache key is not credential/user-scoped, enabling cross-credential list-result reuse

We have discussed this extensively in the past, and the conclusion was that it was fine this way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.1 port-antalya PRs to be ported to all new Antalya releases port-forward Needs to be ported to every future minor release of this major version verified Verified by QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants