Antalya 26.1 - Forward port of list objects cache #1040#1405
Antalya 26.1 - Forward port of list objects cache #1040#1405zvonand merged 15 commits intoantalya-26.1from
Conversation
…ache Antalya 25.8 - Forward port of #805 List objects cache
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bb0d4833c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache()) | ||
| { | ||
| auto & cache = ObjectStorageListObjectsCache::instance(); | ||
| ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())}; |
There was a problem hiding this comment.
Scope list-objects cache entries to auth identity
StorageObjectStorageSource::createFileIterator builds the cache key from only description/namespace/prefix, so entries are shared across sessions that use different object-store credentials for the same endpoint+bucket+prefix. In S3, getDescription() is endpoint-based, so a query can reuse another credential's cached object list, which can surface object names and trigger read failures on objects the current credentials cannot access. Include credential/user identity in the key (or explicitly disable cache for per-query credentials) to avoid cross-credential cache pollution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
IIRC, we did discuss this and came to the conclusion it was ok.
| auto & cache = ObjectStorageListObjectsCache::instance(); | ||
| ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix())}; | ||
|
|
||
| if (auto objects_info = cache.get(cache_key, /*filter_by_prefix=*/ false)) |
There was a problem hiding this comment.
Differentiate list cache entries by tag-fetch mode
The cache key does not include with_tags, so a cached listing produced with with_tags=false can be reused for a later _tags query that needs tag metadata. On a cache hit, createReader skips metadata refetch because each ObjectInfo already has metadata, so _tags can be silently empty/wrong even though the query requested tags. Add with_tags (or an equivalent metadata-completeness flag) to the cache key.
Useful? React with 👍 / 👎.
| const std::shared_ptr<Value> & value) | ||
| { | ||
| auto key_with_ttl = key; | ||
| key_with_ttl.expires_at = std::chrono::steady_clock::now() + std::chrono::seconds(ttl_in_seconds); |
There was a problem hiding this comment.
Honor zero TTL as non-expiring list cache entries
The server setting says TTL 0 means unlimited, but ObjectStorageListObjectsCache::set always writes expires_at = now + seconds(ttl_in_seconds). With ttl_in_seconds == 0, entries expire immediately and are treated as stale on subsequent lookups, effectively disabling caching instead of making entries non-expiring. Handle 0 as a special case (e.g., max time point) to match the documented setting behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed - I am ashamed, thanks machine
QA VerificationPR SummaryForward port of the ListObjects cache feature to antalya-26.1. Adds Tests Added by the PR
All builds (12), fuzzers (10), and the majority of integration/stateless/regression suites passed. Integration and Stateless Tests — Unrelated Failures1.
|
| Test | Job (earlier attempt) | Upstream fails (30d) | Other antalya PRs |
|---|---|---|---|
test_postgresql_replica_database_engine/test_3.py (18 tests) |
amd_tsan 2/6 | 5-16 per test | -- |
test_database_glue/test.py (13 tests) |
amd_asan 2/6 | -- | PR #1428 |
test_kafka_bad_messages/test.py (4 tests) |
amd_binary 4/5 | 71-72 per test | -- |
test_disk_over_web_server/test.py (7 tests) |
amd_binary 1/5 | -- | PRs #1414, #1390, #1375, master |
test_storage_s3_queue/test_3.py (9 tests) |
amd_tsan 4/6 | 6-18 per test | -- |
test_storage_delta/test.py::test_network_activity_with_system_tables |
amd_asan targeted, others | -- | All antalya PRs (PR #1432 regression, fixed by #1474) |
test_backup_to_s3_different_credentials |
amd_binary 3/5 | -- | PR #1390 |
00167_read_bytes_from_fs |
amd_ubsan seq | 4 (master only) | -- |
ThreadSanitizer: data race / Scraping system tables |
amd_tsan s3 seq | -- | PRs #1414, #1390, #1395, #1466, master + 5 more |
Related to PR 1405: No
Regression Tests (Altinity/clickhouse-regression)
All failures are unrelated and pre-existing:
| Suite | x86 | arm64 | Reason |
|---|---|---|---|
| iceberg_1, iceberg_2 | FAIL | FAIL | Antalya features still being forward-ported to 26.1 |
| parquet | FAIL | FAIL | Antalya features still being forward-ported to 26.1 |
| swarms | FAIL | FAIL | Antalya features still being forward-ported to 26.1 |
| settings | FAIL | FAIL | Not updated for 26.1 yet |
Conclusion
All CI failures are justified and unrelated to PR #1405. The PR's own tests (gtest + stateless SQL) pass. All "Checks New Fails" are either:
- Export test instability from PR Antalya 26.1 - Forward port of export part and partition #1388 (tracked in clickhouse-regression#112)
- Known flaky tests that also fail upstream and on other antalya-26.1 PRs
- Transient failures that passed on rerun
PR #1405 Audit Review
1) Scope and partitionsPR reviewed: Altinity/ClickHouse#1405 Functional partitioning applied:
2) Call graphPrimary changed runtime flow:
Control-plane flow:
3) Transition matrix
4) Logical code-path testing summaryReviewed branches:
Malformed/failure branches:
Interleaving analysis (multithreaded priority):
5) Fault categories and category-by-category injection results
6) Confirmed defects (High/Medium/Low)High — Global cache key is not credential/user-scoped, enabling cross-credential list-result reuse
Code evidence: bool ObjectStorageListObjectsCache::Key::operator==(const Key & other) const
{
return storage_description == other.storage_description && bucket == other.bucket && prefix == other.prefix && with_tags == other.with_tags;
}
size_t ObjectStorageListObjectsCache::KeyHasher::operator()(const Key & key) const
{
std::size_t seed = 0;
boost::hash_combine(seed, key.storage_description);
boost::hash_combine(seed, key.bucket);
boost::hash_combine(seed, key.prefix);
boost::hash_combine(seed, key.with_tags);
return seed;
}if (local_context->getSettingsRef()[Setting::use_object_storage_list_objects_cache] && object_storage->supportsListObjectsCache())
{
auto & cache = ObjectStorageListObjectsCache::instance();
ObjectStorageListObjectsCache::Key cache_key {object_storage->getDescription(), configuration->getNamespace(), configuration->getRawPath().cutGlobs(configuration->supportsPartialPathPrefix()), with_tags};
if (auto objects_info = cache.get(cache_key, /*filter_by_prefix=*/ false))std::string getCommonKeyPrefix() const override { return uri.key; }
std::string getDescription() const override { return uri.endpoint; }
ObjectStorageType getType() const override { return ObjectStorageType::S3; }
bool supportsListObjectsCache() override { return true; }Low — Profile event description for prefix-hit counter states “miss”
Code evidence: M(ObjectStorageListObjectsCacheHits, "Number of times object storage list objects operation hit the cache.", ValueType::Number) \
M(ObjectStorageListObjectsCacheMisses, "Number of times object storage list objects operation miss the cache.", ValueType::Number) \
M(ObjectStorageListObjectsCacheExactMatchHits, "Number of times object storage list objects operation hit the cache with an exact match.", ValueType::Number) \
M(ObjectStorageListObjectsCachePrefixMatchHits, "Number of times object storage list objects operation miss the cache using prefix matching.", ValueType::Number) \
M(ParquetMetaDataCacheHits, "Number of times the read from filesystem cache hit the cache.", ValueType::Number) \7) Coverage accounting + stop-condition status
8) Assumptions & Limits
9) Confidence rating and confidence-raising evidence
10) Residual risks and untested paths
|
We have discussed this extensively in the past, and the conclusion was that it was fine this way |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Forward port of list objects cache #1040
Documentation entry for user-facing changes
Cache for listobjects calls
CI/CD Options
Exclude tests:
Regression jobs to run: