Skip to content

define a module for chunk grids, and a registry#3735

Open
d-v-b wants to merge 3 commits intozarr-developers:mainfrom
d-v-b:chore/chunk-grids-module
Open

define a module for chunk grids, and a registry#3735
d-v-b wants to merge 3 commits intozarr-developers:mainfrom
d-v-b:chore/chunk-grids-module

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Mar 1, 2026

This PR creates a modular structure for our chunk grids. Currently we only have the regular chunk grid, but we will get rectilinear chunks soon, and this layout will make that addition much easier.

This PR also adds an explicit registry for chunk grids. Previously we handled chunk grid class lookup in the ChunkGrid.from_dict method. A registry is better, and consistent with the rest of the codebase.

Since this is a purely internal change, I don't think we need release notes.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Mar 1, 2026
@d-v-b d-v-b requested a review from maxrjones March 1, 2026 16:18
@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 1, 2026

closes #3734, and supports #3534

@d-v-b d-v-b requested review from a team and jhamman March 2, 2026 02:06
@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 5, 2026

It would be good to have some eyes on this. we need this functionality for the rectilinear chunk grid.

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two reasons why this is actually public-facing:

  1. New functions added to __all__ in src/zarr/registry.py (which is outside .core)
  2. New zarr.chunk_grid entry point group in _collect_entrypoints

This could be kept private (and have a lower review bar) if the chunk grid registry is first prototyped inside zarr.core.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

I don't think we need to prototype this, it's using the same pattern as the registries we already have. I will add a release note but since we only support a single chunk grid implementation, there's not much anyone can do with this right now (besides register a different regular chunk grid).

@maxrjones
Copy link
Member

I don't think we need to prototype this, it's using the same pattern as the registries we already have. I will add a release note but since we only support a single chunk grid implementation, there's not much anyone can do with this right now (besides register a different regular chunk grid).

I would prefer that we build the functionality that allows using the chunk grid registry before exposing it in the public API. It's more about only having truly usable components in our public API than it is about prototyping. What is the downside of delaying its exposure via zarr.registry?

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

the registry itself works in this pr. someone who wants to register a different chunk grid can do that -- but it has to be some form of regular chunk grid. someone might want to register a different regular chunk grid with a different name, for example. that's possible, but I highly doubt anyone will want to do this.

I think the registry itself is logically distinct from the internal chunk grid implementation. so shipping the registry, in its final form, seems better to me than releasing a private registry and then modifying the registry again in a later PR.

@maxrjones
Copy link
Member

The registry exposes register_chunk_grid as public API, but there's no public API for creating an array with a custom chunk grid. zarr.create() and zarr.open_array() only accept chunk_shape: tuple[int, ...]. So the only way to actually use a registered chunk grid is through internal APIs like ArrayV3Metadata, which defeats the purpose of a public registry.

I agree that the registry is logically distinct from the internal chunk grid implementation. Still, I don't see value in adding something to the public API that cannot yet be used.

@maxrjones
Copy link
Member

Is there something about your plans for follow-up PRs that requires this functionality be public?

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

the goal is to be totally done with the chunk grid registry part of the problem. we can either do that in multiple PRs, or just one. I think "just one" is simpler, which is why this PR defines the chunk grid registry as a regular registry, like the other registries we already have. It's not very useful yet, but that's OK -- it doesn't break anything, users will not see it or be confused by it, and it's a necessary step (arguably a blocker) for the rest of the rectilinear chunk grid work we are trying to do.

@maxrjones
Copy link
Member

the goal is to be totally done with the chunk grid registry part of the problem. we can either do that in multiple PRs, or just one. I think "just one" is simpler, which is why this PR defines the chunk grid registry as a regular registry, like the other registries we already have. It's not very useful yet, but that's OK -- it doesn't break anything, users will not see it or be confused by it, and it's a necessary step (arguably a blocker) for the rest of the rectilinear chunk grid work we are trying to do.

Could we compromise and remove register_chunk_grid and get_chunk_grid_class from __all__? That way they would not show up in the public API docs or be returned from docs searches, such that I'd agree with you that users would not see it. I would gladly take on the follow-up PR to re-add them so more work is not added to your plate.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

that's fine, we will just have to revert those changes as soon as we start working on the next PRs (namely, the next PR that adds a new chunk grid)

@maxrjones
Copy link
Member

maxrjones commented Mar 6, 2026

that's fine, we will just have to revert those changes as soon as we start working on the next PRs (namely, the next PR that adds a new chunk grid)

can you share your plan for the sequence of PRs? __all__ only impact the docs and from zarr.registry import * behavior, so it's essential to understand what you plan to make public when to be able to understand the impact of my suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants