Wrap Arc to Statistics for partition_statistics API#20570
Wrap Arc to Statistics for partition_statistics API#20570xudong963 wants to merge 3 commits intoapache:mainfrom
partition_statistics API#20570Conversation
|
cc @jonathanc-n |
getChan
left a comment
There was a problem hiding this comment.
Nice improvement. Since this changes public trait signatures (Result -> Result<Arc>), could we add a short note in user-facing changes / migration guide for downstream implementers?
| /// (the default), not an error. | ||
| /// If `partition` is `None`, it returns statistics for the entire plan. | ||
| fn partition_statistics(&self, partition: Option<usize>) -> Result<Statistics> { | ||
| fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> { |
There was a problem hiding this comment.
This is breaking we should deprecate first + make another function signature (ex. shared_partition_statistics() or something). We can make the follow up on the new function signature after this.
There was a problem hiding this comment.
yeah, it should be. But given the change is small, only wrap an Arc, I'm thinking if it's enough to only mark the PR as API changing and add some notes to upgrade doc
There was a problem hiding this comment.
Makes sense to me. If I have the time, I'll try to do the the follow pull request, revert this API change and deprecate the original function in favour of the new one with Arc passed into the arguments
There was a problem hiding this comment.
yeah, it should be. But given the change is small, only wrap an Arc, I'm thinking if it's enough to only mark the PR as API changing and add some notes to upgrade doc
I agree that marking as an API change and then adding an upgrade guide is enough
@xudong963 I didn't see anything in the upgrade guide yet -- you might be the first PR for 54.0.0 (53.0.0 is being finalized now)
There was a problem hiding this comment.
Basically it should at least call out the Old vs new pattern:
Including the Arc::unwrap_or_clone example I think would also help people upgrade
// old
fn partition_statistics(&self, p: Option<usize>) -> Result<Statistics>;
// new
fn partition_statistics(&self, p: Option<usize>) -> Result<Arc<Statistics>>;There was a problem hiding this comment.
The only potential concern I have with this change is that it is a public API change (and thus will require a bunch of downstream consumer changes). If we are going to change the statistics API again in the near future having to do two API upgrades for the same function could be annoying to users
In regards to my second comment to this thread. If this is going through for v54.0.0, I can make a follow up PR to change this back to the original API signature before v54.1.0
// Change this
fn partition_statistics(&self, p: Option<usize>) -> Result<Arc<Statistics>>;
// Back to this
fn partition_statistics(&self, p: Option<usize>) -> Result<Statistics>;
// And then create new signature + deprecate old `partition_statistics()`
fn new_partition_statistics(&self, p: Option<usize>, child_statistics: Arc<Statistics>) -> Result<Arc<Statistics>>;This allows users to not have to make two upgrades to the same thing
I put out #20711 to finish up the EPIC to somewhat unblock the concern in #20184. So after that merges, the follow up to this pull request can be made to add the new function signature. WDYT? @alamb @xudong963
asolimando
left a comment
There was a problem hiding this comment.
Thanks for working on this @xudong963, good prerequisite for #20184 and also good to avoid deep clones, hope to see this landing soon :)
|
cc @alamb |
|
run benchmark sql_planner |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Thank you @xudong963 @jonathanc-n @getChan and others.
This is a great idea and a great change -- and it came up on the sync call with @paleolimbot and @adriangb as well
I do think we should add a note to the upgrade guide before we merge this PR in, as suggested by @jonathanc-n
The only potential concern I have with this change is that it is a public API change (and thus will require a bunch of downstream consumer changes). If we are going to change the statistics API again in the near future having to do two API upgrades for the same function could be annoying to users
I suppose we have until DataFusion 54 ships to try and get our act together for a nicer statistics API so maybe this will provide some additional pressure to do so
| /// (the default), not an error. | ||
| /// If `partition` is `None`, it returns statistics for the entire plan. | ||
| fn partition_statistics(&self, partition: Option<usize>) -> Result<Statistics> { | ||
| fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> { |
There was a problem hiding this comment.
yeah, it should be. But given the change is small, only wrap an Arc, I'm thinking if it's enough to only mark the PR as API changing and add some notes to upgrade doc
I agree that marking as an API change and then adding an upgrade guide is enough
@xudong963 I didn't see anything in the upgrade guide yet -- you might be the first PR for 54.0.0 (53.0.0 is being finalized now)
| /// (the default), not an error. | ||
| /// If `partition` is `None`, it returns statistics for the entire plan. | ||
| fn partition_statistics(&self, partition: Option<usize>) -> Result<Statistics> { | ||
| fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> { |
There was a problem hiding this comment.
Basically it should at least call out the Old vs new pattern:
Including the Arc::unwrap_or_clone example I think would also help people upgrade
// old
fn partition_statistics(&self, p: Option<usize>) -> Result<Statistics>;
// new
fn partition_statistics(&self, p: Option<usize>) -> Result<Arc<Statistics>>;|
🤖: Benchmark completed Details
|
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?