Skip to content

Dependence aware tests for ppc_loo_pit_ecdf#428

Open
florence-bockting wants to merge 73 commits intostan-dev:masterfrom
florence-bockting:dependence-aware-LOO-PIT
Open

Dependence aware tests for ppc_loo_pit_ecdf#428
florence-bockting wants to merge 73 commits intostan-dev:masterfrom
florence-bockting:dependence-aware-LOO-PIT

Conversation

@florence-bockting
Copy link

Description

Context

LOO-PIT is used for model checking within the Bayesian workflow. The LOO-PIT values are asymptotically uniform (for continuous data) if the model is calibrated. Within this approach, each data point is iteratively held out; the model is then conditioned on the remaining data and the corresponding LOO predictive distribution is compared to the held-out point to test for departures from uniformity. A corresponding graphical uniformity test was developed by Säilynoja et al. (2022) and is implemented in the ppc_loo_pit_ecdf function in the bayesplot package. This function visualizes the empirical cumulative distribution function (ECDF) of the LOO-PITs, overlaid with simultaneous confidence intervals (creating an envelope) for a standard uniform sample.

Issue

The current approach assumes independence of LOO-PIT values which is not valid (Marhunenda et al., 2005). The corresponding graphical test yields an envelope that is too wide, reducing the test's ability to reveal model miscalibration.

Suggested solution (Content of this PR)

Tesso & Vehtari (2026, see preprint) propose three testing procedures that can handle any dependent uniform values and provide an updated graphical representation that uses color coding to indicate influential regions or most influential points of the ECDF. This PR implements the new development, by replacing the current ppc_loo_pit_ecdf implementation.

TODOs

  • updated ppc_loo_pit_ecdf() function in ppc-loo.R
  • add new helper functions for computing uniformity tests in helpers-ppc.R
  • add unittests for plotting and helper functions in test-helpers-ppc.R and test-ppc-loo.R
  • add visual regression tests for ppc_loo_pit_ecdf() in test-ppc-loo.R
  • update documentation
  • new or updated vignette with explanation of differences between new and old method (TODO)
  • deprecation suggestion of old method
    • P1: old method (default) and new method available via method argument
    • P2: new method (default) and old method available via method argument
    • P3: old method is removed

@florence-bockting florence-bockting changed the title Dependence aware loo pit Dependence aware tests for ppc_loo_pit_ecdf Mar 4, 2026
@florence-bockting florence-bockting marked this pull request as draft March 4, 2026 08:45
@florence-bockting florence-bockting marked this pull request as ready for review March 4, 2026 08:45
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 97.39583% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.70%. Comparing base (306c92e) to head (237daba).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
R/ppc-loo.R 96.87% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #428      +/-   ##
==========================================
+ Coverage   98.66%   98.70%   +0.03%     
==========================================
  Files          35       35              
  Lines        5860     6028     +168     
==========================================
+ Hits         5782     5950     +168     
  Misses         78       78              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgabry
Copy link
Member

jgabry commented Mar 4, 2026

I'm seeing the failure on R-devel outside of this PR too (there seem to be some tiny insignificant differences to the SVGs on r-devel, which has happened before). I haven't seen the Mac failure on r-release before, but it's possible it's not due to this PR. We'd need to check that one more closely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants