poco (POsterior COmpression, "little" in Italian) provides a small, unified API for compressing MCMC
posterior draws into compact density approximations (Gaussian mixture
models, kernel density estimates), regenerating samples on demand, and
evaluating the density at arbitrary points.
It is designed for situations where you need to store, ship, or
post-process many Bayesian models with limited disk space - for example
when caching hundreds of brms or sccomp fits, or sharing pre-trained
posteriors with collaborators. With Gaussian-mixture compression, the
on-disk footprint can drop by 100-1000x while preserving correlation
structure, multimodality and most downstream summaries.
A Stan model with 1000 parameters x 8000 draws x 8 bytes = ~64 MB per chain. With four chains that is 256 MB, and 100 such models is 25 GB.
A 10-component Gaussian mixture in 1000 dimensions stores only
weights + means + covariances, typically a few hundred KB.
# install.packages("remotes")
remotes::install_github("stemangiola/poco")brms / cmdstanr / sccomp are Suggested dependencies; you only
need them if you plan to compress fits from those packages.
poco exposes three entry points that all share the same
method argument:
| Function | Input |
|---|---|
compress_posterior() |
a draws matrix / data.frame / posterior::draws_* |
compress_fit() |
a cmdstanr::CmdStanMCMC or vector of CSV paths |
compress_brmsfit() |
a brms::brmsfit (cmdstanr backend) |
compress_sccomp() |
an sccomp fit (cmdstanr backend) |
All return an S3 object with class posterior_compressed, which is
consumed by:
sample_posterior()- regenerate draws (any number),density_posterior()- evaluate p.d.f. (or log p.d.f.) at given points,evaluate_compression()- quantify how much information was lost (a "98.3% reproduction"-style score, see below),reconstruct_brmsfit()/reconstruct_sccomp()- rebuild a usable fit object so the standard post-processing keeps working.
library(poco)
set.seed(1)
draws <- matrix(rnorm(2000 * 3), ncol = 3,
dimnames = list(NULL, c("alpha", "beta", "sigma")))
comp <- compress_posterior(
draws,
method = "mclust", # or "mvdens_gmm", "mvdens_kde"
n_components = 5
)
new_draws <- sample_posterior(comp, n_draws = 4000)
density_posterior(comp, x = matrix(c(0, 0, 1), nrow = 1))
saveRDS(comp, "model.rds", compress = "xz")library(brms)
fit <- brm(y ~ x + (1 | g), data = dat, backend = "cmdstanr")
result <- compress_brmsfit(fit, method = "mclust", n_components = 5)
saveRDS(result$compressed, "model_compressed.rds", compress = "xz")
saveRDS(result$structure, "model_structure.rds")
# Later, reload + reconstruct + use brms machinery as normal:
result_loaded <- list(
compressed = readRDS("model_compressed.rds"),
structure = readRDS("model_structure.rds")
)
fit_recon <- reconstruct_brmsfit(result_loaded)
predict(fit_recon, newdata = new_data)
posterior_predict(fit_recon)library(cmdstanr)
fit <- mod$sample(data = stan_data, chains = 4)
comp <- compress_fit(fit, method = "mclust", n_components = 5)
new_draws <- sample_posterior(comp, n_draws = 4000)method |
Backend | Notes |
|---|---|---|
"mclust" |
mclust::Mclust() |
parametric GMM, BIC-based selection, fast & compact |
"mvdens_gmm" |
mvdens::fit.gmm() |
EM-based GMM, requires the suggested mvdens package |
"mvdens_kde" |
mvdens::fit.kde() |
kernel density estimate, larger but non-parametric |
compression_methods() returns the current set.
Like JPEG quality scores, evaluate_compression() reports a
reproduction % for the compressed posterior. It uses two
backend-independent two-sample diagnostics on samples regenerated
from the compressed object versus the reference draws — so the score
is not circular (it doesn't reuse the same density model that did the
compression):
- Energy distance (Szekely & Rizzo) anchored against a bootstrap Monte Carlo noise envelope from the reference itself.
- Classifier two-sample test (C2ST) — a random forest via
ranger::ranger()under cross-validation tries to tell reference draws from reconstructed draws; AUC of 0.5 means indistinguishable. Passclassifier = "knn"for a k-NN alternative instead of the forest.
fidelity <- evaluate_compression(comp, reference_draws = draws)
fidelity
#> <compression_fidelity>
#> method : mclust
#> parameters : 3
#> reference n : 1500
#> eval n : 1500
#> ----------------------------------------
#> energy : 100.0% reproduction
#> distance : 0.054 noise envelope (q90): 0.062 ratio: 0.87x
#> C2ST : 97.7% reproduction
#> AUC : 0.511 classifier: ranger cv_folds: 5
#> ----------------------------------------
#> reproduction : 98.9%For a strict out-of-sample evaluation, hold the reference draws out before fitting:
idx <- sample.int(nrow(draws), 0.8 * nrow(draws))
comp <- compress_posterior(draws[idx, ], method = "mclust")
evaluate_compression(comp, reference_draws = draws[-idx, ])vignette("introduction", package = "poco")- hello world, matrix in / samples out.vignette("compress-brms", package = "poco")- end-to-endbrmsworkflow, including reconstruction.vignette("compress-funnel", package = "poco")-cmdstanrworkflow on Neal's funnel.vignette("compress-sccomp", package = "poco")- end-to-endsccompworkflow.vignette("methods-benchmark", package = "poco")- side-by-side comparison of the supported compression methods.vignette("evaluate-compression", package = "poco")- measure reproduction quality withevaluate_compression()(sanity checks on lost correlation, mode collapse, and a sliding-difficulty sweep).
GPL-3
