Generated parse() re-decodes grammar tables on every call

Hi! First, thanks for grmtools — I'm looking at using it in a project
that parses a lot of small inputs in a tight loop, and while profiling I
noticed something I wanted to flag.

The generated `parse()` from `lrpar_mod!` calls
`::lrpar::ctbuilder::_reconstitute(__GRM_DATA, __STABLE_DATA)` on every
invocation, which does two `bincode::decode_from_slice` calls to rebuild
the `YaccGrammar` and `StateTable`. For a single whole-buffer parse this
is invisible, but if you call `parse()` many times — e.g. parsing many
small chunks, or parsing inside a hot loop — that decode ends up
dominating. `RTParserBuilder::new` only borrows the tables, so the work
is pure repeat overhead.

In addition, the per-call setup in `lrpar::parser` has a couple of small
redundancies:

 * `Parser::token_cost` is stored as `Box<&dyn Fn>` — a heap allocation
   around an already-borrowed reference, on every parse.
 * `parse_map` / `parse_actions` collect the lexer iterator into a
   `Vec<Result<_, _>>` and then iterate it again to build the lexeme
   vec, instead of a single `collect::<Result<Vec<_>, _>>()`.
 * The `iter_tidxs` cost-positive `assert!` loop runs in release builds,
   even though it's really a builder-side contract.

On the `calc_actions` example, with 200k iterations of a few short
expressions in release mode, I'm seeing roughly **~2.06 µs/parse → ~0.80 µs/parse**
(about a 2.5× speedup) once these are addressed. Most of that is the
table decode; the rest is the small setup items.

I have a working patch and would be happy to send a PR. Two things I'd
appreciate guidance on before I do:

1. **Public API.** To cache the tables across calls, generated code
   needs a concrete type to put in a `OnceLock`. Naming
   `(YaccGrammar<S>, StateTable<S>)` directly would force every
   downstream crate to depend on `lrtable`, which seems like a
   regression. My current approach is to introduce an opaque
   `lrpar::ParserTables<S>` newtype with `grm()` / `stable()`
   accessors, re-exported at the crate root, so generated code only
   names `::lrpar::ParserTables`. Does that shape sound reasonable, or
   would you prefer it `#[doc(hidden)]`, or somewhere else entirely?

2. **`_reconstitute` signature.** It's already `#[doc(hidden)]`, but its
   return type would change from a tuple to `ParserTables<S>`. Since
   generated code is regenerated against the matching `lrpar` version,
   I think this is fine, but happy to keep the old function and add a
   new one if you'd rather not touch it.

Happy to open the PR as-is, or to iterate on the design first — whatever
works best for you. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated parse() re-decodes grammar tables on every call #630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generated parse() re-decodes grammar tables on every call #630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions