Hi! First, thanks for grmtools — I'm looking at using it in a project
that parses a lot of small inputs in a tight loop, and while profiling I
noticed something I wanted to flag.
The generated parse() from lrpar_mod! calls
::lrpar::ctbuilder::_reconstitute(__GRM_DATA, __STABLE_DATA) on every
invocation, which does two bincode::decode_from_slice calls to rebuild
the YaccGrammar and StateTable. For a single whole-buffer parse this
is invisible, but if you call parse() many times — e.g. parsing many
small chunks, or parsing inside a hot loop — that decode ends up
dominating. RTParserBuilder::new only borrows the tables, so the work
is pure repeat overhead.
In addition, the per-call setup in lrpar::parser has a couple of small
redundancies:
Parser::token_cost is stored as Box<&dyn Fn> — a heap allocation
around an already-borrowed reference, on every parse.
parse_map / parse_actions collect the lexer iterator into a
Vec<Result<_, _>> and then iterate it again to build the lexeme
vec, instead of a single collect::<Result<Vec<_>, _>>().
- The
iter_tidxs cost-positive assert! loop runs in release builds,
even though it's really a builder-side contract.
On the calc_actions example, with 200k iterations of a few short
expressions in release mode, I'm seeing roughly ~2.06 µs/parse → ~0.80 µs/parse
(about a 2.5× speedup) once these are addressed. Most of that is the
table decode; the rest is the small setup items.
I have a working patch and would be happy to send a PR. Two things I'd
appreciate guidance on before I do:
-
Public API. To cache the tables across calls, generated code
needs a concrete type to put in a OnceLock. Naming
(YaccGrammar<S>, StateTable<S>) directly would force every
downstream crate to depend on lrtable, which seems like a
regression. My current approach is to introduce an opaque
lrpar::ParserTables<S> newtype with grm() / stable()
accessors, re-exported at the crate root, so generated code only
names ::lrpar::ParserTables. Does that shape sound reasonable, or
would you prefer it #[doc(hidden)], or somewhere else entirely?
-
_reconstitute signature. It's already #[doc(hidden)], but its
return type would change from a tuple to ParserTables<S>. Since
generated code is regenerated against the matching lrpar version,
I think this is fine, but happy to keep the old function and add a
new one if you'd rather not touch it.
Happy to open the PR as-is, or to iterate on the design first — whatever
works best for you. Thanks again!
Hi! First, thanks for grmtools — I'm looking at using it in a project
that parses a lot of small inputs in a tight loop, and while profiling I
noticed something I wanted to flag.
The generated
parse()fromlrpar_mod!calls::lrpar::ctbuilder::_reconstitute(__GRM_DATA, __STABLE_DATA)on everyinvocation, which does two
bincode::decode_from_slicecalls to rebuildthe
YaccGrammarandStateTable. For a single whole-buffer parse thisis invisible, but if you call
parse()many times — e.g. parsing manysmall chunks, or parsing inside a hot loop — that decode ends up
dominating.
RTParserBuilder::newonly borrows the tables, so the workis pure repeat overhead.
In addition, the per-call setup in
lrpar::parserhas a couple of smallredundancies:
Parser::token_costis stored asBox<&dyn Fn>— a heap allocationaround an already-borrowed reference, on every parse.
parse_map/parse_actionscollect the lexer iterator into aVec<Result<_, _>>and then iterate it again to build the lexemevec, instead of a single
collect::<Result<Vec<_>, _>>().iter_tidxscost-positiveassert!loop runs in release builds,even though it's really a builder-side contract.
On the
calc_actionsexample, with 200k iterations of a few shortexpressions in release mode, I'm seeing roughly ~2.06 µs/parse → ~0.80 µs/parse
(about a 2.5× speedup) once these are addressed. Most of that is the
table decode; the rest is the small setup items.
I have a working patch and would be happy to send a PR. Two things I'd
appreciate guidance on before I do:
Public API. To cache the tables across calls, generated code
needs a concrete type to put in a
OnceLock. Naming(YaccGrammar<S>, StateTable<S>)directly would force everydownstream crate to depend on
lrtable, which seems like aregression. My current approach is to introduce an opaque
lrpar::ParserTables<S>newtype withgrm()/stable()accessors, re-exported at the crate root, so generated code only
names
::lrpar::ParserTables. Does that shape sound reasonable, orwould you prefer it
#[doc(hidden)], or somewhere else entirely?_reconstitutesignature. It's already#[doc(hidden)], but itsreturn type would change from a tuple to
ParserTables<S>. Sincegenerated code is regenerated against the matching
lrparversion,I think this is fine, but happy to keep the old function and add a
new one if you'd rather not touch it.
Happy to open the PR as-is, or to iterate on the design first — whatever
works best for you. Thanks again!