Bump node-addon-api to fix memory leak #2436#2562
Conversation
|
Great find! That tracks perfectly with what we discussed in the issue, and it looks like we are good to upgrade since we don't support node 16. The tests are failing, though. |
|
Thank you for running the tests, I didn't run them local before! Now I ran it on Mac and Linux, reproduced the failures local, I hope my last commit fix all of them. 🙏🏻 |
|
Nice, all green now! 💚 Looking forward to have it reviewed/merged/released! |
| Napi::MemoryManagement::AdjustExternalMemory(env, -(int64_t)approxBytesPerPixel() * width * height); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
From my reading of node-addon-api, having this member function forces the destructor to get called after gc, though before JS goes back to executing. It makes sense since this couldn't get called if the instance were already destroyed. Here's the relevant bit, cleaned up (source):
// If class overrides the basic finalizer, execute it.
if constexpr (details::HasBasicFinalizer<T>::value) {
HandleScope scope(env);
instance->Finalize(Napi::BasicEnv(env));
}
// If class overrides the (extended) finalizer, schedule it.
if constexpr (details::HasExtendedFinalizer<T>::value) {
// Attach via node_api_post_finalizer. `PostFinalizeCallback` is responsible
// for deleting the `T* instance`, after calling the user-provided finalizer.
napi_status status =
node_api_post_finalizer(env, PostFinalizeCallback, data, nullptr);
NAPI_FATAL_IF_FAILED(status,
"ObjectWrap<T>::FinalizeCallback",
"node_api_post_finalizer failed");
}
// If the instance does _not_ override the (extended) finalizer, delete the
// `T* instance` immediately.
else {
delete instance;
}
And the PostFinalizeCallback looks like this:
T* instance = static_cast<T*>(data);
instance->Finalize(Napi::Env(env));
delete instance;But it would be better to destroy within gc, which is the whole point of BasicEnv. All day I've been trying to figure out why that causes node to crash (your initial test failures). I know it crashes in AdjustExternalMemory, but it seems to be failing because of something internal to node or node-addon-api.
AdjustExternalMemory takes a BasicEnv, and that env should still be valid, so I'm kind of at a loss for now. I'd like to take some time to understand it, which either leads to a bug fix in node or a corrected understanding on my part. I'm a little more hesitant to use the experimental API now. If we wanted to use it now, just having this method be empty and leaving the AdjustExternalMemory below would be better, since we want that to get called if we destroy from the C++ side.
I also think we should probably refactor the classes to only store BasicEnv to avoid accidentally trying to invoke JS at the wrong times. It's pretty easy to grab the full Napi::Env from info or other context.
|
I added a test to make sure memory leak doesn't come back later. I don't know how to proceed from here. |
468546c to
b239207
Compare
|
Works like a charm for our server-side rendering. Memory issues we had are gone. We already use a merged version internally at the company. Looking forward for the official merge. |
|
I deployed it today and looks like it didn't fix my memory leak. 😢 Happy that it worked for others. |
|
For completeness, it doesn't mean the fix is not real, others confirmed it works, but probably I have other leaks. 😆 |
canvas_state_t held four cairo_pattern_t* members (fillPattern, strokePattern, fillGradient, strokeGradient) but the destructor only freed fontDescription. Every Context2d that ended life with a non-null pattern leaked the pattern. The copy constructor copied the raw pointers, so save()/restore() shared the same pattern across states without refcounting — making any per-state destroy unsafe. Switch to proper cairo refcount discipline: - canvas_state_t copy ctor + new operator= reference each pattern. - ~canvas_state_t releases all four patterns. - New setFillPattern / setStrokePattern / setFillGradient / setStrokeGradient helpers replace the previous pattern (refcount--) before storing the new one (refcount++). clearFillPattern / clearStrokePattern release both pattern and gradient when switching to a solid color. Also fix two correctness issues exposed by prompt finalization (NAPI_EXPERIMENTAL path from PR Automattic#2562): - _fillStyle and _strokeStyle now use Reset(obj, 1) (strong ref) so the JS gradient/pattern object stays alive while the Context2d's state still references its native cairo_pattern_t. - ~Context2d pops all states before destroying _layout and _context so canvas_state_t destructors run with valid context. - resetState clears all states (not just pop()) and re-points 'state' at the freshly emplaced top.
|
As it did not seem to fix your issues @iurisilvio, I wanted to add what exactly we have used for our testing, in case someone else needs the info. |
Fix memory leak from deferred N-API weak reference callbacks
Fixes #2436
Problem
Since the NAN -> N-API migration (v3.0.0),
ObjectWrapdestructors (which free cairo surfaces) are deferred to the nextSetImmediateinstead of running during GC. In server environments where the event loop is always busy, these destructors never fire and memory grows indefinitely.This was reported upstream in nodejs/node-addon-api#1140 and fixed in nodejs/node#42651 by introducing
node_api_nogc_finalize, a finalizer that runs directly during GC for native-only cleanup.Fix
node-addon-apifrom 7.x to 8.x (which usesnode_api_nogc_finalizeforObjectWrap::FinalizeCallback)NAPI_EXPERIMENTALtobinding.gypdefines (required to enableNODE_API_EXPERIMENTAL_HAS_POST_FINALIZERin Node's headers)No code changes,
node-addon-api8.x automatically routesObjectWrapdestructor callbacks through the nogc finalizer path when the flag is set.I don't fully understand the impact of
node-addon-apiupgrade or enablingNAPI_EXPERIMENTAL.Before / After
200 requests decoding 3000x2000 JPEG images into thumbnails: