Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in parse #119

Open
alanshaw opened this issue Aug 21, 2023 · 2 comments
Open

Possible memory leak in parse #119

alanshaw opened this issue Aug 21, 2023 · 2 comments

Comments

@alanshaw
Copy link
Member

Parsing a stream of ndjson entries causes OOM in Node.js 20:

<--- Last few GCs --->

[92603:0x158008000]   106289 ms: Mark-Compact 4050.2 (4134.2) -> 4035.5 (4135.5) MB, 924.04 / 0.00 ms  (average mu = 0.116, current mu = 0.034) allocation failure; scavenge might not succeed
[92603:0x158008000]   107247 ms: Mark-Compact 4051.7 (4135.7) -> 4036.7 (4136.7) MB, 927.25 / 0.00 ms  (average mu = 0.076, current mu = 0.033) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x1002d0588 node::Abort() [/usr/local/bin/node]
 2: 0x1002d0770 node::ModifyCodeGenerationFromStrings(v8::Local<v8::Context>, v8::Local<v8::Value>, bool) [/usr/local/bin/node]
 3: 0x10045233c v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/usr/local/bin/node]
 4: 0x1006269f4 v8::internal::Heap::GarbageCollectionReasonToString(v8::internal::GarbageCollectionReason) [/usr/local/bin/node]
 5: 0x10062a8a8 v8::internal::Heap::CollectGarbageShared(v8::internal::LocalHeap*, v8::internal::GarbageCollectionReason) [/usr/local/bin/node]
 6: 0x10062730c v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [/usr/local/bin/node]
 7: 0x100625094 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 8: 0x10061bce8 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0x10061c548 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10060154c v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
11: 0x1009e8f5c v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0x100d48c44 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]
13: 0x100cf7038 Builtins_AsyncFunctionAwaitUncaught [/usr/local/bin/node]
14: 0x10606df24 
15: 0x100cf7210 Builtins_AsyncFunctionAwaitResolveClosure [/usr/local/bin/node]
16: 0x100da4fb8 Builtins_PromiseFulfillReactionJob [/usr/local/bin/node]
17: 0x100ce6b94 Builtins_RunMicrotasks [/usr/local/bin/node]
18: 0x100cbe3f4 Builtins_JSRunMicrotasksEntry [/usr/local/bin/node]
19: 0x100593f7c v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/usr/local/bin/node]
20: 0x100594468 v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/usr/local/bin/node]
21: 0x100594644 v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*) [/usr/local/bin/node]
22: 0x1005bb700 v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) [/usr/local/bin/node]
23: 0x1005bbe9c v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) [/usr/local/bin/node]
24: 0x1004bff34 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) [/usr/local/bin/node]
25: 0x1004bf62c v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
26: 0x100d48b24 Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit [/usr/local/bin/node]
27: 0x1061b1548 
28: 0x100cbe50c Builtins_JSEntryTrampoline [/usr/local/bin/node]
29: 0x100cbe1f4 Builtins_JSEntry [/usr/local/bin/node]
30: 0x100593fa4 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/usr/local/bin/node]
31: 0x1005933f0 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [/usr/local/bin/node]
32: 0x10046dcc8 v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) [/usr/local/bin/node]
33: 0x100204d3c node::InternalCallbackScope::Close() [/usr/local/bin/node]
34: 0x10020501c node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/usr/local/bin/node]
35: 0x10021b46c node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/usr/local/bin/node]
36: 0x100399254 node::StreamBase::CallJSOnreadMethod(long, v8::Local<v8::ArrayBuffer>, unsigned long, node::StreamBase::StreamBaseJSChecks) [/usr/local/bin/node]
37: 0x10039a8c0 node::EmitToJSStreamListener::OnStreamRead(long, uv_buf_t const&) [/usr/local/bin/node]
38: 0x10039eba0 node::LibuvStreamWrap::OnUvRead(long, uv_buf_t const*) [/usr/local/bin/node]
39: 0x10039f324 node::LibuvStreamWrap::ReadStart()::$_1::__invoke(uv_stream_s*, long, uv_buf_t const*) [/usr/local/bin/node]
40: 0x100ca7f48 uv__stream_io [/usr/local/bin/node]
41: 0x100caf84c uv__io_poll [/usr/local/bin/node]
42: 0x100c9dd38 uv_run [/usr/local/bin/node]
43: 0x100205754 node::SpinEventLoopInternal(node::Environment*) [/usr/local/bin/node]
44: 0x100310984 node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/usr/local/bin/node]
45: 0x10031072c node::NodeMainInstance::Run() [/usr/local/bin/node]
46: 0x10029c4b0 node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResultImpl const*) [/usr/local/bin/node]
47: 0x10029c7d0 node::Start(int, char**) [/usr/local/bin/node]
48: 0x105e250f4 
@rvagg
Copy link
Member

rvagg commented Aug 22, 2023

interesting! do you happen to have a good source of data I could reproduce this with?

@rvagg
Copy link
Member

rvagg commented Aug 22, 2023

Here's my attempt at reproducing something approximating what you describe:

import { garbage } from '@ipld/garbage'
import * as json from '@ipld/dag-json'

const objTargetSize = 1024 * 100 // 100k

const utf8Dec = new TextDecoder()
const utf8Enc = new TextEncoder()

function nfmt (n) {
  return Intl.NumberFormat().format(n)
}

let lastGbg
const src = {
  async * [Symbol.asyncIterator] () {
    while (true) {
      await new Promise((resolve) => setImmediate(resolve))
      lastGbg = garbage(objTargetSize)
      yield utf8Dec.decode(json.encode(lastGbg))
    }
  }
}

let c = 0
let byts = 0

for await (const line of src) {
  let obj
  try {
    obj = json.decode(utf8Enc.encode(line))
  } catch (err) {
    if (err.message === 'Invalid encoded CID form' ||
      err.message === 'To parse non base32 or base58btc encoded CID multibase decoder must be provided' ||
      err.message === 'Non-base58btc character') {
      // @ipld/garbage may throw up an object with `{'/':'whatever'}` that @ipld/dag-json won't bork at encoding (yet)
      continue
    }
    console.log('Got error:')
    console.log(err.message)
    console.log('On content:')
    console.log(line)
    console.log('Object:')
    console.log(lastGbg)
  }
  if (obj === undefined) {
    throw new Error('undefined')
  }
  byts += line.length
  if (++c % 1000 === 0) {
    console.log('%s lines / %s Mb processed', nfmt(c), nfmt(Math.floor(byts / 1024 / 1024)))
  }
}

But running it (Node.js v20.5.1) just seems to hover somewhere above 100k in memory. I'll leave it running a bit longer but as of writing I've got up to ~800k lines / 50 Gb worth of decoding. Can you help tweak this to replicate your environment a bit better @alanshaw?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants