Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue using abortableSource() #1420

Closed
twoeths opened this issue Oct 10, 2022 · 11 comments
Closed

Performance issue using abortableSource() #1420

twoeths opened this issue Oct 10, 2022 · 11 comments
Labels
kind/discussion Topical discussion; usually not changes to codebase need/triage Needs initial labeling and prioritization P2 Medium: Good to have, but can wait until someone steps up

Comments

@twoeths
Copy link
Contributor

twoeths commented Oct 10, 2022

Severity:

High

Description:

The abortableSource() api is used across libp2p stacks and it takes more than 7.5% of lodestar cpu time, which causes the I/O lag issue in lodestar. I think this is too much considering this is not a core feature when running a node in lodestar.

yarn why v1.22.17
[1/4] 🤔  Why do we have the module "abortable-iterator"...?
[2/4] 🚚  Initialising dependency graph...
[3/4] 🔍  Finding dependency...
[4/4] 🚡  Calculating file sizes...
=> Found "abortable-iterator@4.0.2"
info Reasons this module exists
   - "_project_#@lodestar#beacon-node#@chainsafe#libp2p-gossipsub" depends on it
   - Hoisted from "_project_#@lodestar#beacon-node#@chainsafe#libp2p-gossipsub#abortable-iterator"
   - Hoisted from "_project_#@lodestar#beacon-node#@libp2p#mplex#abortable-iterator"
   - Hoisted from "_project_#@lodestar#beacon-node#@libp2p#tcp#abortable-iterator"
   - Hoisted from "_project_#@libp2p#peer-record#@libp2p#utils#abortable-iterator"
   - Hoisted from "_project_#@lodestar#beacon-node#libp2p#abortable-iterator"
   - Hoisted from "_project_#@lodestar#beacon-node#libp2p#@libp2p#multistream-select#abortable-iterator"
   - Hoisted from "_project_#@lodestar#beacon-node#@chainsafe#libp2p-gossipsub#@libp2p#pubsub#abortable-iterator"
info Disk size without dependencies: "120KB"
info Disk size with unique dependencies: "184KB"
info Disk size with transitive dependencies: "184KB"
info Number of shared dependencies: 2
✨  Done in 0.96s.

I suggest we need to find a more efficient abortable pattern and apply it everywhere

cc @wemeetagain @mpetrunic @dapplion

Steps to reproduce the error:

Run lodestar, monitor metrics and take a profile from there

@twoeths twoeths added the need/triage Needs initial labeling and prioritization label Oct 10, 2022
@twoeths twoeths changed the title Performance issue with Performance issue using abortableSource() Oct 10, 2022
@twoeths
Copy link
Contributor Author

twoeths commented Oct 10, 2022

attaching the detail of time used in aboratableSource() function of abortable-iterator package

@dapplion
Copy link
Contributor

A core value in Rust is "zero cost abstractions". As Javascript developers we must be mindful that in this language this is not the case. I worry that in the current libp2p implementation a single package requires a very high amount of iterator cycles, just for a single package.

More investigation is needed to understand the costs, but on a preliminary review it's a apparent that the amount of Promises and related objects per packet may be excessive.

@mpetrunic mpetrunic added kind/discussion Topical discussion; usually not changes to codebase P2 Medium: Good to have, but can wait until someone steps up labels Nov 8, 2022
achingbrain added a commit to libp2p/js-libp2p-mplex that referenced this issue Nov 25, 2022
Javascript abstractions are not free. While using pipe here looks nice, it adds a non-neglible cost allocating Promises for each extra `for await ()` iteration.

- Similar rationale to libp2p/js-libp2p#1420 (comment)

This PR merges sink pipe components in a single iteration

Co-authored-by: achingbrain <alex@achingbrain.net>
github-actions bot pushed a commit to libp2p/js-libp2p-mplex that referenced this issue Nov 25, 2022
## [7.0.6](v7.0.5...v7.0.6) (2022-11-25)

### Bug Fixes

* reduce async iterator loops per package in _createSink ([#224](#224)) ([e2a32ad](e2a32ad)), closes [/github.com/libp2p/js-libp2p/issues/1420#issuecomment-1273272662](https://github.com/libp2p//github.com/libp2p/js-libp2p/issues/1420/issues/issuecomment-1273272662)
achingbrain added a commit to libp2p/js-libp2p-tcp that referenced this issue Dec 13, 2022
Usage of `abortable-iterator` has a non-negligible cost, see libp2p/js-libp2p#1420 so remove it and close the socket directly when the abort signal fires.

Co-authored-by: achingbrain <alex@achingbrain.net>
@marcus-pousette
Copy link
Contributor

Did some investigation.

In the abortable-iterator library one is racing every promise with a promise that could potentially abort.

await Promise.race([iterator.next(), abortPromise])

If you, instead do it in batches of 50-100, like this,

await Promise.race([Promise.all(batch), abortPromise])

where batch is an array of iteration promises, you get abourt 50% improvement in performance, but you will have a promise that is running the background until the batch is fully resolved or rejected (even if iterator is aborted)

Even faster, if you consider that we necessarily don't need to cancel promises att all and do something like

async function* asyncIterableWithAbort<T>(signal: AbortSignal, iterable: AsyncIterable<T>): AsyncGenerator<T, void, undefined> {
  const iterator = iterable[Symbol.asyncIterator]();

  while (true) {
    const { done, value } = await iterator.next();

    if (done) {
      return;
    }

    yield value;

    if (signal.aborted) {
      return;
    }
  }
}

we can do a lot better, it would be more or less "zero cost" in this case.

Generally, in many cases where we use abortableItereator we are using it waiting for small pieces of data, which imo are not worth beeing able to cancel to save a millisecond or what not. Instead, overall, more time is saved if we remove the cancelable promise feature altogether and do the simplest kind of abortable iterator (above).

@marcus-pousette
Copy link
Contributor

marcus-pousette commented May 2, 2023

For @tuyennhv specific case though I don't think the bottleneck actually is abortable-iterator, because most of the time it is waiting for the next promise to resolve. Judging by the CPU times provided it looks like the noise encryption is what actually is the bottleneck. Perhaps try to optimize this instead? I have made a fork of chainsafe noise encryption where I utilize Node native crypto functions and WASM that gives around 5x improved speed for Node envs @dao-xyz/libp2p-noise

@wemeetagain
Copy link
Member

We discussed this issue in the libp2p triage call today.
The consensus is that we would definitely benefit from having a non-cancelable abortable source implementation that is very cheap to use.
We can then look at selectively using the cheaper version where it makes sense.

@achingbrain
Copy link
Member

async function* asyncIterableWithAbort<T>(signal: AbortSignal, iterable: AsyncIterable<T>): AsyncGenerator<T, void, undefined> {
  const iterator = iterable[Symbol.asyncIterator]();

  while (true) {
    const { done, value } = await iterator.next();

    if (done) {
      return;
    }

    yield value;

    if (signal.aborted) {
      return;
    }
  }
}

Thinking about this a bit more, the above won't work as it needs the promise returned from iterator.next() to resolve before it checks the abort signal.

Imagine the iterator yields a value, great, we check the abort signal and it's not aborted, then the iterator takes an hour to yield the next value, meanwhile the 'abort' event fires on the abort signal, we won't check the .aborted property until the next value arrives.

Eg:

async function delay (ms) {
  return new Promise((resolve) => {
    setTimeout(() => resolve(), ms)
  })
}

async function * input () {
  yield 0
  await delay(10)
  yield 1
  await delay(10)
  yield 2
  await delay(10)
  yield 3
  await delay(1000)
  yield 4
  await delay(10)
  yield 5
}

async function * asyncIterableWithAbort(signal, iterable) {
  const iterator = iterable[Symbol.asyncIterator]();

  while (true) {
    const { done, value } = await iterator.next();

    if (done) {
      return;
    }

    yield value;

    if (signal.aborted) {
      return;
    }
  }
}

const signal = AbortSignal.timeout(100)
const start = Date.now()

for await (const val of asyncIterableWithAbort(signal, input())) {
  console.info(val)
}

console.info('took', Date.now() - start, 'ms')
% node test.js 
0
1
2
3
4
took 1041 ms

So it runs for 1041ms even though the timeout was 100ms.

Returning a new async generator and calling .return (or .throw) doesn't help as it still waits for the current promise to resolve before throwing:

async function * asyncIterableWithAbort(signal, iterable) {
  async function * output () {
    yield * iterable
  }

  const out = output()

  const onAbort = () => {
    out.return()
    // or better:
    // out.throw(new Error('Aborted!'))
  }

  signal.addEventListener('abort', onAbort)

  try {
    yield * out
  } finally {
    signal.removeEventListener('abort', onAbort)
  }
}
% node test.js
0
1
2
3
4
took 1040 ms

I wonder if we just want a simpler pattern. E.g. instead of:

import { abortableDuplex } from 'abortable-iterator'

async someStreamHandler (data: IncomingStreamData): Promise<void> {
  const signal = AbortSignal.timeout(timeout)
  const stream = abortableDuplex(data.stream, signal)

  await pipe(
    stream,
    (source) => {
      // do some work that we don't want to take forever
    }
  // ... more code here
}

something like:

async someStreamHandler (data: IncomingStreamData): Promise<void> {
  const signal = AbortSignal.timeout(timeout)
  signal.addEventListener('abort', () => {
    data.stream.abort(new Error('Timeout!'))
  })

  await pipe(
    data.stream,
    (source) => {
      // do some work that we don't want to take forever
    }
  // ... more code here
}

@marcus-pousette
Copy link
Contributor

I think you/we are talking about the same thing here. For promises that are expected to "do some work that we don't want to take forever" you can get away with not having cancellable promises, and can do a naive approach. And for cases where promises are not expected to resolve quickly and you need to be able to cancel them you might need a more robust solution.

I did a simple test with the @libp2p/mplex lib where I did a "fast" abortable source approach and only got a 5-10% improvement in the benchmark

But to return to @tuyennhv problem, I don't think the abortable-iterator is the problem, but the noise encryption that is creating an iterator that has a lot of "waiting", and for that I suggested a solution that might work quite well and might reduce the wait time with around 80%.

@p-shahi
Copy link
Member

p-shahi commented May 30, 2023

@marcus-pousette are you interested in taking this on?

Note:
@wemeetagain reports that this has showed up in Lodestar profiling

@marcus-pousette
Copy link
Contributor

marcus-pousette commented May 30, 2023

I am kind of working on this problem continuously already, but as stated above I am pretty sure that the bottleneck is not the javascript Promise.race/aborting, so I would argue the premise for this issue is wrong. I did try running my own project where I swapped out all abortable promises to a simpler one and the gains where not significant at all.

If you look at the profiling provided, the process that itself is most likely taking up most time is noise encryption. To that I suggested a solution @wemeetagain and Lodestar could try out to see how it improves the CPU-times. In summary, for me (and perhaps also for Lodestar) I would say areas you can improve are

  • Noise encryption (try out @dao-xyz/libp2p-noise to see if this will improve anything for you). I am currently also working on Wasm simd128 optimizations that could in the future be benificial if you are running things in the browser.

  • Message signing performance. Use Node native crypto if you are using Node.

  • To improve with the Libp2p stack: Reduce unnecessary allocations that happen when you merge two Uint8arrays. Currently we do a lot of reallocations because we need to insert some bytes before some Uint8array. Usually when sending/writing to streams we do:

    Allocate and serialize a message to an Uint8array -> Stream muxer encode with some header/frame (Usually results in a reallocation on concatination to fit both header and data) -> Cont...
    

    Instead we could do

    Allocate Uint8array that both fits the data and the muxing encoding -> Serialize message into the allocated array and write muxer encoding to the allocated array -> Cont...
    

@wemeetagain
Copy link
Member

(try out @dao-xyz/libp2p-noise to see if this will improve anything for you)

Will do. We're currently using @chainsafe/as-chacha20poly1305 and @chainsafe/as-sha256 assemblyscript / wasm implementations but will see if we can get any better results here.

I am currently also working on Wasm simd128 optimizations that could in the future be benificial if you are running things in the browser.

👀

@maschad
Copy link
Member

maschad commented Aug 29, 2023

Closing this given @marcus-pousette suggestion that these performance bottlenecks are related to the crypto tooling as opposed to abortableSource

I see that @wemeetagain has been testing this on ChainSafe/lodestar#5900 (comment)

I have created a new issue to address the discussion around refactoring abortable source usage.

@maschad maschad closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Topical discussion; usually not changes to codebase need/triage Needs initial labeling and prioritization P2 Medium: Good to have, but can wait until someone steps up
Projects
Archived in project
Development

No branches or pull requests

8 participants