Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Transform stream #33397

Closed
ronag opened this issue May 14, 2020 · 7 comments
Closed

Faster Transform stream #33397

ronag opened this issue May 14, 2020 · 7 comments
Assignees
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@ronag
Copy link
Member

ronag commented May 14, 2020

Some thoughts on how to make Transform streams faster.

Part of the overhead of Transform (and PassThrough) is that it is actually 2 streams, one Writable and one Readable, both with buffering and state management, which are connected together.

We could try to skip this and implement Transform as a Readable which implements the Writable interface and proxies the naming.

e.g.

class FastTransform extends Readable {
  constructor(options) {
    super(options)
    this._writableState = {
      length: 0,
      needDrain: false,
      ended: false,
      finished: false
    }
  }
  get writableEnded () {
    return this._writableState.ended
  }
  get writableFinished () {
    return this._writableState.finished
  }
  _read () {
    const rState = this._readableState
    const wState = this._writableState

    if (!wState.needDrain) {
      return
    }

    if (wState.length + rState.length > rState.highWaterMark) {
      return
    }

    wState.needDrain = false
    this.emit('drain')
  }
  write (chunk) {
    const rState = this._readableState
    const wState = this._writableState

    const len = chunk.length

    wState.length += len
    this._transform(chunk, null, (err, data) => {
      wState.length -= len
      if (err) {
        this.destroy(err)
      } else if (data != null) {
        this.push(data)
      }
      this._read()
    })

    wState.needDrain = wState.length + rState.length > rState.highWaterMark

    return wState.needDrain
  }
  end () {
    const wState = this._writableState

    wState.ended = true
    if (this._flush) {
      this._flush(chunk, (err, data) => {
        const wState = this._writableState
        if (err) {
          this.destroy(err)
        } else {
          if (data != null) {
            this.push(data)
          }
          this.push(null)
          wState.finished = true
          this.emit('finish')
        }
      })
    } else {
      this.push(null)
      wState.finished = true
      this.emit('finish')
    }
  }
}

// TODO: Make Writable[Symbol.hasInstance] recognize `FastTransform`.

Making this fully backwards compatible with Transform might be difficult.

@ronag
Copy link
Member Author

ronag commented May 14, 2020

Thoughts @nodejs/streams?

@mcollina
Copy link
Member

How much will we gain from this? I'm +1, possibly with a better name.

@ronag
Copy link
Member Author

ronag commented May 14, 2020

@mcollina: ~60%

$ time node fast-transform.js 

real    0m1.582s
user    0m1.737s
sys     0m0.157s

$ time node transform.js 

real    0m2.671s
user    0m2.683s
sys     0m0.069s
undici$ 

possibly with a better name.

FasterTransform? 😄

@ronag ronag self-assigned this May 14, 2020
@ronag ronag added the stream Issues and PRs related to the stream subsystem. label May 14, 2020
@targos
Copy link
Member

targos commented May 14, 2020

How does it compare to transforming with an async generator function?

@ronag ronag mentioned this issue May 14, 2020
4 tasks
ronag added a commit to nxtedition/node that referenced this issue May 14, 2020
A faster version of Transform which avoids the
Writable overhead.

Refs: nodejs#33397
@ronag
Copy link
Member Author

ronag commented May 14, 2020

$ time node async-generator.js 
real    0m0.788s
user    0m0.772s
sys     0m0.075s

$ time node fast-transform.js 
real    0m1.612s
user    0m1.729s
sys     0m0.169s

So using an async generator in pipeline might be significantly faster. Is it worth further optimizing Transform? Note that an optimization as suggested here does make it harder to maintain than today's rather trivial implementation.

@ronag
Copy link
Member Author

ronag commented May 17, 2020

pipeline will convert a Readable source into an async iterator and read from it that way. Might introduce unnecessary overhead. Will investigate that first.

@clshortfuse
Copy link
Contributor

clshortfuse commented Jun 16, 2020

Apparently I should be using pipeline. This does seem like a good replacement for PassThrough.

@ronag ronag closed this as completed Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants