webrtc: investigate resource usage #1917

marten-seemann · 2022-11-22T06:56:00Z

@ckousik reports that running WebRTC is expensive, and a significant amount of CPU can be spent on handling a comparatively small number of WebRTC connections.

We should investigate how expensive it actually is. Our current DoS mitigation assumes that handling connections doesn't come with a large overhead in terms of CPU, and that the main cost is imposed by handling streams, in particular by the memory consumed on streams. This might not hold true for WebRTC.

If that's the case, we'll need to impose strict limits on the number of concurrent WebRTC connections that a node is willing to handle.

BigLep · 2022-12-05T18:33:02Z

@ckousik : can we link to the benchmarks that were run?

ddimaria · 2022-12-14T19:35:55Z

@ckousik is picking this up today

ddimaria · 2023-01-04T02:10:05Z

@ckousik built out the test for this. He was blocked while waiting for hosted resources, but was given the green light from @p-shahi to acquire our own instances over the break, so will pick this work back up this week.

BigLep · 2023-01-07T00:10:13Z

Just curious what the status is here.

p-shahi · 2023-01-07T00:19:53Z

I don't think it's all that different from @ddimaria 's update

ddimaria · 2023-01-07T14:47:20Z

@ckousik can provide an update here? @marten-seemann in order to have output match up with other transport metrics you've collected in the past, can you point @ckousik to that output?

BigLep · 2023-01-13T23:34:44Z

What is the done criteria here?

@ckousik : what did you have in mind?
@marten-seemann : what are you expecting?

ckousik · 2023-01-17T15:22:28Z

Libp2p WebRTC comparitive performance tests Datadog.pdf

BigLep · 2023-01-18T02:21:40Z

@ckousik : thanks for sharing. We're ultimately going to need @marten-seemann to weigh in.

A couple of things:

Are there any next steps you'd recommend of these findings?
Was there anything that surprised you?
It would be great to attach the notebook so someone can review the setup so can confirm/verify the experiment methodology. (I personally don't have a datadog account.)

ckousik · 2023-01-18T16:53:47Z

As for next steps, Glen is working on optimisation passes over the current PR.
Couple of things that stood out to me:

Pion has issues with datachannel ID reuse. We have a workaround for this, but are holding off on investigating the issue in Pion. The corresponding issue in Pion can be found here: Failed to handle DCEP: invalid Message Type pion/webrtc#2258
Pion rate limits the creation of streams. There can be 16 SCTP streams that are not accepted at any given time. This value is hardcoded and not configurable, therefore we have to ramp up the number of connections and streams.

@BigLep Glen is also going to be running the tests and verifying them. Is there anything you would prefer in place of datadog?

BigLep · 2023-01-18T21:51:46Z

Thanks @ckousik .

For 3, the key thing is to enable reproducibility. This is for two fold:

Easy enable others to spot-check the methodology. It can be useful to make sure that the test parameters, configuration, etc. is what we expect.
If someone looks at this issue six months from now, and assuming we're all gone, they should be to understand how we arrived at these results. At the minimum, let's have a way to see the config/code that was used. Attaching an .ipynb notebook is fine, or a gist, etc. We just want to avoid the case of future folks not having access to your datadog and then not being able to verify / understand what was executed.

ckousik · 2023-01-19T15:00:56Z

Sorry, I had misunderstood the notebook as a DataDog notebook. The test code is present here: https://github.com/little-bear-labs/libp2p-webrtc-bench

p-shahi · 2023-01-23T22:35:35Z

The test code is present here: https://github.com/little-bear-labs/libp2p-webrtc-bench

Looks like this was now included in the pr itself right? Maybe you can archive this repo?

We have a workaround for this, but are holding off on investigating the issue in Pion.

Was this workaround in go-libp2p PR (couldn't find a reference to pion/webrtc#2258 in a comment) or elsewhere? Can you link to it?

ckousik · 2023-01-24T03:17:42Z

@p-shahi We manually assign stream ID's here: https://github.com/libp2p/go-libp2p/pull/1999/files#diff-f3e8c67f01e1cd4597f5d58558db1e0e28f21be14b640d8e31282eb9580476aaR310-R320. I'll add a comment linking to pion/webrtc#2258

p-shahi · 2023-02-02T16:44:05Z

Resource usage investigation is being done as part of this pr #1999

p-shahi · 2023-03-07T15:54:49Z

Status update: @GlenDC to provide Pion bottlenecks report sometime in the next two weeks

sukunrt · 2024-02-08T16:18:58Z

We have 1 fix in: pion/mdns#172
We need

Remove the closed datachannels from the pion/webrtc.PeerConnection object.
Fix a memory leak in pion/sctp

sukunrt · 2024-02-09T07:22:05Z

Tracking issue for 1. pion/webrtc#2672

sukunrt · 2024-03-13T09:56:49Z

For

Remove the closed datachannels from the pion/webrtc.PeerConnection object

The PR in webrtc decreases memory usage from 31MB to 17MB when running 10 connections and echoing 1MB over 100 different streams(1GB total data transferred). The rest of the memory use are fix sized allocations that'd take more effort to reduce (1 MB buffers in sctp and 1MB buffers for read from ice connections)

the benchmarks are in branch webrtc-bm-2.

memprofile_2_10_100.pb.gz
memprofile_10_100.pb.gz

sukunrt · 2024-03-19T16:55:01Z

Setup: two ec2 instances. c5.xlarge(4 cores 8GB Ram). us-east-1 and us-west-2. Ping time is 60ms. BDP is 40MB assuming 5Gbps link.

All bandwidth numbers are in Mega Bits per second. All buffers are in Bytes.
maxBufferedAmount here is the amount of data we write on the channel before waiting for an ACK.
Scenario: Ping pong 10MB repeatedly on multiple streams. The number of streams is conn x streams in the tables below.
There is no dependence of cpu used on the number of connections
Per percent of CPU used we get roughly 6-8 Mb/s throughput. So 3% CPU usage gives us 25 Mb/s throughput
Single stream througput is limited by maxBufferedAmount. For 100kb buffered amount we get 12Mb/s and for 32kB we get 3Mb/s throughput. This throughput is lower for higher latencies as we write the same amount on the channel but have to wait more time to get the ACK
Max throughput we get is 550 - 600 Mb/s
Roughly, 100kb of buffer on either side translates to 12 Mb/s of throughput. So 100kB of recv buf has max throughput of 12Mb/s and 1MB of recv buf has 130 Mb/s of throughput. 100kb of send buffer has a throughput of 12 Mb/s and 32kb of send buffer has a throughput of 3 Mb/s

recv buf: 1MB, maxBufferedAmount: 32kB

On a single stream we get 3Mb/s throughput
On a single connection we can get a maximum throughput of 130 Mb/s

conn	stream	Mb/s	cpu
1	1	3	1
1	10	40	6
1	40	130	19
1	100	130	20
2	1	6	1
2	40	265	38
10	1	30	4.5
10	10	280	50
20	10	350	55
50	1	155	20
100	5	500	70

recv buf: 100kB, maxBufferedAmount: 32kB

On a single stream we get 3Mb/s throughput
On a single connection we get a maximum throughput of 12 Mb/s

conn	stream	Mb/s	cpu
1	1	3	1
1	20	12	3
10	10	120	13
20	10	240	27
20	20	240	26
40	10	450	50
60	10	600	65
70	10	600	75

recv buf: 1MB, maxBufferedAmount: 100kB

On a single stream we get 12 Mb/s throughput
On a single connection we get a maximum throughput of 130 Mb/s

conn	stream	Mb/s	cpu
1	1	12	2
1	10	120	18
1	50	140	19
2	10	260	33
4	10	450	70
5	2	130	17
5	50	580	77
10	10	450	75
30	1	400	45
40	1	520	57
50	1	520	62

sukunrt · 2024-03-19T16:55:53Z

I quite like the idea of 100kb receive buf. The performance seems acceptable for now and the peer can enqueue 10x less on the sctp layer.

SgtPooki · 2024-03-19T19:11:56Z

Per percent of CPU used we get roughly 6-8 Mb/s throughput. So 3% CPU usage gives us 25 Mb/s throughput

Single stream througput is limited by maxBufferedAmount. For 100kb buffered amount we get 12Mb/s and for 32kB we get 3Mb/s throughput. This throughput is lower for higher latencies as we write the same amount on the channel but have to wait more time to get the ACK

Does this mean we can limit CPU usage by using maxBufferedAmount, or is there a better way to accomplish that?

MarcoPolo · 2024-03-19T23:39:36Z

Applications can limit their throughput if they want to limit CPU usage. For now we aren't going to expose maxBufferedAmount to the go-libp2p user.

MarcoPolo · 2024-03-19T23:44:59Z

The source code for @sukunrt's test is here: https://github.com/libp2p/go-libp2p/tree/webrtc-echobm/examples/echobm

recv buf: 100kB, maxBufferedAmount: 32kB

I agree that this one seems like a good option. It is fast enough while using little CPU. If use cases appear that need to optimize past 12 Mb/s on WebRTC connections we can expose an option to tune the recv buffer and the maxBufferedAmount. But defaulting to conservative resource usage seems better to me.

MarcoPolo · 2024-03-19T23:45:19Z

I think we can close this issue as @sukunrt's report and code is enough.

sukunrt · 2024-03-20T14:59:21Z

@SgtPooki the better way to do that is to limit the sctp receive buffer to 100kB as this affects all streams on a connection.

Apologies for the poor naming, I used the term we are using in code. The maxBufferedAmount is the send buffer we have per stream. So increasing maxBufferedAmount increases throughput per stream and also the cpu used as we need more cpu for the higher throughput. Just changing this number still won't increase the per connection(sum of all streams) cpu used because that number is limited by the receive buffer which is shared across all streams. The receive buffer is shared across all streams because SCTP doesn't have per stream flow control.

vyzo · 2024-03-20T15:00:51Z

this should be done through the resource mamager, dont hardcode values please.

sukunrt · 2024-03-20T16:37:12Z

That's a much larger change since the underlying sctp connection's receive buffer is a fix sized buffer and doesn't support window updates.

BigLep mentioned this issue Jan 7, 2023

webrtc: add WebRTC transport #1655

Closed

p-shahi added the P0 Critical: Tackled by core team ASAP label Jan 7, 2023

p-shahi added this to the WebRTC browser-to-server milestone Jan 19, 2023

p-shahi linked a pull request Feb 2, 2023 that will close this issue

add WebRTC transport #1999

Closed

17 tasks

p-shahi removed this from the M1: WebRTC browser-to-server milestone Jun 28, 2023

BigLep mentioned this issue Nov 27, 2023

WebRTC: Tracking Issue #2656

Open

marten-seemann mentioned this issue Dec 7, 2023

Make WebRTC non experimental #2657

Closed

sukunrt mentioned this issue Feb 21, 2024

v0.34 #2704

Closed

22 tasks

MarcoPolo closed this as completed Mar 19, 2024

sukunrt mentioned this issue Mar 20, 2024

webrtc: set sctp receive buffer size to 100kB #2745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webrtc: investigate resource usage #1917

webrtc: investigate resource usage #1917

marten-seemann commented Nov 22, 2022

BigLep commented Dec 5, 2022

ddimaria commented Dec 14, 2022

ddimaria commented Jan 4, 2023

BigLep commented Jan 7, 2023

p-shahi commented Jan 7, 2023

ddimaria commented Jan 7, 2023

BigLep commented Jan 13, 2023

ckousik commented Jan 17, 2023

BigLep commented Jan 18, 2023

ckousik commented Jan 18, 2023

BigLep commented Jan 18, 2023

ckousik commented Jan 19, 2023

p-shahi commented Jan 23, 2023

ckousik commented Jan 24, 2023

p-shahi commented Feb 2, 2023

p-shahi commented Mar 7, 2023

sukunrt commented Feb 8, 2024

sukunrt commented Feb 9, 2024

sukunrt commented Mar 13, 2024

sukunrt commented Mar 19, 2024

sukunrt commented Mar 19, 2024

SgtPooki commented Mar 19, 2024

MarcoPolo commented Mar 19, 2024

MarcoPolo commented Mar 19, 2024

MarcoPolo commented Mar 19, 2024

sukunrt commented Mar 20, 2024 •

edited

Loading

vyzo commented Mar 20, 2024

sukunrt commented Mar 20, 2024 •

edited

Loading

webrtc: investigate resource usage #1917

webrtc: investigate resource usage #1917

Comments

marten-seemann commented Nov 22, 2022

BigLep commented Dec 5, 2022

ddimaria commented Dec 14, 2022

ddimaria commented Jan 4, 2023

BigLep commented Jan 7, 2023

p-shahi commented Jan 7, 2023

ddimaria commented Jan 7, 2023

BigLep commented Jan 13, 2023

ckousik commented Jan 17, 2023

BigLep commented Jan 18, 2023

ckousik commented Jan 18, 2023

BigLep commented Jan 18, 2023

ckousik commented Jan 19, 2023

p-shahi commented Jan 23, 2023

ckousik commented Jan 24, 2023

p-shahi commented Feb 2, 2023

p-shahi commented Mar 7, 2023

sukunrt commented Feb 8, 2024

sukunrt commented Feb 9, 2024

sukunrt commented Mar 13, 2024

sukunrt commented Mar 19, 2024

sukunrt commented Mar 19, 2024

SgtPooki commented Mar 19, 2024

MarcoPolo commented Mar 19, 2024

MarcoPolo commented Mar 19, 2024

MarcoPolo commented Mar 19, 2024

sukunrt commented Mar 20, 2024 • edited Loading

vyzo commented Mar 20, 2024

sukunrt commented Mar 20, 2024 • edited Loading

sukunrt commented Mar 20, 2024 •

edited

Loading

sukunrt commented Mar 20, 2024 •

edited

Loading