From f80fdf54ca684df059791d4b7d99f67b2729d000 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <josh.macdonald@gmail.com>
Date: Mon, 16 Sep 2024 14:26:55 -0700
Subject: [PATCH] README: on batching w/ otel-arrow

---
 exporter/otelarrowexporter/README.md | 105 +++++++++++++++++++++------
 1 file changed, 84 insertions(+), 21 deletions(-)

diff --git a/exporter/otelarrowexporter/README.md b/exporter/otelarrowexporter/README.md
index 4b3bcb0e4fd7..dab41b5a31dc 100644
--- a/exporter/otelarrowexporter/README.md
+++ b/exporter/otelarrowexporter/README.md
@@ -244,44 +244,107 @@ exporters:
 
 ### Batching Configuration
 
-This exporter includes a new, experimental `batcher` configuration for
-batching in the `exporterhelper` module, but this mode is disabled by
-default.  This batching support works when combined with
-`queue_sender` functionality.
+### Option 1: Batching with back-pressure
 
+To configure an OpenTelemetry Collector pipeline for both batching and
+back-pressure, use of a custom component, the Concurrent Batch Processor,
+available in the OTel-Arrow project repository, is required. We have not
+included this in the Collector-Contrib repository because equivalent
+functionality is being added as a standard exporter-batcher mechanism and the
+new exporter-batcher functionality is still experimental.
+
+When the [Concurrent Batch Processor](https://github.com/open-telemetry/otel-arrow/blob/main/collector/processor/concurrentbatchprocessor/README.md) is configured, parallel batches of data are
+exported with no limit on concurrency. This configuration requires that
+receivers apply memory limits or admission control. While this is our preferred
+configuration, it is just one of several reasonable setups. As an example
+configuration:
+
+```yaml
+exporters:
+  otelarrow:
+    # place gRPC, otel-arrow, retry, and timeout settings here
+    batcher:
+	  enabled: false
+    sending_queue:
+      enabled: false
+receivers:
+  otelarrow:
+    # otelarrow supports OTLP and OTel-Arrow with admission control
+    admission:
+      request_limit_mib: 128
+processors:
+  concurrentbatch:
+service:
+  pipelines:
+    traces:
+      exporters: [otelarrow]
+      processors: [concurrentbatch]
+      receivers: [otelarrow]
 ```
+
+### Option 2: Batching with a persistent queue
+
+The OpenTelemetry Collector has a built-in persistent queue mechanism which
+supplies back-pressure corresponding with disk write speed. In this mode,
+batching is done after writing to the persistent queue. In this mode, the
+`num_consumers` field determines how many parallel batches of data are presented
+to the exporter. When the `sending_queue` function is enabled, `num_consumers`
+should be set to at least the number of OTel-Arrow streams, or higher to
+increase throughput.
+
+As an example configuration:
+
+```yaml
 exporters:
   otelarrow:
+    # place gRPC, otel-arrow, retry, and timeout settings here.
     batcher:
 	  enabled: true
     sending_queue:
       enabled: true
+      num_consumers: 32
       storage: file_storage/otc
 extensions:
   file_storage/otc:
     directory: /var/lib/storage/otc
+receivers:
+  otlp:
+    protocols:
+      grpc:
+service:
+  extensions: [file_storage]
+  pipelines:
+    traces:
+      exporters: [otelarrow]
+      receivers: [otlp]
 ```
 
-The built-in batcher is only recommended with a persistent queue,
-otherwise it cannot provide back-pressure to the caller.  If building
-a custom build of the OpenTelemetry Collector, we recommend using the
-[Concurrent Batch
-Processor](https://github.com/open-telemetry/otel-arrow/blob/main/collector/processor/concurrentbatchprocessor/README.md)
-to provide simultaneous back-pressure, concurrency, and batching
-functionality.  See [more discussion on this
-issue](https://github.com/open-telemetry/opentelemetry-collector/issues/10368).
+### Option 3: Batching without back-pressure
 
-```
+Instead of applying back-pressure, another option is to return success as
+quickly as possible to the caller using an in-memory queue. As long as the
+exporter can keep up with the arriving data, none will be dropped in this
+configuration; however, this setup is relatively fragile and more likely to
+cause the loss of telemetry data.
+
+As an example configuration:
+
+```yaml
 exporters:
   otelarrow:
+    # place gRPC, otel-arrow, retry, and timeout settings here
     batcher:
-	  enabled: false
+	  enabled: true
     sending_queue:
-      enabled: false
-processors:
-  concurrentbatch:
-    send_batch_max_size: 1500
-    send_batch_size: 1000
-    timeout: 1s
-    max_in_flight_size_mib: 128
+      enabled: true
+      num_consumers: 32
+receivers:
+  otlp:
+    protocols:
+      grpc:
+service:
+  pipelines:
+    traces:
+      exporters: [otelarrow]
+      receivers: [otlp]
 ```