Skip to content

Commit

Permalink
docs: add docment of bulk copy(BULK INSERT)
Browse files Browse the repository at this point in the history
  • Loading branch information
Breeze0806 committed Oct 7, 2024
1 parent b6bcf6d commit ca5d6cd
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 3 deletions.
50 changes: 49 additions & 1 deletion datax/plugin/writer/sqlserver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Based on your configured `writeMode`, it generates:

**Or**

- bulk copy, i.e., `insert bulk ...` which behaves similarly to insert into but is much faster. However, currently, it cannot insert records containing null values for unknown reasons.
- bulk copy, i.e., `BULK INSERT ...` which behaves similarly to insert into but is much faster. Compared to `insert into...`, we recommend this writing mode more.

## Functionality Description

Expand Down Expand Up @@ -110,6 +110,54 @@ Describes the SQL Server table information.
- Required: No
- Default: insert

#### bulkOption

- Description: Primarily used for configuring bulk write operations in `copyIn`, affecting the settings for `BULK INSERT`. For details, reference to [BULK INSERT](https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver16).
- Required: No
- Default Value: None

##### CheckConstraints

+ Description: Means `CHECK_CONSTRAINTS `. Specifies that all constraints on the target table or view must be checked during the bulk-import operation. Without the `CHECK_CONSTRAINTS` option, any CHECK and FOREIGN KEY constraints are ignored, and after the operation, the constraint on the table is marked as not-trusted.
+ Required: No
+ Default Value: None

##### FireTriggers

+ Description: Means `FIRE_TRIGGERS`. Specifies that any insert triggers defined on the destination table execute during the bulk-import operation. If triggers are defined for INSERT operations on the target table, they're fired for every completed batch.If FIRE_TRIGGERS isn't specified, no insert triggers execute.
+ Required: No
+ Default Value: None

##### KeepNulls

+ Description: Means `KEEPNULLS`. Specifies that empty columns should retain a null value during the bulk-import operation, instead of having any default values for the columns inserted.
+ Required: No
+ Default Value: None

##### KilobytesPerBatch

+ Description: Means `KILOBYTES_PER_BATCH`. Specifies the approximate number of kilobytes (KB) of data per batch as *kilobytes_per_batch*. By default, `KILOBYTES_PER_BATCH` is unknown.
+ Required: No
+ Default Value: None

##### RowsPerBatch

+ Description: Means `ROWS_PER_BATCH `. Indicates the approximate number of rows of data in the data file.By default, all the data in the data file is sent to the server as a single transaction, and the number of rows in the batch is unknown to the query optimizer. If you specify `ROWS_PER_BATCH` (with a value > 0) the server uses this value to optimize the bulk-import operation. The value specified for `ROWS_PER_BATCH` should approximately the same as the actual number of rows.
+ Required: No
+ Default Value: None

##### Order

+ Description: Means `ORDER`.Specifies how the data in the data file is sorted. Bulk import performance is improved if the data being imported is sorted according to the clustered index on the table, if any. If the data file is sorted in a different order, that is other than the order of a clustered index key or if there's no clustered index on the table, the `ORDER` clause is ignored. The column names supplied must be valid column names in the destination table. By default, the bulk insert operation assumes the data file is unordered. For optimized bulk import, SQL Server also validates that the imported data is sorted.
+ Required: No
+ Default Value: None

##### Tablock

+ Description: Means `TABLOCK `.Specifies that a table-level lock is acquired for the duration of the bulk-import operation. A table can be loaded concurrently by multiple clients if the table has no indexes and TABLOCK is specified. By default, locking behavior is determined by the table option **table lock on bulk load**. Holding a lock for the duration of the bulk-import operation reduces lock contention on the table, in some cases can significantly improve performance.
+ Required: No
+ Default Value: None

#### batchTimeout

- Description: Primarily used to configure the timeout interval for each batch write operation. The format is: number + unit, where the unit can be s for seconds, ms for milliseconds, or us for microseconds. If the specified time interval is exceeded, the data will be written directly. This parameter, along with batchSize, helps adjust write performance.
Expand Down
58 changes: 56 additions & 2 deletions datax/plugin/writer/sqlserver/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ SQLServerWriter通过使用dbmswriter中定义的查询流程调用go-etl自定

**或者**

- bulk copy 即`inster bulk ...` 与 insert into 行为一致,速度比insert into方式迅速,但是目前不知为何无法插入含有空值的记录
- bulk copy 即`BULK INSERT ...` 与 insert into 行为一致,速度比insert into方式迅速。比起`insert into...`,我们更推荐这种写入模式。


## 功能说明

Expand Down Expand Up @@ -46,6 +47,9 @@ SQLServerWriter通过使用dbmswriter中定义的查询流程调用go-etl自定
"name":"mytable"
}
},
"bulkOption":{
"KeepNulls":true
},
"batchTimeout": "1s",
"batchSize":1000
}
Expand Down Expand Up @@ -114,10 +118,60 @@ SQLServerWriter通过使用dbmswriter中定义的查询流程调用go-etl自定

#### writeMode

- 描述:写入模式,insert代表insert into方式写入数据,copyIn代表批量复制插入。
- 描述:写入模式,insert代表insert into方式写入数据copyIn代表批量复制插入。
- 必选:否
- 默认值: insert

#### bulkOption

- 描述:主要用于copyIn的批量写入配置,作用于`BULK INSERT`的配置
- 必选:否
- 默认值: 无

以下是您提供的英文段落的中文翻译:

##### CheckConstraints

+ 描述:表示`CHECK_CONSTRAINTS`。指定在批量导入操作期间,必须检查目标表或视图上的所有约束。如果不使用`CHECK_CONSTRAINTS`选项,则会忽略任何CHECK和FOREIGN KEY约束,并且在操作后,表上的约束将被标记为不受信任。
+ 必选:否
+ 默认值:无

##### FireTriggers

+ 描述:表示`FIRE_TRIGGERS`。指定在批量导入操作期间,目标表上定义的任何插入触发器都会执行。如果为目标表的INSERT操作定义了触发器,则它们会在每个完成的批次上触发。如果未指定`FIRE_TRIGGERS`,则不会执行任何插入触发器。
+ 必选:否
+ 默认值:无

##### KeepNulls

+ 描述:表示`KEEPNULLS`。指定在批量导入操作期间,空列应保留空值,而不是为插入的列插入任何默认值。
+ 必选:否
+ 默认值:无

##### KilobytesPerBatch

+ 描述:表示`KILOBYTES_PER_BATCH`。指定每批数据的大致千字节(KB)数为*kilobytes_per_batch*。默认情况下,`KILOBYTES_PER_BATCH`是未知的。
+ 必选:否
+ 默认值:无

##### RowsPerBatch

+ 描述:表示`ROWS_PER_BATCH`。指示数据文件中数据的大致行数。默认情况下,数据文件中的所有数据都作为单个事务发送到服务器,并且查询优化器不知道批次中的行数。如果指定了`ROWS_PER_BATCH`(值大于0),则服务器将使用此值来优化批量导入操作。为`ROWS_PER_BATCH`指定的值应大致与实际行数相同。
+ 必选:否
+ 默认值:无

##### Order

+ 描述:表示`ORDER`。指定数据文件中数据的排序方式。如果导入的数据根据表上的聚集索引(如果有)进行排序,则可以提高批量导入性能。如果数据文件以不同的顺序排序,即不是聚集索引键的顺序,或者表上没有聚集索引,则忽略`ORDER`子句。提供的列名称必须是目标表中的有效列名称。默认情况下,批量插入操作假定数据文件是无序的。为了优化批量导入,SQL Server还会验证导入的数据是否已排序。
+ 必选:否
+ 默认值:无

##### Tablock

+ 描述:表示`TABLOCK`。指定在批量导入操作期间获取表级锁。如果表没有索引并且指定了TABLOCK,则多个客户端可以同时加载表。默认情况下,锁定行为由表的**在批量加载时表锁定**选项确定。在批量导入操作期间持有锁可以减少表上的锁争用,在某些情况下可以显著提高性能。
+ 必选:否
+ 默认值:无

#### batchTimeout

- 描述 主要用于配置每次批量写入超时时间间隔,格式:数字+单位, 单位:s代表秒,ms代表毫秒,us代表微妙。如果超过该时间间隔就直接写入,和batchSize一起调节写入性能。
Expand Down

0 comments on commit ca5d6cd

Please sign in to comment.