CSV importer

It imports a CSV file to database.

Large CSV files are supported, optimised for speed

Install as a command line

go install github.com/olbrichattila/gocsvimporter/cmd/csvimporter@latest

How it works:

Analyses the CSV file and determine the file types
Creates the table, drops if already exists
Import data

Import modes are different per database type. I've tried to find the best settings for each on them

The import can run

with/without transaction
batched-SQL/Insert SQL per row
One connection/multiple connections

Currently the following database types are supported:

SqLite
MySql
Postgresql
FirebirdSql

Usage: go run . <params> or ./importcs <params>

Example:

go run . data.csv vehicles ";"
./importcs data.csv vehicles ";"

Parameters

CSV file name
Table name to create
The CSV delimiter, (in quotes). This parameter is optional, if not set then ","

Database settings.

Create a file called .env.csvimporter next to the application or export the variables like export DB_CONNECTION=sqlite

Usage:

csvimporter source.csv desttablename ";"

where ";" is csv separator, and this parameter is optional, if not set then the delimiter is a comma ","

Examples:

Sqlite

DB_CONNECTION=sqlite
DB_DATABASE=./database/database.sqlite

MySql

DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=migrator
DB_USERNAME=migrator
DB_PASSWORD=H8E7kU8Y

Postgresql

DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_SSLMODE=disable

Firebird SQL

DB_CONNECTION=firebird
DB_HOST=127.0.0.1
DB_PORT=3050
DB_DATABASE=/firebird/data/employee.fdb
DB_USERNAME=SYSDBA
DB_PASSWORD=masterkey

Optional parameters in .env.csvimporter

The BATCH_SIZE, is how many rows are sent to the database engine per insertSQL (Firebird does not souport it and will be ignored)
The MAX_CONNECTION_COUNT is how many connection (max) should be established to the database at the same time (SQLite does not support that)

The default bath size is 100, the default max connection count is 10.

If the values are incorrectly set, (not a number), then it will fall back to default values

The BATCH_INSERT can be set to ON or OFF, This will overwrite the default configuration / database type (in Firebird it's ignored)
The MULTIPLE_CONNECTIONS can be set to ON or OFF, This will overwrite the default configuration / database type (in sqLite it's ignored)
The TRANSACTIONAL can be set to ON or OFF, This will overwrite the default configuration / database type

BATCH_SIZE=500
MAX_CONNECTION_COUNT=25
BATCH_INSERT=on
MULTIPLE_CONNECTIONS=on
TRANSACTIONAL=off

Speed analyses with 2 million rows:

12 Columns: (Index, Customer Id, First Name, Last Name, Company, City, Country, Phone 1, Phone 2, Email, Subscription Date, Website)

Running on: Ubuntu Linux Dell Inc. OptiPlex 7010 Intel® Core™ i7-3770S CPU @ 3.10GHz × 8 SSD

Sqlite

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in batch insert mode
1 Connection opened
1 Transaction started
Importing: 100% Active threads: [ ] 
Done
1 transactions committed
1 connections closed

Full Analysis time: 0 minutes 15 seconds
Full duration time: 0 minutes 36 seconds
Total: 0 minutes 52 seconds

MySql

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in multiple threads mode
Running in batch insert mode
10 Connection opened
10 Transaction started
Importing: 100% Active threads: [OOO OOOOOO] 
Done
10 transactions committed
10 connections closed

Full Analysis time: 0 minutes 15 seconds
Full duration time: 0 minutes 50 seconds
Total: 1 minutes 5 seconds

PostgesQl

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in multiple threads mode
Running in batch insert mode
10 Connection opened
10 Transaction started
Importing: 100% Active threads: [OOO OOOOOO] 
Done
10 transactions committed
10 connections closed

Full Analysis time: 0 minutes 15 seconds
Full duration time: 0 minutes 28 seconds
Total: 0 minutes 43 seconds

Firebird

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in multiple threads mode

10 Connection opened
10 Transaction started
Importing: 100% Active threads: [OOOOOOOOOO] 
Done
10 transactions committed
10 connections closed

Full Analysis time: 0 minutes 16 seconds
Full duration time: 5 minutes 26 seconds
Total: 5 minutes 42 seconds

Make targets

## Some test imports
make vehicles:
make customers:
	
## Switch between test environments
make switch-sqlite:
make switch-mysql:
make switch-pgsql:
make switch-firebird:

Test locally:

There is a docker folder:

cd docker
docker-compose up -d

TODO: Notes

Code cleanup

Next steps, possible improvements

Make it distributable, when analyzing the file, record some file pointer number and split up the importer by distribution the application between (pods, servers, virtual machines). Each one of them will open the CSV in a readonly / file shared mode and will start pushing to the database engine from it's designated file pointer until it designated target file pointer. In theory this could work in one machine to process the file in go routines per block. The distributed importers would work the same way

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
cmd/csvimporter		cmd/csvimporter
docker		docker
internal		internal
.env.firebird.example		.env.firebird.example
.env.mysql.example		.env.mysql.example
.env.pgsql.example		.env.pgsql.example
.env.sqlite.example		.env.sqlite.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV importer

It imports a CSV file to database.

Large CSV files are supported, optimised for speed

Install as a command line

Database settings.

Sqlite

MySql

Postgresql

Firebird SQL

Optional parameters in .env.csvimporter

Speed analyses with 2 million rows:

Sqlite

MySql

PostgesQl

Firebird

Make targets

Test locally:

TODO: Notes

Next steps, possible improvements

About

Releases 2

Packages

Languages

olbrichattila/gocsvimporter

Folders and files

Latest commit

History

Repository files navigation

CSV importer

It imports a CSV file to database.

Large CSV files are supported, optimised for speed

Install as a command line

Database settings.

Sqlite

MySql

Postgresql

Firebird SQL

Optional parameters in .env.csvimporter

Speed analyses with 2 million rows:

Sqlite

MySql

PostgesQl

Firebird

Make targets

Test locally:

TODO: Notes

Next steps, possible improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages