Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/2108 - csv parser #4439

Merged
merged 27 commits into from
Aug 24, 2018
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a6c1e2b
unfinished csv parser
maxunt Jul 17, 2018
c839ce3
functionality for csv parser, still needs unit tests
maxunt Jul 18, 2018
3c8cb17
add unit tests for csv parser
maxunt Jul 19, 2018
4a07734
mess with config options
maxunt Jul 19, 2018
48210f5
fix unit tests
maxunt Jul 19, 2018
67f4929
change README
maxunt Jul 19, 2018
d24e687
unfinished test case for csv
maxunt Jul 25, 2018
e07ed58
fix type conversion and add unit test
maxunt Jul 25, 2018
edd8afc
addresses greg and chris's comments
maxunt Jul 27, 2018
b5ff78f
address some of greg+chris's comments. includes config for trimspace …
maxunt Jul 27, 2018
7704f3e
get rid of grok changes on branch
maxunt Jul 27, 2018
83db721
Merge branch 'master' into feature/2108
maxunt Aug 17, 2018
80135ee
initial config changes
maxunt Aug 20, 2018
339670f
Merge branch 'master' into feature/2108
maxunt Aug 20, 2018
60761d7
additional config options
maxunt Aug 20, 2018
24e38f3
start to remove field_column config
maxunt Aug 20, 2018
fc36fd5
just broke a lot. lovely
maxunt Aug 21, 2018
6e7ec3e
fixed it
maxunt Aug 22, 2018
0d7b236
address some of daniel's comments
maxunt Aug 22, 2018
20ed819
trim space manually
maxunt Aug 22, 2018
5016899
fix config
maxunt Aug 22, 2018
162b092
Merge branch 'master' into feature/2108
maxunt Aug 22, 2018
86d353f
Merge branch 'master' into feature/2108
maxunt Aug 23, 2018
c058db6
Remerge data format docs
danielnelson Aug 23, 2018
4847a59
finally fixes hopefully. error checks in registry.go
maxunt Aug 24, 2018
b408ac4
tags are added before default tags
maxunt Aug 24, 2018
acc5ea7
fix tags being removed from fields
maxunt Aug 24, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 75 additions & 1 deletion docs/DATA_FORMATS_INPUT.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Telegraf is able to parse the following input data formats into metrics:
1. [Collectd](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#collectd)
1. [Dropwizard](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#dropwizard)
1. [Grok](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#grok)
1. [CSV](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#csv)

Telegraf metrics, like InfluxDB
[points](https://docs.influxdata.com/influxdb/v0.10/write_protocols/line/),
Expand Down Expand Up @@ -761,4 +762,77 @@ HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG}
## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
## 3. UTC -- or blank/unspecified, will return timestamp in UTC
grok_timezone = "Canada/Eastern"
```
```

# CSV
Parse out metrics from a CSV formatted table. By default, the parser assumes there is no header and
will read data from the first line. If `csv_header` is true, the parser will extract column names from
the first row and will begin parsing data on the second row.

To assign custom column names, the `csv_data_columns` config is available. If the `csv_data_columns`
config is used, all columns must be named or an error will be thrown. If `csv_header` is set to false,
`csv_data_columns` must be specified. Names listed in `csv_data_columns` will override names extracted
from the header.

The `csv_tag_columns` and `csv_field_columns` configs are available to add the column data to the metric.
The name used to specify the column is the name in the header, or if specified, the corresponding
name assigned in `csv_data_columns`. If neither config is specified, no data will be added to the metric.

Additional configs are available to dynamically name metrics and set custom timestamps. If the
`csv_name_column` config is specified, the parser will assign the metric name to the value found
in that column. If the `csv_timestamp_column` is specified, the parser will extract the timestamp from
that column. If `csv_timestamp_column` is specified, the `csv_timestamp_format` must also be specified
or an error will be thrown.

#### CSV Configuration
```toml
data_format = "csv"

## Whether or not to treat the first row of data as a header
## By default, the parser assumes there is no header and will parse the
## first row as data. If set to true the parser will treat the first row
## as a header, extract the list of column names, and begin parsing data
## on the second line. If `csv_data_columns` is specified, the column
## names in header will be overridden.
# csv_header = false

## The seperator between csv fields
## By default, the parser assumes a comma (",")
# csv_delimiter = ","

## The character reserved for marking a row as a comment row
## Commented rows are skipped and not parsed
# csv_comment = ""

## If set to true, the parser will remove leading whitespace from fields
## By default, this is false
# csv_trim_space = false

## For assigning custom names to columns
## If this is specified, all columns must have a name
## ie there should be the same number of names listed
## as there are columns of data
## If `csv_header` is set to false, this config must be used
csv_data_columns = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this csv_column_names


## Columns listed here will be added as tags
csv_tag_columns = []

## Columns listed here will be added as fields
## the field type is infered from the value of the field
csv_field_columns = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add all non-tag columns as fields. If someone wants to skip a field they can use fieldpass/fielddrop


## The column to extract the name of the metric from
## By default, this is the name of the plugin
## the `name_override` config overrides this
# csv_name_column = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this csv_measurement_column


## The column to extract time information for the metric
## `csv_timestamp_format` must be specified if this is used
# csv_timestamp_column = ""

## The format of time data extracted from `csv_timestamp_column`
## this must be specified if `csv_timestamp_column` is specified
# csv_timestamp_format = ""
```

123 changes: 123 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -1399,6 +1399,121 @@ func buildParser(name string, tbl *ast.Table) (parsers.Parser, error) {
}
}

//for csv parser
if node, ok := tbl.Fields["csv_data_columns"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if ary, ok := kv.Value.(*ast.Array); ok {
for _, elem := range ary.Value {
if str, ok := elem.(*ast.String); ok {
c.CSVDataColumns = append(c.CSVDataColumns, str.Value)
}
}
}
}
}

if node, ok := tbl.Fields["csv_tag_columns"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if ary, ok := kv.Value.(*ast.Array); ok {
for _, elem := range ary.Value {
if str, ok := elem.(*ast.String); ok {
c.CSVTagColumns = append(c.CSVTagColumns, str.Value)
}
}
}
}
}

if node, ok := tbl.Fields["csv_field_columns"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if ary, ok := kv.Value.(*ast.Array); ok {
for _, elem := range ary.Value {
if str, ok := elem.(*ast.String); ok {
c.CSVFieldColumns = append(c.CSVFieldColumns, str.Value)
}
}
}
}
}

if node, ok := tbl.Fields["csv_delimiter"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.CSVDelimiter = str.Value
}
}
}

if node, ok := tbl.Fields["csv_comment"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.CSVComment = str.Value
}
}
}

if node, ok := tbl.Fields["csv_name_column"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.CSVNameColumn = str.Value
}
}
}

if node, ok := tbl.Fields["csv_timestamp_column"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.CSVTimestampColumn = str.Value
}
}
}

if node, ok := tbl.Fields["csv_timestamp_format"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.String); ok {
c.CSVTimestampFormat = str.Value
}
}
}

if node, ok := tbl.Fields["csv_header"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.Boolean); ok {
//for config with no quotes
val, _ := strconv.ParseBool(str.Value)
c.CSVHeader = val
} else {
//for config with quotes
strVal := kv.Value.(*ast.String)
val, err := strconv.ParseBool(strVal.Value)
if err != nil {
log.Printf("E! parsing to bool: %v", err)
} else {
c.CSVHeader = val
}
}
}
}

if node, ok := tbl.Fields["csv_trim_space"]; ok {
if kv, ok := node.(*ast.KeyValue); ok {
if str, ok := kv.Value.(*ast.Boolean); ok {
//for config with no quotes
val, _ := strconv.ParseBool(str.Value)
c.CSVTrimSpace = val
} else {
//for config with quotes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have these else clauses, if its not a bool then it should be an error. This is actually a bug throughout this function, when the type is wrong for the field name it looks like currently we delete the field, when we should return an error and refuse to start Telegraf.

strVal := kv.Value.(*ast.String)
val, err := strconv.ParseBool(strVal.Value)
if err != nil {
log.Printf("E! parsing to bool: %v", err)
} else {
c.CSVTrimSpace = val
}
}
}
}

c.MetricName = name

delete(tbl.Fields, "data_format")
Expand All @@ -1420,6 +1535,14 @@ func buildParser(name string, tbl *ast.Table) (parsers.Parser, error) {
delete(tbl.Fields, "grok_custom_patterns")
delete(tbl.Fields, "grok_custom_pattern_files")
delete(tbl.Fields, "grok_timezone")
delete(tbl.Fields, "csv_data_columns")
delete(tbl.Fields, "csv_tag_columns")
delete(tbl.Fields, "csv_field_columns")
delete(tbl.Fields, "csv_name_column")
delete(tbl.Fields, "csv_timestamp_column")
delete(tbl.Fields, "csv_timestamp_format")
delete(tbl.Fields, "csv_delimiter")
delete(tbl.Fields, "csv_header")

return parsers.NewParser(c)
}
Expand Down
Loading