Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Commit for Splunk Output plugin #4185

Closed
wants to merge 7 commits into from

Conversation

tmartin14
Copy link

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Initial commit of an output plugin to send telegraf metrics directly to a Splunk metrics index via HEC.

Copy link
Contributor

@phemmer phemmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start. Have identified a few moderate issues, but nothing extraordinary.

var payload []byte
var err error

splunkMetric.Time = m.Time().UnixNano() / 1000000000 // convert metric nanoseconds to unix time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splunk is happy to accept timestamps with sub-second precision. I would recommend using that as truncating the value to the second may not be enough for user's use cases. In fact I would recommend allowing the user to set the precision like we do for the influxdb output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use the default precision of UnixNano().


//// Check for existence of s && req to prevent race ('panic: runtime error: invalid memory address or nil pointer dereference')
if s != nil && req != nil {
resp, err := s.client.Do(req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could end up being very expensive sending a separate http request for every single field. I would instead recommend batching the metrics up and sending them in a single request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented batch sending of metrics.

return fmt.Errorf("error POST-ing metrics to Splunk[%s] Sending Data:%s\n", err, payload)
}

defer resp.Body.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the batching is not done, this is going to be a problem. Because you're not closing the request until the function returns, every field sent ends up holding open a file descriptor. You can easily exhaust the available file descriptors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented batch sending of metrics.

// -----------------------------------------------------------------------------------------------------------------------
// Loop through each metric and create a map of all tags + the name and value of the metric (Splunk Metric format)
// -----------------------------------------------------------------------------------------------------------------------
fields := newTags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just create the variable as fields (on line 127) instead of having to rename it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

req.Header.Add("Authorization", "Splunk "+s.AuthString)

//// Check for existence of s && req to prevent race ('panic: runtime error: invalid memory address or nil pointer dereference')
if s != nil && req != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is supposed to be checking, but there is no race here. My only guess is that during development, this was added to combat a nil req value on error, but then you added line 171-173 above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected.

ConvertPaths bool
ConvertBool bool
UseRegex bool
StringToNumber map[string][]map[string]float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no documentation on this parameter, or what it's supposed to be doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments throughout.

}

return 0, fmt.Errorf("unexpected type: %T, with value: %v, for: %s", v, v, name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's lots of other numeric types being missed: int, uint, uint8, uint16, uint32, int8, int16, int32, float32.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the additional types.

}
}
case int64:
return float64(v.(int64)), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the type assertion if you use p.

E.g.:

return float64(p), nil

Ditto for the uint64 and float64 cases below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left the type assertion, but created new cases for multiple types.

name = sanitizedRegex.ReplaceAllLiteralString(name, "-")
} else {
name = sanitizedChars.Replace(name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be forcing the user to have scrubbed field names. Defaulting to enabled might be fine, but it should be possible to turn it off entirely.
Splunk doesn't require that the name be within a certain character set. Having some characters might make it harder to work with, but it's not impossible. And people might not want their names being messed with.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified the default to NOT change any names and added a new setting to let users choose if they want to replace special characters.


## OPTIONAL: Convert metric name paths to use metricSeperator character
## When true (default) will convert all _ (underscore) chartacters in final metric name
#convert_paths = true
Copy link
Contributor

@phemmer phemmer Jun 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a good idea, at least not as a default. In splunk, the . is supposed to be for namespacing. Lots of telegraf metrics use underscore in field names, and not as namespace boundaries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the default to false, but the option is there if a user wants to convert "_" to "." (or any other metric separator).

@tmartin14
Copy link
Author

tmartin14 commented Jun 14, 2018

circleci tests fail with code formatting error and recommends running 'make fmt' to resolve the issue. I ran make fmt, but there's nothing changed in the resulting source file (splunk.go) so I removed 1 blank line.

@tmartin14
Copy link
Author

@phemmer - I'm getting failures on the code formatting, but 'make fmt' does not make any changes to the code. Can you help?

@danielnelson
Copy link
Contributor

Maybe try running it directly:

gofmt -w plugins/outputs/splunk/splunk.go

}
}
case int, int8, int16, int32, int64:
return float64(v.(int64)), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this will not work. As an example, in the case where v is an int8, the type assertion v.(int64) will panic.
Type switches have to be repetitive. One case for each type.

switch p := v.(type) {
case bool:
  ...
case int:
  return float64(p), nil
case int8:
  return float64(p), nil
case int16:
  return float64(p), nil
case int32:
  return float64(p), nil
...
}

Even though each case statement contains the same code, they have to be separate because p is a different type, and thus when compiled, they aren't actually the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields can only be int64, uint64, float64, string, and bool. You can only support these types, but also include a default case that logs an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've converted all numerics to float64 in the latest commit. String and bool are also handled.

Is this acceptable?

Copy link
Contributor

@phemmer phemmer Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. That seems like a restriction that should be removed. The type is a limitation of the storage engine. I don't think it makes sense to impose InfluxDB's limitations on the other outputs.
Meaning right now I see that the makemetric function in the model does the conversion, what I'm saying is that seems like it should be removed, leave type conversion to the output plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to allowing additional types, but I don't want fields to be anything and everything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. This is an output plugin that's doing the conversion. If I'm missing something I apologize, but if there really is an issue please let me know how best to resolve it in this code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmartin14 On this output, only worry about fields being one of int64, uint64, float64, string, and bool. Anything else just log an error and ignore it.

Regarding field types in general, I don't want each output plugin to have to anticipate a huge list of possible types, and I want the types that a field can be to be well defined. So, I don't really want the field to be an interface{}, I want a variant type. I would also like at some point to return an error if the input adds other types. I think these steps will reduce the number of bugs around type conversion, and simplify output writing. We can expand this list of types if needed (I've been considering adding a time type for instance).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielnelson. I'm updating this back to the original types and will re-commit.

ConvertBool bool
ReplaceSpecials bool
UseRegex bool
StringToNumber map[string][]map[string]float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response to the 'lack of documentation' comment on the previous review said documentation had been added on this parameter. But I'm still not seeing it. Maybe it was accidentally left out of the update?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation was added to the readme and the notes in the config. I've add comments to the code as well.


## OPTIONAL: Use Regex to sanitize metric and tag names from invalid characters
## Regex is more thorough, but significantly slower
#use_regex = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace_special_chars and use_regex combined cover all the possible cases, but maybe a single parameter would be better.
E.G.

name_sanatize = "none"
name_sanatize = "regex"
name_sanatize = "transliterate"

Note, this is not a requirement, or even a suggestion on the name, just throwing out ideas. @danielnelson might have a stronger preference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by keeping them separate the user will have the option to use both regex and the predefined character set. I'm going to leave this one as is.

// -----------------------------------------------------------------------------------------------------------------
var payload []byte
var err error
payload, err = json.Marshal(allMetrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? I haven't tested with an array, but the splunk http event collector doc says it wants JSON streaming format:

Here is an example of batched data. The batch protocol for HTTP Event Collector is simply event objects stacked one after the other as shown here, and not in a JSON array. Note that these events, though they only contain the "event" and "time" keys, are still valid:

{
  "event":"event 1", 
  "time": 1447828325
}

{
  "event":"event 2", 
  "time": 1447828326
}

To get this format in GO, you need to use a json.Encoder. E.G.:

buf := bytes.NewBuffer(nil)
enc := json.NewEncoder(buf)
for _, m := range measures {
  ...
  enc.Encode(splunkMetric)
}
http.NewRequest("POST", s.SplunkUrl, buf)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data will be placed into a Splunk metrics index as opposed to an event index. The format of a metric is still JSON, but different from event data. The byte array does indeed work ;-).

@domeger
Copy link

domeger commented Aug 29, 2018

Any way we can get this merged?, this would be very valuable for a lot of Splunk customers. @phemmer

@danielnelson
Copy link
Contributor

@tmartin14 I think we are going to go with the serializer implementation in #4339 instead of this output, if you haven't already could you double check that the serializer meets your requirements?

I would still be open to a Splunk output plugin, I think it would simplify using the serializer and we might be able to do it without writing any new code by creating a specially configured http output in the init function.

@tmartin14
Copy link
Author

Thanks for the update @danielnelson. I am in favor of closing this PR in favor of #4339 as the serializer approach adds support for both Splunk metric and event index types without the need for a separate output.

@tmartin14 tmartin14 closed this Aug 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants