Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New processor: truncate_fields #11297

Merged
merged 9 commits into from
Apr 2, 2019

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Mar 18, 2019

This PR introduces a new processor truncate_fields. To keep raw messages use this processor with copy_fields.

truncate_fields

This processor truncates configured fields. Example configuration is below:

processors:
- truncate_fields:
    fields:
      - message
    max_bytes: 1024
    fail_on_error: false
    ignore_missing: true

Keep raw events

This preserves the orignal field and truncates it, if it's too long.

processors:
- copy_fields:
    fields:
        - from: message
          to: event.original
    fail_on_error: false
    ignore_missing: true
- truncate_fields:
    fields:
      - event.original
    max_bytes: 1024
    fail_on_error: false
    ignore_missing: true

Depends on #11303

@kvch kvch requested review from a team as code owners March 18, 2019 20:59
@kvch kvch requested review from ph and ruflin and removed request for a team March 18, 2019 21:09
@kvch kvch force-pushed the feature-add-copy-truncate-processors branch from 2642cbe to b0a8760 Compare March 18, 2019 21:41
@ruflin
Copy link
Member

ruflin commented Mar 19, 2019

@kvch Any chance to split this up into at least 2 PR's? One per processor?

@kvch
Copy link
Contributor Author

kvch commented Mar 19, 2019

@ruflin sure

@kvch kvch force-pushed the feature-add-copy-truncate-processors branch from b0a8760 to 835e02a Compare March 19, 2019 10:15
Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides what andrew mentioned LGTM

@kvch kvch force-pushed the feature-add-copy-truncate-processors branch from 835e02a to 69b3201 Compare March 19, 2019 10:38
@kvch kvch changed the title New processors: copy_fields, truncate_fields New processor: truncate_fields Mar 19, 2019
@kvch
Copy link
Contributor Author

kvch commented Mar 19, 2019

@ruflin done. other PR: #11297

@kvch kvch force-pushed the feature-add-copy-truncate-processors branch 3 times, most recently from 8edd7af to 375a21d Compare March 19, 2019 12:50
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening a separate PR. Is the plan to rebase this one as soon as the other one is merged. If yes, please ignore my reviews.

CHANGELOG.next.asciidoc Show resolved Hide resolved
auditbeat/auditbeat.reference.yml Outdated Show resolved Hide resolved
@kvch kvch force-pushed the feature-add-copy-truncate-processors branch from 375a21d to 6624d24 Compare March 22, 2019 14:06
@kvch kvch added the in progress Pull request is currently in progress. label Mar 22, 2019
@kvch kvch force-pushed the feature-add-copy-truncate-processors branch from 07078ee to 75bd386 Compare March 25, 2019 14:27
@kvch kvch removed the in progress Pull request is currently in progress. label Mar 25, 2019
@kvch
Copy link
Contributor Author

kvch commented Mar 25, 2019

This should be ready for another round of review.

libbeat/processors/actions/truncate_fields.go Outdated Show resolved Hide resolved
libbeat/processors/actions/truncate_fields.go Outdated Show resolved Hide resolved
libbeat/processors/actions/truncate_fields.go Outdated Show resolved Hide resolved
}

func (f *truncateFields) addTruncatedString(field, value string, event *beat.Event) (*beat.Event, error) {
truncated, err := f.truncate(f, []byte(value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting from string -> []byte -> string going to do two separate copies of the string data, but since it's only truncating I think this could be avoided with some refactoring. It would be good if someone could double-check my assumption about the two copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed copying twice. But after thinking a bit more about the conversion and copying data I am not sure if it makes sense to eliminate them. First I need to copy and convert the string, as it is inmutable type, in order to modify it. Then I need to create a new truncated copy, so the previous long array can be freed. Then as I need to return a string, a new copy is needed to get the inmutable type again.
Without converting the string to []byte, I can only do slicing which does not touch the underlying data, thus the long line stays in memory as long there is a reference to it. I think it would take away the advantage of decreasing the size of the data in memory.
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we would mostly truncate []byte fields (e.g. message) and rarely on string. Thus, this code path would be less frequently run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I can reduce the number of copies by not passing around the arrays. Will do that.

libbeat/processors/actions/truncate_fields.go Outdated Show resolved Hide resolved
return nil
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded space above?
maybe adding a unit test?

_, err = event.PutValue(field, truncated)
if err != nil {
return event, fmt.Errorf("could not add truncated byte slice value for key: %s, Error: %+v", field, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to add the log.flags flag?

"truncate characters of too long byte line": {
MaxChars: 10,
Input: common.MapStr{
"message": []byte("ez egy túl hosszú sor"), // this is a too long line (hungarian)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

101 hungarian :)

@kvch kvch force-pushed the feature-add-copy-truncate-processors branch from 36acbb0 to 991a4da Compare April 1, 2019 10:51
@kvch kvch merged commit 555a89a into elastic:master Apr 2, 2019
DStape pushed a commit to DStape/beats that referenced this pull request Aug 20, 2019
This PR introduces a new processor `truncate_fields`. To keep raw messages use this processor with `copy_fields`.

### `truncate_fields`

This processor truncates configured fields. Example configuration is below:

```yaml
processors:
- truncate_fields:
    fields:
      - message
    max_bytes: 1024
    fail_on_error: false
    ignore_missing: true
```

### Keep raw events

This preserves the orignal field and truncates it, if it's too long.

```yaml
processors:
- copy_fields:
    fields:
        - from: message
          to: event.original
    fail_on_error: false
    ignore_missing: true
- truncate_fields:
    fields:
      - event.original
    max_bytes: 1024
    fail_on_error: false
    ignore_missing: true
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants