Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable line terminator #11015

Merged

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Mar 1, 2019

This PR add a new input configuration option named line_terminator:

# Characters which separate the lines. Valid values: auto, line_feed, vertical_tab, form_feed,
# carriage_return, carriage_return_line_feed, next_line, line_separator, paragraph_separator.
#line_terminator: auto

The option auto tells Filebeat to use our current hybrid new line finder approach. Thus, we can avoid introducing a breaking change.

It also contains a minor refactoring in readfile package. I have created a new type Config which stores the configuration of the readers of the package. This eliminates a long list of parameters in the constructors of EncodeReader and LineReader.

Closes #5500

@kvch kvch requested a review from urso March 1, 2019 12:12
@kvch kvch requested a review from a team as a code owner March 1, 2019 12:12
@kvch kvch removed the request for review from a team March 1, 2019 12:14
@kvch
Copy link
Contributor Author

kvch commented Mar 1, 2019

jenkins test this

@kvch kvch force-pushed the feature-filebeat-configurable-line-terminator branch from 8583743 to cf0da81 Compare March 5, 2019 08:56
@kvch
Copy link
Contributor Author

kvch commented Mar 11, 2019

jenkins test this

1 similar comment
@kvch
Copy link
Contributor Author

kvch commented Mar 12, 2019

jenkins test this

@kvch kvch force-pushed the feature-filebeat-configurable-line-terminator branch 3 times, most recently from 6356a98 to 22089aa Compare March 29, 2019 12:21
@kvch
Copy link
Contributor Author

kvch commented Mar 29, 2019

jenkins test this

@kvch kvch force-pushed the feature-filebeat-configurable-line-terminator branch 2 times, most recently from 3d5cab1 to e472e04 Compare April 1, 2019 13:34
Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a breaking change. Old behavior did support \r\n and \n based line terminators. It used to split on \n first and remove trailing \r\n or \n characters from the line.

This change seems to make it 'less' dynamic and this sense. What will be the default for Windows where \r\n is more common?

@kvch
Copy link
Contributor Author

kvch commented Apr 1, 2019

It is indeed less dynamic. This PR requires the input file to use one kind of newline. Previously the following input file was readable:

hello\n
hello\r\n

Also if someone has different files with different newlines, e.g.

hello\n
hello\n

and

hello\r\n
hello\r\n

he/she was able to add it to the same input.

By default on Windows the configured newline is carriage_return_line_feed, on other OSes the default is line_feed.

To make the configuration backward compatible, a new option auto (for a lack of a better name) could be added which is basically our current newline handling. WDYT?

@urso
Copy link

urso commented Apr 8, 2019

SGTM.

Alternatively we can have 2 patterns. One 'split' pattern and one 'right-trim' pattern. This way we're not limited to newlines anymore, but could support more complex patterns for split based multiline.

@kvch kvch force-pushed the feature-filebeat-configurable-line-terminator branch from e472e04 to 1340348 Compare April 15, 2019 10:30
@kvch kvch merged commit ebc7da7 into elastic:master Apr 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support \r as line separator
3 participants