Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date format for csv import #7620

Closed
gmoskovicz opened this issue Jul 4, 2016 · 7 comments
Closed

Date format for csv import #7620

gmoskovicz opened this issue Jul 4, 2016 · 7 comments

Comments

@gmoskovicz
Copy link
Contributor

When uploading a csv, most of the times there is a different date format for the date field. If you want to explicitly set the date format, you need to use an index template with an order of 0 so it picks up what you define there. Otherwise, none of the data will be imported because you will get the following exception:

screen shot 2016-07-04 at 14 14 58

We need to have an easy way to define the date pattern for this specific csv, otherwise the csv import is pretty unusable.

@Bargs
Copy link
Contributor

Bargs commented Jul 5, 2016

This functionality will be provided once editable ingest pipelines are implemented. You'll be able to parse any custom date formats using the date processor. I know this makes CSV import a bit less user friendly today, but I didn't want to create duplicate functionality with pipeline support right around the corner.

Until then, you'll need to either modify the csv (I found this to be pretty easy in Excel/Google Spreadsheets) or manually set up an index template.

@Bargs Bargs closed this as completed Jul 5, 2016
@gmoskovicz
Copy link
Contributor Author

Thanks for the detailed explanation. I will add my support to that functionality. The idea of this is to quickly learn thru the process, however for folks that only wants to ingest the data and don't know the elasticsearch internals (templates, mapping, dynamic mapping, and so on) it's kind of impossible to understand why the data is not being ingested, nor what needs to be changed. At least meantime we could add a message around what needs to be changed, or the actual format that the data is expecting. Does it sounds right?

@Bargs
Copy link
Contributor

Bargs commented Jul 5, 2016

I know what you mean, those error messages are probably meaningless to the non-technical user. I'm not sure if it's feasible to translate every possible ES error into something friendly though. What do you think is a good way to expose these errors to the user? Would it be sufficient to translate some of the most common errors into plain english? For instance if there's a mapper_parsing_exception, display an error like "Line 8: Failed to parse field [FECHA Y HORA]"?

@gmoskovicz
Copy link
Contributor Author

@Bargs now that we have error types, such as mapper_parsing_exception i think that the best option is to try to translate generic error into something like: There was an issue with the data type, or similar.

@Bargs
Copy link
Contributor

Bargs commented Jul 5, 2016

@gmoskovicz I opened a new ticket for better indexing error handling: #7632

@NathanZamecnik
Copy link

@Bargs I would imagine the ingest pipeline functionality will also help in the case of geo data that has an interesting format? For instance, I have a large CSV with lat/lon stored:

(41.729307661, -87.631943865)

Will the ingest pipeline stuff let me strip the "(" and ")" characters?

@Bargs
Copy link
Contributor

Bargs commented Jul 7, 2016

@NathanZamecnik Yep! You could probably achieve that with the gsub or grok processors, or if all else fails there's the script processor.

You'll be able to accomplish a lot of other common asks as well, like renaming fields, deleting fields, etc. TBH this thing will really spread its wings once we have pipeline functionality. The current functionality was intentionally kept pretty basic since we know pipelines are coming in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants