Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save/export rows that failed ingest due to Delimited Text Ingest Fails on Unescaped Quotes #84

Open
janmichaelyu opened this issue Jul 12, 2018 · 1 comment
Assignees
Milestone

Comments

@janmichaelyu
Copy link

We're encountering a similar issue to #68 for files that are tab delimited but with unescaped quotes:

Sample:

11:16:43.614 [pool-1-thread-1] WARN  c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter

02020201 "opt out" service Observation   DOMAIN   DOMAIN 

It would be great if we could get the failed records in a separate file or in the log so we could examine quickly what went wrong during the ingest and see what kind of formatting error we have and fix it.

@janmichaelyu
Copy link
Author

  • Steps to reproduce the bug - ingest as tab delimited file with value:
    02020201 "opt out" service Observation DOMAIN DOMAIN
  • Input and Output -
    Sample output:
    11:16:43.614 [pool-1-thread-1] WARN c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter
  • Environment - RedHat, MarkLogic 9.0-3, MLCP 9.0-4
  • Suggest a fix - save the skipped lines in a separate file or log so we can inspect what kind of formatting error is encountered

@jxchen-us jxchen-us added this to the 10.0.1 milestone Jul 12, 2018
@mattsunsjf mattsunsjf self-assigned this Aug 12, 2018
@mattsunsjf mattsunsjf assigned yunzvanessa and unassigned mattsunsjf Mar 22, 2019
@mattsunsjf mattsunsjf modified the milestones: 10.0.1, 10.0.2 Mar 22, 2019
@yunzvanessa yunzvanessa modified the milestones: 10.0.2, 10.0.3 Aug 24, 2019
@yunzvanessa yunzvanessa modified the milestones: 10.0.3, 11.0.1 Sep 18, 2019
@yunzvanessa yunzvanessa added closed and removed fix labels Oct 14, 2019
@yunzvanessa yunzvanessa modified the milestones: 11.0.1, 10.0.3 Oct 15, 2019
@yunzvanessa yunzvanessa reopened this Oct 15, 2019
@yunzvanessa yunzvanessa modified the milestones: 10.0.3, 10.0.5 Apr 1, 2020
@yunzvanessa yunzvanessa removed this from the 10.0.5 milestone Sep 12, 2020
@yunzvanessa yunzvanessa added this to the 10.0.6 milestone Sep 12, 2020
@yunzvanessa yunzvanessa modified the milestones: 10.0.6, 10.0.8 May 22, 2021
@yunzvanessa yunzvanessa assigned abika5 and unassigned yunzvanessa Sep 27, 2021
@yunzvanessa yunzvanessa modified the milestones: 10.0.8, 10.0.9 Sep 27, 2021
@abika5 abika5 modified the milestones: 10.0.9, 10.0-10 Jan 28, 2022
@yunzvanessa yunzvanessa modified the milestones: 11.0.0, 11.1.0 May 15, 2023
@abika5 abika5 modified the milestones: 11.1.0, 11.2.0 Jan 3, 2024
@abika5 abika5 modified the milestones: 11.3.0, 11.4.0 Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants