Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

CSVMapper does not correctly parse objects with duplicate values. #41

Closed
Claudenw opened this issue May 9, 2014 · 8 comments
Closed
Milestone

Comments

@Claudenw
Copy link

Claudenw commented May 9, 2014

Duplicate column value when parsing objects from the input stream with a schema causes dropped data.

It appears that the mapper uses the value for the column name and therefor, when there are duplicate data elements, only one is returned.

Example contained in code below.

public static void main(String[] args) throws JsonProcessingException,
            IOException {

        String dataString="\"foo\",\"bar\",\"foo\"";
        ByteArrayInputStream bais = new ByteArrayInputStream(dataString.getBytes());
        CsvSchema schema = CsvSchema.builder().addColumn("Col1").addColumn("Col2")
                .addColumn("Col3").build();

        MappingIterator<Object> iter = new CsvMapper().reader(Object.class)
                .with(schema).readValues(bais);

        Object o = iter.next();
        Map<?, ?> colData = (Map<?, ?>) o;

        System.out.println("I think this should be: "+schema );
        System.out.println(colData.keySet());
        System.out.println("I think this should be: "+dataString );
        System.out.println(colData.values());

    }
@cowtowncoder
Copy link
Member

Which version is this with?

@Claudenw
Copy link
Author

Claudenw commented May 9, 2014

Sorry, I should have specified earlier.
We are using v 2.3.3 for both jackson-core and jackson-dataformat-csv

@cowtowncoder
Copy link
Member

Ok. So, I still don't quite understand the problem: maybe I should run the code. But I do not see any duplication here; except for value "foo", "bar", "foo". If those are taken to be column names, then yes, use of duplicate property names is not supported.
But schema doesn't seem to be built to use first row for header names; in which case it should just be data, and there isn't real concept of duplicates there.

@Claudenw
Copy link
Author

Claudenw commented May 9, 2014

The columns names are defined in the schema as Col1, Col2, and Col3.

 CsvSchema schema = CsvSchema.builder().addColumn("Col1").addColumn("Col2")
                .addColumn("Col3").build();

The data for those columns are: "foo","bar","foo"

String dataString="\"foo\",\"bar\",\"foo\"";

I would expect the map to contain the following:

key       value
Col1     "foo"
Col2     "bar"
Col3     "foo"

Instead I get the following

key       value
foo       "foo"
bar       "bar"

So the column names are lost and the data values become the column names.

@cowtowncoder
Copy link
Member

Ah yes; that looks wrong. Thank you for clarifying this; I hope to look into what is causing the problem.

@gribr
Copy link

gribr commented May 13, 2014

I've been looking into this and it looks like the issue is occurring as the CsvParser doesnt differentiate between a token that is JsonToken.FIELD_NAME and JsonToken.VALUE_STRING in the following

package com.fasterxml.jackson.dataformat.csv;
...
public class CsvParser
    extends ParserMinimalBase
{
...
@Override
    public String getText() throws IOException, JsonParseException {
        return _currentValue;
    }
...
}

which is necessary as the UntypedObjectDeserializer mapObject() iterates over the tokens:

package com.fasterxml.jackson.databind.deser.std;
...
public class UntypedObjectDeserializer
    extends StdDeserializer<Object>
    implements ResolvableDeserializer, ContextualDeserializer
{
...
protected Object mapObject(JsonParser jp, DeserializationContext ctxt)
        throws IOException, JsonProcessingException
    {
       ...
       String field1 = jp.getText();  // CsvParser will return _currentValue
       jp.nextToken();
       Object value1 = deserialize(jp, ctxt);  // Calls jp.getText() internally for JsonToken.VALUE_STRING 
       ...
       return result;
    }
    ...
}

proposed fix:

package com.fasterxml.jackson.dataformat.csv;
...
public class CsvParser
    extends ParserMinimalBase
{
...
@Override
    public String getText() throws IOException, JsonParseException {
      if (_currToken == JsonToken.FIELD_NAME) {
       return _currentName;
      }
       return _currentValue;
    }
...
}

@cowtowncoder
Copy link
Member

@gribr Thank you for digging into this. I will have a look now.

@cowtowncoder cowtowncoder added this to the 2.3.4 milestone May 17, 2014
@cowtowncoder
Copy link
Member

Thank you for troubleshooting this; fixed for 2.4.0 (and 2.3.4). Will also change UntypedObjectDeserializer to use getCurrentName(), but fixing bug makes sense.

cowtowncoder added a commit that referenced this issue May 17, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants