Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set dtype for qrels columns at read time in io method #254

Merged
merged 3 commits into from
Dec 20, 2021

Conversation

jjdelvalle
Copy link
Contributor

The current approach allows pandas to infer the type of each column at read time and then the types for the qid and docno columns are forced to be str. This could result in unexpected behavior if one of those fields is number-like with leading zeros.

This would make it so a docno or qid such as 0001 is read as 1 by pandas, and then casted to "1".

This change makes it so, pandas gets the type explicitly and the field can be read directly as "0001", thus conserving the leading zeros.

(I also removed some trailing space in the source code)

@cmacdonald cmacdonald added this to the 0.7 milestone Dec 20, 2021
@cmacdonald cmacdonald merged commit 2b60572 into terrier-org:master Dec 20, 2021
@cmacdonald
Copy link
Contributor

Good spot, and thanks for this PR. I updated a corresponding test case.

@jjdelvalle jjdelvalle deleted the qrels_dtype branch December 20, 2021 11:16
@jjdelvalle
Copy link
Contributor Author

Good thinking about the test case and thanks for accepting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants