Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong worksheet retrieved from xlsx file exported from Google Sheets #104

Closed
jennybc opened this issue May 23, 2015 · 2 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@jennybc
Copy link
Member

jennybc commented May 23, 2015

When I use read_excel() on an xlsx file obtained by downloading a Google Sheet as xlsx, the sheet = argument doesn't behave as expected. I don't get the sheet I request, regardless of whether I provide an integer or sheet name. If I open the file in Excel, re-save and try again, everything works fine. I've confirmed this with xlxs files from two different Google Sheets. I don't know if Google Sheets exports weird/broken xlsx or what?

Here's a complete example:

https://gist.github.com/jennybc/279ab4675f5f4d275350#file-2015-05-23_read-excel-sheet-md

@richfitz
Copy link

With the example Jenny posted, it does not matter if integer or string numbers are used becuase readxl:::xlsx_sheets does the right thing there.

The issue is that for google sheets generated files the assumptions about numbering of the worksheet files within the bundle is incorrect. The constructor for XlsxWorkSheet makes the assumption that the ith sheet is in the file:

std::string sheetPath = tfm::format("xl/worksheets/sheet%i.xml", i + 1);

but the correct place to look for the integer -> filename mapping seems to be in the xl/_rels/workbook.xml.rels file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet3.xml"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet4.xml"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet5.xml"/>
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet2.xml"/>
</Relationships>

It is possible that the order here is the Correct order (I've not looked at the spec) but this is the same order as the workbook. So if we want the "first" sheet it is actually "worksheets/sheet3.xml" rather than "worksheets/sheet1.xml".

@jennybc
Copy link
Member Author

jennybc commented May 22, 2016

Poking around in xlsx for my own purposes. I am staring at a clean table of worksheet info, so recording here in case it is helpful.

A data frame with one row per worksheet in mini-gap.xlsx (a Google Sheet version).

## $sheets_df
## Source: local data frame [5 x 5]
## 
##   sheetId     name    Id                   Target
##     <int>    <chr> <chr>                    <chr>
## 1       1   Africa  rId3 xl/worksheets/sheet4.xml
## 2       2 Americas  rId4 xl/worksheets/sheet3.xml
## 3       3     Asia  rId5 xl/worksheets/sheet5.xml
## 4       4   Europe  rId6 xl/worksheets/sheet1.xml
## 5       5  Oceania  rId7 xl/worksheets/sheet2.xml
  • sheetId: from xl/workbook.xml
  • name: from xl/workbook.xml
  • Id: from xl/_rels/workbook.xml.rels; a.k.a. id in xl/workbook.xml
  • Target: from xl/_rels/workbook.xml.rels

xl/workbook.xml links worksheet name or integer index to Id and xl/_rels/workbook.xml.rels links Id to target xml.

@jennybc jennybc changed the title Puzzling behaviour of sheet argument on xlsx file exported from Google Sheets Wrong worksheet retrieved from xlsx file exported from Google Sheets Jan 17, 2017
@jennybc jennybc added bug an unexpected problem or unintended behavior sheet_rels labels Jan 20, 2017
@lock lock bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants