Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image missing #2396

Closed
DeekshaPrabhub96 opened this issue Nov 13, 2021 · 8 comments · Fixed by #3130
Closed

Image missing #2396

DeekshaPrabhub96 opened this issue Nov 13, 2021 · 8 comments · Fixed by #3130

Comments

@DeekshaPrabhub96
Copy link

This is: In one one of the scenario I observed the image is missing from a sheet. I have mentioned the steps to reproduce below.

What is the expected behavior?

Image shouldn't be missing.

What is the current behavior?

Image is getting removed.

What are the steps to reproduce?

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:
Steps:
Create a Xlsx with 7 sheets. Add image to sheets 1,2,4,7.
Add a List Box from Developer tab to sheet 6
Add a button to Sheet 7 along with a image.

After opening and saving the file with different name using PHPSpreadsheet, I see the image is missing from sheet 4. I have attached a sample excel where issue can be easily reproduced.

<?php
$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();

// add code that show the issue here...

$spreadsheet = $reader->load($file);

$writer = new Xlsx($spreadsheet);
$writer->save('TextColorOutput.xlsx');

Which versions of PhpSpreadsheet and PHP are affected?

1.19
ImageIssueInput.xlsx

@oleibman
Copy link
Collaborator

Working in 17.1, not working in 18. Similar to #2387 in that respect.

@oleibman
Copy link
Collaborator

I was hoping that some other work I was doing would help enlighten me about this. So far no dice. 2387 appears unrelated after all. This may be related to #1767, a problem I have not gained any traction on in some time.

@oleibman
Copy link
Collaborator

Both 17.1 and 18.0 are wrong. The problem lies deeper than the change between the two. The original spreadsheet has drawing files named drawing1/2/3/4/5.xml. Drawing 1 is associated with sheet 1, 2 with 2, drawing 3 with sheet 4, drawing 4 with sheet 6, and drawing 5 with sheet 7. The reader renumbers them to drawing 1/2/4/6/7. The drawing on sheet 6 is 'unparsed'. When the writer tries to add the unparsed data from sheet6, it does so using the original id drawing4.xml. This is in conflict with the renamed files. As luck would have it, 17.1 resolves the conflict by keeping the first drawing4.xml (which happens to result in a correct worksheet), and 18.0 resolves it by keeping the second one (which causes this problem). Something needs to be done with the reader and/or writer to avoid this name collision in the first place.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Nov 30, 2021
This is at least a partial fix for PHPOffice#2396 and PHPOffice#1767 (which has been around for a long time). PhpSpreadsheet renames drawing XML files when it reads them from a spreadsheet. However, when it writes unparsed drawing files, it uses the original names, which can result in a clash with the renamed files. The solution in this PR is to write the unparsed files using the same renaming convention as the the others.

This is an incredibly simple fix, basically a one-line change, for such a long-lived problem. It is conceivable that this PR breaks a more sophisticated file than I have come across, e.g. with multiple unparsed files associated with a single worksheet. However, this PR does fix at least part of the problem for both issues, and causes no regression issues. The changed code was covered in only 2 tests - Reader/XlsxTest testLoadSaveWithEmptyDrawings and Writer/Xlsx/UnparsedDataTest testLoadSaveXlsxWithUnparsedData.

2396 is covered by a new test Unparsed2396Test. I had trouble figuring out what to test for 1767. Since it is a problem that becomes evident only when the output file is opened in Excel, I added a new sample to cover it.
@oleibman
Copy link
Collaborator

Please test against 2423 if possible. It should fix the missing image problem. It doesn't do anything about ListBox or Button. My inclination would be to open a separate issue for those; I'm not sure whether they're supposed to be supported or not. I will probably leave this PR unmerged for a couple of weeks. The issue which I thought might be related does indeed seem to be at least partially fixed by this, and I want to allow for more testing there.

@oleibman
Copy link
Collaborator

oleibman commented Dec 3, 2021

Issue #1770 is probably relevant for ListBox and Button. Issue #1996 for Checkbox may also be relevant.

@DeekshaPrabhub96
Copy link
Author

@oleibman I commented some code from phpspreadsheet/src/PhpSpreadsheet/Writer/Xlsx.php, and that fixes the issue.

Commented from 430 to 438 in Writer/Xlsx.php(PhpSpreadsheet 1.19)
// if (isset($unparsedLoadedData['sheets'][$sheetCodeName]['Drawings'])) {
// foreach ($unparsedLoadedData['sheets'][$sheetCodeName]['Drawings'] as $relId => $drawingXml) {
// $drawingFile = array_search($relId, $unparsedLoadedData['sheets'][$sheetCodeName]['drawingOriginalIds']);
// if ($drawingFile !== false) {
// $drawingFile = ltrim($drawingFile, '.');
// $zipContent['xl' . $drawingFile] = $drawingXml;
// }
// }
// }

not sure if this will break any other things, could you please confirm if the changes makes sense?

@oleibman
Copy link
Collaborator

oleibman commented Dec 7, 2021

Have you tried testing against 2423? I think your proposal may cause lots of regression problems.

oleibman added a commit that referenced this issue Dec 10, 2021
* Name Clashes Between Parsed and Unparsed Drawings

This is at least a partial fix for #2396 and #1767 (which has been around for a long time). PhpSpreadsheet renames drawing XML files when it reads them from a spreadsheet. However, when it writes unparsed drawing files, it uses the original names, which can result in a clash with the renamed files. The solution in this PR is to write the unparsed files using the same renaming convention as the the others.

This is an incredibly simple fix, basically a one-line change, for such a long-lived problem. It is conceivable that this PR breaks a more sophisticated file than I have come across, e.g. with multiple unparsed files associated with a single worksheet. However, this PR does fix at least part of the problem for both issues, and causes no regression issues. The changed code was covered in only 2 tests - Reader/XlsxTest testLoadSaveWithEmptyDrawings and Writer/Xlsx/UnparsedDataTest testLoadSaveXlsxWithUnparsedData.

2396 is covered by a new test Unparsed2396Test. I had trouble figuring out what to test for 1767. Since it is a problem that becomes evident only when the output file is opened in Excel, I added a new sample to cover it.

* Sloppy Errors

I neglected to run php-cs-fixer and phpstan, and it bit me.

* Scrutinizer

It's not as good as Phpstan at recognizing problems that can't happen due to previous assertions.

* Scrutinizer Again

It can be really stupid sometimes.
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Dec 21, 2021
Fix PHPOffice#2396. Fix PHPOffice#1770.

I am starting this out in draft status, and will probably leave it that way for some time. I'm not sure where we want to go with this. It fixes some problems, but in a limited manner, and creates some others. I'm not sure the pain of the others is balanced considering the limitations of the fix. If enough interest is generated as a result of this ticket being out there, we can proceed; if not, it probably isn't worth it.

This fix allows form control elements to be read in and written out. It does not allow you to add such elements, nor even to locate them or determine their properties (so you can't modify or delete them). Although it handles reading and writing of sheets containing both form controls and comments, it will probably create a corrupt spreadsheet if you try adding a new comment to a sheet with form controls - probably quite difficult to solve. Cloning the sheet probably won't work either - probably easier than the other. It is conceivable that we want to add a new property to the Xlsx Reader which turns the reading of form elements on or off (default=off), so that negative effects will be limited to those who have explictly opted in. The change in its current form does not implement such a property.

Because of its limitations, the change isn't really testable. As in some other recent installs, I have added a sample to demonstrate that it works correctly.
@PowerKiKi
Copy link
Member

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Oct 20, 2022
This is a replacement for draft PR PHPOffice#2455 and draft PR PHPOffice#3127. There is some useful commentary in those PRs which I have mostly, but not entirely, duplicated below. Fix PHPOffice#2396. Fix PHPOffice#1770. Fix PHPOffice#2388.

A related problem is that the vml files used for the form controls sometimes contain invalid xml. Fix PHPOffice#3125 (rejected previous PR PHPOffice#1181 as too risky, issue was also reported as PHPOffice#170). Vml file should be valid Xml, but Excel can generate unclosed `<br>` tags, preventing Xlsx reader from reading file correctly. I believe a very narrowly targeted fix, changing `<br>` to `<br/>`, and only when reading vml files, probably mitigates the risk. The sample file formscomments.xlsx which is part of this change shows this problem with `Button 1` on sheet `Forms`; the spreadsheet was created with Excel 365, so the problem is not restricted to Excel 2013 as originally reported. A comment on PR 3127 indicates that other tags might be involved, but, without a file demonstrating that, I will restrict this change to br tags for now.

I am starting this out in draft status, and will probably leave it that way for some time. I'm not sure where we want to go with this. It fixes some problems, but in a limited manner, and creates some others. I'm not sure the pain of the others is balanced considering the limitations of the fix. If enough interest is generated as a result of this ticket being out there, we can proceed; if not, it probably isn't worth it.

This fix allows form control elements to be read in and written out. It does not allow you to add such elements, nor even to locate them or determine their properties (so you can't modify or delete them). Although it handles reading and writing of sheets containing both form controls and comments, it will probably create a corrupt spreadsheet if you try adding a new comment to a sheet with form controls - probably quite difficult to solve. Cloning the sheet probably won't work either - probably easier than the other. It is conceivable that we want to add a new property to the Xlsx Reader which turns the reading of form elements on or off (default=off), so that negative effects will be limited to those who have explictly opted in. The change in its current form does not implement such a property.

Because of its limitations, the change isn't really testable. As in some other recent installs, I have added a sample to demonstrate that it works correctly.

As it turns out, if we have a worksheet which contains both form controls and comments (see formscomments.xlsx which is part of this PR), PhpSpreadsheet already creates a corrupt file when it tries to load and save the spreadsheet with such a worksheet. With this change, the file is saved without corruption. This tilts things in favor of proceeding. I'm still not ready, but this will be an important consideration.

A sample file for issue PHPOffice#2621 illustrated a problem with shape files. Since they are involved here, I took a look at how the sample worked with this code. In master, and with this change, a corrupt file results. Fixing that is probably easier than the general problem of handling shape files, but it's an argument against moving this forward until the corruption problem can be addressed.

Fix PHPOffice#2661. A template including checkboxes was leading to file corruption solved by this PR. Another argument for moving forward.
oleibman added a commit that referenced this issue Dec 28, 2022
* WIP Limited Support for Form Controls V2 (ListBox, Buttons, etc.)

This is a replacement for draft PR #2455 and draft PR #3127. There is some useful commentary in those PRs which I have mostly, but not entirely, duplicated below. Fix #2396. Fix #1770. Fix #2388.

A related problem is that the vml files used for the form controls sometimes contain invalid xml. Fix #3125 (rejected previous PR #1181 as too risky, issue was also reported as #170). Vml file should be valid Xml, but Excel can generate unclosed `<br>` tags, preventing Xlsx reader from reading file correctly. I believe a very narrowly targeted fix, changing `<br>` to `<br/>`, and only when reading vml files, probably mitigates the risk. The sample file formscomments.xlsx which is part of this change shows this problem with `Button 1` on sheet `Forms`; the spreadsheet was created with Excel 365, so the problem is not restricted to Excel 2013 as originally reported. A comment on PR 3127 indicates that other tags might be involved, but, without a file demonstrating that, I will restrict this change to br tags for now.

I am starting this out in draft status, and will probably leave it that way for some time. I'm not sure where we want to go with this. It fixes some problems, but in a limited manner, and creates some others. I'm not sure the pain of the others is balanced considering the limitations of the fix. If enough interest is generated as a result of this ticket being out there, we can proceed; if not, it probably isn't worth it.

This fix allows form control elements to be read in and written out. It does not allow you to add such elements, nor even to locate them or determine their properties (so you can't modify or delete them). Although it handles reading and writing of sheets containing both form controls and comments, it will probably create a corrupt spreadsheet if you try adding a new comment to a sheet with form controls - probably quite difficult to solve. Cloning the sheet probably won't work either - probably easier than the other. It is conceivable that we want to add a new property to the Xlsx Reader which turns the reading of form elements on or off (default=off), so that negative effects will be limited to those who have explictly opted in. The change in its current form does not implement such a property.

Because of its limitations, the change isn't really testable. As in some other recent installs, I have added a sample to demonstrate that it works correctly.

As it turns out, if we have a worksheet which contains both form controls and comments (see formscomments.xlsx which is part of this PR), PhpSpreadsheet already creates a corrupt file when it tries to load and save the spreadsheet with such a worksheet. With this change, the file is saved without corruption. This tilts things in favor of proceeding. I'm still not ready, but this will be an important consideration.

A sample file for issue #2621 illustrated a problem with shape files. Since they are involved here, I took a look at how the sample worked with this code. In master, and with this change, a corrupt file results. Fixing that is probably easier than the general problem of handling shape files, but it's an argument against moving this forward until the corruption problem can be addressed.

Fix #2661. A template including checkboxes was leading to file corruption solved by this PR. Another argument for moving forward.

* Improved Sample File, and Documentation

Add more realistic worksheet to spreadsheet. Document new feature, adding caveats to how it can be used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants