Skip to content

Commit e4efeff

Browse files
committed
Allow Xlsx Reader to Specify ParseHuge Master
Fix #4260. A number of Security Advisories related to libxml_options were opened. In the end, we disabled the ability to specify any libxml_options. However, some users were adversely affected because they needed LIBXML_PARSEHUGE for some of their files. Having finally obtained access to a file demonstrating this problem, we can restore this ability. - The operation is potentially dangerous, a vector for memory leaks and out-of-memory errors. It is not recommended unless absolutely needed. - It will not be permitted as a global (static) property with the ability to adversely affect other users on the same server. - It will instead be implemented as an instance property of Xlsx Reader (default to false), with a setter. I do not see a use case for a getter. - People will need to set this property individually for each file which they think needs it. - This change will be backported to all supported releases. - The sheer size and processing time for the file involved makes it impractical to add a formal test case. It has, nevertheless, been tested satisfactorily.
1 parent 44c3bd5 commit e4efeff

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

src/PhpSpreadsheet/Reader/Xlsx.php

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,19 @@ class Xlsx extends BaseReader
6161
/** @var SharedFormula[] */
6262
private array $sharedFormulae = [];
6363

64+
private bool $parseHuge = false;
65+
66+
/**
67+
* Allow use of LIBXML_PARSEHUGE.
68+
* This option can lead to memory leaks and failures,
69+
* and is not recommended. But some very large spreadsheets
70+
* seem to require it.
71+
*/
72+
public function setParseHuge(bool $parseHuge): void
73+
{
74+
$this->parseHuge = $parseHuge;
75+
}
76+
6477
/**
6578
* Create a new Xlsx Reader instance.
6679
*/
@@ -124,8 +137,8 @@ private function loadZip(string $filename, string $ns = '', bool $replaceUnclose
124137
}
125138
$rels = @simplexml_load_string(
126139
$this->getSecurityScannerOrThrow()->scan($contents),
127-
'SimpleXMLElement',
128-
0,
140+
SimpleXMLElement::class,
141+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
129142
$ns
130143
);
131144

@@ -139,8 +152,8 @@ private function loadZipNonamespace(string $filename, string $ns): SimpleXMLElem
139152
$contents = $this->getFromZipArchive($this->zip, $filename);
140153
$rels = simplexml_load_string(
141154
$this->getSecurityScannerOrThrow()->scan($contents),
142-
'SimpleXMLElement',
143-
0,
155+
SimpleXMLElement::class,
156+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
144157
($ns === '' ? $ns : '')
145158
);
146159

@@ -259,7 +272,9 @@ public function listWorksheetInfo(string $filename): array
259272
$this->zip,
260273
$fileWorksheetPath
261274
)
262-
)
275+
),
276+
null,
277+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
263278
);
264279
$xml->setParserProperty(2, true);
265280

@@ -2043,7 +2058,9 @@ private function readRibbon(Spreadsheet $excel, string $customUITarget, ZipArchi
20432058
// exists and not empty if the ribbon have some pictures (other than internal MSO)
20442059
$UIRels = simplexml_load_string(
20452060
$this->getSecurityScannerOrThrow()
2046-
->scan($dataRels)
2061+
->scan($dataRels),
2062+
SimpleXMLElement::class,
2063+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
20472064
);
20482065
if (false !== $UIRels) {
20492066
// we need to save id and target to avoid parsing customUI.xml and "guess" if it's a pseudo callback who load the image

0 commit comments

Comments
 (0)