| 
 | 1 | +<?php  | 
 | 2 | + | 
 | 3 | +/**  | 
 | 4 | + * @copyright Copyright (C) Ibexa AS. All rights reserved.  | 
 | 5 | + * @license For full copyright and license information view LICENSE file distributed with this source code.  | 
 | 6 | + */  | 
 | 7 | +declare(strict_types=1);  | 
 | 8 | + | 
 | 9 | +namespace Ibexa\FieldTypeRichText\RichText;  | 
 | 10 | + | 
 | 11 | +use DOMDocument;  | 
 | 12 | +use DOMText;  | 
 | 13 | +use DOMXPath;  | 
 | 14 | +use RuntimeException;  | 
 | 15 | + | 
 | 16 | +/**  | 
 | 17 | + * @internal  | 
 | 18 | + */  | 
 | 19 | +final class XMLSanitizer  | 
 | 20 | +{  | 
 | 21 | +    public function sanitizeXMLString(string $xmlString): string  | 
 | 22 | +    {  | 
 | 23 | +        $xmlString = $this->decodeHTMLEntities($xmlString);  | 
 | 24 | +        $xmlString = $this->removeComments($xmlString);  | 
 | 25 | +        $xmlString = $this->removeDangerousTags($xmlString);  | 
 | 26 | +        $xmlString = $this->sanitizeDocType($xmlString);  | 
 | 27 | + | 
 | 28 | +        return $this->removeEmptyDocType($xmlString);  | 
 | 29 | +    }  | 
 | 30 | + | 
 | 31 | +    public function convertCDATAToText(DOMDocument $document): DOMDocument  | 
 | 32 | +    {  | 
 | 33 | +        $xpath = new DOMXPath($document);  | 
 | 34 | +        $cdataNodes = $xpath->query('//text()[ancestor-or-self::node()]');  | 
 | 35 | +        if ($cdataNodes === false) {  | 
 | 36 | +            return $document;  | 
 | 37 | +        }  | 
 | 38 | + | 
 | 39 | +        foreach ($cdataNodes as $cdataNode) {  | 
 | 40 | +            if ($cdataNode->nodeType === XML_CDATA_SECTION_NODE && $cdataNode->parentNode !== null) {  | 
 | 41 | +                $cdataNode->parentNode->replaceChild(new DOMText($cdataNode->textContent), $cdataNode);  | 
 | 42 | +            }  | 
 | 43 | +        }  | 
 | 44 | + | 
 | 45 | +        return $document;  | 
 | 46 | +    }  | 
 | 47 | + | 
 | 48 | +    private function decodeHTMLEntities(string $xmlString): string  | 
 | 49 | +    {  | 
 | 50 | +        return html_entity_decode($xmlString, ENT_XML1, 'UTF-8');  | 
 | 51 | +    }  | 
 | 52 | + | 
 | 53 | +    private function removeComments(string $xmlString): string  | 
 | 54 | +    {  | 
 | 55 | +        $xmlString = preg_replace('/<!--\s?.*?\s?-->/s', '', $xmlString);  | 
 | 56 | + | 
 | 57 | +        if ($xmlString === null) {  | 
 | 58 | +            $this->throwRuntimeException(__METHOD__);  | 
 | 59 | +        }  | 
 | 60 | + | 
 | 61 | +        return $xmlString;  | 
 | 62 | +    }  | 
 | 63 | + | 
 | 64 | +    private function removeDangerousTags(string $xmlString): string  | 
 | 65 | +    {  | 
 | 66 | +        $xmlString = preg_replace('/<\s*(script|iframe|object|embed|style)[^>]*>.*?<\s*\/\s*\1\s*>/is', '', $xmlString);  | 
 | 67 | + | 
 | 68 | +        if ($xmlString === null) {  | 
 | 69 | +            $this->throwRuntimeException(__METHOD__);  | 
 | 70 | +        }  | 
 | 71 | + | 
 | 72 | +        return $xmlString;  | 
 | 73 | +    }  | 
 | 74 | + | 
 | 75 | +    private function sanitizeDocType(string $xmlString): string  | 
 | 76 | +    {  | 
 | 77 | +        $pattern = '/<\s*!DOCTYPE\s+(?<name>[^\s>]+)\s*(\[(?<entities>.*?)\]\s*)?>/is';  | 
 | 78 | + | 
 | 79 | +        if (!preg_match($pattern, $xmlString, $matches)) {  | 
 | 80 | +            return $xmlString;  | 
 | 81 | +        }  | 
 | 82 | + | 
 | 83 | +        $docTypeName = $matches['name'];  | 
 | 84 | +        $entitiesBlock = $matches['entities'] ?? '';  | 
 | 85 | +        [$safeEntities, $removedEntities] = $this->filterEntitiesFromDocType($entitiesBlock);  | 
 | 86 | + | 
 | 87 | +        foreach ($removedEntities as $entity) {  | 
 | 88 | +            $xmlString = preg_replace('/&' . preg_quote($entity, '/') . ';/i', '', $xmlString);  | 
 | 89 | + | 
 | 90 | +            if ($xmlString === null) {  | 
 | 91 | +                $this->throwRuntimeException(__METHOD__);  | 
 | 92 | +            }  | 
 | 93 | +        }  | 
 | 94 | + | 
 | 95 | +        $safeDocType = sprintf('<!DOCTYPE %s [ %s ]>', $docTypeName, implode("\n", $safeEntities));  | 
 | 96 | +        $xmlString = preg_replace($pattern, $safeDocType, $xmlString);  | 
 | 97 | + | 
 | 98 | +        if ($xmlString === null) {  | 
 | 99 | +            $this->throwRuntimeException(__METHOD__);  | 
 | 100 | +        }  | 
 | 101 | + | 
 | 102 | +        return $xmlString;  | 
 | 103 | +    }  | 
 | 104 | + | 
 | 105 | +    private function removeEmptyDocType(string $xmlString): string  | 
 | 106 | +    {  | 
 | 107 | +        $xmlString = preg_replace('/<\s*!DOCTYPE\s+[^\[\]>]*\[\s*\]>/is', '', $xmlString);  | 
 | 108 | + | 
 | 109 | +        if ($xmlString === null) {  | 
 | 110 | +            $this->throwRuntimeException(__METHOD__);  | 
 | 111 | +        }  | 
 | 112 | + | 
 | 113 | +        return $xmlString;  | 
 | 114 | +    }  | 
 | 115 | + | 
 | 116 | +    /**  | 
 | 117 | +     * @return array<int, array<int, string>>  | 
 | 118 | +     */  | 
 | 119 | +    private function filterEntitiesFromDocType(string $entitiesBlock): array  | 
 | 120 | +    {  | 
 | 121 | +        $lines = explode("\n", $entitiesBlock);  | 
 | 122 | +        $safeEntities = [];  | 
 | 123 | +        $entitiesToRemove = [];  | 
 | 124 | +        $entityDefinitions = [];  | 
 | 125 | + | 
 | 126 | +        foreach ($lines as $line) {  | 
 | 127 | +            $line = trim($line);  | 
 | 128 | + | 
 | 129 | +            if (preg_match('/<!ENTITY\s+(\S+)\s+(SYSTEM|PUBLIC)\s+/i', $line, $matches)) {  | 
 | 130 | +                $entitiesToRemove[] = $matches[1];  | 
 | 131 | +                continue;  | 
 | 132 | +            }  | 
 | 133 | + | 
 | 134 | +            if (!preg_match('/<!ENTITY\s+(\S+)\s+"([^"]+)"/', $line, $matches)) {  | 
 | 135 | +                continue;  | 
 | 136 | +            }  | 
 | 137 | + | 
 | 138 | +            $entityName = $matches[1];  | 
 | 139 | +            $entityValue = $matches[2];  | 
 | 140 | +            $entityDefinitions[$entityName] = $entityValue;  | 
 | 141 | + | 
 | 142 | +            if (preg_match('/&\S+;/', $entityValue)) {  | 
 | 143 | +                $entitiesToRemove[] = $entityName;  | 
 | 144 | +                continue;  | 
 | 145 | +            }  | 
 | 146 | + | 
 | 147 | +            $safeEntities[] = $line;  | 
 | 148 | +        }  | 
 | 149 | + | 
 | 150 | +        $entitiesToRemove = $this->resolveRecursiveEntities($entityDefinitions, $entitiesToRemove);  | 
 | 151 | +        $safeEntities = array_filter($safeEntities, function ($line) use ($entitiesToRemove) {  | 
 | 152 | +            return !$this->containsUnsafeEntity($line, $entitiesToRemove);  | 
 | 153 | +        });  | 
 | 154 | + | 
 | 155 | +        return [$safeEntities, $entitiesToRemove];  | 
 | 156 | +    }  | 
 | 157 | + | 
 | 158 | +    /**  | 
 | 159 | +     * @param array<int, string> $entitiesToRemove  | 
 | 160 | +     * @param array<string, string> $entityDefinitions  | 
 | 161 | +     *  | 
 | 162 | +     * @return array<int, string>  | 
 | 163 | +     */  | 
 | 164 | +    private function resolveRecursiveEntities(array $entityDefinitions, array $entitiesToRemove): array  | 
 | 165 | +    {  | 
 | 166 | +        foreach ($entityDefinitions as $name => $value) {  | 
 | 167 | +            foreach ($entitiesToRemove as $toRemove) {  | 
 | 168 | +                if (strpos($value, "&$toRemove;") !== false && !in_array($name, $entitiesToRemove, true)) {  | 
 | 169 | +                    $entitiesToRemove[] = $name;  | 
 | 170 | +                }  | 
 | 171 | +            }  | 
 | 172 | +        }  | 
 | 173 | + | 
 | 174 | +        return array_unique($entitiesToRemove);  | 
 | 175 | +    }  | 
 | 176 | + | 
 | 177 | +    /**  | 
 | 178 | +     * @param array<int, string> $entitiesToRemove  | 
 | 179 | +     */  | 
 | 180 | +    private function containsUnsafeEntity(string $line, array $entitiesToRemove): bool  | 
 | 181 | +    {  | 
 | 182 | +        foreach ($entitiesToRemove as $toRemove) {  | 
 | 183 | +            if (strpos($line, $toRemove) !== false) {  | 
 | 184 | +                return true;  | 
 | 185 | +            }  | 
 | 186 | +        }  | 
 | 187 | + | 
 | 188 | +        return false;  | 
 | 189 | +    }  | 
 | 190 | + | 
 | 191 | +    /**  | 
 | 192 | +     * @return never  | 
 | 193 | +     */  | 
 | 194 | +    private function throwRuntimeException(string $functionName): void  | 
 | 195 | +    {  | 
 | 196 | +        throw new RuntimeException(  | 
 | 197 | +            sprintf('%s returned null for "$xmlString", error: %s', $functionName, preg_last_error_msg())  | 
 | 198 | +        );  | 
 | 199 | +    }  | 
 | 200 | +}  | 
0 commit comments