A successor to previous performance issue (#3). Both `quick=TRUE` and `quick=FALSE` methods will now handle text processing (not including comment text in cell content, repeated whitespace and multi-line cell content) without significant drains on performance. Performance is broadly comparable to that provided by `{readODS}` for the standard example file in regards execution time, but perhaps understandably (due to what it is extracting) uses more memory. ```r # Basic example file (sheet 2) extraction comparison #> # A tibble: 7 × 5 #> expression min median mem_alloc n_itr #> <bch:expr> <bch:tm> <bch:tm> <bch:byt> <int> #> 1 cells_quick 38.3ms 38.8ms 536.92KB 13 #> 2 cells_slow 62.4ms 65.8ms 1.01MB 8 #> 3 sheet_quick 44.5ms 49.1ms 617.89KB 11 #> 4 sheet_slow 68.2ms 79.6ms 1.1MB 7 #> 5 readODS 43.9ms 46.1ms 375.06KB 11 ``` Performance for large files when using `quick=TRUE` is quicker than `{readODS}` and only slightly slower when `quick=FALSE`, interestingly all `{tidyods}` extraction approaches use notably less memory than `{readODS}`. ```r # Postcode example file (sheet 2) extraction comparison #> # A tibble: 7 × 5 #> expression min median mem_alloc n_itr #> <bch:expr> <bch:tm> <bch:tm> <bch:byt> <int> #> 1 cells_quick 9.94s 10.44s 171.52MB 5 #> 2 cells_slow 14.93s 16.51s 289.8MB 5 #> 3 sheet_quick 10.26s 10.49s 187.64MB 5 #> 4 sheet_slow 15.47s 16.18s 311.19MB 5 #> 5 readODS 13.63s 13.83s 2.53GB 5 ``` --- Performance bottlenecks are now largely due to `{xml2}` (and underlying `libxml2`) limitations that cannot be overcome without writing independent C/C++ code to handle XML extraction. A critical limitation of libxml2 is its requirement for available memory 4 times the file size. > _In general for a balanced textual document the internal memory requirement is about 4 times the size of the UTF8 serialization of this document (example the XML-1.0 recommendation is a bit more of 150KBytes and takes 650KBytes of main memory when parsed)_ > [_GNOME libxml 2 documentation_](https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Memory-management#general-memory-requirements) As a precaution `{tidyods}` checks the size of the `content.xml` file inside the ODS zip container and compares this to the available memory reported by `ps::ps_system_memory()` to determine whether the XML can be safely processed. This check is an internal function that throws an error when the XML is too large and invisibly returns TRUE if the XML is an ok size, the internal function has a `verbose` argument if you want to get a report on the file size, processing requirement and available memory. ```r tidyods::check_xml_memory("path/to/small_file.ods") #> Error in `check_xml_memory()`: #> ! ODS file is too large to process #> ℹ ODS XML is estimated to need 7.74 GB of memory, uncompressed content.xml #> file within path/to/small_file.ods is 1.93 GB in size. #> ✖ Available system memory is estimated at 1.50 GB tidyods:::check_xml_memory("path/to/small_file.ods", verbose = TRUE) #> ℹ ODS XML is estimated to need 228.76 kB of memory, uncompressed content.xml #> file within path/to/small_file.ods is 57.19 kB in size. #> ✔ Available system memory is estimated at 1.55 GB ```