* Add option to preserve empty rows when reading an XLSX file
* Add option to preserve empty rows when reading a CSV file
* Add option to preserve empty rows when reading an ODS file
- To determine if a style should apply a date format, the presence of "applyNumberFormat" attribute on the "cellXfs" section of styles.xml is now optional. We only look at the "numFmtId" attribute (but early return if "applyNumberFormat" is set to "0").
- The format code can contain lowercase AND now uppercase characters as its pattern.
- "General" format code used as a custom format is now supported. It seems to be used by a bunch of programs...
When a cell contains multiple text nodes, the cell value is currently obtained by concatenating the value of each text node.
Instead, values should still be concatenated but a space should be added in between.
When reading spreadsheets, Spout should be able to return formatted dates, as shown when opened with Excel for instance.
It currently only returns DateTime/DateInterval objects, making it impossible to read + write, as the Writer does not accept objects.
Instead of relying on the ID, sheets should be retrieved in the order they appear in the file.
Workbook.xml describes the correct order.
This allows the reader to read data in the correct order when sheets have been manually moved after creation.
The value passed into the format() function is coming from an XML file and has never been coerced.
Therefore, when checking is_int($value), the check always returns false - because it's a string.
Changing the check fixes the issue and Spout now correctly parses large numbers.
Some software generate [Content_Types].xml file with sheets definition in random order.
Instead of having the first sheet (id = 1) defined first, it may be defined in 3rd position.
Therefore, to read the file in the correct order, sheets order need to be fixed.
Although Excel has a Date type, older Excel versions use numeric values to store dates.
The value represents the number of days since Jan 1st, 1900.
The only way to tell if the value is a number or a date is to look at the styles.xml and check if the cell has date formatting.
Spout can now read ODS files.
It's on par with the XLSX reader. The only difference is that the row iterator cannot be rewound.
It supports the different output formats from LibreOffice and Excel, skipping extra rows/cells if needed.
Instead of the hasNext() / next() syntax, readers now implements the PHP iterator pattern.
It allows readers to be used with a foreach() loop.
All readers now share the same structure (CSV is treated as having exactly one sheet):
- one concrete Reader
- one SheetIterator, exposed by the Reader
- one or more Sheets, returned at every iteration
- one RowIterator, exposed by the Sheet
Introducing the concept of sheets for CSV may be kind of confusing but it makes Spout way more consistent.
Also, this confusion may be resolved by creating a wrapper around the readers if needed.
-- This commit does not delete the old files, not change the folder structure for Writers. This will be done in another commit.
Based on the number of unique shared strings as well as the available memory amount,
one strategy will be chosen over the other.
The algorithm is based on empirical data and super safe so it may need to be tuned.
In-memory implementation using SplFixedArray
Updated code and tests to support errors when reading XML nodes (useful when reading XML files used for attacks)
Removed LIBXML_NOENT option (which DOES substitute entities...)
Added test for Quadratic Blowup attack
Added LIBXML_NOENT option when reading a XML file
libxml_disable_entity_loader(true) cannot be used because it disables
the use of XMLReader::open()... see https://bugs.php.net/bug.php?id=62577
Added proper support for booleans, dates, numbers, errors.
Added unescaping of the read string.
Fixed a bug when cells did not have any values => now returns empty string.