38 Commits

Author SHA1 Message Date
Petr Skoda
816596183f Add full support for PHP 8.0
Unfortunately due to PHPUnit 8.5 dependency
this also drops support for PHP 7.1
2021-02-08 14:31:55 +01:00
Adrien Loison
1bbfd45b82 Support for missing styles XML file in XLSX
Some files don't have a "styles.xml" file. Excel supports these files, Spout should do too.
2019-07-20 16:48:51 +02:00
Adrien Loison
40ee386edd Add helper functions to create specific readers and writers
Removed the `ReaderEntityFactory::createReader(Type)` method and replaced it by 3 methods:
- `ReaderEntityFactory::createCSVReader()`
- `ReaderEntityFactory::createXLSXReader()`
- `ReaderEntityFactory::createODSReader()`

This has the advantage of enabling autocomplete in the IDE, as the return type is no longer the interface but the concrete type. Since readers may expose different options, this is pretty useful.

Similarly, removed the `WriterEntityFactory::createWriter(Type)` method and replaced it by 3 methods:
- `WriterEntityFactory::createCSVWriter()`
- `WriterEntityFactory::createXLSXWriter()`
- `WriterEntityFactory::createODSWriter()`

Since this is a breaking change, I also updated the Upgrade guide.
Finally, the doc is up to date too.
2019-05-17 21:22:03 +02:00
madflow
8a1c48b6b0 rename EntityFactory for writers and readers #526 2018-09-03 11:15:09 +02:00
Gabriel Caruso
4c7adbb33f Refactoring tests 2017-12-15 10:09:18 +01:00
Gabriel Caruso
0efdf48119 Support PHPUnit 6 2017-11-27 00:24:13 +01:00
Adrien Loison
78b6639480 Make XLSX reader return Row objects 2017-11-18 20:53:22 +01:00
Adrien Loison
a665b974fa Make CSV reader return Row objects 2017-11-18 19:08:27 +01:00
Adrien Loison
e2b519d6f9 Fetch XML file paths from Workbook Relationships 2017-11-11 15:25:12 +01:00
Adrien Loison
5a470188a9 Merge remote-tracking branch 'origin/master' into develop_3.0 2017-11-11 12:20:28 +01:00
Adrien Loison
3851e05f83 Remove @expectedException annotation 2017-11-05 02:21:09 +01:00
Adrien Loison
c74c0d9127 Add support for 1904 dates
This commit adds support for dates using the 1904 calendar (starting 1904-01-01 00:00:00).
It also fixes some issues with the dates in 1900 calendar (which now correctly start at 1899-12-30 00:00:00).
Finally, it is now possible to have negative timestamps, representing dates before the base date (and up to 0000-01-01 00:00:00), as per the SpreadsheetML specs. Note that some versions of Excel don't support negative dates...
2017-11-04 16:33:46 +01:00
Adrien Loison
b968513cb9 Fix code style 2017-09-06 00:33:43 +02:00
Adrien Loison
740fcfb8c1 Fix code before applying PHP CS Fixer 2017-09-06 00:33:43 +02:00
Adrien Loison
6d44cd26cc Fix prefixed shared strings XML file (#450)
A prefixed sharedStrings.xml file was not properly read, as we were comparing the un-prefixed name with the possible prefixed name.
Also, this commit contains a fix for sheets with rows not starting at column A.
2017-07-25 14:16:22 +02:00
Adrien Loison
048105461c Fix shared strings XML Entities auto decode (#411)
When converting an XMLReader node to a SimpleXMLElement, the conversion would automatically decode the XML entities. This resulted in a double decode.
For example: """ was converted to """ when imported into a SimpleXMLElement and was again converted into " (quote).

This commit changes the way the XLSX Shared Strings file is processed. It also changes the unescaping logic for both XLSX and ODS.

Finally, it removes any usage of the SimpleXML library (yay!).
2017-04-28 02:27:33 +02:00
Adrien Loison
0978d340f0 Option to keep empty rows (#331)
* Add option to preserve empty rows when reading an XLSX file
* Add option to preserve empty rows when reading a CSV file
* Add option to preserve empty rows when reading an ODS file
2016-10-17 10:20:02 -07:00
Adrien Loison
cc07072cbb Better support for Date custom format (#316)
- To determine if a style should apply a date format, the presence of "applyNumberFormat" attribute on the "cellXfs" section of styles.xml is now optional. We only look at the "numFmtId" attribute (but early return if "applyNumberFormat" is set to "0").
- The format code can contain lowercase AND now uppercase characters as its pattern.
- "General" format code used as a custom format is now supported. It seems to be used by a bunch of programs...
2016-09-24 10:46:42 -07:00
Adrien Loison
b75a3e34fc XLSX cells containing date values should respect shouldFormatDate option (#282)
Return the ISO 8601 date string directly if option is set
2016-07-20 20:12:00 -07:00
Adrien Loison
a8eb7ad39c Shared strings table without uniqueCount and count should work (#269)
Use file based strategy in this case
2016-07-11 19:03:37 +02:00
Adrien Loison
ffea8871a6 Add support for missing cell reference (#268)
When describing a cell, the cell reference (r="A1") is optional.
When not present, we should just increment the index of the last processed row.
2016-07-11 18:15:55 +02:00
Adrien Loison
1891c0b053 Fix XLSX reading when shared strings is missing the uniqueCount attribute (#255)
Use "count" attribute as a fallback
2016-06-16 10:06:11 -07:00
Adrien Loison
03866a6604 Support XLSX with prefixed XML files (#237)
While the standard is not to have prefixes, some XLSX files have XML files containing a prefix.
Microsoft has a tool that generates such files: https://msdn.microsoft.com/en-us/library/office/gg278316.aspx
2016-05-29 22:16:59 -07:00
Adrien Loison
104cd9b811 Option to return formatted dates instead of PHP objects (#226)
When reading spreadsheets, Spout should be able to return formatted dates, as shown when opened with Excel for instance.
It currently only returns DateTime/DateInterval objects, making it impossible to read + write, as the Writer does not accept objects.
2016-05-20 16:08:35 -07:00
Adrien Loison
b4724906c4 Add support for cells formatted as time (#224)
Cells formatted as "time" have values between 0 and 1. These values used to be considered as invalid.
Note: this uses what was started in #202
2016-05-19 13:10:47 -07:00
Adrien Loison
b8fd789ac0 Retrieve XLSX sheets in order of appearance (#220)
Instead of relying on the ID, sheets should be retrieved in the order they appear in the file.
Workbook.xml describes the correct order.
This allows the reader to read data in the correct order when sheets have been manually moved after creation.
2016-05-19 10:37:48 -07:00
madflow
616925148e Renamed xlsx file, #195 2016-04-07 08:58:54 +02:00
madflow
6f0f7c9690 Fix #195 2016-04-06 22:00:47 +02:00
madflow
2b1160bb33 Tests for #184 2016-03-19 11:34:31 +01:00
Adrien Loison
d2ac54c578 Custom stream wrapper support
Added support for custom stream wrappers, such as "fly" or "s3".
Support is determined per reader.
2016-03-18 17:09:13 -07:00
Adrien Loison
a804be4844 Support XLSX that are defined in random order
Some software generate [Content_Types].xml file with sheets definition in random order.
Instead of having the first sheet (id = 1) defined first, it may be defined in 3rd position.
Therefore, to read the file in the correct order, sheets order need to be fixed.
2016-01-08 08:42:29 -08:00
Adrien Loison
8ef6bdac62 Better date support
Although Excel has a Date type, older Excel versions use numeric values to store dates.
The value represents the number of days since Jan 1st, 1900.
The only way to tell if the value is a number or a date is to look at the styles.xml and check if the cell has date formatting.
2015-10-23 16:04:38 -07:00
Adrien Loison
e4154dfdc3 ODS Reader
Spout can now read ODS files.
It's on par with the XLSX reader. The only difference is that the row iterator cannot be rewound.
It supports the different output formats from LibreOffice and Excel, skipping extra rows/cells if needed.
2015-09-01 10:53:49 -07:00
Adrien Loison
1ba10ed2b0 Add wrappers around XMLReader and SimpleXMLElement to improve error handling 2015-07-27 00:49:43 -07:00
Adrien Loison
37d87a8a27 Fix various problems 2015-07-27 00:23:18 -07:00
Adrien Loison
86a4c3790a Adding more tests 2015-07-26 23:53:49 -07:00
Adrien Loison
c52dd7bde8 Remove old reader files 2015-07-26 23:53:17 -07:00
Adrien Loison
ae3ee357ff Moved readers to iterators
Instead of the hasNext() / next() syntax, readers now implements the PHP iterator pattern.
It allows readers to be used with a foreach() loop.

All readers now share the same structure (CSV is treated as having exactly one sheet):
- one concrete Reader
- one SheetIterator, exposed by the Reader
- one or more Sheets, returned at every iteration
- one RowIterator, exposed by the Sheet

Introducing the concept of sheets for CSV may be kind of confusing but it makes Spout way more consistent.
Also, this confusion may be resolved by creating a wrapper around the readers if needed.

-- This commit does not delete the old files, not change the folder structure for Writers. This will be done in another commit.
2015-07-26 23:53:17 -07:00