70 Commits

Author SHA1 Message Date
Adrien Loison
a8eb7ad39c Shared strings table without uniqueCount and count should work (#269)
Use file based strategy in this case
2016-07-11 19:03:37 +02:00
Adrien Loison
ffea8871a6 Add support for missing cell reference (#268)
When describing a cell, the cell reference (r="A1") is optional.
When not present, we should just increment the index of the last processed row.
2016-07-11 18:15:55 +02:00
rlukasz
aa25678a83 Update RowIterator.php (#263) 2016-07-04 11:31:03 +02:00
Adrien Loison
1891c0b053 Fix XLSX reading when shared strings is missing the uniqueCount attribute (#255)
Use "count" attribute as a fallback
2016-06-16 10:06:11 -07:00
madflow
cd38ba093e Fix #245 (#246) 2016-06-08 09:50:00 -07:00
Adrien Loison
1d3a9f939c Convert escapers to singletons (#239) 2016-05-30 13:55:21 -07:00
Adrien Loison
251c0bebc1 Adding open_file_in_zip() helper function to XMLReader (#238) 2016-05-29 23:22:57 -07:00
Adrien Loison
03866a6604 Support XLSX with prefixed XML files (#237)
While the standard is not to have prefixes, some XLSX files have XML files containing a prefix.
Microsoft has a tool that generates such files: https://msdn.microsoft.com/en-us/library/office/gg278316.aspx
2016-05-29 22:16:59 -07:00
Adrien Loison
2c80b1f23a XLSX Reader should add a space between text nodes (#229)
When a cell contains multiple text nodes, the cell value is currently obtained by concatenating the value of each text node.
Instead, values should still be concatenated but a space should be added in between.
2016-05-23 14:15:48 -07:00
Adrien Loison
104cd9b811 Option to return formatted dates instead of PHP objects (#226)
When reading spreadsheets, Spout should be able to return formatted dates, as shown when opened with Excel for instance.
It currently only returns DateTime/DateInterval objects, making it impossible to read + write, as the Writer does not accept objects.
2016-05-20 16:08:35 -07:00
madflow
2d923c7e46 Fix issue #218 (#222) 2016-05-20 09:32:47 -07:00
Adrien Loison
b4724906c4 Add support for cells formatted as time (#224)
Cells formatted as "time" have values between 0 and 1. These values used to be considered as invalid.
Note: this uses what was started in #202
2016-05-19 13:10:47 -07:00
Adrien Loison
b8fd789ac0 Retrieve XLSX sheets in order of appearance (#220)
Instead of relying on the ID, sheets should be retrieved in the order they appear in the file.
Workbook.xml describes the correct order.
This allows the reader to read data in the correct order when sheets have been manually moved after creation.
2016-05-19 10:37:48 -07:00
Adrien Loison
5a7c2c1262 Handle General number format as non date (#221)
If the number format is set to General (id = 0), do no try to format the value as a date
2016-05-19 09:40:12 -07:00
madflow
6f0f7c9690 Fix #195 2016-04-06 22:00:47 +02:00
skeleton
d6e8fe4b54 Fix line breaks on CSV reader 2016-03-23 23:26:49 +01:00
madflow
30837f869d Coding style and typos 2016-03-20 08:46:30 +01:00
madflow
e60054f3c4 More explicit rule for ignoring empty placeholder cells in Excel ODS #184 2016-03-19 11:34:32 +01:00
madflow
3ee7099c95 Fix zeros treated as missing values #184 2016-03-19 11:34:32 +01:00
Adrien Loison
d2ac54c578 Custom stream wrapper support
Added support for custom stream wrappers, such as "fly" or "s3".
Support is determined per reader.
2016-03-18 17:09:13 -07:00
Sebastian Fichera
8614f79da3 Minor fixes in order to be ok with naming conventions and code documentation... 2016-02-11 17:51:24 -06:00
Sebastian Fichera
03e85ffc21 Added EOL configuration support while reading CSV files...
Enhancement for #172 issue…
2016-02-11 17:12:54 -06:00
Adrien Loison
4a5da2ad74 Fix CellValueFormatter for numeric values
The value passed into the format() function is coming from an XML file and has never been coerced.
Therefore, when checking is_int($value), the check always returns false - because it's a string.
Changing the check fixes the issue and Spout now correctly parses large numbers.
2016-01-14 11:11:31 -08:00
Adrien Loison
a804be4844 Support XLSX that are defined in random order
Some software generate [Content_Types].xml file with sheets definition in random order.
Instead of having the first sheet (id = 1) defined first, it may be defined in 3rd position.
Therefore, to read the file in the correct order, sheets order need to be fixed.
2016-01-08 08:42:29 -08:00
Ingmar Runge
4407cffeff XLSX Date Support / Test + Fix for years beyond 2037
This also fixes years < 1902 on 32-bit PHP systems.
2015-12-17 08:52:15 +01:00
Adrien Loison
f55520661e Various speed improvements 2015-11-12 13:55:25 -08:00
Adrien Loison
8b666fc6cd Fix PHPDoc to work with Augmented Types 2015-11-05 15:48:26 -08:00
Adrien Loison
8ef6bdac62 Better date support
Although Excel has a Date type, older Excel versions use numeric values to store dates.
The value represents the number of days since Jan 1st, 1900.
The only way to tell if the value is a number or a date is to look at the styles.xml and check if the cell has date formatting.
2015-10-23 16:04:38 -07:00
Adrien Loison
3395d3abb3 Increase max read bytes per line for CSV
Specify a bigger value than the default one to support long lines.
2015-10-22 10:54:12 -07:00
Adrien Loison
01cc8b3da0 Fix "Cannot open file" issue with XMLReader::open on Windows
This occurred when using relative paths. Using realpath() solves this issue.
2015-10-15 09:19:47 -07:00
Adrien Loison
a1a1077677 Fix infinite loop for CSV with all lines empty
Only occured with multiline CSV files
2015-10-05 21:10:41 +02:00
Adrien Loison
f8c39287ad Added @api tag for documentation 2015-09-04 11:43:01 -07:00
Adrien Loison
818ec2488c Support all ODS cell types
Including:
- date / time
- currency
- percentage
- void

And improved support for boolean
2015-09-02 14:03:38 -07:00
Adrien Loison
d6e707c5fe Moved cell value formatting logic into formatters 2015-09-02 00:12:59 -07:00
Adrien Loison
0a5be41c53 Remove unused isInsideRowTag 2015-09-01 10:59:33 -07:00
Adrien Loison
e4154dfdc3 ODS Reader
Spout can now read ODS files.
It's on par with the XLSX reader. The only difference is that the row iterator cannot be rewound.
It supports the different output formats from LibreOffice and Excel, skipping extra rows/cells if needed.
2015-09-01 10:53:49 -07:00
Adrien Loison
5949cb2442 ODS writer
Added ODS writer
Refactored XLSX writer to abstract some pieces into an abstract multi-sheets writer
Created an abstract style helper
Moved shared components around
2015-08-28 20:19:45 -07:00
Adrien Loison
2183ff6738 Replace == with === 2015-08-10 19:13:40 -07:00
Adrien Loison
8a3b895afc Fix CSV reader when last line is empty
If the last line was empty, it would create an infinite loop...
2015-07-29 10:17:51 -07:00
Adrien Loison
93d7aafe8b Fix XMLReader open() overriding
This is to avoid a warning in PHP7 (and also because that's how it should be!)
2015-07-29 09:59:33 -07:00
Adrien Loison
5e1cfbfdbd Attempt to convert the non UTF-8 strings to UTF-8 2015-07-27 20:59:12 -07:00
Adrien Loison
d946f12951 Support for multiple BOMs depending on the selected encoding 2015-07-27 09:36:55 -07:00
Adrien Loison
1ba10ed2b0 Add wrappers around XMLReader and SimpleXMLElement to improve error handling 2015-07-27 00:49:43 -07:00
Adrien Loison
37d87a8a27 Fix various problems 2015-07-27 00:23:18 -07:00
Adrien Loison
86a4c3790a Adding more tests 2015-07-26 23:53:49 -07:00
Adrien Loison
15aab7902a Factory should return Interface 2015-07-26 23:53:17 -07:00
Adrien Loison
c52dd7bde8 Remove old reader files 2015-07-26 23:53:17 -07:00
Adrien Loison
ae3ee357ff Moved readers to iterators
Instead of the hasNext() / next() syntax, readers now implements the PHP iterator pattern.
It allows readers to be used with a foreach() loop.

All readers now share the same structure (CSV is treated as having exactly one sheet):
- one concrete Reader
- one SheetIterator, exposed by the Reader
- one or more Sheets, returned at every iteration
- one RowIterator, exposed by the Sheet

Introducing the concept of sheets for CSV may be kind of confusing but it makes Spout way more consistent.
Also, this confusion may be resolved by creating a wrapper around the readers if needed.

-- This commit does not delete the old files, not change the folder structure for Writers. This will be done in another commit.
2015-07-26 23:53:17 -07:00
Adrien Loison
6ae79b63b3 Merge pull request #67 from box/caching_strategies
Caching strategies
2015-07-14 10:58:37 -07:00
Adrien Loison
494c506d56 Add logic to automatically select the best caching strategy
Based on the number of unique shared strings as well as the available memory amount,
one strategy will be chosen over the other.
The algorithm is based on empirical data and super safe so it may need to be tuned.
2015-07-14 02:26:01 -07:00