Reading rows from excel will skip blank leading cells.
![]() |
10 KB
![]() |
8 KB
I've checked in a fix as SVN #29858 for ExcelTransformer.writeWorksheetToCSV(). Fung just needs to code review.
You can use attached file with this command :
$:1> excel2csv -c 'utf-8' -d '|' -l 'unix' -DQ 'square-bracket' -ED false -s Excel-Test.xls
"ID"|"First Name"|"Last Name"|"Country"|"Age"|"Salary"|"Hourly"|"Hire Date"
1|"Bob"|"送り仮名"|"USA"|20|1000|1.23|[20120101 00:00:00.000]
2|"John"|"L2"|"Germany"|21|2000|2.45|[20120501 00:00:00.000]
3|"Lance"|"L3"|"France"|25|3000|5.786|[20121202 00:00:00.000]
4|"Gangas"|"L4"|""|30|4000|5.44444|[20000120 00:00:00.000]
5|"Hilary"|"L5"|"Canada"|35|5000|3.21111|[19970315 00:00:00.000]
6|"Mark"|"L6"|"South Africa"|40|60000|100.11|[19910707 00:00:00.000]
7|"Derek"|"L7"|"Brazil"|45|100000|500|[20000405 00:00:00.000]
8|"Jo"|"L8"|"Chile"|50|500000|50.01|[19800808 00:00:00.000]
9|"Wane"|"送り仮名"|"Mexico"|55|1000000|1123.45|[20200209 00:00:00.000]
""|"Derek"|"L7"|"Brazil"|45|100000|500|""
""|""|"L7"|"Brazil"|45|100000|""|""
12|"Derek"|"L7"|"Brazil"|45|""|500|[20000405 00:00:00.000]
13|"Derek"|"L7"|""|45|100000|500|[20000405 00:00:00.000]
You can use attached file with this command :
$:1> excel2csv -c 'utf-8' -d '|' -l 'unix' -DQ 'square-bracket' -ED false -s Excel-Test.xls
"ID"|"First Name"|"Last Name"|"Country"|"Age"|"Salary"|"Hourly"|"Hire Date"
1|"Bob"|"送り仮名"|"USA"|20|1000|1.23|[20120101 00:00:00.000]
2|"John"|"L2"|"Germany"|21|2000|2.45|[20120501 00:00:00.000]
3|"Lance"|"L3"|"France"|25|3000|5.786|[20121202 00:00:00.000]
4|"Gangas"|"L4"|""|30|4000|5.44444|[20000120 00:00:00.000]
5|"Hilary"|"L5"|"Canada"|35|5000|3.21111|[19970315 00:00:00.000]
6|"Mark"|"L6"|"South Africa"|40|60000|100.11|[19910707 00:00:00.000]
7|"Derek"|"L7"|"Brazil"|45|100000|500|[20000405 00:00:00.000]
8|"Jo"|"L8"|"Chile"|50|500000|50.01|[19800808 00:00:00.000]
9|"Wane"|"送り仮名"|"Mexico"|55|1000000|1123.45|[20200209 00:00:00.000]
""|"Derek"|"L7"|"Brazil"|45|100000|500|""
""|""|"L7"|"Brazil"|45|100000|""|""
12|"Derek"|"L7"|"Brazil"|45|""|500|[20000405 00:00:00.000]
13|"Derek"|"L7"|""|45|100000|500|[20000405 00:00:00.000]
I did some test and looked into Aspose APIs. It seems to me Aspose's Iterator APIs always skip blank cells which implies if you have a blank row setting between 2 non-blank rows, then that blank row won't be included in the row iterator. If you agree the blank row described above should be included in the output then your change did not address this problem.
Attached Excel file, test-7832.xlsx, would demonstrate the 'blank-row' problem. If Case (2) below is the expected output, then we need to use other Aspose APIs other than Iterator to locate the cell that we are interested. I think we can replace the outer row loop with something like:
int maxDataRow = wksheet.getCells().getMaxDataRow(); // maxDataRow is an index, zero based
for( int i=0; i<=maxDataRow; i++ ) {
// Row row = (Row)rowIterator.next();
Row row = rows.get(i);
....
}
This is how I generate Case (2) output below.
Case (1) - using Apsose Iterator APIs:
--------------------------------------
> excel2csv test-7832.xlsx -s -SQ none
ID,First Name,Last Name,Country,Age,Salary,Hourly,Hire Date
1,Bob,????,USA,20,1000,1.23,[20120101 00:00:00.000]
ID
First Name,,,,,,
,Last Name
,,Country
,,,Age
,,,,Salary
,,,,,Hourly
,,,,,,Hire Date
,,,,,,
ID,,,Age
First Name,,,,Salary
,Last Name,,,,Hourly
,,Country,,,,Hire Date
ID,Last Name,,Age
First Name,,Country,,Salary
,Last Name,,Age,,Hourly
,,Country,,Salary,,Hire Date
Case (2) - should this be the expected result?
----------------------------------------------
,,,,,,,
ID,First Name,Last Name,Country,Age,Salary,Hourly,Hire Date
1,Bob,????,USA,20,1000,1.23,[20120101 00:00:00.000]
,,,,,,,
ID,,,,,,,
,First Name,,,,,,
,,Last Name,,,,,
,,,Country,,,,
,,,,Age,,,
,,,,,Salary,,
,,,,,,Hourly,
,,,,,,,Hire Date
,,,,,,,
ID,,,,Age,,,
,First Name,,,,Salary,,
,,Last Name,,,,Hourly,
,,,Country,,,,Hire Date
,,,,,,,
,,,,,,,
ID,,Last Name,,Age,,,
,,,,,,,
,First Name,,Country,,Salary,,
,,,,,,,
,,Last Name,,Age,,Hourly,
,,,,,,,
,,,Country,,Salary,,Hire Date
I did some test and looked into Aspose APIs. It seems to me Aspose's Iterator APIs always skip blank cells which implies if you have a blank row setting between 2 non-blank rows, then that blank row won't be included in the row iterator. If you agree the blank row described above should be included in the output then your change did not address this problem.
Attached Excel file, test-7832.xlsx, would demonstrate the 'blank-row' problem. If Case (2) below is the expected output, then we need to use other Aspose APIs other than Iterator to locate the cell that we are interested. I think we can replace the outer row loop with something like:
int maxDataRow = wksheet.getCells().getMaxDataRow(); // maxDataRow is an index, zero based
for( int i=0; i<=maxDataRow; i++ ) {
// Row row = (Row)rowIterator.next();
Row row = rows.get(i);
....
}
This is how I generate Case (2) output below.
Case (1) - using Apsose Iterator APIs:
--------------------------------------
> excel2csv test-7832.xlsx -s -SQ none
ID,First Name,Last Name,Country,Age,Salary,Hourly,Hire Date
1,Bob,????,USA,20,1000,1.23,[20120101 00:00:00.000]
ID
First Name,,,,,,
,Last Name
,,Country
,,,Age
,,,,Salary
,,,,,Hourly
,,,,,,Hire Date
,,,,,,
ID,,,Age
First Name,,,,Salary
,Last Name,,,,Hourly
,,Country,,,,Hire Date
ID,Last Name,,Age
First Name,,Country,,Salary
,Last Name,,Age,,Hourly
,,Country,,Salary,,Hire Date
Case (2) - should this be the expected result?
----------------------------------------------
,,,,,,,
ID,First Name,Last Name,Country,Age,Salary,Hourly,Hire Date
1,Bob,????,USA,20,1000,1.23,[20120101 00:00:00.000]
,,,,,,,
ID,,,,,,,
,First Name,,,,,,
,,Last Name,,,,,
,,,Country,,,,
,,,,Age,,,
,,,,,Salary,,
,,,,,,Hourly,
,,,,,,,Hire Date
,,,,,,,
ID,,,,Age,,,
,First Name,,,,Salary,,
,,Last Name,,,,Hourly,
,,,Country,,,,Hire Date
,,,,,,,
,,,,,,,
ID,,Last Name,,Age,,,
,,,,,,,
,First Name,,Country,,Salary,,
,,,,,,,
,,Last Name,,Age,,Hourly,
,,,,,,,
,,,Country,,Salary,,Hire Date
One thing I forgot to mention is SVN #29858 cannot handle the Excel file similar to the one I attached last time (test-7832.xlsx); SVN #29858 would generate rows with different number of columns which will cause csv2json and csv2xml throw exception.
We probably should replace Iterator and #29858 with other Aspose APIs:
wksheet.getCells().getMaxDataColumn();
Cell cell = Row.get(columnIndex);
One thing I forgot to mention is SVN #29858 cannot handle the Excel file similar to the one I attached last time (test-7832.xlsx); SVN #29858 would generate rows with different number of columns which will cause csv2json and csv2xml throw exception.
We probably should replace Iterator and #29858 with other Aspose APIs:
wksheet.getCells().getMaxDataColumn();
Cell cell = Row.get(columnIndex);
> SVN #29858 would generate rows with different number of columns which will cause csv2json and csv2xml throw exception.
Fixed, all of generated CSV rows should have the same number of columns, SVN r29915. Everything else remains unchanged.
> SVN #29858 would generate rows with different number of columns which will cause csv2json and csv2xml throw exception.
Fixed, all of generated CSV rows should have the same number of columns, SVN r29915. Everything else remains unchanged.
Issue #7831 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build trunk/29915 |
No time estimate |
I've checked in a fix as SVN #29858 for ExcelTransformer.writeWorksheetToCSV(). Fung just needs to code review.