The \csv2* commands in ADStudio 12.0.0 have changed their behavior on binary files compared to ADStudio 12.0.0 RC 23.
In RC 23, on a binary file (in my test case an mp3 file) they were all displaying the message "conversion failed (binary file?): Invalid UTF-8 start byte 0xff (at char #49, byte #-1)
", however in the final release they behave as follows:
* \csv2excel bin_file.mp3 produces a bin_file.mp3 file with no usable content
* \csv2html bin_file.mp3 produces a bin_file.html file with no usable content
* \csv2json bin_file.mp3 prints "csv2json: 'tmp_fs_binary.mp3': conversion failed: 1
"
* \csv2xml bin_file.mp3 prints "csv2xml: 'tmp_fs_binary.mp3': conversion failed: Invalid column name: ID3&TPE1SoundJay.com Sound Effects���.
"
This is because we switched CSV reader from jackson.CsvMapper to auqafold.AquaCsvReader, requested by issue #7808.
"Invalid UTF-8 start byte 0xff (at char #49, byte #-1)" is generated by CsvMapper, csv2* commands then can perform a check against that specific exception and behave accordingly (i.e. wrapped it with '(binary file?)'. AquaCsvReader does not generate this sort of exception any more, that's why csv2* commands can go through on binary input now.
Errors from \csv2json and \csv2xml are generated by jackson JSON/XML generator. The root cause of 'conversion failed: 1' error is because AquaCsvReader generates a list of CSV rows with different number of columns. For xml error, that is self-explained.
I briefly reviewed AquaCsvReader code, I don't have a good solution at the moment.
It seems that Jackson has intelligence on the encoding of bytes to characters. The new CSVReader does not as it passes this to the Java java.io.Reader class. The java.io.Reader class doesn't seem to fail on a encoding missmatch and it just continues. It would be costly to try to resolve this, and I think the current behavior is acceptable.
It seems that Jackson has intelligence on the encoding of bytes to characters. The new CSVReader does not as it passes this to the Java java.io.Reader class. The java.io.Reader class doesn't seem to fail on a encoding missmatch and it just continues. It would be costly to try to resolve this, and I think the current behavior is acceptable.
Issue #7845 |
Closed |
Won't Fix |
Resolved |
Completion |
No due date |
No fixed build |
No time estimate |
1 issue link |
relates to #7909
Issue #7909\csv2html -s on binary file |
This is because we switched CSV reader from jackson.CsvMapper to auqafold.AquaCsvReader, requested by issue #7808.
"Invalid UTF-8 start byte 0xff (at char #49, byte #-1)" is generated by CsvMapper, csv2* commands then can perform a check against that specific exception and behave accordingly (i.e. wrapped it with '(binary file?)'. AquaCsvReader does not generate this sort of exception any more, that's why csv2* commands can go through on binary input now.
Errors from \csv2json and \csv2xml are generated by jackson JSON/XML generator. The root cause of 'conversion failed: 1' error is because AquaCsvReader generates a list of CSV rows with different number of columns. For xml error, that is self-explained.
I briefly reviewed AquaCsvReader code, I don't have a good solution at the moment.