I am continuing to prototype ETL, in this case generating either a CSV or a tab delimited text file. However, I cannot get the Quoted Identifier to generate. Is there something I'm missing or is there a bug in the version I'm using?
See the trivial POC script attached as well as the resultant output file and the information about the source table used.
![]() |
2 KB
![]() |
75 B
![]() |
659 B
![]() |
1 KB
|
165 KB
The two AQTextProcessor#setQuoteIdentifier APIs double-escape the quote identifier found in the cell value. For example, if the cell value is "ABC's" and the AQTextProcessor#setQuoteIdentifier("'") is used, the implementation will search for the character "'" and double-escapes it - making formatted value to be "ABC''s".
Perhaps I asked my question the wrong way. As the attached output file demonstrates, I am not getting a standard looking CSV file where the textual values are quoted. In my demo, none of the actual cell values have any quotes in them. I am in this question not so concerned with escaping of quotes as the fact that NO quotes are generated in the output file, which makes this a non-conforming CSV file.
Perhaps I asked my question the wrong way. As the attached output file demonstrates, I am not getting a standard looking CSV file where the textual values are quoted. In my demo, none of the actual cell values have any quotes in them. I am in this question not so concerned with escaping of quotes as the fact that NO quotes are generated in the output file, which makes this a non-conforming CSV file.
See my additional comment about NO quotes appearing in the output file.
See my additional comment about NO quotes appearing in the output file.
@bobfromtn:
Quote identifier is generated based on column's data type. ADS handles 2 kinds of quote identifiers:
(1) String quote identifier - this is applied to character data types, e.g. text, varchar. String quote identifier can be set by setQuoteIdentifier().
(2) Date quote identifier - this is applied to date/time data types, e.g. date, timestamp. Date quote identifier can be set by setDateQuoteIdentifier() or setOpenDateQuoteIdentifier()/setCloseDateQuoteIdentifier().
In order to correctly generate quote identifier for columns, writer needs to have access to columns metadata where column data type definitions are stored.
In the current Open API implementation, columns metadata is only stored in AQDataSet object, and is not available in AQDataRow object.
In your script, you iterate through AQDataSet and call file.write(AQDataRow)
row by row. This will not get desired quote identifiers generated due to writer does not have access to columns metadata and hence writer won't be able to determine the data type for each column, and therefore won't be able to generate quote identifier for columns when needed.
To resolve the problem, you should modify your script and replace the entire while
loop by file.write(AQDataSet)
, this should give your the expected results in output file. please see this attachment for details.
@bobfromtn:
Quote identifier is generated based on column's data type. ADS handles 2 kinds of quote identifiers:
(1) String quote identifier - this is applied to character data types, e.g. text, varchar. String quote identifier can be set by setQuoteIdentifier().
(2) Date quote identifier - this is applied to date/time data types, e.g. date, timestamp. Date quote identifier can be set by setDateQuoteIdentifier() or setOpenDateQuoteIdentifier()/setCloseDateQuoteIdentifier().
In order to correctly generate quote identifier for columns, writer needs to have access to columns metadata where column data type definitions are stored.
In the current Open API implementation, columns metadata is only stored in AQDataSet object, and is not available in AQDataRow object.
In your script, you iterate through AQDataSet and call file.write(AQDataRow)
row by row. This will not get desired quote identifiers generated due to writer does not have access to columns metadata and hence writer won't be able to determine the data type for each column, and therefore won't be able to generate quote identifier for columns when needed.
To resolve the problem, you should modify your script and replace the entire while
loop by file.write(AQDataSet)
, this should give your the expected results in output file. please see this attachment for details.
Thanks for this explanation and the example with submitting a resultset via the .write signature.
In my actual use case, I'm reading a source resultset stream and reading a target resultset stream. Both of these are compared within nested loops so that I can determine whether a particular source row is an insert, update (for which I generate an update statement) or no action (no data change).
An edge case of my ETL prototype is of course that my target replication table is empty, in which case I potentially can switch to the style suggested in your post, since that use case would be 100% inserts.
In the more typical use case, where I have a mix of inserts and updates, I looked at writing inserts to a file and submitting these in another way, since I ran into a performance issue with the data writer. Since the import tool is not available programmatically from AquaScript, I was starting to explore FTPing the resultant file to my DB server and using PostgreSQL Copy command for the actual import. With the performance enhancements on data writer, once I receive those, I will retry with that modality.
While the explanation above makes it clear why my code did not perform as expected, it is not obvious from the API signatures that I would receive the behavior I did, nor is it noted in the API doc that delimiters and other stand-alone setter APIs will not have effect for the .write signature that accepts a data row. Perhaps at a minimum, the doc should be updated to note this limitation?
However, in some use cases such as my mainstream use case dictate that I do not want the entire resultset or datawriter stream to be output to my output file stream. To better support such requirements, I might suggest changing the signature of .write to be something like .write(AQDataRow, AQColumnStructure) if that might be adequate for achieving the desired file output behavior.
Thanks for this explanation and the example with submitting a resultset via the .write signature.
In my actual use case, I'm reading a source resultset stream and reading a target resultset stream. Both of these are compared within nested loops so that I can determine whether a particular source row is an insert, update (for which I generate an update statement) or no action (no data change).
An edge case of my ETL prototype is of course that my target replication table is empty, in which case I potentially can switch to the style suggested in your post, since that use case would be 100% inserts.
In the more typical use case, where I have a mix of inserts and updates, I looked at writing inserts to a file and submitting these in another way, since I ran into a performance issue with the data writer. Since the import tool is not available programmatically from AquaScript, I was starting to explore FTPing the resultant file to my DB server and using PostgreSQL Copy command for the actual import. With the performance enhancements on data writer, once I receive those, I will retry with that modality.
While the explanation above makes it clear why my code did not perform as expected, it is not obvious from the API signatures that I would receive the behavior I did, nor is it noted in the API doc that delimiters and other stand-alone setter APIs will not have effect for the .write signature that accepts a data row. Perhaps at a minimum, the doc should be updated to note this limitation?
However, in some use cases such as my mainstream use case dictate that I do not want the entire resultset or datawriter stream to be output to my output file stream. To better support such requirements, I might suggest changing the signature of .write to be something like .write(AQDataRow, AQColumnStructure) if that might be adequate for achieving the desired file output behavior.
See my last comment for a request to at minimum update doc and preferably to consider a signature enhancement on AQFile
See my last comment for a request to at minimum update doc and preferably to consider a signature enhancement on AQFile
@bobfromtn:
We can add the following method to AQDataWriter:
setColumnStructure(AQColumnStructure columnStructure);
so that you can use it to bind an instance of AQColumnStructure to a writer.
We also can add the following method to AQDataWriter:
writeColumnNames();
so that you can use it to generate the column names when needed.
By providing the above 2 methods, in your script, you just need to add 2 more lines (shown below) in front of the while
loop to generate the results that you were looking for.
file.setColumnStructure(rs.getColumnStructure());
file.writeColumnNames();
Will the approach suggested above work for you? If this is all you need, we can enhance Open API and provide you a patch next week.
@bobfromtn:
We can add the following method to AQDataWriter:
setColumnStructure(AQColumnStructure columnStructure);
so that you can use it to bind an instance of AQColumnStructure to a writer.
We also can add the following method to AQDataWriter:
writeColumnNames();
so that you can use it to generate the column names when needed.
By providing the above 2 methods, in your script, you just need to add 2 more lines (shown below) in front of the while
loop to generate the results that you were looking for.
file.setColumnStructure(rs.getColumnStructure());
file.writeColumnNames();
Will the approach suggested above work for you? If this is all you need, we can enhance Open API and provide you a patch next week.
Yes, your proposal in the above comment seems workable. Thanks!
Yes, your proposal in the above comment seems workable. Thanks!
SVN r55486/ADS 18.0.18-6
SVN r55487/ADS 19.0.0-beta-40
Added the following 2 methods to AQDataWriter
as described here:
setColumnStructure(AQColumnStructure columnStructure);
writeColumnNames();
SVN r55486/ADS 18.0.18-6
SVN r55487/ADS 19.0.0-beta-40
Added the following 2 methods to AQDataWriter
as described here:
setColumnStructure(AQColumnStructure columnStructure);
writeColumnNames();
Can I please get a link to the build with this fix included?
Can I please get a link to the build with this fix included?
Hi Bob,
You can download and install the latest V18 updates from our website here.
If you are looking for the specific patch.
Hi Bob,
You can download and install the latest V18 updates from our website here.
If you are looking for the specific patch.
Issue #15439 |
Resolved |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build ADS 18.0.18-6, ADS 19.0.0-beta-40, ADS 18.0.19 |
No time estimate |
The two AQTextProcessor#setQuoteIdentifier APIs double-escape the quote identifier found in the cell value. For example, if the cell value is "ABC's" and the AQTextProcessor#setQuoteIdentifier("'") is used, the implementation will search for the character "'" and double-escapes it - making formatted value to be "ABC''s".