I have a draft script that serves an ETL purpose of reading rows from a source and writing to a target. The script took over 60 seconds to insert 10k rows (of only one column). My real script runs around 20 minutes to insert 170k rows, which obviously is not acceptable for a production system. Is this the expected/normal performance of ADS datawriter on inserts? (I hope something has been missed.)
My first effort was to move to the attached draft script that strips out my abstract, data-driven logic in favor of direct and simple coding to transfer the source value to target datarow. Next I tried to address performance by turning off auto-commit, which I believe I now have success with (previously, I could see rows in the table from another session; now in this draft where it will not commit until the end, I do not see table rows from another session while the script runs). I also monitored the sessions while the script was running and did not see the transaction renewing, when previously I did.
However, this simplified attached draft script is still taking over 60 seconds:
The table itself is trivial, without even a primary key. It is truncated before running the script.
CREATE TABLE nsrep.keylist_customers ( customer_id bigint NOT NULL, date_last_modified timestamp NULL ) GO
![]() |
24 KB
![]() |
1 KB
|
52 KB
|
72 KB
Love you guy's responsiveness and grateful for your help!! OK, you test was insightful. I modified my real script that reads about 170k source rows to write to a csv file instead of doing DB writes. Then I manually used the import tool to dump that data, with batch size of 1000 into the DB. My results:
a) just 74 seconds to read source rows, run all my logic in javascript and create the CSV!!
b) The import tool had Elapsed time of 6s and Inserted 171,085 rows. Avg Row Time 0.04 mills with 24902 rows/sec!!
Love you guy's responsiveness and grateful for your help!!
Love you guy's responsiveness and grateful for your help!! OK, you test was insightful. I modified my real script that reads about 170k source rows to write to a csv file instead of doing DB writes. Then I manually used the import tool to dump that data, with batch size of 1000 into the DB. My results:
a) just 74 seconds to read source rows, run all my logic in javascript and create the CSV!!
b) The import tool had Elapsed time of 6s and Inserted 171,085 rows. Avg Row Time 0.04 mills with 24902 rows/sec!!
Love you guy's responsiveness and grateful for your help!!
From Tom:
>>1) Can the import tool be driven programmatically from Aquascript?
No. However, I would recommend using FluidShell. To invoke FluidShell, right click on your registered server & choose "FluidShell". Overview. FluidShell provides a sqlimport command and should provide the same performance as the Import Wizard. Example syntax:
sqlimport -d tom4 -s public bistudio_example C:/Users/tom/Projects/data/tom3.public.bistudio_example.csv -TE Batch -TS 1000
From FluidShell you can also invoke an AquaScript using the source command:
source "C:/Users/tom/Projects/Database Schema and Data Exporter3/AquaScripts/Database Schema and Data Exporter.xjs"
For Export, you have the option of using AquaScript, our Export UI (which allows for a custom SQL statement) or FluidShell's sqlexport command.
>>2) The full source table is unfortunately fat, with just over 700 columns. I know Excel itself will only handle 255 or so ...
This is strictly an Excel limitation and not a limitation in ADS. If you choose to import a CSV file with more than 255 columns, that should work fine in ADS. Our Batch import & export functionality, either through UI or through FluidShell, are designed to be highly optimized and very memory efficient.
>>3) Is there any chance that whatever magic you guys do in the import tool could become available without an intermediary file....
This seems like an ideal use case to use FluidShell. On this page, look at the Example "Extract Data from Oracle and Load into SQL Server".
From Tom:
>>1) Can the import tool be driven programmatically from Aquascript?
No. However, I would recommend using FluidShell. To invoke FluidShell, right click on your registered server & choose "FluidShell". Overview. FluidShell provides a sqlimport command and should provide the same performance as the Import Wizard. Example syntax:
sqlimport -d tom4 -s public bistudio_example C:/Users/tom/Projects/data/tom3.public.bistudio_example.csv -TE Batch -TS 1000
From FluidShell you can also invoke an AquaScript using the source command:
source "C:/Users/tom/Projects/Database Schema and Data Exporter3/AquaScripts/Database Schema and Data Exporter.xjs"
For Export, you have the option of using AquaScript, our Export UI (which allows for a custom SQL statement) or FluidShell's sqlexport command.
>>2) The full source table is unfortunately fat, with just over 700 columns. I know Excel itself will only handle 255 or so ...
This is strictly an Excel limitation and not a limitation in ADS. If you choose to import a CSV file with more than 255 columns, that should work fine in ADS. Our Batch import & export functionality, either through UI or through FluidShell, are designed to be highly optimized and very memory efficient.
>>3) Is there any chance that whatever magic you guys do in the import tool could become available without an intermediary file....
This seems like an ideal use case to use FluidShell. On this page, look at the Example "Extract Data from Oracle and Load into SQL Server".
Essentially, I'm creating a proof-of-concept/prototype of ETL with Aquascript. If all of this pans out, there might be the possibility that my company would adopt Aqua Data Server. If we had Aqua Data Server, then we would probably be looking for it to do some scheduling, invoking various aqua scripts.
A couple things I have not stated thus far: a) our cloud system, NetSuite, allows user customization of its tables, which means that tables and columns in the source system, to some extent, can be added, dropped and renamed. b) I'm seeking to handle this (unfortunate) mutability via data-driven logic based in Postgres, where I also export from NetSuite a data dictionary of the current source structure. Thus, I am dynamically generating various SQL statements. I've got this working in a combination of Aqua Script and Postgres functions. I am less confident of getting this level of dynamic behavior with FluidShell (and it has been 20 years since I've done unix shell scripting, which is also not as common a skill in our shop cf to javascript) c) While javascript is adopted in our company stack, it is a bit of a sell to adopt Aqua Script and Aqua Data Server where javascript is a bit "non-standard" in that it is single-threaded and not oriented around callbacks and one needs to learn the OpenAPIs too. I'm really hoping not to add another tech to the mix in FluidShell, at least not if I can avoid it.
Given your comments / suggestions, here are further questions:
1) You mention FluidShell can invoke aquascript. Can aquascript invoke FluidShell, setting command line parameters as needed?
2) Can Aqua Data Server invoke FluidShell in a scheduled manner?
Essentially, I'd prefer an approach where the ADServer invokes an aqua script that then drives a set of work, including invoking FluidShell for an import if needed, rather than scheduling FluidShell at the OS level and having it invoke aquascript. I'm not quite sure where ADServer would even fit in in that last scenario.
Since we are talking about which kind of thing can invoke another kind of thing... As I've been contemplating how this might all work using Aqua Data Server (and I have not had time to try that product yet and doc is somewhat limited - I've read everything on the website), I have some further questions:
3) Does Aqua Data Server provide any standard logging? If an Aquascript runs in the context of Aqua Data Server, do any aqua.console.println() statements automatically start going into the server logging system or would different code need to be written or would we need to create our own separate logging approach entirely?
4) I have not gotten the full picture of how I might package a more complete system up. Essentially, there are a set of actions that would need to be coordinated in a certain sequence. I'm guessing that shared javascript/aquascript functions could be put into a common file that is imported by various scripts? Can an aqua script invoke another aqua script, especially passing in parameters and/or receiving a value back from the child aqua script? Since there is no multi-threading, I'm guessing if a parent aqua script did programatically invoke another "child" aqua script, then execution in the parent is blocked until the child finishes or throws an uncaught exception?
5) I'm hoping that an aqua script could be invoked interactively from ADServer with a user providing parameter input to the aqua script. It looks like there is a "workspace" in which some kind of form can be created and maybe that supports this scenario? Any examples of this?
6) I imaging we might have other systems want to make web service/rest calls to ADServer, which may invoke an aquascript. The caller might pass parameters that go to the aquascript and then caller can receive a payload back. I've seen various indications that this may be possible, including a URI construction that shows a path including a project and also a concept of some specific aqua script being the default script for a project. However it isn't fully clear how all of this would work.
Essentially, I'm creating a proof-of-concept/prototype of ETL with Aquascript. If all of this pans out, there might be the possibility that my company would adopt Aqua Data Server. If we had Aqua Data Server, then we would probably be looking for it to do some scheduling, invoking various aqua scripts.
A couple things I have not stated thus far: a) our cloud system, NetSuite, allows user customization of its tables, which means that tables and columns in the source system, to some extent, can be added, dropped and renamed. b) I'm seeking to handle this (unfortunate) mutability via data-driven logic based in Postgres, where I also export from NetSuite a data dictionary of the current source structure. Thus, I am dynamically generating various SQL statements. I've got this working in a combination of Aqua Script and Postgres functions. I am less confident of getting this level of dynamic behavior with FluidShell (and it has been 20 years since I've done unix shell scripting, which is also not as common a skill in our shop cf to javascript) c) While javascript is adopted in our company stack, it is a bit of a sell to adopt Aqua Script and Aqua Data Server where javascript is a bit "non-standard" in that it is single-threaded and not oriented around callbacks and one needs to learn the OpenAPIs too. I'm really hoping not to add another tech to the mix in FluidShell, at least not if I can avoid it.
Given your comments / suggestions, here are further questions:
1) You mention FluidShell can invoke aquascript. Can aquascript invoke FluidShell, setting command line parameters as needed?
2) Can Aqua Data Server invoke FluidShell in a scheduled manner?
Essentially, I'd prefer an approach where the ADServer invokes an aqua script that then drives a set of work, including invoking FluidShell for an import if needed, rather than scheduling FluidShell at the OS level and having it invoke aquascript. I'm not quite sure where ADServer would even fit in in that last scenario.
Since we are talking about which kind of thing can invoke another kind of thing... As I've been contemplating how this might all work using Aqua Data Server (and I have not had time to try that product yet and doc is somewhat limited - I've read everything on the website), I have some further questions:
3) Does Aqua Data Server provide any standard logging? If an Aquascript runs in the context of Aqua Data Server, do any aqua.console.println() statements automatically start going into the server logging system or would different code need to be written or would we need to create our own separate logging approach entirely?
4) I have not gotten the full picture of how I might package a more complete system up. Essentially, there are a set of actions that would need to be coordinated in a certain sequence. I'm guessing that shared javascript/aquascript functions could be put into a common file that is imported by various scripts? Can an aqua script invoke another aqua script, especially passing in parameters and/or receiving a value back from the child aqua script? Since there is no multi-threading, I'm guessing if a parent aqua script did programatically invoke another "child" aqua script, then execution in the parent is blocked until the child finishes or throws an uncaught exception?
5) I'm hoping that an aqua script could be invoked interactively from ADServer with a user providing parameter input to the aqua script. It looks like there is a "workspace" in which some kind of form can be created and maybe that supports this scenario? Any examples of this?
6) I imaging we might have other systems want to make web service/rest calls to ADServer, which may invoke an aquascript. The caller might pass parameters that go to the aquascript and then caller can receive a payload back. I've seen various indications that this may be possible, including a URI construction that shows a path including a project and also a concept of some specific aqua script being the default script for a project. However it isn't fully clear how all of this would work.
Hi Bob,
>>Essentially, I'd prefer an approach where the ADServer invokes an aqua script that then drives a set of work
The below answers focus on how you can accomplish the task using AquaScript.
>>3) do any aqua.console.println() statements automatically start going into the server logging system
If you use aqua.console.println() and execute an AquaScript inside the AquaScript tab, there is a "Console Output" section where these messages are displayed. If you execute the AquaScript via Scheduled Job, then the output can be viewing by clicking on the Job's Detail's results - see LoggingOutput.png. If you find this is insufficient, you can use the default java.util.logging framework from AquaScript:
importPackage(java.util.logging);
var fileHandler = new FileHandler('app.log'); var logger = Logger.getLogger("app.example.com"); fileHandler.setFormatter(new SimpleFormatter()); fileHandler.setLevel(Level.ALL); logger.addHandler(fileHandler); logger.log(Level.INFO, "Starting app..."); // application logic logger.log(Level.INFO, "Exiting app...");
>> 4) I'm guessing that shared javascript/aquascript functions could be put into a common file that is imported by various scripts? Can an aqua script invoke another aqua script, especially passing in parameters and/or receiving a value back from the child aqua script?
Yes to both. The runScript API is used to accomplish this.
// START Import Aquascript.xjs
var util = aqua.project.getAquaScript("util.xjs"); aqua.system.runScript(util, null, true); sayHello("Allen"); // END Import Aquascript.xjs
// START util.xjs
function sayHello(name) { print("hello: " + name); } // END util.xjs
>> I'm guessing if a parent aqua script did programatically invoke another "child" aqua script, then execution in the parent is blocked until the child finishes or throws an uncaught exception?
Correct.
>>5) & 6)
Yes, we could pass parameter as follows:
Take a look at our HelloWorld solution. These can be run inside ADStudio or ADServer. If run in ADServer, choose the "Debug in Browser" option in the AquaScript toolbar to see the HTML form.
We also provide api to call remote HTTP services:
var client = aqua.net.newWebClient(); var url = "https://login.example.com" + "?location=" + aqua.util.urlEncode("abcd@example.com"); var request = client.newWebRequest(url); // Currently ADS can POST data as key-value pairs or as file upload. request.addParameter('username', "test"); print("Request: " + request); var response = client.submitPostRequest(request); print("Response: " + response.getContent());
Hi Bob,
>>Essentially, I'd prefer an approach where the ADServer invokes an aqua script that then drives a set of work
The below answers focus on how you can accomplish the task using AquaScript.
>>3) do any aqua.console.println() statements automatically start going into the server logging system
If you use aqua.console.println() and execute an AquaScript inside the AquaScript tab, there is a "Console Output" section where these messages are displayed. If you execute the AquaScript via Scheduled Job, then the output can be viewing by clicking on the Job's Detail's results - see LoggingOutput.png. If you find this is insufficient, you can use the default java.util.logging framework from AquaScript:
importPackage(java.util.logging);
var fileHandler = new FileHandler('app.log'); var logger = Logger.getLogger("app.example.com"); fileHandler.setFormatter(new SimpleFormatter()); fileHandler.setLevel(Level.ALL); logger.addHandler(fileHandler); logger.log(Level.INFO, "Starting app..."); // application logic logger.log(Level.INFO, "Exiting app...");
>> 4) I'm guessing that shared javascript/aquascript functions could be put into a common file that is imported by various scripts? Can an aqua script invoke another aqua script, especially passing in parameters and/or receiving a value back from the child aqua script?
Yes to both. The runScript API is used to accomplish this.
// START Import Aquascript.xjs
var util = aqua.project.getAquaScript("util.xjs"); aqua.system.runScript(util, null, true); sayHello("Allen"); // END Import Aquascript.xjs
// START util.xjs
function sayHello(name) { print("hello: " + name); } // END util.xjs
>> I'm guessing if a parent aqua script did programatically invoke another "child" aqua script, then execution in the parent is blocked until the child finishes or throws an uncaught exception?
Correct.
>>5) & 6)
Yes, we could pass parameter as follows:
Take a look at our HelloWorld solution. These can be run inside ADStudio or ADServer. If run in ADServer, choose the "Debug in Browser" option in the AquaScript toolbar to see the HTML form.
We also provide api to call remote HTTP services:
var client = aqua.net.newWebClient(); var url = "https://login.example.com" + "?location=" + aqua.util.urlEncode("abcd@example.com"); var request = client.newWebRequest(url); // Currently ADS can POST data as key-value pairs or as file upload. request.addParameter('username', "test"); print("Request: " + request); var response = client.submitPostRequest(request); print("Response: " + response.getContent());
1) You mention FluidShell can invoke aquascript. Can aquascript invoke FluidShell, setting command line parameters as needed?
2) Can Aqua Data Server invoke FluidShell in a scheduled manner?
FluidShell is not supported inside of Aqua Data Server.
In Aqua Data Studio (ADS), FluidShell can be invoked directly via command line. In your [ADS_HOME] directory, you'll notice runfluid*.bat files. This also allows FluidShell to be programmatically invoke using the OS's native job scheduler. An AquaScript has the ability to invoke an external process using the runCommand API. Using this API, an AquaScript could invoke the runfluid*.bat file
1) You mention FluidShell can invoke aquascript. Can aquascript invoke FluidShell, setting command line parameters as needed?
2) Can Aqua Data Server invoke FluidShell in a scheduled manner?
FluidShell is not supported inside of Aqua Data Server.
In Aqua Data Studio (ADS), FluidShell can be invoked directly via command line. In your [ADS_HOME] directory, you'll notice runfluid*.bat files. This also allows FluidShell to be programmatically invoke using the OS's native job scheduler. An AquaScript has the ability to invoke an external process using the runCommand API. Using this API, an AquaScript could invoke the runfluid*.bat file
SVN r55470/ADS 18.0.18-3
SVN r55472/ADS 19.0.0-beta-40
Checkpoint: Improved performance of AQTableWriter.write(AQDataRow row)
Open API. After above check-in is applied, AquaScript's performance should be close to Import Tool's performance with Transaction Type set to FULL.
SVN r55470/ADS 18.0.18-3
SVN r55472/ADS 19.0.0-beta-40
Checkpoint: Improved performance of AQTableWriter.write(AQDataRow row)
Open API. After above check-in is applied, AquaScript's performance should be close to Import Tool's performance with Transaction Type set to FULL.
It is nice to know about this capability. In this case, my actual script reads a resultset stream from my source and simultaneously reads a resultset stream from my target. These are processed in nested loops to determine whether source rows are inserts, updates, or to be ignored (no changes). So, in my mainstream use case, I am not commiting the entire resultset to the output. Also, in my mainstream use case, there may be some data transformation here and there.
It is nice to know about this capability. In this case, my actual script reads a resultset stream from my source and simultaneously reads a resultset stream from my target. These are processed in nested loops to determine whether source rows are inserts, updates, or to be ignored (no changes). So, in my mainstream use case, I am not commiting the entire resultset to the output. Also, in my mainstream use case, there may be some data transformation here and there.
I note the patch available from funfun. Once that is made available to me, I'll retry and retime with that update.
I note the patch available from funfun. Once that is made available to me, I'll retry and retime with that update.
SVN r55476/ADS 18.0.18-4
SVN r55477/ADS 19.0.0-beta-40
Support batch processing in AquaScript on invocation of AQDataWriter.write(AQDataReader)
and AQDataWriter.write(AQDataSet)
.
AquaScript now supports batch processing on execution of AQDataWriter.write(AQDataReader)
and AQDataWriter.write(AQDataSet)
. Please note that batching processing is not applicable to AQDataWriter.write(AQDataRow)
.
The following APIs are added to AQTableWriter
:
public void setBatchSize(int size); public void setBatchTransactionType(); public void setFullTransactionType(); public int getBatchSize(); public boolean isBatchTransactionType(); public boolean isFullTransactionType();
Please see this screenshot for javadoc.
To enable batch processing, the following settings are required
writer.setBatchTransactionType();
writer.setBatchSize(N);
// where N > 0
prior calling writer.write(AQDataReader)
and writer.write(AQDataSet)
.
With batch processing enabled, on PostgreSQL, AquaScript's performance should be close to Import tool with transaction type set to BATCH.
SVN r55476/ADS 18.0.18-4
SVN r55477/ADS 19.0.0-beta-40
Support batch processing in AquaScript on invocation of AQDataWriter.write(AQDataReader)
and AQDataWriter.write(AQDataSet)
.
AquaScript now supports batch processing on execution of AQDataWriter.write(AQDataReader)
and AQDataWriter.write(AQDataSet)
. Please note that batching processing is not applicable to AQDataWriter.write(AQDataRow)
.
The following APIs are added to AQTableWriter
:
public void setBatchSize(int size); public void setBatchTransactionType(); public void setFullTransactionType(); public int getBatchSize(); public boolean isBatchTransactionType(); public boolean isFullTransactionType();
Please see this screenshot for javadoc.
To enable batch processing, the following settings are required
writer.setBatchTransactionType();
writer.setBatchSize(N);
// where N > 0
prior calling writer.write(AQDataReader)
and writer.write(AQDataSet)
.
With batch processing enabled, on PostgreSQL, AquaScript's performance should be close to Import tool with transaction type set to BATCH.
This is really encouraging. Do you think it might be possible to have the patch at least by Friday afternoon? I expect to do a good bit of coding on the weekend and likely on Friday as well. I'd love to try out these new capabilities.
This is really encouraging. Do you think it might be possible to have the patch at least by Friday afternoon? I expect to do a good bit of coding on the weekend and likely on Friday as well. I'd love to try out these new capabilities.
Hi Bob,
We will be testing the patch tomorrow. If it passes our testing, then we should be able to provide it to you by Friday.
Hi Bob,
We will be testing the patch tomorrow. If it passes our testing, then we should be able to provide it to you by Friday.
Hi Bob,
Patch uploaded:
Issue #15429 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build ADS 18.0.18-4, ADS 19.0.0-beta-40 |
No time estimate |