Observed on ADS 14-beta-64.
Trying to import live client data into mongodb using ADS import data tool, I observed it is too slow compared to 'mongoimport' utility. To import 1,938,446 docs, ADS import tool took 3h 22 m (Pls refer to attached insert_perf.png) whereas mongoimport utility took about 28 mins. (Pls refer mongoimport.png)
ADS import tool import rate was 160 rows/s whereas mongoimport rate was avg of 1144 rows/s.
Could you please check if performance of Import tool can be improved?
The detailed setup of experiment was :
1) A TSV (tab delimited file) with 1,938,446 documents (each document contains 52 fields) was to be loaded into a mongo collection. This file and ADS was on a machine running Ubuntu 10.04 with Core 2 duo (2GHz), 2GB RAM.
2) mongodb server was running on separate machine (Ubuntu 12.04 64bit, i3 1.4 GHz, 4 GB RAM) in the same local network.
For mongoimport utility, following options were used:
mongoimport --host 192.168.123.102 --port 27017 --db test --collection impor --fieldFile /home/ravi/expt/headers --type tsv --file /home/ravi/expt/chr22.txt --headerline
|
83 KB
|
134 KB
MongoSQL supports two different INSERT syntax:
INSERT [INTO] collection_name
VALUES (json_document)
INSERT [INTO] collection_name (field1, field2 …)
VALUES (value1, value2 …)
I did a comparison between the two syntax. I submitted 10,000 of the following INSERT statements.
INSERT INTO "baseball" VALUES( { "teamName":"Cubs", "city":"Chicago", "division":"NL Central", "ranking":5, "managerName":{ "first":"Dale", "last":"Sveum"}, "colors":["blue","white"], "worldChampionships":2, "stats":[{"year":2010, "wins":75, "losses":87, "winPercentage":0.463}, {"year":2011, "wins":71, "losses":91, "winPercentage":0.438}, {"year":2012, "wins":61, "losses":101, "winPercentage":0.377}]}) GO INSERT INTO "baseball" ("teamName", "city", "division", "ranking", "managerName.first", "managerName.last", "colors", "worldChampionships", "stats") VALUES('Cubs', 'Chicago', 'NL Central', 5, 'Dale', 'Sveum', [ 'blue', 'white' ], 2, [ { "year" : 2010, "wins" : 75, "losses" : 87, "winPercentage" : 0.463 }, { "year" : 2011, "wins" : 71, "losses" : 91, "winPercentage" : 0.438 }, { "year" : 2012, "wins" : 61, "losses" : 101, "winPercentage" : 0.377 } ]) GO
- Using the JSON document syntax, it took around 20 seconds to insert 10,000 documents.
- Using the standard SQL syntax, it took around 45 seconds to insert 10,000 documents.
Currently, the ADS Import Tool uses the second syntax (which is standard SQL). However, we pay the performance penalty of parsing the INSERT statement and forming the correct document to pass to the Mongo Java API.
We can improve the performance of the Import Tool by generating INSERT statement using the JSON document.
Issue #9628 |
Closed |
Completion |
No due date |
No fixed build |
No time estimate |
2 issue links |
relates to #9724
Issue #9724Performance metrics of ADS data import/export compared to mongoimport/mongoexport utility |
relates to #8740
Issue #8740Import/Export enhancement for MongoDB |
MongoSQL supports two different INSERT syntax:
INSERT [INTO] collection_name
VALUES (json_document)
INSERT [INTO] collection_name (field1, field2 …)
VALUES (value1, value2 …)
I did a comparison between the two syntax. I submitted 10,000 of the following INSERT statements.
- Using the JSON document syntax, it took around 20 seconds to insert 10,000 documents.
- Using the standard SQL syntax, it took around 45 seconds to insert 10,000 documents.
Currently, the ADS Import Tool uses the second syntax (which is standard SQL). However, we pay the performance penalty of parsing the INSERT statement and forming the correct document to pass to the Mongo Java API.
We can improve the performance of the Import Tool by generating INSERT statement using the JSON document.