Tested on ADS 14-alpha-16 on Ubuntu 10.04.
ADS is unable to connect to a secondary member when primary member of the replica set is down even after we set 'slaveOK=true' and 'readPreference=secondary' in driver settings. (Pls refer to the attached replset_conn_error.png)
Steps to reproduce this issue:
1) Configure and activate a 3 member replica set using following type of command
mongod --port 27017 --dbpath /home/ravi/replica/1 -replSet rs0 --auth
mongod --port 27018 --dbpath /home/ravi/replica/2 -replSet rs0 --auth
mongod --port 27020 --dbpath /home/ravi/replica/3 -replSet rs0 --auth
2) Now shutdown 2 of these mongod processes so that there is a single remaining mongod process serving as secondary. (Verified using rs.conf() from mongo shell)
3) Try to connect ADS to this replica set specifying replicaset server information as localhost:27017,localhost:27018,localhost:27020 and specifying parameters 'slaveOK=true' and 'readPreference=secondary' on Drivers tab of 'Register Server' window.
Actual result:
a) An error is thrown saying 'can't find a master' and 'unable to create socket connection on localhost:27017'
Expected result:
a) ADS should have connected to a last remaining secondary member for read operations as we have specified appropriate params in Drivers page.
(Through mongo shell we are able to connect to this secondary using following commands : mongo --port 27020 and then rs.slaveOk() )
|
158 KB
|
204 KB
|
254 KB
Pls user appropriate ports and dbpath information in above steps (I listed the settings i used in my setup)
The separator in the driver parameters is incorrect. Please try again using ; (semicolon) as the separator.
The separator in the driver parameters is incorrect. Please try again using ; (semicolon) as the separator.
Issue persists even after specifying Driver params correctly as ?slaveOK=true;readPreference=secondary
Issue persists even after specifying Driver params correctly as ?slaveOK=true;readPreference=secondary
From the screenshot it looks like Test Connection is giving a wrong impression to the user. This is because of the limitation of Test Connection only attempting a socket connection to the first host; and it does not try the second one.
In general, here is the algorithm for Test Connection
Step 1) Attempts a socket connection to user input - host, port;
if SUCCESS: proceed to Step 2)
if FAILURE: stop and show the error message to user
Step 2) Attempts a jdbc connection using the selected driver;
So in this case if the first host is unreachable it does not proceed with the jdbc connection. And based on your driver parameters jdbc connection should have been successful. Btw, hitting "Save" button would have given you the correct response as it tries jdbc connection first.
We will review the current behavior of Test Connection in case of multiple hosts and will keep you posted.
From the screenshot it looks like Test Connection is giving a wrong impression to the user. This is because of the limitation of Test Connection only attempting a socket connection to the first host; and it does not try the second one.
In general, here is the algorithm for Test Connection
Step 1) Attempts a socket connection to user input - host, port;
if SUCCESS: proceed to Step 2)
if FAILURE: stop and show the error message to user
Step 2) Attempts a jdbc connection using the selected driver;
So in this case if the first host is unreachable it does not proceed with the jdbc connection. And based on your driver parameters jdbc connection should have been successful. Btw, hitting "Save" button would have given you the correct response as it tries jdbc connection first.
We will review the current behavior of Test Connection in case of multiple hosts and will keep you posted.
Issue persists even after specifying Driver params correctly as ?slaveOK=true;readPreference=secondary
The JDBC driver use the listDatabases command after socket connection is established to see if the given database exists or not on the server. However, it seems that DB.getCommandReadPreference() method overrides user preferences (i.e. the readPreference property) and return custom values based on the command's name.
I think the slaveOK -> readPreference refactorization is incomplete on the Mongo Java Driver, because a workaround for this issue is to call the deprecated method mongo.slaveOk()
from MongoDriver.connect() . There is an open issue on the Java Driver which seems related to this.
I guess we have to use this workaround in MongoDriver (if one of the following condition is met: slaveOK=true or readPreference=[ secondary | secondaryPreferred ] ) until there is a fix on the upstream code for the Mongo Java Driver.
Issue persists even after specifying Driver params correctly as ?slaveOK=true;readPreference=secondary
The JDBC driver use the listDatabases command after socket connection is established to see if the given database exists or not on the server. However, it seems that DB.getCommandReadPreference() method overrides user preferences (i.e. the readPreference property) and return custom values based on the command's name.
I think the slaveOK -> readPreference refactorization is incomplete on the Mongo Java Driver, because a workaround for this issue is to call the deprecated method mongo.slaveOk()
from MongoDriver.connect() . There is an open issue on the Java Driver which seems related to this.
I guess we have to use this workaround in MongoDriver (if one of the following condition is met: slaveOK=true or readPreference=[ secondary | secondaryPreferred ] ) until there is a fix on the upstream code for the Mongo Java Driver.
I think this JIRA issue is more related to our issue. There were several similar problems encountered in the past (e.g. JAVA-535, JAVA-536 ) , so they've set up on JAVA-497 a list of commands that can be safely sent to secondary nodes. listDatabases is not among them (I don't know if this command has been omitted by mistake or it is not safe to use it on secondaries).
I didn't find what are the reasons why there is only a subset of commands can be sent to secondaries, but probably a reason is to avoid retrieving outdated information. My workaround explained on previous comment would cut off their split and all the commands will go to secondaries if readPreference is secondary or secondaryPreferred , which I think is not desirable.
So far, this workaround seems the only way to avoid the the user chosen readPreference from being overriden (for the listDatabases command) inside DB.getCommandReadPreference()
.
I think this JIRA issue is more related to our issue. There were several similar problems encountered in the past (e.g. JAVA-535, JAVA-536 ) , so they've set up on JAVA-497 a list of commands that can be safely sent to secondary nodes. listDatabases is not among them (I don't know if this command has been omitted by mistake or it is not safe to use it on secondaries).
I didn't find what are the reasons why there is only a subset of commands can be sent to secondaries, but probably a reason is to avoid retrieving outdated information. My workaround explained on previous comment would cut off their split and all the commands will go to secondaries if readPreference is secondary or secondaryPreferred , which I think is not desirable.
So far, this workaround seems the only way to avoid the the user chosen readPreference from being overriden (for the listDatabases command) inside DB.getCommandReadPreference()
.
Probably the split from issue JAVA-497 actually has the purpose of allowing only read-only commands to be run on secondaries (e.g. mapreduce with inline results is allowed, but mapreduce with output stored to a collection is not, because it writes data and thus is not a read-only command).
The listDatabases command is a read-only command. Ideally I would have to override the DB.getCommandReadPreference()
implementation and include it on the list of commands that can be safely sent to secondary nodes.
Probably the split from issue JAVA-497 actually has the purpose of allowing only read-only commands to be run on secondaries (e.g. mapreduce with inline results is allowed, but mapreduce with output stored to a collection is not, because it writes data and thus is not a read-only command).
The listDatabases command is a read-only command. Ideally I would have to override the DB.getCommandReadPreference()
implementation and include it on the list of commands that can be safely sent to secondary nodes.
I've succeeded to override the getCommandReadPreference()
by using extending the DB class and use instances of the new class everywhere, but along with listDatabases there are also other commands (such as whatsmyuri , buildInfo , create , renameCollection etc ) that need to be "white-listed" so that the readPreference option is not overwritten for them.
In order to keep the original behaviour (send only some of the commands to secondary nodes and the rest to primary one) without having to maintain a hard-coded list of commands, but still allowing users to use secondary nodes when the primary server is down, I propose to re-enable the slaveOK JDBC parameter. If this parameter is set, the mongo.slaveOk()
will get called (after I add this link on the MongoDriver class) and thus all commands will go to secondary nodes (the non-read-only commands will fail).
This way we give to the users the possibility to control the behaviour (although we'll rely on a deprecated method, but as long as the readPreference seems to be not yet fully implemented, I think this is ok (even the MongoDB team encourage its usage sometimes, see this comment ).
Sachin, do you agree reenabling this parameter?
I've succeeded to override the getCommandReadPreference()
by using extending the DB class and use instances of the new class everywhere, but along with listDatabases there are also other commands (such as whatsmyuri , buildInfo , create , renameCollection etc ) that need to be "white-listed" so that the readPreference option is not overwritten for them.
In order to keep the original behaviour (send only some of the commands to secondary nodes and the rest to primary one) without having to maintain a hard-coded list of commands, but still allowing users to use secondary nodes when the primary server is down, I propose to re-enable the slaveOK JDBC parameter. If this parameter is set, the mongo.slaveOk()
will get called (after I add this link on the MongoDriver class) and thus all commands will go to secondary nodes (the non-read-only commands will fail).
This way we give to the users the possibility to control the behaviour (although we'll rely on a deprecated method, but as long as the readPreference seems to be not yet fully implemented, I think this is ok (even the MongoDB team encourage its usage sometimes, see this comment ).
Sachin, do you agree reenabling this parameter?
Hi Emil - we're going to do some testing with ReplicaSets tomorrow and then I can make a decision on this. There are certain behaviors I want to test out first before deciding on this change.
Hi Emil - we're going to do some testing with ReplicaSets tomorrow and then I can make a decision on this. There are certain behaviors I want to test out first before deciding on this change.
We have done some research and testing with replica set today. This document explains the replica set election process.
http://docs.mongodb.org/manual/core/replication/#replica-set-elections
In Ravi's test case, there are 3 members in the replica set and 2 are down, leaving only one member in the replica set. In this scenario, the election process fails and there is no primary in the replica set. When this happens, a lot of the MongoDB commands fail (listDatabases, whatsmyuri, etc.). There is really nothing ADS can do as ADS requires those commands to work properly.
If only 1 member is down, a new primary will be selected. Regardless of the readPreference setting, ADS will be able to work with the new primary.
We have done some research and testing with replica set today. This document explains the replica set election process.
http://docs.mongodb.org/manual/core/replication/#replica-set-elections
In Ravi's test case, there are 3 members in the replica set and 2 are down, leaving only one member in the replica set. In this scenario, the election process fails and there is no primary in the replica set. When this happens, a lot of the MongoDB commands fail (listDatabases, whatsmyuri, etc.). There is really nothing ADS can do as ADS requires those commands to work properly.
If only 1 member is down, a new primary will be selected. Regardless of the readPreference setting, ADS will be able to work with the new primary.
Decision: remove mongo.slaveOk() and lets not override the getCommandReadPreference()
Decision: remove mongo.slaveOk() and lets not override the getCommandReadPreference()
Jenny, It's true the election process fails in the scenario when 2 of the 3 replica set members are down.
But using the mongo shell, we could connect and read from last remaining secondary. I thought the user should be able to do the same(i.e. read from the secondary) via ADS.
Thanks
Jenny, It's true the election process fails in the scenario when 2 of the 3 replica set members are down.
But using the mongo shell, we could connect and read from last remaining secondary. I thought the user should be able to do the same(i.e. read from the secondary) via ADS.
Thanks
In a situation where there is no primary, the replica set is not fully functional. The MongoDB Shell probably doesn't need much to work. However, ADS requires the MongoDB commands to be fully functional. For example, in order to populate the database drop-down list in the Query Analyzer window, ADS issues the command listDatabases which doesn't work with a secondary member.
In a situation where there is no primary, the replica set is not fully functional. The MongoDB Shell probably doesn't need much to work. However, ADS requires the MongoDB commands to be fully functional. For example, in order to populate the database drop-down list in the Query Analyzer window, ADS issues the command listDatabases which doesn't work with a secondary member.
I think there is a misunderstanding on this issue. Ravi was not referring to the election process of choosing a new primary server when the current one has got down. He was requesting the possibility to perform read operations on secondary nodes (when the primary server is not available but just secondary nodes).
The listDatabases , whatsmyuri etc commands used by ADS are not working on this scenario because the are not listed on the DB._obedientCommands
list (although these two commands are read-only commands and could have been included here). As a result of this, the DB.getCommandReadPreference()
method returns a "primary" read preference for these commands, no matter which was the user's choice for the readPreference parameter.
Jenny, if you change the MongoDriver class and add a mongo.slaveOK();
call inside the MogoDriver.connect() method, then you'll succeed to connect and query secondary nodes. When issuing statements that tries to create collections / insert / update data, a "This is not the master" error or something similar is thrown by the server (the secondary node).
I think there is a misunderstanding on this issue. Ravi was not referring to the election process of choosing a new primary server when the current one has got down. He was requesting the possibility to perform read operations on secondary nodes (when the primary server is not available but just secondary nodes).
The listDatabases , whatsmyuri etc commands used by ADS are not working on this scenario because the are not listed on the DB._obedientCommands
list (although these two commands are read-only commands and could have been included here). As a result of this, the DB.getCommandReadPreference()
method returns a "primary" read preference for these commands, no matter which was the user's choice for the readPreference parameter.
Jenny, if you change the MongoDriver class and add a mongo.slaveOK();
call inside the MogoDriver.connect() method, then you'll succeed to connect and query secondary nodes. When issuing statements that tries to create collections / insert / update data, a "This is not the master" error or something similar is thrown by the server (the secondary node).
Emil, you are right! mongo.slaveOK();
works like magic!
OK, here is the new decision. Let's add back the slaveOK JDBC parameter and default it to true. If slaveOK is true, then call mongo.slaveOK(). This gives the capability for the user to override it to false if they don't want us to call slaveOK().
Emil, you are right! mongo.slaveOK();
works like magic!
OK, here is the new decision. Let's add back the slaveOK JDBC parameter and default it to true. If slaveOK is true, then call mongo.slaveOK(). This gives the capability for the user to override it to false if they don't want us to call slaveOK().
Emil, you are right! mongo.slaveOK(); works like magic!
Yes, because this call actually sets the Bytes.QUERYOPTION_SLAVEOK option. This option, if set, overrides any primary readPreference inside the com.mongodb.DBTCPConnector.innerCall()
method, thus cancelling the rewrite made by DB.getCommandReadPreference().
e.g. for the listDatabases command, the latter method was rewriting the user chosen readPreference -- primaryPreferred in our case -- to primary . By enabling the Bytes.QUERYOPTION_SLAVEOK option, the read preference for this command finally becomes secondaryPreferred .
As you can see inside the DBTCPConnector.innerCall() method, the QUERYOPTION_SLAVEOK ultimately overrides readPreference when it is set to primary, causing it to be rewritten to secondaryPreferred .
Looks a little bit weird, but this is the way how they've replaced the slaveOK option with the newer readPreference . Probably this is a not completed refactorisation (the readPreference is a new addition on the MongoDB Java Driver).
OK, here is the new decision. Let's add back the slaveOK JDBC parameter and default it to true. If slaveOK is true, then call mongo.slaveOK(). This gives the capability for the user to override it to false if they don't want us to call slaveOK().
I've added back the slavOK JDBC parameter as requested, also mentioning on the readPreference description that if it is set to primary , then the slaveOK JDBC should be set to false in order to apply the expected read preference.
Emil, you are right! mongo.slaveOK(); works like magic!
Yes, because this call actually sets the Bytes.QUERYOPTION_SLAVEOK option. This option, if set, overrides any primary readPreference inside the com.mongodb.DBTCPConnector.innerCall()
method, thus cancelling the rewrite made by DB.getCommandReadPreference().
e.g. for the listDatabases command, the latter method was rewriting the user chosen readPreference -- primaryPreferred in our case -- to primary . By enabling the Bytes.QUERYOPTION_SLAVEOK option, the read preference for this command finally becomes secondaryPreferred .
As you can see inside the DBTCPConnector.innerCall() method, the QUERYOPTION_SLAVEOK ultimately overrides readPreference when it is set to primary, causing it to be rewritten to secondaryPreferred .
Looks a little bit weird, but this is the way how they've replaced the slaveOK option with the newer readPreference . Probably this is a not completed refactorisation (the readPreference is a new addition on the MongoDB Java Driver).
OK, here is the new decision. Let's add back the slaveOK JDBC parameter and default it to true. If slaveOK is true, then call mongo.slaveOK(). This gives the capability for the user to override it to false if they don't want us to call slaveOK().
I've added back the slavOK JDBC parameter as requested, also mentioning on the readPreference description that if it is set to primary , then the slaveOK JDBC should be set to false in order to apply the expected read preference.
It's not working even after setting slaveOK=true. I even tried slaveOK=true;readPreference=secondary but no luck.
(Please refer to the attached screenshot conn_secondary_fails.png).
In this case, out of 3 members of replica set, 2 were shut down and only the member on port 27021 was up as secondary.
It's not working even after setting slaveOK=true. I even tried slaveOK=true;readPreference=secondary but no luck.
(Please refer to the attached screenshot conn_secondary_fails.png).
In this case, out of 3 members of replica set, 2 were shut down and only the member on port 27021 was up as secondary.
I am not sure what you are missing in your test. Mine works correctly. See conn_secondary_success.png. I have only one secondary member running and set the slaveOK=true parameter. I am able to submit a query and get the result set successfully.
I am not sure what you are missing in your test. Mine works correctly. See conn_secondary_success.png. I have only one secondary member running and set the slaveOK=true parameter. I am able to submit a query and get the result set successfully.
Jenny, I tried multiple times(with slaveOK=true) but no success(still getting 'can't find a master' error). Mine is auth enabled replica setup. Could you please confirm it's same in yours as well?
Jenny, I tried multiple times(with slaveOK=true) but no success(still getting 'can't find a master' error). Mine is auth enabled replica setup. Could you please confirm it's same in yours as well?
Indeed, I've succeeded to reproduce it on an auth-enabled replica set.
Indeed, I've succeeded to reproduce it on an auth-enabled replica set.
The auth failure on secondary node is caused by a bug on the Mongo Java Driver library.
They had this issue, then this change was made on the upstream code, but this way the checkMaster() call fails if there is no master available on the replica set.
I've fixed the DBTCPConnector.authenticate()
method and with this patched version of the Mongo Java Library, ADS can successfully authenticate on the secondary server node.
The auth failure on secondary node is caused by a bug on the Mongo Java Driver library.
They had this issue, then this change was made on the upstream code, but this way the checkMaster() call fails if there is no master available on the replica set.
I've fixed the DBTCPConnector.authenticate()
method and with this patched version of the Mongo Java Library, ADS can successfully authenticate on the secondary server node.
Verified that ADS is able to connect to secondary of the replica set for read operations when primary is down.
Checked on ADS 14-beta-33.
Verified that ADS is able to connect to secondary of the replica set for read operations when primary is down.
Checked on ADS 14-beta-33.
Issue #9032 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build ADS 14.0.0-beta-32 (mongo-jdbc 1.2.1) |
No time estimate |
3 issue links |
relates to #9120
Issue #9120intelligence and robustness needed in mongo replica set connection in ADS |
relates to #9290
Issue #9290problem with CREATE DATABASE... on a replicated setup |
relates to #9134
Issue #9134Test Connection - should handle multiple hosts better |
Pls user appropriate ports and dbpath information in above steps (I listed the settings i used in my setup)