MASTER_AUTO_POSITION being reset to 0 after graceful-master-takeover

Hi @shlomi-noach , I think I might have found a bug during graceful-master-takeover process.
All slaves, prior to graceful-master-takeover starting, have the following: Auto_Position: 1. However, after graceful-master-takeover takes place, Auto_Position is set to 0, and further graceful failovers do not work until I set it back to 1. 

I have the following test scenario below:
```
[root@po-proxysql1 orchestrator]# orchestrator-client -c topology -i po-mysql1:53306
po-mysql1:53306     [0s,ok,5.7.21-21-log,rw,MIXED,>>,GTID]
+ po-mysql2:53306   [0s,ok,5.7.21-21-log,ro,MIXED,>>,GTID]
+ po-mysql3:53306   [0s,ok,5.7.21-21-log,ro,MIXED,>>,GTID]
  + po-mysql4:53306 [0s,ok,5.7.21-21-log,ro,MIXED,>>,GTID]
```

I wrote a post graceful-master-takeover hook, which does the following:

1. Restarts the slave threads on old master (now a slave)
2. Gets a list of all secondary slaves from the old master (for now this is hard coded as you can see below as this is proof of concept)
3. Moves the secondary slaves as slaves of the old master (now a slave) after graceful-failover
4. Starts slave threads on secondary slaves

```
#!/bin/bash
echo "Restarting slave threads on old master ${ORC_FAILED_HOST}:${ORC_FAILED_PORT}"
orchestrator -c start-slave -i ${ORC_FAILED_HOST}:${ORC_FAILED_PORT}

echo "Getting list of secondary slaves from new master"
SEC_SLAVES=()
for secondary_slave in `orchestrator-client -c which-replicas -i ${ORC_SUCCESSOR_HOST}:${ORC_FAILED_PORT} | grep  po-mysql4`
do
SEC_SLAVES+=(${secondary_slave})
done

for ancillary_slave in "${SEC_SLAVES[@]}"
do
echo "Making SECONDARY SLAVE ${ancillary_slave} as a SLAVE of ${ORC_FAILED_HOST}"
orchestrator -c relocate -i ${ancillary_slave} -d ${ORC_FAILED_HOST}:${ORC_FAILED_PORT}
orchestrator -c start-slave -i ${ancillary_slave}
done
```

Here are the before and after pictures. Notice this only worked after I did the following on the old master 
after graceful-master-takeover was all finished:

```STOP SLAVE; CHANGE MASTER TO MASTER_AUTO_POSITION = 1; START SLAVE;```

Screeshots below show the before and after:

![screenshot from 2018-05-17 14-45-08](https://user-images.githubusercontent.com/30844925/40199087-6cb8911a-59e6-11e8-82d2-3fe7cd3060bc.png)

![screenshot from 2018-05-17 14-46-58](https://user-images.githubusercontent.com/30844925/40199100-76f4ca86-59e6-11e8-9f75-18e6944dbcc4.png)

Here's my config:

```
[root@po-proxysql1 orchestrator]# cat /etc/orchestrator.conf.json
{
  "Debug": false,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/usr/local/orchestrator/orchestrator.db",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "orchestrator_password",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "orchestrator_password",
  "MySQLOrchestratorCredentialsConfigFile": "",
  "MySQLOrchestratorSSLPrivateKeyFile": "",
  "MySQLOrchestratorSSLCertFile": "",
  "MySQLOrchestratorSSLCAFile": "",
  "MySQLOrchestratorSSLSkipVerify": true,
  "MySQLOrchestratorUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".:53306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "SlaveLagQuery": "",
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"
  ],
   "OnFailureDetectionProcesses": [
    "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log",
    "/usr/local/orchestrator/pregracefulfailover.sh >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo 'Planned takeover complete' >> /tmp/recovery.log",
    "/usr/local/orchestrator/postgracefulfailover.sh >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "MasterFailoverDetachSlaveMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true
}
}
```
Thanks for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MASTER_AUTO_POSITION being reset to 0 after graceful-master-takeover #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MASTER_AUTO_POSITION being reset to 0 after graceful-master-takeover #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions