Troubleshooting
Mesos
Â
At scheduler startup, if specified in config (mesos/reconcile), framework will try to reconcile GoDocker running jobs with Mesos tasks to sync task status.
Â
In case of Mesos framework deletion in Mesos master, it is possible to force scheduler to use a new framework Id.
In redis:
del god:mesos:frameworkId
Â
If a job is terminated in Mesos but still shown as running in GoDocker (status update sync issue), one can force the update of the job in GoDocker
In redis:
set god:mesos:over:TASK_ID 7 # TASK_ID = identifier of the job, 7: mark as failed, 2: mark as OK
Â
If Mesos/GoDocker are completly out of sync with any reason, and there are too many tasks to handle the above trick.
Switch GoDocker to maintenance and wait for any running job to complete. Once, on mesos side, all jobs are over, if you still have some jobs running in GoDocker, you can delete those jobs.
Stop scheduler and watchers processes.
In mongodb:
Â
db.jobs.remove({'status.primary': 'running'}) # Will delete all jobs in running status
Then in redis:
del god:jobs:running # Clear the running jobs queue used by watchers/executors
Â
If a job fails to be killed by mesos executor, go to the mesos slave and stop the Docker container
docker stop XYZ