Skip to content

Conversation

jchodera
Copy link
Member

@jchodera jchodera commented Feb 8, 2017

@andrrizzi : Here's the very basic test code I was playing with, in case you find it useful. This doesn't necessarily have to be merged, but might at least illustrate something along the lines of what I was thinking.

I haven't tested this on the cluster yet.

@jchodera
Copy link
Member Author

jchodera commented Feb 8, 2017

We'll still need a way to start celery workers on individual GPUs. I bet we could do this with something like

CUDA_VISIBLE_DEVICES=0 celery -A openmmtools.distributed worker -l info --concurrency=1 &
CUDA_VISIBLE_DEVICES=1 celery -A openmmtools.distributed worker -l info --concurrency=1 &
CUDA_VISIBLE_DEVICES=2 celery -A openmmtools.distributed worker -l info --concurrency=1 &
CUDA_VISIBLE_DEVICES=3 celery -A openmmtools.distributed worker -l info --concurrency=1 &

though @pgrinaway may have better ideas for how best to do this with multiple GPUs on a node.

It looks like there's also a way to specify worker queues with the --queues flag, like --queues gpu vs --queues cpu. Documentation on these can be found here.

@andrrizzi
Copy link
Contributor

Thanks! I'll take a look at this tomorrow.

@jchodera
Copy link
Member Author

This is still very much test code for experimenting. I think the next steps are:

  • Try to construct some sort of benchmark example that tests local vs remote execution of a replica exchange like operation on a realistic test system (e.g. Src in explicit solvent) to see how well things perform
  • Make it easy to try both celery and redis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants