Skip to content

List of refactoring and code improvement opportunities #114

@rishi-s8

Description

@rishi-s8

I am listing a few things that would improve the performance and consistency of the code:

  1. Use torch functions and tensors for as many things as possible, including model averaging. Reduce the use of Python data types as much as possible.
  2. Migrate functions that use numpy and numpy arrays to torch tensors.
  3. Ideally create an append-only log, for example, for the accuracy, loss and similar things, create a CSV at the start, and then each round just appends a line at the end instead of maintaining the whole log in memory.
  4. As mentioned in Improve GRPC broadcast implementation #65, grpc all_gather, and receives from multiple nodes is sequential and blocks until it receives the messages in order. A better way to do this might be to interleave synchronous waiting with the actual message and when the condition is not satisfied (not in the current round or the node is too busy), move to another node and come back to this node later.
  5. Ideally, we should not poll the current round of another node through recurrent messages. We can use something like a condition while asking for a round, and the polled node will respond when the condition is satisfied.
  6. The choice of synchronous or not should be for each receive and not on the state of the node.

Feel free to add things to this list as a comment on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions