Skip to content

GSoC 2017 Project Ideas

Richard Gowers edited this page Jan 17, 2017 · 12 revisions
Google Summer of Code 2017 A list of projects ideas for Google Summer of Code 2017.

The current proposed projects are:

  1. Implement efficient parallel analysis of trajectories
  2. Improve distance search
  3. Add new MD-Formats
  4. Help port MDAnalysis to Python 3

Or work on your your own idea! Get in contact with us to propose an idea and we will work with you to flesh it out into a full project. Raise an issue in the Issue Tracker or contact us via the developer Google group.


Implement efficient parallel analysis of trajectories

Difficulty: Hard

Mentors: Manuel

Molecular simulation trajectories are very often analyzed frame-by-frame. This is frequently an embarrassingly parallel procedure, in which work can be efficiently divided simply by splitting the trajectory and letting each worker process one of the chunks. The goal of this project is to implement a parallelization framework that automates all the trajectory splitting, work distribution, and eventual result collection.

A parallelization framework should put the least burden possible on the end-user, so that minimal changes are required to turn serial code into parallel. Likewise, the parallelization framework must blend naturally with the analysis API of MDAnalysis. In this way, analyses written using analysis.base will automatically become parallelizable.

Implementing parallelization in Python code can be done in many ways. Aspects to consider when choosing one or several approaches are:

  • Most users will primarily have access to SMP parallelization;
  • Notwithstanding the above point, many users also typically have access to multi-node HPC clusters, and we should be able to leverage their use;
  • In an analysis context, being able to write results to shared memory will improve the memory usage footprint and simplify result collection;
  • GPU parallelization is attractive for its wide availability (though possibly more complex to implement in a meaningful way).

Improve distance search

Difficulty: Hard

Mentors: Manuel, Richard

Analysis of molecular dynamics simulations typically involves calculations of based upon atoms which are spatially close to each other. For example a radial distribution function is often only interesting up to distances of around 1.6 nm. The naive approach to calculate this is to calculate the distance between each pair of atoms, however as the size of the system grows the number of useful pair distances decreases while the computational cost scales as N^2.

To greatly improve the efficiency of this operation, we can first decompose the total simulation volume into smaller cells. We can then calculate the distances between atom pairs in neighbouring cells. If atoms are not in neighbouring cells we already know that the distance is to large to be interesting. A theoretical description of this algorithm can be found in this book Appendix F

One domain decomposition algorithm is cell grids.

In this project you would integrate the cell grid algorithm into MDAnalysis.

Add new MD-Formats

Dificulty: Medium

Mentors:

One of the strengths of MDAnalysis is its ability to support a wide range of different MD-formats. But we are still missing some like the new TNG file format from Gromacs , H5MD or the HALMD format. Alternatively, you can also add a format that you want to use personally in MDAnalysis. This project will familiarize you with working with and connecting different APIs, as well as giving insight into how modern portable data storage file formats work.

Help port MDAnalysis to Python 3

Difficulty: Easy

Mentors:

Python 3 is getting adopted by a wider range of users and unix distributions are starting to switch. MDAnalysis can't run right now under Python 3 mostly due to it's C/Cython extensions, we currently try to move our C-extensions to cython which supports Python 2 and 3 with one source. See also #260.

Missing here right now is the DCD trajectory reader. There is an incomplete work to enable Python 2/3 of the DCD reader. In this project you would finish this work by either writing finishing this work or by rewriting the DCD interface in cython.

The second part of this project is to remove all other incompatibilities with Python 3 we currently have. For this you should work that our test-suite passes on Python 3.

Clone this wiki locally