Original Source Here
Anton computers at D. E. Shaw Research
A family of specialized supercomputers that simulates molecular mechanics like no other
A purely physics-based alternative to the sea of machine learning methods, this unmatched tech enables otherwise impossible studies relevant to basic biology and pharma
Molecular dynamics simulations consist in describing a piece of matter as a mathematical model, most usually atom by atom, and then calculating how the system evolves over time in a physically realistic fashion. Contrary to ML-based and other approaches, which require large amounts of data for training, molecular simulations attempt to reproduce reality up from purely physical principles, hence allowing the exploration of questions that are hard or even impossible to probe with data-based methods. Here’s a glimpse into the field of molecular dynamics simulations with a focus on a family of supercomputers that can only run such computations, but at an unbeatable speed.
To run molecular dynamics simulation, and limiting this explanation only to what is dubbed “classical atomistic molecular dynamics simulations”, scientists assign to each atom of the system a radius, a mass, a charge, and physics-consistent connections to all the other atoms. These connections include “bonded” connections that mimic the covalent bonds and angular restraints that keep atoms together in molecules, and also the “non-bonded” interactions that keep track of how atoms collide with each other, get attracted or repulsed by their opposing or like-sign charges, etc. Integration of all this mathematical description over time results in a kind of movie about how all the atoms of the system move. With such tool scientists can explore from simple concepts such as how molecules move or diffuse to complex questions such as how a drug binds to a target protein, how a protein performs its function, and myriad of other issues relevant to fundamental and applied chemistry and biology.
Molecular simulations are computationally demanding
It turns out that propagating motions over time during a simulation is no easy task. Given a starting configuration of atoms in the mathematical model, a computer program called “molecular dynamics engine” computes the forces acting on all the atoms and then the resulting changes in their speeds and positions. From the new positions, the program can again compute forces and make a new update of the velocities and positions. And then again, and again, creating each time one more frame of the “movie” that describes how the atoms of the system move. Moreover, simulations mimic the temperature and pressure of the simulated systems by ensuring that the atoms move in consistent ways. Mimicking temperature involves generating random numbers to affect the velocities of all atoms; thus, each time you run an MD simulation you will observe a different evolution over time. The key (and the hope) is that after a sufficiently long time, and/or if one could run a given simulation multiple times in parallel, then the overall conclusions that one will draw from the simulations are in principle the same. That’s why it is important to run long simulations, especially when one wants to sample an event that takes time to happen.
Unfortunately, physics is such that the time step between consecutive “frames” of the “movie” is very short. Typically just 2 fs (femtoseconds) which is 2 millionths of a millionth of a thousandth of a second. This number gains some context when we recall that the most interesting chemical events happen in timescales of microseconds to milliseconds.
To reach these timescales, then, the MD program needs to compute billions to trillions of steps!
Even worse if we take into account that in order to collect statistics one ideally wants to observe multiple occurrences of the event under investigation.
Specialized supercomputers for molecular simulations
Typical computers, including those with the most powerful GPUs, can today simulate small systems for up to some tens of microseconds in the best case. While there are specialized methods to circumvent this problem by “forcing” the system to undergo events (a large array of “enhanced sampling” “tricks”), over a decade ago billionaire and ex-computer science professor David Elliot Shaw created a private company, DEShaw Research, with the goal of developing a new series of supercomputers specialized right for MD simulations that would break free from current limitations.
The long-term goal of this new company was to accelerate research in the development of pharmaceutically relevant compounds by applying simulations to understand proteins and other biological systems in atomic detail. As intermediate goals they engaged optimization of molecular mechanics forcefields, i.e. the collections of parameters used to describe a system for its simulation. And before that, they tackled the engineering problems related to optimizing the calculations involved in molecular dynamics simulations.
The first computer developed at and by DEShaw Research, called Anton (we could rather say “Anton 1”, as subsequent models were called Anton 2, 3, etc.), was put into work in 2008. It could simulate molecular systems in atomic detail around 100 times faster than regular computers could at that time. This meant that scientists using Anton at the company could simulate by the early 2010s molecular events that nobody else can simulate even today, without applying any enhanced sampling tricks. Somehow they broke “Moore’s law” as it applied to molecular simulations.
Anton 2 is even faster, it can fit bigger systems, and it is a bit more programmable (hence versatile) than its predecessor (recall that these computers are hard-wired to run simulations, so it’s not given that even variations of regular simulations will be easy to achieve as in a regular computer). At least one Anton 3 computer exists too, which is faster, more programmable, and scales well for quite big systems, opposite to the closest competing GPUs. If you are into simulations and want to see some numbers, Anton 3 was reported to run a 100k atoms system at around 200 microseconds per day, which means you can get 1 ms worth of dynamics in just a week of work. The best current GPU can today run around 10 times slower than Anton 3 but only for small systems, not scaling well for big systems. Hence, Anton 3 presents a double advantage: it runs simulations faster, producing more sampling per unit of real time, and also enables the study of bigger, more complete and complex systems.
How the Anton computers work
To achieve exceptional speeds in simulating molecular systems, the Anton computers incorporate novel pieces of computer engineering developed specifically for and by the DEShaw Research projects. Globally speaking, these developments involve designing hardware to specifically accelerate the typical computations involved in molecular simulations. Thus, the Anton computers traded efficiency and speed in their integration of the equations of motions for flexibility. In other words, they are extraordinarily powerful to run molecular simulations, but they cannot do anything else. They are pieces of highly specialized hardware.
Without going into much detail, summarizing from WikiPedia and the above-cited articles by DEShaw Res (and more articles at the end), Anton runs its computations entirely on specialized circuits (ASICs) instead of dividing the computation between general-purpose host processors. In particular, powerful ASICs with cores specialized for certain calculations are at the core of Anton’s superb speeds, together with streamlined communications. Each Anton ASIC contains two subsystems, and also its own DRAM bank, enabling large simulations. One of the subsystems is specialized to computer the non-bonded forces; this is a high-throughput interaction subsystem consisting of several deeply pipelined modules arranged much like in a systolic array. The remaining calculations, including bonded forces and various mathematical operations, are performed in the other subsystem (which is more flexible to accommodate various different calculations, consisting of specialized but programmable SIMD cores).
The ASICs of the Anton computers are arranged into a 3D torus that maximizes connectivity between them, also maximized by high-bandwidth links that transfer tens to hundreds of GB per second and consist in separate lines for the information flowing in different directions. Nothing was left to chance; every detail was considered during design, as you see.
After a wave of unprecedented simulations to prove Anton’s potential, applications to biology came
In 2010 DEShaw Research reported in a Science paper the first full study of protein motions using Anton (Anton 1 at that time):
Atomic-Level Characterization of the Structural Dynamics of Proteins (Shaw et al Science 2010)
The paper reported atomically detailed molecular dynamics simulations reaching around 0.1 to 1 millisecond each, for proteins folding into 3D structures from disordered forms and for folded proteins that experience functionally relevant motions in the tens of microseconds hence you need millisecond simulations to sample them multiple times.
The millisecond timescales reached by Anton 1 already in 2010 is still around a hundred times longer than the typical simulations reported today using conventional supercomputers not designed for simulations.
For some more details, and then the reader is referred to the paper, the work presented simulations for:
- The small proteins FiP35 and villin, known to fold very fast (within microseconds), starting from extended conformations and monitoring if and how they adopted the known 3D structures.
- The dynamics of bovine pancreatic trypsin inhibitor (in this case starting from the actual folded 3D structure) interconverting between distinct conformational states that happens too slow for regular simulations to capture them.
Simulations of proteins folding into 3D states have in principle the potential of replacing (and performing better than) ML-based methods for structure prediction, with the additional advantage that they can explain how proteins fold. Recall that even the best ML methods like AlphaFold predict folded structures but not how they are achieved, i.e. they in principle don’t know anything about folding pathways. In a physics-based simulation, instead, one can literally see how folding proceeds, and if the process reproduces known experimental data about it then one can infer how the protein folds, based on the simulation.
In 2011 DEShaw Res published a new paper examining the folding pathways of several small proteins from disordered states, with Anton of course:
How Fast-Folding Proteins Fold (Lindorff-Larsen et al, Science 2011)
After this, we didn’t hear more from the company about using molecular dynamics simulations to fold proteins. Probably, the approach didn’t evolve further because even mid-sized proteins take several milliseconds to fold; also possibly because forcefields are not good enough yet. More over, the impact of ML-based predictions kind of opaqued the role of physics-based methods like molecular simulations. Indeed most efforts of the communities of computational chemists and biologists are now dedicated mainly to advance ML-based methods. Just check the list of talks in this recent conference as an example.
How DEShaw Res uses Anton computers now
The group rather used the power of its Anton computers for two main goals:
- Improving forcefields, i.e. getting better descriptions of the parameters used to describe atoms and their interactions throughout the simulations. This is essential for the whole community but especially for DEShaw Res because by running such long simulations they can better expose (and suffer, and eventually correct) the problems and biases in the forcefields.
- Advancing the atomic-level understanding of systems of biological relevance, which is probably the ultimate role of the company as a means to eventually create new molecules of clinical use.
Improving forcefields explored several paths, but most importantly focused on two points: tuning the description of water, and better describing disordered regions of folded proteins and even fully (“intrinsically”) disordered proteins. Both problems are actually entangled:
Bulk water has very complex properties that are very hard to model, and tuning them helps (as DEShaw Res and many others showed) to correct for problems observed in multiprotein systems and in intrinsically disordered proteins (which do not adopt a well-definable 3D structure, yet are of great biological relevance). I did myself a full study of this in the following paper, including the latest forcefield proposed by DEShaw Res by that time:
Applications to chemistry and biology
On the side of general applications to biological systems, DEShaw Res published papers about how drugs bind their target proteins, especially interesting for disordered proteins, the discovery of pockets in protein surfaces that could be the target of novel drugs, the binding of disordered proteins to others, among others.
And of course, multiple works related directly to biology and medically relevant proteins. Here’s just a small selection:
In each of these applications to biological problems, the molecular simulations either put forward hypothesis to design experiments, or explained experimental results that cannot be drawn from the static structures alone. Something that data-based prediction tools cannot even approach, at least for the moment.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot