introducing nimna

As some of you might have seen, I had a library called nimna lying around on my GitHub in a stale state. Well, it’s stale no more. I have decided to resume development on nimna, as I myself am a biologist and never know when I might need to design some nucleic acids next. For those of you who expected JavaScript support in this first release, I have to disappoint you. There is none beyond the demo at the moment, but I am trying to figure out a few macros to transparently interact with the Emscripten heap, so I might have more on that soon-ish.

For those of you who do not know what nimna is about, a short description:

nimna - nucleic acid folding and design

Why did I write it?

nimna is a library for biology. It deals with nucleic acid folding (DNA and RNA), i.e. the prediction of the structure of short RNA or DNA sequences. Why is this interesting or useful? Well, for once there are a few tasks in molecular biology where you want to absolutely make sure that the DNA or RNA sequences do not fold in certain detrimental ways. For example, if you design primers, short DNA sequences which are used to start off the replication of DNA at a complementary sequence of DNA, you absolutely do not want them to have a very stable secondary structure and you do not want them to bind each other strongly. This is a very common task, which is in some cases amenable to automation. More interesting for me is the case of structure prediction, when the structure is in some way related to biological function. There exist for example catalytically active nucleic acids (ribozymes, DNAzymes), the design and study of which encompasses a whole area of biological research. Though those nucleic acids are somewhat exotic and interesting for research, they also seem to have potential for commercialization, as some companies are trying to develop ribozyme or DNAzyme based drugs. Anyway, so much for the motivation.

What is it?

nimna consists of a low-level wrapper of ViennaRNA, which is perhaps the most flexible and powerful package for nucleic acid folding prediction. On top of that, nimna provides a high-level interface, automating memory management and providing all the things needed to make that interface feel like idiomatic Nim. If you use that interface, you will never ever have to care about handling memory returned from ViennaRNA and will be able to use common nim facilities such as [] accessors and iterators. nimna further contains a module for automating nucleic acid design, which allows the user to specify sequence constraints and design DNA and RNA sequences to optimally satisfy user-defined fitness criteria.

What can I do with it?

Amongst other things, nimna can do:

Folding:
- Partition function folding for one or more molecules, as well as alignments.
- Minimum free energy folding for one or more molecules, as well as alignments..
- Centroid structure folding for one or more molecules.
- 2DFold (MFE and partition function).
- Maximum expected accuracy folding.
- Generation of suboptimal structures and energies.
Constraints:
- Hard constraints are fully supported.
- Soft constraints are fully supported.
- Structured ligand binding constraints are fully supported.
Model Details:
- Updating of model details associated with a molecule.
- Generating MFE and PF parameters from model details.
- Macro for easily generating model details.
Parameters:
- Updating of parameters associated with a molecule.
Probability Matrix:
- Probability matrix exposed as Probabilities = ref object.
- Extracting values from the probability matrix of partition function folding.
- Generating a Density Plot of the base pairing probability in a terminal emulator.
Nucleic acid design:
- Generating nucleic acid sequences corresponding to local minima in user-defined fitness functions.
Miscellaneous:
- Generating reasonably random DNA/RNA sequences.
- Evaluating energies of secondary structures.
- Sampling secondary structures from ensembles computed with pf and pf2D.
- Iterators for all types which can be iterated over.
- Reading and writing parameter files.

Where can I get it?

nimna should be part of the Nimble package list soon. Until then, you can install it on Linux or MacOS from the repository by just typing:

nimna install https://github.com/mjendrusch/nimna.git

This will automatically fetch the ViennaRNA sources, build it with sane configuration options and add it to your library path, so you can use it right away. It will also register an alias:

nimnacleanup

Which you should use before uninstalling nimna to remove ViennaRNA. Otherwise, nimble will not be able to delete the package directory completely.

You can build the docs locally by navigating to the installation directory and executing:

nimble docs

and run the unit tests using:

nimble test

If you need to rebuild dependencies for some reason, you can do it using:

nimble deps

A short example - ripped from the docs

This is a short example of the design functionality of nimna, which is for me the most important capability the library provides:

  We want the following fold and sequence constraint:
                 N
             N       N
              N --- G
              N --- G
              N --- C
   5' N N N N         N
      | | | |          N
   3' N U C C N       N
                H ---- N
                  H --- N
                   H --- A
                    G      A
                        U

Let’s design such a sequence:

  import strformat
  import nimna, nimna.design

  const
    structure =  "(((((((...)))...(((...))).))))"
    constraint = "NNNNNNNNNNGGCNNNNNAAUGHHHNCCUN"
    population = 100

  let
    opts = settings(temperature = 37.0)

  proc fitness(c: Compound): float =
    c.update(opts)
    let
      targetEnergy = c.eval(structure)
      (ensembleEnergy, ensembleStructure) = c.pf
    ## we want a sequence for which the target
    ## structure dominates the ensemble.
    result = targetEnergy - ensembleEnergy

  var
    engine = newEngine(100, fitness)

  engine.pattern = constraint

  for idx in 0 ..< 5:
    engine.step(1000)
    engine.mutationProbability = engine.mutationProbability - 0.1

  echo fmt"Best candidate is {engine.best.sequence} with score {engine.score}"

A bit of history

When I started writing nimna, I had just begun learning Nim as a programming language, after having come from using ViennaRNA with Python, at the time it only had a Python 2.x interface. It was originally developed as an actually viable alternative to a DNA sequence design script written over the course of the 2015 iGEM competition, but ended up lying around on my hard drive for quite some time before I actually pushed it to GitHub. I later wrote a small proof of concept to show the possibility to move the whole thing to the JavaScript backend, which kind of worked, but was quite awkward to get to build. Any further attempts of mine to generalize that proof of concept to the whole library proved to be an exercise in frustration, as I neither had an idea of the inner workings of the Emscripten memory layout, nor was my metaprogramming-fu strong enough at that point. Then the bachelor thesis and my second foray into iGEM came along and I sort of forgot about nimna and my Emscripten wrapper attempts in the process.

Nowadays, metaprogamming is strong in me and I now have more of an idea of interfacing with Emscripten, so I will probably go back to finish what I started with the JavaScript interface, sooner rather than later.

Conclusion

Well, that’s it for the introduction of nimna, enjoy this first release.

Happy folding!