Module design

Module design

Design

The design module provides facilities for nucleic acid design. It is currently based on an artificial immune system algorithm with naive sequence sampling. Future versions will include smarter sampling algorithms and arbitrary optimization drivers (think Ant-colony optimization, etc.).

General usage

The basic usage of nimna.design relies on the definition of a fitness function, and a set of constraints on the sequence space to be searched. Those are then fed into a DesignEngine, which performs the actual sequence optimization.

Example

We want the following fold and sequence constraint:
               N
           N       N
            N --- G
            N --- G
            N --- C
 5' N N N N         N
    | | | |          N
 3' N U C C N       N
              H ---- N
                H --- N
                 H --- A
                  G      A
                      U

Let's design such a sequence:

import strformat
import nimna, nimna.design

const
  structure =  "(((((((...)))...(((...))).))))"
  constraint = "NNNNNNNNNNGGCNNNNNAAUGHHHNCCUN"
  population = 100

let
  opts = settings(temperature = 37.0)

proc fitness(c: Compound): float =
  c.update(opts)
  let
    targetEnergy = c.eval(structure)
    (ensembleEnergy, _) = c.pf
  ## we want a sequence for which the target
  ## structure dominates the ensemble.
  result = targetEnergy - ensembleEnergy

var
  engine = newEngine(100, fitness)

engine.pattern = constraint

for idx in 0 ..< 5:
  engine.step(1000)
  engine.mutationProbability = engine.mutationProbability - 0.1

echo fmt"Best candidate is {engine.best.sequence} with score {engine.score}"

Types

Mutator = object
  constraintString*: string
  backgroundProbs*: Table[char, float]
  totalProb*: Table[char, float]
  mutationProb*: float
  consistentProb*: float
  stringLength*: int
  pairConstraints*: seq[tuple[i: int, j: int]]
  freeConstraints*: seq[int]
Object containing all constraints and parameters used for sequence mutation.
DesignEngine = ref object
  population*: seq[Compound]
  populationSize: int
  best*: Compound
  score*: float
  scoringFunction: proc (c: Compound): float
  settings*: Settings
  mutator*: Mutator
Object containing a population and scoring function for nucleic acid design.

Consts

concreteAlphabet = {'A', 'a', 'C', 'c', 'G', 'g', 'T', 't', 'U', 'u'}
abstractAlphabet = {'N', 'n', 'B', 'b', 'D', 'd', 'H', 'h', 'V', 'v', 'W', 'w', 'S', 's', 'R',
                  'r', 'Y', 'y'}
skipNucleotides = (data: [(0, 0, {}), (0, 0, {}), (66, 66, {65, 97}), (98, 98, {65, 97}),
                       (68, 68, {67, 99}), (100, 100, {67, 99}), (0, 0, {}), (0, 0, {}),
                       (72, 72, {71, 103}), (104, 104, {71, 103}), (0, 0, {}), (0, 0, {}),
                       (0, 0, {}), (0, 0, {}), (0, 0, {}), (0, 0, {}), (0, 0, {}), (0, 0, {}),
                       (82, 82, {67, 99, 84, 116}), (83, 83, {65, 97, 84, 116}),
                       (115, 115, {65, 97, 84, 116}), (114, 114, {67, 99, 84, 116}),
                       (86, 86, {84, 116}), (118, 118, {84, 116}),
                       (87, 87, {71, 103, 67, 99}), (119, 119, {71, 103, 67, 99}),
                       (89, 89, {65, 97, 71, 103}), (121, 121, {65, 97, 71, 103}),
                       (0, 0, {}), (0, 0, {}), (0, 0, {}), (0, 0, {})], counter: 16)

Procs

proc newEngine(popSize: int; fitness: proc (c: Compound): float): DesignEngine {.
raises: [], tags: []
.}
Creates a new DesignEngine with a population of size popSize and a fitness function fitness. For the fitness function smaller is better.
proc background=(eg: DesignEngine; probs: Table[char, float]) {.
raises: [KeyError], tags: []
.}
Sets the background probabilities of nucleotides.
proc mutationProbability=(eg: DesignEngine; prob: float) {.
raises: [], tags: []
.}
Sets the probability of a base to be mutated at each step.
proc consistentMutationProbability=(eg: DesignEngine; prob: float) {.
raises: [], tags: []
.}
Sets the probability of a base pair, or unpaired base to be mutated consistent with a set of proposed secondary structures.
proc pattern=(eg: DesignEngine; pattern: string) {.
raises: [], tags: []
.}

Sets a constraint on the bases at each position in the population. The constraint follows pattern in IUPAC notation:

E.g: H corresponds to one of A, C, T, N to anything, and so on.

proc addStructure(eg: DesignEngine; structure: string) {.
raises: [], tags: []
.}
Adds a structure for consistent mutation to the DesignEngine eg.
proc structure=(eg: DesignEngine; structure: string) {.
raises: [], tags: []
.}
Sets a structure for consistent mutation for the DesignEngine eg.
proc mutate(mt: Mutator; source: string = ""): Compound {.
raises: [KeyError, Exception], tags: [RootEffect]
.}
Returns a mutated Compound derived from source, according to the parameters set in the Mutator mt.
proc populate(eg: DesignEngine) {.
raises: [KeyError, Exception], tags: [RootEffect]
.}
Populates a DesignEngine eg according to its set properties.
proc eval(eg: DesignEngine) {.
raises: [Exception], tags: [RootEffect]
.}
Evaluates all members of the population stored in eg and selects the best.
proc step(eg: DesignEngine; iterations: int = 1) {.
raises: [KeyError, Exception], tags: [RootEffect]
.}
Performs iterations steps of mutation and evaluation on the population in the DesignEngine eg.

Templates

template skip(constraint, nt: char): bool
template accept(constraint, nt: char): bool