Oligonucleotides: RNA

Nucleic Acid Sequences

OpenMS also supports the representation of RNA oligonucleotides using the NASequence class:

 1from pyopenms import *
 2oligo = NASequence.fromString("AAUGCAAUGG")
 3prefix = oligo.getPrefix(4)
 4suffix = oligo.getSuffix(4)
 5
 6print(oligo)
 7print(prefix)
 8print(suffix)
 9print()
10
11print("Oligo length", oligo.size())
12print("Total precursor mass", oligo.getMonoWeight())
13print("y1+ ion mass of", str(prefix), ":", prefix.getMonoWeight(NASequence.NASFragmentType.YIon, 1))
14print()
15
16seq_formula = oligo.getFormula()
17print("RNA Oligo", oligo, "has molecular formula", seq_formula)
18print("="*35)
19print()
20
21isotopes = seq_formula.getIsotopeDistribution( CoarseIsotopePatternGenerator(6) )
22for iso in isotopes.getContainer():
23  print ("Isotope", iso.getMZ(), ":", iso.getIntensity())

Which will output

AAUGCAAUGG
AAUG
AUGG

Oligo length 10
Total precursor mass 3206.4885302061
y1+ ion mass of AAUG : 1248.2298440331

RNA Oligo AAUGCAAUGG has molecular formula C97H119N42O66P9
===================================

Isotope 3206.4885302061 : 0.25567981600761414
Isotope 3207.4918850439003 : 0.31783154606819153
Isotope 3208.4952398817004 : 0.23069815337657928
Isotope 3209.4985947195 : 0.12306403368711472
Isotope 3210.5019495573 : 0.053163252770900726
Isotope 3211.5053043951 : 0.01956319250166416

The NASequence object also allows iterations directly in Python:

1oligo = NASequence.fromString("AAUGCAAUGG")
2print("The oligonucleotide", str(oligo), "consists of the following nucleotides:")
3for ribo in oligo:
4  print(ribo.getName())

Fragment ions

Similarly to before for amino acid sequences, we can also generate internal fragment ions:

 1oligo = NASequence.fromString("AAUGCAAUGG")
 2suffix = oligo.getSuffix(4)
 3
 4oligo.size()
 5oligo.getMonoWeight()
 6
 7charge = 2
 8mass = suffix.getMonoWeight(NASequence.NASFragmentType.WIon, charge)
 9w4_formula = suffix.getFormula(NASequence.NASFragmentType.WIon, charge)
10mz = mass / charge
11
12print("="*35)
13print("RNA Oligo w4++ ion", suffix, "has mz", mz)
14print("RNA Oligo w4++ ion", suffix, "has molecular formula", w4_formula)

Modified oligonucleotides

Modified nucleotides can also represented by the Ribonucleotide class and are specified using a unique string identifier present in the RibonucleotideDB in square brackets. For example, [m1A] represents 1-methyladenosine. We can create a NASequence object by parsing a modified sequence as follows:

 1oligo_mod = NASequence.fromString("A[m1A][Gm]A")
 2seq_formula = oligo_mod.getFormula()
 3print("RNA Oligo", oligo_mod, "has molecular formula",
 4  seq_formula, "and length", oligo_mod.size())
 5print("="*35)
 6
 7oligo_list = [oligo_mod[i].getOrigin() for i in range(oligo_mod.size())]
 8print("RNA Oligo", oligo_mod.toString(), "has unmodified sequence", "".join(oligo_list))
 9
10r = oligo_mod[1]
11r.getName()
12r.getHTMLCode()
13r.getOrigin()
14
15for i in range(oligo_mod.size()):
16  print (oligo_mod[i].isModified())

DNA, RNA and Protein

We can also work with DNA and RNA sequences in combination with the BioPython library (you can install BioPython with pip install biopython):

 1from Bio.Seq import Seq
 2from Bio.Alphabet import IUPAC
 3bsa = FASTAEntry()
 4bsa.sequence = 'ATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGT'
 5bsa.description = "BSA Bovine Albumin (partial sequence)"
 6bsa.identifier = "BSA"
 7
 8entries = [bsa]
 9
10f = FASTAFile()
11f.store("example_dna.fasta", entries)
12
13coding_dna = Seq(bsa.sequence, IUPAC.unambiguous_dna)
14coding_rna = coding_dna.transcribe()
15protein_seq = coding_rna.translate()
16
17oligo = NASequence.fromString(str(coding_rna))
18aaseq = AASequence.fromString(str(protein_seq))
19
20print("The RNA sequence", str(oligo), "has mass", oligo.getMonoWeight(), "and \n"
21  "translates to the protein sequence", str(aaseq), "which has mass", aaseq.getMonoWeight() )