Provides the amino acid alphabets and functionality for translation from nucleotide. More...
Classes | |
class | bio::alphabet::aa10li |
The reduced Li amino acid alphabet.. More... | |
class | bio::alphabet::aa10murphy |
The reduced Murphy amino acid alphabet.. More... | |
class | bio::alphabet::aa20 |
The canonical amino acid alphabet.. More... | |
class | bio::alphabet::aa27 |
The twenty-seven letter amino acid alphabet.. More... | |
class | bio::alphabet::aminoacid_base< derived_type, size > |
A CRTP-base that refines bio::alphabet::base and is used by the amino acids. More... | |
struct | bio::alphabet::aminoacid_empty_base |
This is an empty base class that can be inherited by types that shall model bio::alphabet::aminoacid. More... | |
Concepts | |
concept | bio::alphabet::aminoacid |
A concept that indicates whether an alphabet represents amino acids. | |
Functions | |
template<genetic_code gc = genetic_code::CANONICAL, nucleotide nucl_type> | |
constexpr aa27 | bio::alphabet::translate_triplet (nucl_type const &n1, nucl_type const &n2, nucl_type const &n3) noexcept |
Translate one nucleotide triplet into single amino acid (single nucleotide interface). | |
Variables | |
template<typename t > | |
constexpr bool | bio::alphabet::custom::enable_aminoacid = std::derived_from<t, aminoacid_empty_base> |
A trait that indicates whether a type shall model bio::alphabet::aminoacid. | |
Provides the amino acid alphabets and functionality for translation from nucleotide.
Amino acid sequences are an important part of bioinformatic data processing and used by many applications and while it is possible to represent them in a regular std::string, it makes sense to have specialised data structures in most cases. This sub-module offers the 27 letter aminoacid alphabet as well as three reduced versions that can be used with regular container and ranges. The 27 letter amino acid alphabet contains the 20 canonical amino acids, 2 additional proteinogenic amino acids (Pyrrolysine and Selenocysteine) and a termination letter (*). Additionally 4 wildcard letters are offered which allow a more generic usage for example in case of ambiguous amino acids (e.g. J which means either Isoleucine or Leucine). See also https://en.wikipedia.org/wiki/Amino_acid for more information about the amino acid alphabet.
Amino acid name | Three letter code | One letter code | Remapped in bio::alphabet::aa20 | Remapped in bio::alphabet::aa10murphy | Remapped in bio::alphabet::aa10li |
---|---|---|---|---|---|
Alanine | Ala | A | A | A | A |
Arginine | Arg | R | R | K | K |
Asparagine | Asn | N | N | B | H |
Aspartic acid | Asp | D | D | B | B |
Cysteine | Cys | C | C | C | C |
Tyrosine | Tyr | Y | Y | F | F |
Glutamic acid | Glu | E | E | B | B |
Glutamine | Gln | Q | Q | B | B |
Glycine | Gly | G | G | G | G |
Histidine | His | H | H | H | H |
Isoleucine | Ile | I | I | I | I |
Leucine | leu | L | L | I | J |
Lysine | Lys | K | K | K | K |
Methionine | Met | M | M | I | J |
Phenylalanine | Phe | F | F | F | F |
Proline | Pro | P | P | P | P |
Serine | Ser | S | S | S | A |
Threonine | Thr | T | T | S | A |
Tryptophan | Trp | W | W | F | F |
Valine | Val | V | V | I | I |
Selenocysteine | Sec | U | C | C | C |
Pyrrolysine | Pyl | O | K | K | K |
Asparagine or aspartic acid | Asx | B | D | B | B |
Glutamine or glutamic acid | Glx | Z | E | B | B |
Leucine or Isoleucine | Xle | J | L | I | J |
Unknown | Xaa | X | S | S | A |
Stop Codon | N/A | * | W | F | F |
All amino acid alphabets provide static value members (like an enum) for all amino acids in the form of the one-letter representation. As shown above, alphabets smaller than 27 internally represent multiple amino acids as one.
For most cases it is highly recommended to use bio::alphabet::aa27 as bio::alphabet::aa20 provides no benefits in regard to space consumption (both need 5bits). Use it only when you know you need to interface with other software of formats that only support the canonical set.
|
strong |
Genetic codes used for translation of nucleotides into amino acids.
The numeric values of the enums correspond to the genbank transl_table values (see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi).
|
strong |
Specialisation values for single and multiple translation frames.
|
constexprnoexcept |
Translate one nucleotide triplet into single amino acid (single nucleotide interface).
nucl_type | The type of input nucleotides. |
[in] | n1 | First nucleotide in triplet. |
[in] | n2 | Second nucleotide in triplet. |
[in] | n3 | Third nucleotide in triplet. |
Translates single nucleotides into amino acid according to given genetic code.
Constant.
No-throw guarantee.
|
inlineconstexpr |
A trait that indicates whether a type shall model bio::alphabet::aminoacid.
t | Type of the argument. |
This is an auxiliary trait that is checked by bio::alphabet::aminoacid to verify that a type is an amino acid. This trait should never be read from, instead use bio::alphabet::aminoacid. However, user-defined alphabets that want to model bio::alphabet::aminoacid need to specialise it.
This is a customisation point (see Customisation). To change the default behaviour for your own alphabet, do one of the following:
Note, that the concept check removes cvref-qualification from the type before evaluating this trait, so you only need to specialise it for t
and not for t &
et cetera.