BioC++ core-0.7.0
The Modern C++ libraries for Bioinformatics.
 
Loading...
Searching...
No Matches
Aminoacid

Provides the amino acid alphabets and functionality for translation from nucleotide. More...

+ Collaboration diagram for Aminoacid:

Classes

class  bio::alphabet::aa10li
 The reduced Li amino acid alphabet.. More...
 
class  bio::alphabet::aa10murphy
 The reduced Murphy amino acid alphabet.. More...
 
class  bio::alphabet::aa20
 The canonical amino acid alphabet.. More...
 
class  bio::alphabet::aa27
 The twenty-seven letter amino acid alphabet.. More...
 
class  bio::alphabet::aminoacid_base< derived_type, size >
 A CRTP-base that refines bio::alphabet::base and is used by the amino acids. More...
 
struct  bio::alphabet::aminoacid_empty_base
 This is an empty base class that can be inherited by types that shall model bio::alphabet::aminoacid. More...
 

Concepts

concept  bio::alphabet::aminoacid
 A concept that indicates whether an alphabet represents amino acids.
 

Enumerations

enum struct  bio::alphabet::genetic_code : uint8_t { CANONICAL = 1 }
 Genetic codes used for translation of nucleotides into amino acids. More...
 
enum class  bio::alphabet::translation_frames : uint8_t {
  bio::alphabet::translation_frames::FWD_FRAME_0 = 1 , bio::alphabet::translation_frames::FWD_FRAME_1 = 1 << 1 , bio::alphabet::translation_frames::FWD_FRAME_2 = 1 << 2 , bio::alphabet::translation_frames::REV_FRAME_0 = 1 << 3 ,
  bio::alphabet::translation_frames::REV_FRAME_1 = 1 << 4 , bio::alphabet::translation_frames::REV_FRAME_2 = 1 << 5 , bio::alphabet::translation_frames::FWD_REV_0 = FWD_FRAME_0 | REV_FRAME_0 , bio::alphabet::translation_frames::FWD_REV_1 = FWD_FRAME_1 | REV_FRAME_1 ,
  bio::alphabet::translation_frames::FWD_REV_2 = FWD_FRAME_2 | REV_FRAME_2 , bio::alphabet::translation_frames::FWD = FWD_FRAME_0 | FWD_FRAME_1 | FWD_FRAME_2 , bio::alphabet::translation_frames::REV = REV_FRAME_0 | REV_FRAME_1 | REV_FRAME_2 , bio::alphabet::translation_frames::SIX_FRAME = FWD | REV
}
 Specialisation values for single and multiple translation frames. More...
 

Functions

template<genetic_code gc = genetic_code::CANONICAL, nucleotide nucl_type>
constexpr aa27 bio::alphabet::translate_triplet (nucl_type const &n1, nucl_type const &n2, nucl_type const &n3) noexcept
 Translate one nucleotide triplet into single amino acid (single nucleotide interface).
 

Variables

template<typename t >
constexpr bool bio::alphabet::custom::enable_aminoacid = std::derived_from<t, aminoacid_empty_base>
 A trait that indicates whether a type shall model bio::alphabet::aminoacid.
 

Detailed Description

Provides the amino acid alphabets and functionality for translation from nucleotide.

Introduction

Amino acid sequences are an important part of bioinformatic data processing and used by many applications and while it is possible to represent them in a regular std::string, it makes sense to have specialised data structures in most cases. This sub-module offers the 27 letter aminoacid alphabet as well as three reduced versions that can be used with regular container and ranges. The 27 letter amino acid alphabet contains the 20 canonical amino acids, 2 additional proteinogenic amino acids (Pyrrolysine and Selenocysteine) and a termination letter (*). Additionally 4 wildcard letters are offered which allow a more generic usage for example in case of ambiguous amino acids (e.g. J which means either Isoleucine or Leucine). See also https://en.wikipedia.org/wiki/Amino_acid for more information about the amino acid alphabet.

Conversions

Amino acid name Three letter code One letter code Remapped in
bio::alphabet::aa20
Remapped in
bio::alphabet::aa10murphy
Remapped in
bio::alphabet::aa10li
Alanine Ala A A A A
Arginine Arg R R K K
Asparagine Asn N N B H
Aspartic acid Asp D D B B
Cysteine Cys C C C C
Tyrosine Tyr Y Y F F
Glutamic acid Glu E E B B
Glutamine Gln Q Q B B
Glycine Gly G G G G
Histidine His H H H H
Isoleucine Ile I I I I
Leucine leu L L I J
Lysine Lys K K K K
Methionine Met M M I J
Phenylalanine Phe F F F F
Proline Pro P P P P
Serine Ser S S S A
Threonine Thr T T S A
Tryptophan Trp W W F F
Valine Val V V I I
Selenocysteine Sec U C C C
Pyrrolysine Pyl O K K K
Asparagine or aspartic acid Asx B D B B
Glutamine or glutamic acid Glx Z E B B
Leucine or Isoleucine Xle J L I J
Unknown Xaa X S S A
Stop Codon N/A * W F F

All amino acid alphabets provide static value members (like an enum) for all amino acids in the form of the one-letter representation. As shown above, alphabets smaller than 27 internally represent multiple amino acids as one.
For most cases it is highly recommended to use bio::alphabet::aa27 as bio::alphabet::aa20 provides no benefits in regard to space consumption (both need 5bits). Use it only when you know you need to interface with other software of formats that only support the canonical set.

Enumeration Type Documentation

◆ genetic_code

enum struct bio::alphabet::genetic_code : uint8_t
strong

Genetic codes used for translation of nucleotides into amino acids.

The numeric values of the enums correspond to the genbank transl_table values (see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi).

◆ translation_frames

enum class bio::alphabet::translation_frames : uint8_t
strong

Specialisation values for single and multiple translation frames.

Enumerator
FWD_FRAME_0 

The first forward frame starting at position 0.

FWD_FRAME_1 

The second forward frame starting at position 1.

FWD_FRAME_2 

The third forward frame starting at position 2.

REV_FRAME_0 

The first reverse frame starting at position 0.

REV_FRAME_1 

The second reverse frame starting at position 1.

REV_FRAME_2 

The third reverse frame starting at position 2.

FWD_REV_0 

The first forward and first reverse frame.

FWD_REV_1 

The second forward and second reverse frame.

FWD_REV_2 

The first third and third reverse frame.

FWD 

All forward frames.

REV 

All reverse frames.

SIX_FRAME 

All frames.

Function Documentation

◆ translate_triplet()

template<genetic_code gc = genetic_code::CANONICAL, nucleotide nucl_type>
constexpr aa27 bio::alphabet::translate_triplet ( nucl_type const &  n1,
nucl_type const &  n2,
nucl_type const &  n3 
)
constexprnoexcept

Translate one nucleotide triplet into single amino acid (single nucleotide interface).

Template Parameters
nucl_typeThe type of input nucleotides.
Parameters
[in]n1First nucleotide in triplet.
[in]n2Second nucleotide in triplet.
[in]n3Third nucleotide in triplet.

Translates single nucleotides into amino acid according to given genetic code.

Complexity

Constant.

Exceptions

No-throw guarantee.

Variable Documentation

◆ enable_aminoacid

template<typename t >
constexpr bool bio::alphabet::custom::enable_aminoacid = std::derived_from<t, aminoacid_empty_base>
inlineconstexpr

A trait that indicates whether a type shall model bio::alphabet::aminoacid.

Template Parameters
tType of the argument.

This is an auxiliary trait that is checked by bio::alphabet::aminoacid to verify that a type is an amino acid. This trait should never be read from, instead use bio::alphabet::aminoacid. However, user-defined alphabets that want to model bio::alphabet::aminoacid need to specialise it.

Customisation point

This is a customisation point (see Customisation). To change the default behaviour for your own alphabet, do one of the following:

  1. Inherit from bio::alphabet::aminoacid_empty_base; or
  2. Specialise this trait for your type.

Note, that the concept check removes cvref-qualification from the type before evaluating this trait, so you only need to specialise it for t and not for t & et cetera.

Example

namespace your_namespace
{
// your own aminoacid definition
{
//...
};
} // namespace your_namespace
/***** OR *****/
namespace your_namespace2
{
// your own aminoacid definition
struct your_aa
{
//...
};
} // namespace your_namespace2
template <>
inline constexpr bool bio::alphabet::custom::enable_aminoacid<your_namespace2::your_aa> = true;
Provides bio::alphabet::aminoacid.
This is an empty base class that can be inherited by types that shall model bio::alphabet::aminoacid.
Definition: concept.hpp:34