Provides the amino acid alphabets and functionality for translation from nucleotide. More...

Collaboration diagram for Aminoacid:

Classes
class	bio::alphabet::aa10li
	The reduced Li amino acid alphabet.. More...

class	bio::alphabet::aa10murphy
	The reduced Murphy amino acid alphabet.. More...

class	bio::alphabet::aa20
	The canonical amino acid alphabet.. More...

class	bio::alphabet::aa27
	The twenty-seven letter amino acid alphabet.. More...

class	bio::alphabet::aminoacid_base< derived_type, size >
	A CRTP-base that refines bio::alphabet::base and is used by the amino acids. More...

struct	bio::alphabet::aminoacid_empty_base
	This is an empty base class that can be inherited by types that shall model bio::alphabet::aminoacid. More...

Concepts
concept	bio::alphabet::aminoacid
	A concept that indicates whether an alphabet represents amino acids.

Enumerations
enum struct	bio::alphabet::genetic_code : uint8_t { CANONICAL = 1 }
	Genetic codes used for translation of nucleotides into amino acids. More...

enum class	bio::alphabet::translation_frames : uint8_t { bio::alphabet::translation_frames::FWD_FRAME_0 = 1 , bio::alphabet::translation_frames::FWD_FRAME_1 = 1 << 1 , bio::alphabet::translation_frames::FWD_FRAME_2 = 1 << 2 , bio::alphabet::translation_frames::REV_FRAME_0 = 1 << 3 , bio::alphabet::translation_frames::REV_FRAME_1 = 1 << 4 , bio::alphabet::translation_frames::REV_FRAME_2 = 1 << 5 , bio::alphabet::translation_frames::FWD_REV_0 = FWD_FRAME_0 \| REV_FRAME_0 , bio::alphabet::translation_frames::FWD_REV_1 = FWD_FRAME_1 \| REV_FRAME_1 , bio::alphabet::translation_frames::FWD_REV_2 = FWD_FRAME_2 \| REV_FRAME_2 , bio::alphabet::translation_frames::FWD = FWD_FRAME_0 \| FWD_FRAME_1 \| FWD_FRAME_2 , bio::alphabet::translation_frames::REV = REV_FRAME_0 \| REV_FRAME_1 \| REV_FRAME_2 , bio::alphabet::translation_frames::SIX_FRAME = FWD \| REV }
	Specialisation values for single and multiple translation frames. More...

Functions
template<genetic_code gc = genetic_code::CANONICAL, nucleotide nucl_type>
constexpr aa27	bio::alphabet::translate_triplet (nucl_type const &n1, nucl_type const &n2, nucl_type const &n3) noexcept
	Translate one nucleotide triplet into single amino acid (single nucleotide interface).

Variables
template<typename t >
constexpr bool	bio::alphabet::custom::enable_aminoacid = std::derived_from<t, aminoacid_empty_base>
	A trait that indicates whether a type shall model bio::alphabet::aminoacid.

Detailed Description

Provides the amino acid alphabets and functionality for translation from nucleotide.

Introduction

Amino acid sequences are an important part of bioinformatic data processing and used by many applications and while it is possible to represent them in a regular std::string, it makes sense to have specialised data structures in most cases. This sub-module offers the 27 letter aminoacid alphabet as well as three reduced versions that can be used with regular container and ranges. The 27 letter amino acid alphabet contains the 20 canonical amino acids, 2 additional proteinogenic amino acids (Pyrrolysine and Selenocysteine) and a termination letter (*). Additionally 4 wildcard letters are offered which allow a more generic usage for example in case of ambiguous amino acids (e.g. J which means either Isoleucine or Leucine). See also https://en.wikipedia.org/wiki/Amino_acid for more information about the amino acid alphabet.

Conversions

Amino acid name	Three letter code	One letter code	Remapped in bio::alphabet::aa20	Remapped in bio::alphabet::aa10murphy	Remapped in bio::alphabet::aa10li
Alanine	Ala	A	A	A	A
Arginine	Arg	R	R	K	K
Asparagine	Asn	N	N	B	H
Aspartic acid	Asp	D	D	B	B
Cysteine	Cys	C	C	C	C
Tyrosine	Tyr	Y	Y	F	F
Glutamic acid	Glu	E	E	B	B
Glutamine	Gln	Q	Q	B	B
Glycine	Gly	G	G	G	G
Histidine	His	H	H	H	H
Isoleucine	Ile	I	I	I	I
Leucine	leu	L	L	I	J
Lysine	Lys	K	K	K	K
Methionine	Met	M	M	I	J
Phenylalanine	Phe	F	F	F	F
Proline	Pro	P	P	P	P
Serine	Ser	S	S	S	A
Threonine	Thr	T	T	S	A
Tryptophan	Trp	W	W	F	F
Valine	Val	V	V	I	I
Selenocysteine	Sec	U	C	C	C
Pyrrolysine	Pyl	O	K	K	K
Asparagine or aspartic acid	Asx	B	D	B	B
Glutamine or glutamic acid	Glx	Z	E	B	B
Leucine or Isoleucine	Xle	J	L	I	J
Unknown	Xaa	X	S	S	A
Stop Codon	N/A	*	W	F	F

All amino acid alphabets provide static value members (like an enum) for all amino acids in the form of the one-letter representation. As shown above, alphabets smaller than 27 internally represent multiple amino acids as one.
For most cases it is highly recommended to use bio::alphabet::aa27 as bio::alphabet::aa20 provides no benefits in regard to space consumption (both need 5bits). Use it only when you know you need to interface with other software of formats that only support the canonical set.

Enumeration Type Documentation

◆ genetic_code

enum struct bio::alphabet::genetic_code : uint8_t

strong

Genetic codes used for translation of nucleotides into amino acids.

The numeric values of the enums correspond to the genbank transl_table values (see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi).

◆ translation_frames

enum class bio::alphabet::translation_frames : uint8_t

strong

Specialisation values for single and multiple translation frames.

Enumerator
FWD_FRAME_0	The first forward frame starting at position 0.
FWD_FRAME_1	The second forward frame starting at position 1.
FWD_FRAME_2	The third forward frame starting at position 2.
REV_FRAME_0	The first reverse frame starting at position 0.
REV_FRAME_1	The second reverse frame starting at position 1.
REV_FRAME_2	The third reverse frame starting at position 2.
FWD_REV_0	The first forward and first reverse frame.
FWD_REV_1	The second forward and second reverse frame.
FWD_REV_2	The first third and third reverse frame.
FWD	All forward frames.
REV	All reverse frames.
SIX_FRAME	All frames.

Function Documentation

◆ translate_triplet()

template<genetic_code gc = genetic_code::CANONICAL, nucleotide nucl_type>

constexpr aa27 bio::alphabet::translate_triplet	(	nucl_type const &	n1,
		nucl_type const &	n2,
		nucl_type const &	n3
	)

constexprnoexcept

Translate one nucleotide triplet into single amino acid (single nucleotide interface).

Template Parameters

nucl_type The type of input nucleotides.

Parameters

[in]	n1	First nucleotide in triplet.
[in]	n2	Second nucleotide in triplet.
[in]	n3	Third nucleotide in triplet.

Translates single nucleotides into amino acid according to given genetic code.

Complexity

Constant.

Exceptions

No-throw guarantee.

Variable Documentation

◆ enable_aminoacid

template<typename t >

constexpr bool bio::alphabet::custom::enable_aminoacid = std::derived_from<t, aminoacid_empty_base>

inlineconstexpr

A trait that indicates whether a type shall model bio::alphabet::aminoacid.

Template Parameters

t	Type of the argument.

This is an auxiliary trait that is checked by bio::alphabet::aminoacid to verify that a type is an amino acid. This trait should never be read from, instead use bio::alphabet::aminoacid. However, user-defined alphabets that want to model bio::alphabet::aminoacid need to specialise it.

Customisation point

This is a customisation point (see Customisation). To change the default behaviour for your own alphabet, do one of the following:

Inherit from bio::alphabet::aminoacid_empty_base; or
Specialise this trait for your type.

Note, that the concept check removes cvref-qualification from the type before evaluating this trait, so you only need to specialise it for t and not for t & et cetera.

Example

#include <bio/alphabet/aminoacid/concept.hpp>
 
namespace your_namespace
{
 
// your own aminoacid definition
struct your_aa : bio::alphabet::aminoacid_empty_base
{
    //...
};
 
} // namespace your_namespace
 
/***** OR *****/
 
namespace your_namespace2
{
 
// your own aminoacid definition
struct your_aa
{
    //...
};
 
} // namespace your_namespace2
 
template <>
inline constexpr bool bio::alphabet::custom::enable_aminoacid<your_namespace2::your_aa> = true;
 

Classes

Concepts

Enumerations

Functions

Variables