state_freqs Module

This module will compute a vector of stationary frequencies.

class state_freqs.StateFrequencies(by, **kwargs)

Bases: object

Parent class for stationary (state, equilibrium, etc.) frequency calculations.
Child classes include the following:
  1. EqualFrequencies (default)
    • Sets frequencies as equal (i.e. 1/4 for all nucleotides if by=’nucleotide’, and so on.)
  2. RandomFrequencies
    • Computes (semi-)random frequency values for a given alphabet.
  3. CustomFrequencies
    • Computes frequencies based on a user-provided dictionary of frequencies.
  4. ReadFrequencies
    • Computes frequencies from a sequence file. Contains an option to select specific columns from sequence file only, but this requires that the file is an alignemnt.

A single positional argument is required for all child classes. This argument can take on three values: “nucleotide”, “amino_acid”, or “codon,” and it indicates how frequencies should be computed. These frequencies need not be the ultimate frequencies you want to compute. For example, it is possible to compute stationary frequencies in amino-acid space (via this argument) but ultimately return codon frequencies (using argument “type” in the .compute_frequencies() method, described below).

compute_frequencies(**kwargs)

Calculate and return a vector of state frequencies. At this stage, the StateFrequencies object must already have been initialized with the keyword argument by = <amino_acid/codon/nucleotide>.

Optional keyword arguments include,

  1. type ( = “nucleotide”, “amino_acid”, or “codon”) represents the type of final frequencies to return. If not specified, the alphabet of returned frequencies will be that specified with the by keyword.
  2. savefile is a file name to which final frequencies may be saved. Output frequencies will be ordered alphabetically, i.e. A, C, G, T for nucleotides; A, C, D, E, etc.for amino acids; and AAA, AAC, AAG, AAT, ACA, etc. for codons.
class state_freqs.EqualFrequencies(by, **kwargs)

Bases: state_freqs.StateFrequencies

This class may be used to compute equal state frequencies (amino = 1/20, codon = 1/61, nucleotide = 1/4).

Required arguments include,

  1. by. See parent class StateFrequencies for details.

Optional arguments include,

  1. restrict, a list (in which each element is a string) specifying which states should have non-zero frequencies. Default: all.
Examples:
>>> # Return 1/20 amino acid frequencies in the variable `frequencies`
>>> f = EqualFrequencies("amino_acid")()
>>> frequencies = f.contruct_frequencies()

>>> # Compute equal codon frequencies and convert to amino-acid space. `frequencies` will contain amino-acid frequencies.
>>> f = EqualFrequencies("codon")
>>> frequencies = f.compute_frequencies(type = "amino_acid")

>>> # Compute equal amino acid frequencies, but allowing only certain amino acids to have non-zero frequencies
>>> f = EqualFrequencies("amino_acid", restrict = ["A", "G", "P", "T", "W"])
>>> frequencies = f.compute_frequencies()
class state_freqs.RandomFrequencies(by, **kwargs)

Bases: state_freqs.StateFrequencies

This class may be used to compute “semi-random” state frequencies. The resulting frequency distributions are not truly random, but are instead virtually flat distributions with some noise.

Required arguments include,

  1. by. See parent class StateFrequencies for details.

Optional arguments include,

  1. restrict, a list (in which each element is a string) specifying which states should have non-zero frequencies. Default: all.
Examples:
>>> # Return random amino acid frequencies in `frequencies` variable
>>> f = RandomFrequencies("amino_acid")
>>> frequencies = f.compute_frequencies()


>>> # Compute random amino acid frequencies, but allowing only certain amino acids to have non-zero frequencies
>>> f = RandomFrequencies("amino_acid", restrict = ["A", "G", "P", "T", "W"])
>>> frequencies = f.compute_frequencies()
class state_freqs.CustomFrequencies(by, **kwargs)

Bases: state_freqs.StateFrequencies

This class may be used to compute frequencies directly from a user-provided python dictionary of frequencies.

Required keyword arguments include,

  1. by. See parent class StateFrequencies for details.
  2. freq_dict, a dictionary of frequencies, in which keys are states (e.g. a codon key would be ‘ATC’, an amino acid key would be ‘W’, and a nucleotide key would be ‘T’), and values are float frequencies which sum to 1. Note that the keys in this dictionary must correspond to the by keyword provided. Any states not included in this dictionary are assumed to have an equal frequency. Hence, the dictionary values MUST sum to 1, and all states not included in this dictionary will be given a 0 frequency.
Examples:
>>> # custom random amino acid frequencies
>>> f = CustomFrequencies("amino_acid", freq_dict = {'A':0.5, 'C':0.1, 'D':0.2, 'E':0.3})
>>> frequencies = f.compute_frequencies()

>>> # use amino-acid information to get custom codon frequencies (note: synonymous codons are assigned equal frequencies!)
>>> f = CustomFrequencies("amino_acid", freq_dict = {'F':0.5, 'W':0.1, 'D':0.2, 'E':0.3})
>>> frequencies = f.compute_frequencies(type = "codon")

>>> # custom nucleotide frequencies with lots of GC bias
>>> f = CustomFrequencies("nucleotide", freq_dict = {'A':0.1, 'C':0.45, 'T':0.05, 'G': 0.4})
>>> frequencies = f.compute_frequencies()
class state_freqs.ReadFrequencies(by, **kwargs)

Bases: state_freqs.StateFrequencies

This class may be used to compute frequencies directly from a specified sequence file. Frequencies may be computed globally (using entire file), or based on specific columns (i.e. site-specific frequencies), provided the file contains a sequence alignment.

Required positional include,
  1. by. See parent class StateFrequencies for details.
Required keyword arguments include,
  1. file is the file containing sequences from which frequencies will be computed. By default, this file is assumed to be in FASTA format, although you can specify a different format with the optional argument format
Optional keyword arguments include,
  1. format is the sequence file format (case-insensitive). Sequence files are parsed using Biopython, so any format they accept is accepted here (e.g. fasta, phylip, phylip-relaxed, nexus, clustal...)
  2. columns is a list of integers giving the column(s) which should be considered in frequency calculations. This list should be indexed from 1. If this argument is not provided, all positions in sequence file will be considered. Note that this argument is only possible for alignments!
Examples:
>>> # Compute amino acid frequencies globally from a sequence file
>>> f = ReadFrequencies("amino_acid", file = "my_sequence_file.fasta")
>>> frequencies = f.compute_frequencies()

>>> # Compute amino acid frequencies globally from a sequence file, and then convert to codon frequencies (note: synonymous codons are assigned the same fitness!)
>>> f = ReadFrequencies("amino_acid", file = "my_sequence_file.fasta")
>>> frequencies = f.compute_frequencies(type = "codon")

>>> # Compute nucleotide frequencies from a specific range of columns (1-10, inclusive) from a nucleotide alignment file 
>>> f = ReadFrequencies("nucleotide", file = "my_nucleotide_alignment.phy", format = "phylip", columns = range(1,11))
>>> frequencies = f.compute_frequencies()
class state_freqs.EmpiricalModelFrequencies(model)

This class assigns state frequencies from a specified amino acid or codon empirical model (e.g. JTT, WAG, ECM...). The default frequencies (i.e. those given in each model’s original paper) for empirical models.

The currently supported models include,
  1. Amino acid: JTT, WAG, LG
  2. Codon: ECM(un)rest
Required positional arguments include,
  1. model is empirical model of choice (case-insensitive). This argument should be specified as any of the following: JTT, WAG, LG, ECMrest, ECMunrest.
Examples:
>>> # Assign WAG frequencies
>>> f = EmpiricalModelFrequencies("WAG")
>>> frequencies = f.compute_frequencies()

>>> # Assign ECMrest frequencies (ECM "restricted" model, in which only single nucleotide changes occur instantaneously)
>>> my_freqs = EmpiricalModelFrequencies("ecmrest")
>>> frequencies = f.compute_frequencies()
compute_frequencies()

Function to return state frequencies. No arguments are needed.