model Module

Define evolutionary model objects.

class model.Model(model_type, parameters=None, **kwargs)

This class defines evolutionary model objects. All evolutionary models contain information about the substitution process (rate matrix) and information about rate heterogeneity. Note that, in cases of rate heterogeneity, non-dN/dS models use a single rate matrix and model heterogeneity using discrete scaling factors and associated probabilities. Alternatively, rate heterogeneity in dN/dS models is implemented using a set of matrices with distinct dN/dS values, and each matrix has an associated probability.

The Model class will construct an evolutionary model object which will be used to evolve sequence data. Instantiation requires a single positional argument (but a second one is recommended, read on!):

  1. model_type is type of model (matrix) that is being used. These matrices are described explicitly in the matrix_builder module. Options include the following:

    model_type Notes
    nucleotide Arbitrary GTR
    JTT Jones, Taylor, and Thornton 1994 (amino acids)
    WAG Whelan and Goldman 2002 (amino acids)
    LG Le and Gascuel 2008 (amino acids)
    MTMAM Yang, Nielsen, and Hasagawa 1998 (amino acids)
    MTREV24 Adachi and Hasegawa 1996 (amino acids)
    DAYHOFF Dayhoff, Schwartz, and Orcutt 1978 (amino acids)
    AB Mirsky, Kazandjian, and Anisimova 2015 (amino acids)
    GY Goldman and Yang 1994 (modified), Nielsen and Yang 1998
    MG Muse and Gaut 1994
    codon Defaults to GY-style model
    ECM Kosiol et al. 2007
    mutsel Halpern and Bruno 2008 (may also be used for nucleotides)

To use your own rate matrix (which you must create on your own), enter “custom” for the model_type argument, and provide the custom matrix (numpy array or list of lists) in the parameters dictionary with the key “matrix”. Please note that pyvolve stores nucleotides, amino acids, and codons in alphabetical order of their abbreviations: * Nucleotides: A, C, G, T * Amino acids: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y * Codons: AAA, AAC, AAG, AAT, ACA, ... TTG, TTT [note that stop codons should be excluded]

If you wish to evolve custom states (neither nucleotide, amino acids, nor codons), for instance to evolve characters, also include the key “code” in the parameters dictionary. The associated value should be a list of strings, e.g. [“0”, “1”, “2”], and the length of this list should be the same as a dimension of the square custom matrix provided. Note that this argument is not required if wish to evolve nucleotides, amino-acids, and/or codons.

Please be careful here - Pyvolve takes your matrix (mostly) at face-value (provided it has proper dimensions and rows sum to 0). In particular, the matrix will not be scaled!!!

A second positional argument, parameters may additionally be specified. This argument should be a dictionary of parameters pertaining to substitution process. Each individual evolutionary model will have its own parameters. Note that if this argument is not provided, default parameters for your selected model will be assigned. Note that this argument is required for mechanistic codon (dN/dS) models, as this rate ratio must be assigned!

Optional keyword arguments include,
  1. name, the name for a Model object. Names are not needed in cases of branch homogeneity, but when there is branch heterogeneity, names are required to map the model to the model flags provided in the phylogeny.
  2. rate_factors, for specifying rate heterogeneity in nucleotide or amino acid models. This argument should be a list/numpy array of scalar factors for rate heterogeneity. Default: rate homogeneity.
  3. rate_probs, for specifying rate heterogeneity probabilities in nucleotide, amino acid, or codon models. This argument should be a list/numpy array of probabilities (which sum to 1!) for each rate category. Default: equal.
  4. alpha, for specifying rate heterogeneity in nucleotide or amino acid models if gamma-distributed heterogeneity is desired. The alpha shape parameter which should be used to draw rates from a discrete gamma distribution.
  5. num_categories, for specifying the number of gamma categories to draw for rate heterogeneity in nucleotide or amino acid models. Should be used in conjunction with the “alpha” parameter. Default: 4.
  6. pinv, for specifying a proportion of invariant sites when gamma heterogeneity is used. When specifying custom rate heterogeneity, a proportion of invariant sites can be specified simply with a rate factor of 0.
  7. save_custom_frequencies, for specifying a file name in which to save the state frequencies from a custom matrix. Pyvolve automatically computes the proper frequencies and will save them to a file named “custom_matrix_frequencies.txt”, and you can use this argument to change the file name. Note that this argument is really only relevant for custom models.
num_classes()

Return the number of rate classes associated with a given model.

assign_name(name)

Assign name to a Model instance. In cases of branch/temporal homogeneity, names are unneeded. However, in cases of branch heterogeneity, each model must be named. Names are used to map to model flags given in the phylogeny. NOTE that name can also be assigned as a keyword argument when initializing a Model object.

is_hetcodon_model()

Return True if the model is a heterogeneous codon model and return False otherwise.

extract_mutation_rates()

Convenience function for returning the mutation rate dictionary to users.

extract_rate_matrix()

Convenience function for returning the rate matrix/matrices users.

extract_state_freqs()

Convenience function for returning the stationary frequencies.

extract_parameters()

Convenience function for returning the params dictionary, which contains all model parameters used to construct the rate matrix (except for nucleotide/amino-acid rate heterogeneity).