`extractor` Module¶

Parse (Extract!) JSON output from a standard HyPhy analysis.

class extractor.JSONFields¶

Bases: object

This class defines the strings of relevant JSON keys. Note that these strings correspond precisely to those in the HyPhy distribution. See file: TemplateBatchFiles/libv3/all-terms.bf in the terms.json namespace.

class extractor.AnalysisNames¶

Bases: object

This class defines the names of analyses which we can parse.

class extractor.Genetics¶

Bases: object

Class to define codes used. Primarily (only?) used to extract frequencies as dictionaries.

class extractor.Extractor(content)¶

Bases: object

This class parses JSON output and contains a variety of methods for pulling out various pieces of information.

Initialize a Extractor instance.

Required arguments:

content, The input content to parse. Two types of input may be provided here, EITHER:
- The path to a JSON file to parse, provided as a string
- A phyphy Analysis (i.e. BUSTED, SLAC, FEL, etc.) object which has been used to execute a HyPhy analysis through the phyphy interface

Examples:

>>> ### Define an Extractor instance with a JSON file
>>> e = Extractor("/path/to/json.json")

>>> ### Define an Extractor instance with an Analysis object 
>>> ### First, define and run an analysis (FEL, for example)
>>> myfel = FEL(data = "/path/to/data.fna")
>>> myfel.run_analysis()
>>> e = Extractor(myfel)

extract_number_sequences()¶

Return the number of sequences in the input dataset.

No arguments are required.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_number_sequences()
10

extract_number_sites()¶

Return the number of sites in the input dataset. Note for codon analyses this will be the number of codon sites (i.e. length/3)

No arguments are required.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_number_sites()
187

extract_input_file()¶

Return the name (including the path to) the input dataset. If alignment and tree were provided separately, this will return the alignment file name.

No arguments are required.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_input_file()
"/Users/sjspielman/evogenomics_hyphy/datasets/CD2.fna"

extract_partition_count()¶

Return the number of partitions in the analysis.

No arguments are required.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_partition_count()
1

extract_input_tree(partition=None, original_names=False, node_labels=False)¶

Return the inputted newick phylogeny, whose nodes have been labeled by HyPhy (if node labels were not present). For analyses with a single partition OR for a request for a specific partition’s tree, returns a string. For analyses with multiple partitions (and hence multiple trees), returns a dictionary of trees.

Optional keyword arguments:

partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
original_names, Boolean (Default: False) if should update with original names before returning

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_input_tree()
((((Pig:0.147969,Cow:0.21343)Node3:0.085099,Horse:0.165787,Cat:0.264806)Node2:0.058611,((RhMonkey:0.002015,Baboon:0.003108)Node9:0.022733,(Human:0.004349,Chimp:0.000799)Node12:0.011873)Node8:0.101856)Node1:0.340802,Rat:0.050958,Mouse:0.09795);

>>> ### Use original names
>>> e.extract_input_tree(original_names = True)
((((Pig~gy:0.147969,Cow:0.21343)Node3:0.085099,Horse:0.165787,Cat:0.264806)Node2:0.058611,((RhMonkey:0.002015,Baboon:0.003108)Node9:0.022733,(Human:0.004349,Chimp:0.000799)Node12:0.011873)Node8:0.101856)Node1:0.340802,Rat:0.050958,Mouse:0.09795);

>>> e = Extractor("/path/to/FEL_mulitpart.json") ## Define a FEL Extractor, from an analysis with multiple partitions, for example
>>> e.extract_input_tree() ## All partitions
{0: '((((AF231119:0.00599498,AF231117:0.00602763)Node3:0.00187262,(AF186242:0.00194569,AF186243:0.0059545)Node6:1e-10)Node2:0.00395465,(AF186241:0.00398948,(AF231116:1e-10,AF187824:0.00402724)Node11:0.00395692)Node9:0.00200337)Node1:0.00392717,AF082576:0.00193519,(((AF231118:0.0639035,AF234767:0.143569)Node17:0.000456671,(AF231115:0.00201331,AF231114:0.00592754)Node20:0.00592206)Node16:1e-10,AF231113:0.00395832)Node15:1e-10);', 1: '(((((AF231119:0.00307476,AF231115:1e-10)Node4:1e-10,((AF082576:0.00309362,AF231113:1e-10)Node8:0.0031872,AF231114:0.013292)Node7:0.0030793)Node3:0.00310106,(AF231117:0.00396728,AF231118:0.0665375)Node12:0.00249394)Node2:0.00637034,(AF186242:1e-10,(AF186243:1e-10,AF234767:0.0278842)Node17:0.00311418)Node15:0.00307177)Node1:1e-10,(AF186241:0.00306598,AF231116:1e-10)Node20:1e-10,AF187824:0.00632863);', 2: '(AF231119:0.00208218,AF231117:1e-10,((AF082576:1e-10,AF231113:0.00433775)Node4:0.00208919,((((AF186242:0.00216055,AF186243:0.00437974)Node10:0.00214339,((AF186241:1e-10,AF187824:0.00215048)Node14:0.00214528,AF231116:1e-10)Node13:1e-10)Node9:0.0112142,(AF231118:0.0244917,AF234767:0.0835686)Node18:0.0280857)Node8:0.0021073,(AF231115:1e-10,AF231114:0.00868934)Node21:0.00639388)Node7:1e-10)Node3:1e-10);', 3: '((AF231119:0.000939531,AF082576:0.00182425)Node1:1e-10,(((AF231117:0.00499646,(AF231116:1e-10,(AF187824:0.00453171,AF231113:0.0180629)Node10:0.00923609)Node8:0.00581275)Node6:0.00383552,(AF231115:1e-10,AF231114:0.0100664)Node13:0.00401088)Node5:0.00102177,((AF186242:0.00171504,AF186243:0.00438135)Node17:0.00180763,AF186241:0.0044495)Node16:0.00408249)Node4:0.000197413,(AF231118:0.032062,AF234767:0.0409599)Node21:0.0228604);'}

>>> e.extract_input_tree(partition = 1) ## Single specified partitions
(((((AF231119:0.00307476,AF231115:1e-10)Node4:1e-10,((AF082576:0.00309362,AF231113:1e-10)Node8:0.0031872,AF231114:0.013292)Node7:0.0030793)Node3:0.00310106,(AF231117:0.00396728,AF231118:0.0665375)Node12:0.00249394)Node2:0.00637034,(AF186242:1e-10,(AF186243:1e-10,AF234767:0.0278842)Node17:0.00311418)Node15:0.00307177)Node1:1e-10,(AF186241:0.00306598,AF231116:1e-10)Node20:1e-10,AF187824:0.00632863);

reveal_fitted_models()¶

Return a list of all model names in the fits JSON field.

No arguments are required.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.reveal_fitted_models()
['Nucleotide GTR', 'Global MG94xREV]

>>> e = Extractor("/path/to/aBSREL.json") ## Define an aBSREL Extractor, for example
>>> e.reveal_fitted_models()
['Nucleotide GTR', 'Full adaptive model', 'Baseline MG94xREV']     

extract_model_component(model_name, component)¶

Return a model component for a given model name found in the fits JSON field.

Required arguments:

model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()
component, the component of the model to return.

Recommended use: Note there are a variety of convenience methods which wrap this function to extract all components (note that not all analyses will have all of these components):

.extract_model_logl(model_name) returns the log likelihood of a given model fit
.extract_model_estimated_parameters(model_name) returns the number of estimated parameters in a given model fit
.extract_model_aicc(model_name) returns the small-sample AIC (AIC-c) for a given model fit
.extract_model_rate_distributions(model_name) returns rate distributions for a given model fit
.extract_model_frequencies(model_name) returns the equilibrium frequencies for the given model fit

See one of these other methods for example(s).

extract_model_logl(model_name)¶

Return log likelihood (as a float) for a given model that appears in the the fits field.

Required arguments:

model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_model_logl("Nucleotide GTR")
-3531.96378073

extract_model_estimated_parameters(model_name)¶

Return estimated parameters (as an int) for a given model that appears in the fits field.

Required arguments:

model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_model_estimated_parameters("Nucleotide GTR")
24

extract_model_aicc(model_name)¶

Return AICc (as a float) for a given model that appears in the fits field.

Required arguments:

model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_model_aicc("Nucleotide GTR")
7112.57796796

extract_model_rate_distributions(model_name)¶

Return rate distributions, as a reformatted dictionary, for a given model that appears in the fits field. NOTE: Currently assumes dS = 1 for all initial MG94xREV fits, as in the current HyPhy implementation (True in <=2.3.4).

Required arguments:

model_name, the name of the model of interest. Note that all model names can be revealed with the method .reveal_fitted_models()

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_model_rate_distributions("Nucleotide GTR")
{'AC': 0.5472216942647106, 'GT': 0.3027127947903878, 'AG': 1, 'CG': 0.4864956075134169, 'AT': 0.2645767737218761, 'CT': 1.017388348535757}

>>> e.extract_model_rate_distributions("Global MG94xREV")
{'test': {'proportion': 1.0, 'omega': 0.9860796476982517}}

extract_model_frequencies(model_name, as_dict=False)¶

Return a list of equilibrium frequencies (in alphabetical order) for a given model that appears in the field fits.

Required arguments:

model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()

Optional keyword arguments:

as_dict, Boolean to indicate if the frequencies should be returned as a dictionary. Default: False.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_model_frequencies("Nucleotide GTR")
[0.3563279857397504, 0.1837789661319073, 0.2402852049910873, 0.2196078431372549]

>>> ### Return dictionary instead of list
>>> e.extract_model_frequencies("Nucleotide GTR", as_dict = True)
{'A': 0.3563279857397504, 'C': 0.1837789661319073, 'T': 0.2196078431372549, 'G': 0.2402852049910873}

extract_branch_sets(by_set=False)¶

Return branch set designations as a dictionary for all nodes. By default, this function will return the branch sets “as is” is the JSON field tested, where keys are node and values are the branch set to which the given node belongs NOTE: Assumes that all partitions share the same branch sets.

Optional keyword arguments:

by_set, Boolean to indicate if the returned dictionary should use branch sets as keys, and values are a list of nodes in that branch set. Default: False.

Examples:

>>> e = Extractor("/path/to/BUSTED.json") ## Define a BUSTED Extractor, for example
>>> e.extract_branch_sets()
{'Node12': 'test', 'GOR': 'test', 'HUM': 'test', 'PON': 'test', 'MAC': 'test', 'MAR': 'test', 'BAB': 'test', 'GIB': 'test', 'BUS': 'test', 'Node3': 'test', 'Node2': 'test', 'Node5': 'test', 'Node4': 'test', 'PAN': 'test', 'Node6': 'test'}

>>> ### Return dictionary of lists per set instead of default
>>> e.extract_branch_sets(by_set = True)
{'test': ['Node12', 'HUM', 'PON', 'MAC', 'MAR', 'BAB', 'GIB', 'Node2', 'BUS', 'Node3', 'Node6', 'Node5', 'Node4', 'PAN', 'GOR']}

reveal_branch_attributes()¶

Return a dictionary of all the attributes in the branch attributes field and their attribute type (node label or branch label).

Examples:

>>> e = Extractor("/path/to/BUSTED.json") ## Define a BUSTED Extractor, for example
>>> e.reveal_branch_attributes()
{'Nucleotide GTR': 'branch length', 'unconstrained': 'branch length', 'constrained': 'branch length', 'MG94xREV with separate rates for branch sets': 'branch length', 'original name': 'node label'}

extract_branch_attribute(attribute_name, partition=None)¶

Return dictionary of attributes for given attribute, where keys are nodes and values are attributes. If there are multiple partitions, default returns a dictionary with all partitions. If partition = [some integer], only the attribute for the given partition will be returned. NOTE: PARTITION STARTS FROM 0.

Importantly, the values for all returned dictionaries will be strings, except for the extraction of rate distributions .

Required positional arguments:

attribute_name, the name of the attribute to obtain. Attribute names available can be revealed with the method .reveal_branch_attributes().

Optional keyword arguments:

partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.

Examples:

>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example
>>> e.extract_branch_attribute("Nucleotide GTR") ## branches lengths
{'Horse': '0.209139911487', 'Node12': '0.0178341148216', 'Cow': '0.248286674829', 'Chimp': '0.00181779097957', 'RhMonkey': '0.00377365885129', 'Pig': '0.187127383086', 'Node9': '0.0256769899145', 'Node8': '0.106120848179', 'Rat': '0.0666961080592', 'Node3': '0.0989071298032', 'Human': '0', 'Node1': '0.277289433172', 'Cat': '0.266103366998', 'Node2': '0.0661858336662', 'Mouse': '0.118170595693', 'Baboon': '0.0016809649281'}

>>> e = Extractor("/path/to/ABSREL.json") ## Define an ABSREL Extractor, for example
>>> e.extract_branch_attribute("Rate classes") ## Number of inferred rate classes per node
{'0557_7': '1', '0557_4': '1', 'Node29': '1', '0564_13': '1', 'Node25': '1', 'Node20': '1', 'Node23': '1', '0557_11': '1', '0557_12': '1', '0557_13': '1', '0564_22': '1', '0564_21': '1', '0564_15': '2', 'Node9': '1', '0564_1': '1', '0564_3': '2', 'Separator': '2', '0564_5': '1', '0564_6': '1', '0564_7': '1', '0564_9': '1', '0557_24': '1', 'Node7': '1', 'Node6': '1', '0557_9': '1', 'Node17': '1', 'Node16': '1', 'Node19': '1', 'Node32': '1', 'Node30': '1', '0557_6': '1', 'Node36': '1', 'Node35': '2', '0557_5': '1', '0557_2': '1', '0564_11': '2', '0564_17': '1', 'Node18': '1', '0557_25': '1', '0564_4': '2', 'Node8': '1', '0557_26': '1', '0557_21': '1', 'Node53': '1'}

map_branch_attribute(attribute_name, original_names=False, partition=None)¶

Return the newick phylogeny with specified attribute mapped into the phylogeny as branch lengths. If there are multiple partitions, default returns a dictionary of mapped trees for all partitions. If partition is specified, only the attribute for the given partition will be returned. NOTE: PARTITION STARTS FROM 0.

Required positional arguments:

attribute_name, the name of the attribute to obtain. Attribute names available can be revealed with the method .reveal_branch_attributes().

Optional keyword arguments:

partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
original_names, reformat the tree with the original names (as opposed to hyphy-friendly names with forbidden characters replaced). In most cases hyphy and original names are identical. Default: False.

Examples:

>>> e = Extractor("/path/to/ABSREL.json") ## Define an aBSREL Extractor, for example
>>> e.map_branch_atttribute("Rate classes") ## number of inferred rate classes, as branch lengths
(0564_7:1,(((((0564_11:2,0564_4:2)Node20:1,(0564_1:1,(0564_21:1,0564_5:1)Node25:1)Node23:1)Node19:1,0564_17:1)Node18:1,((0564_13:1,(0564_15:2)Node32:1)Node30:1,((0564_22:1,0564_6:1)Node36:1,0564_3:2)Node35:2)Node29:1)Node17:1,0564_9:1)Node16:1,(((0557_24:1,0557_4:1,0557_2:1)Node9:1,0557_12:1)Node8:1,((0557_21:1,0557_6:1,0557_9:1,0557_11:1,0557_13:1,0557_26:1,(0557_5:1,0557_7:1)Node53:1)Node6:1,0557_25:1)Node7:1)Separator:2);

extract_model_tree(model, partition=None, original_names=False)¶

Return newick phylogeny fitted to a certain model, i.e. with branch lengths optimized for specified model. This is just a special case of map_branch_attribute.

Required positional arguments:

model, the name of the model whose optimized tree you wish to obtain. Models names available can be revealed with the method .reveal_fitted_models().

Optional keyword arguments:

partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
original_names, reformat the tree with the original names (as opposed to hyphy-friendly names with forbidden characters replaced). In most cases hyphy and original names are identical. Default: False.

Examples:

>>> ### Define a FEL Extractor, for example 
>>> e = Extractor("/path/to/FEL.json") 
>>> e.extract_model_tree("Global MG94xREV) 
((((Pig:0.192554792971,Cow:0.247996722936)Node3:0.101719189407,Horse:0.211310618381,Cat:0.273732369855)Node2:0.0644249932769,((RhMonkey:0.00372054481786,Baboon:0.0017701670358)Node9:0.0259206344918,(Human:0,Chimp:0.00182836999996)Node12:0.0178636195889)Node8:0.109431753602)Node1:0.284434196447,Rat:0.0670087588444,Mouse:0.120166947697);"

>>> ### Use original names rather than HyPhy-reformatted names 
>>> e.extract_model_tree("Global MG94xREV") 
((((Pig~gy:0.192554792971,Cow:0.247996722936)Node3:0.101719189407,Horse:0.211310618381,Cat:0.273732369855)Node2:0.0644249932769,((RhMonkey:0.00372054481786,Baboon:0.0017701670358)Node9:0.0259206344918,(Human:0,Chimp:0.00182836999996)Node12:0.0178636195889)Node8:0.109431753602)Node1:0.284434196447,Rat:0.0670087588444,Mouse:0.120166947697);

>>> ### Define a FEL Extractor, from an analysis with multiple partitions, for example
>>> e = Extractor("/path/to/FEL_mulitpart.json") 
>>> e.extract_model_tree("Global MG94xREV", partition = 1) ## specify only one partition 
(((((AF231119:0.00272571804934,AF231115:0)Node4:0,((AF082576:0.00274243126371,AF231113:0)Node8:0.0027139677452,AF231114:0.011078118042)Node7:0.00276605108624)Node3:0.00271644261188,(AF231117:0.00298921107219,AF231118:0.0505182782033)Node12:0.00258521327296)Node2:0.00550172127052,(AF186242:0,(AF186243:0,AF234767:0.0224059556982)Node17:0.00273365779956)Node15:0.00270941747926)Node1:0,(AF186241:0.00270936341991,AF231116:0)Node20:0,AF187824:0.00546772497238);

extract_absrel_tree(original_names=False, update_branch_lengths=None, p=0.05, labels=None)¶

Return newick phylogeny in Extended Newick Format (ete-style features) as selection indicators (Default is 0 for not selected, 1 for selected) at the specified p threshold. aBSREL only.

Optional keyword arguments:

original_names, reformat the tree with the original names (as opposed to hyphy-friendly names with forbidden characters replaced). In most cases hyphy and original names are identical. Default: False.
update_branch_lengths, string model name, indicting that branch lengths should be replaced with the given model fit’s optimized lengths. Default: None.
p, the p-value threshold for calling selection. Default: 0.05
labels: A tuple of labels to use for (selected, not selected). Default is (1,0)

Examples:

>>> ### Define an ABSREL Extractor
>>> e = Extractor("/path/to/ABSREL.json") 

>>> ### Add extended-newick format labels of selection with default labels.
>>> ### Note this example happens not to have branch lengths in the input tree.
>>> e.extract_absrel_tree() 
(0564_7:1[&&NHX:Selected=0],(((((0564_11:1[&&NHX:Selected=0],0564_4:1[&&NHX:Selected=0])Node20:1[&&NHX:Selected=0],(0564_1:1[&&NHX:Selected=0],(0564_21:1[&&NHX:Selected=0],0564_5:1[&&NHX:Selected=0])Node25:1[&&NHX:Selected=0])Node23:1[&&NHX:Selected=0])Node19:1[&&NHX:Selected=0],0564_17:1[&&NHX:Selected=0])Node18:1[&&NHX:Selected=0],((0564_13:1[&&NHX:Selected=0],(0564_15:1[&&NHX:Selected=0])Node32:1[&&NHX:Selected=0])Node30:1[&&NHX:Selected=0],((0564_22:1[&&NHX:Selected=0],0564_6:1[&&NHX:Selected=0])Node36:1[&&NHX:Selected=0],0564_3:1[&&NHX:Selected=1])Node35:1[&&NHX:Selected=1])Node29:1[&&NHX:Selected=0])Node17:1[&&NHX:Selected=0],0564_9:1[&&NHX:Selected=0])Node16:1[&&NHX:Selected=0],(((0557_24:1[&&NHX:Selected=0],0557_4:1[&&NHX:Selected=0],0557_2:1[&&NHX:Selected=0])Node9:1[&&NHX:Selected=0],0557_12:1[&&NHX:Selected=0])Node8:1[&&NHX:Selected=0],((0557_21:1[&&NHX:Selected=0],0557_6:1[&&NHX:Selected=0],0557_9:1[&&NHX:Selected=0],0557_11:1[&&NHX:Selected=0],0557_13:1[&&NHX:Selected=0],0557_26:1[&&NHX:Selected=0],(0557_5:1[&&NHX:Selected=0],0557_7:1[&&NHX:Selected=0])Node53:1[&&NHX:Selected=0])Node6:1[&&NHX:Selected=0],0557_25:1[&&NHX:Selected=0])Node7:1[&&NHX:Selected=0])Separator:1[&&NHX:Selected=1])[&&NHX:Selected=0];

>>> ### Add extended-newick format labels of selection with default labels, with branch lengths updated as the adaptive model 
>>> e.extract_absrel_tree(update_branch_lengths="Full adaptive model") 
(0564_7:0.00708844[&&NHX:Selected=0],(((((0564_11:0.00527268[&&NHX:Selected=0],0564_4:0.00714182[&&NHX:Selected=0])Node20:0.0022574[&&NHX:Selected=0],(0564_1:0.00583239[&&NHX:Selected=0],(0564_21:0.00121537[&&NHX:Selected=0],0564_5:0.00266921[&&NHX:Selected=0])Node25:0.000797211[&&NHX:Selected=0])Node23:0.00142056[&&NHX:Selected=0])Node19:0.0019147[&&NHX:Selected=0],0564_17:0.00605582[&&NHX:Selected=0])Node18:0.00100178[&&NHX:Selected=0],((0564_13:0.0053066[&&NHX:Selected=0],(0564_15:0.00346989[&&NHX:Selected=0])Node32:0.000752206[&&NHX:Selected=0])Node30:0.00188243[&&NHX:Selected=0],((0564_22:0.00686981[&&NHX:Selected=0],0564_6:0.00581523[&&NHX:Selected=0])Node36:0.00125905[&&NHX:Selected=0],0564_3:0.00791919[&&NHX:Selected=1])Node35:0.0174886[&&NHX:Selected=1])Node29:0.0010489[&&NHX:Selected=0])Node17:0.00156911[&&NHX:Selected=0],0564_9:0.00551506[&&NHX:Selected=0])Node16:0.000783733[&&NHX:Selected=0],(((0557_24:0.00078793[&&NHX:Selected=0],0557_4:0.000787896[&&NHX:Selected=0],0557_2:0.000399166[&&NHX:Selected=0])Node9:0.00206483[&&NHX:Selected=0],0557_12:0.00267531[&&NHX:Selected=0])Node8:0.00118205[&&NHX:Selected=0],((0557_21:0[&&NHX:Selected=0],0557_6:0.000391941[&&NHX:Selected=0],0557_9:0.000402021[&&NHX:Selected=0],0557_11:0.00156985[&&NHX:Selected=0],0557_13:0.000401742[&&NHX:Selected=0],0557_26:0.00079377[&&NHX:Selected=0],(0557_5:0.00117641[&&NHX:Selected=0],0557_7:0[&&NHX:Selected=0])Node53:0.000391973[&&NHX:Selected=0])Node6:0.00118062[&&NHX:Selected=0],0557_25:0.00220372[&&NHX:Selected=0])Node7:0.00103489[&&NHX:Selected=0])Separator:0.00822051[&&NHX:Selected=1])[&&NHX:Selected=0];               

>>> ### Add extended-newick format labels of selection with *custom* labels, with branch lengths updated as the adaptive model 
>>> e.extract_absrel_tree(update_branch_lengths="Full adaptive model", labels=["no", "yes"]) 
(0564_7:0.00708844[&&NHX:Selected=yes],(((((0564_11:0.00527268[&&NHX:Selected=yes],0564_4:0.00714182[&&NHX:Selected=yes])Node20:0.0022574[&&NHX:Selected=yes],(0564_1:0.00583239[&&NHX:Selected=yes],(0564_21:0.00121537[&&NHX:Selected=yes],0564_5:0.00266921[&&NHX:Selected=yes])Node25:0.000797211[&&NHX:Selected=yes])Node23:0.00142056[&&NHX:Selected=yes])Node19:0.0019147[&&NHX:Selected=yes],0564_17:0.00605582[&&NHX:Selected=yes])Node18:0.00100178[&&NHX:Selected=yes],((0564_13:0.0053066[&&NHX:Selected=yes],(0564_15:0.00346989[&&NHX:Selected=yes])Node32:0.000752206[&&NHX:Selected=yes])Node30:0.00188243[&&NHX:Selected=yes],((0564_22:0.00686981[&&NHX:Selected=yes],0564_6:0.00581523[&&NHX:Selected=yes])Node36:0.00125905[&&NHX:Selected=yes],0564_3:0.00791919[&&NHX:Selected=no])Node35:0.0174886[&&NHX:Selected=no])Node29:0.0010489[&&NHX:Selected=yes])Node17:0.00156911[&&NHX:Selected=yes],0564_9:0.00551506[&&NHX:Selected=yes])Node16:0.000783733[&&NHX:Selected=yes],(((0557_24:0.00078793[&&NHX:Selected=yes],0557_4:0.000787896[&&NHX:Selected=yes],0557_2:0.000399166[&&NHX:Selected=yes])Node9:0.00206483[&&NHX:Selected=yes],0557_12:0.00267531[&&NHX:Selected=yes])Node8:0.00118205[&&NHX:Selected=yes],((0557_21:0[&&NHX:Selected=yes],0557_6:0.000391941[&&NHX:Selected=yes],0557_9:0.000402021[&&NHX:Selected=yes],0557_11:0.00156985[&&NHX:Selected=yes],0557_13:0.000401742[&&NHX:Selected=yes],0557_26:0.00079377[&&NHX:Selected=yes],(0557_5:0.00117641[&&NHX:Selected=yes],0557_7:0[&&NHX:Selected=yes])Node53:0.000391973[&&NHX:Selected=yes])Node6:0.00118062[&&NHX:Selected=yes],0557_25:0.00220372[&&NHX:Selected=yes])Node7:0.00103489[&&NHX:Selected=yes])Separator:0.00822051[&&NHX:Selected=no])[&&NHX:Selected=0];

>>> ### Add extended-newick format labels of selection with default labels, using a P-threshold of 0.3 
>>> e.extract_absrel_tree(p=0.3) 
(0564_7:1[&&NHX:Selected=1],(((((0564_11:1[&&NHX:Selected=0],0564_4:1[&&NHX:Selected=0])Node20:1[&&NHX:Selected=0],(0564_1:1[&&NHX:Selected=0],(0564_21:1[&&NHX:Selected=0],0564_5:1[&&NHX:Selected=0])Node25:1[&&NHX:Selected=0])Node23:1[&&NHX:Selected=0])Node19:1[&&NHX:Selected=0],0564_17:1[&&NHX:Selected=0])Node18:1[&&NHX:Selected=0],((0564_13:1[&&NHX:Selected=0],(0564_15:1[&&NHX:Selected=0])Node32:1[&&NHX:Selected=0])Node30:1[&&NHX:Selected=0],((0564_22:1[&&NHX:Selected=0],0564_6:1[&&NHX:Selected=0])Node36:1[&&NHX:Selected=0],0564_3:1[&&NHX:Selected=1])Node35:1[&&NHX:Selected=1])Node29:1[&&NHX:Selected=0])Node17:1[&&NHX:Selected=0],0564_9:1[&&NHX:Selected=0])Node16:1[&&NHX:Selected=0],(((0557_24:1[&&NHX:Selected=0],0557_4:1[&&NHX:Selected=0],0557_2:1[&&NHX:Selected=0])Node9:1[&&NHX:Selected=0],0557_12:1[&&NHX:Selected=0])Node8:1[&&NHX:Selected=0],((0557_21:1[&&NHX:Selected=0],0557_6:1[&&NHX:Selected=0],0557_9:1[&&NHX:Selected=0],0557_11:1[&&NHX:Selected=0],0557_13:1[&&NHX:Selected=0],0557_26:1[&&NHX:Selected=0],(0557_5:1[&&NHX:Selected=0],0557_7:1[&&NHX:Selected=0])Node53:1[&&NHX:Selected=0])Node6:1[&&NHX:Selected=0],0557_25:1[&&NHX:Selected=0])Node7:1[&&NHX:Selected=0])Separator:1[&&NHX:Selected=1])[&&NHX:Selected=0];

extract_feature_tree(feature, original_names=False, update_branch_lengths=None, partition=None)¶

Return newick phylogeny in Extended Newick Format (ete-style features) with specified feature(s).

Required positional arguments:

feature, The feature(s) to be included the final tree. This is either a string of a feature, or a list of features. Features are taken from attributes. Note that the exported feature label will have all spaces removed.

Optional keyword arguments:

update_branch_lengths, string model name, indicting that branch lengths should be replaced with the given model fit’s optimized lengths. Default: None.
partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.

Examples:

>>> ### Define an ABSREL Extractor
>>> e = Extractor("/path/to/ABSREL.json") 

>>> ### Add a single feature, rate classes
>>> ### Note this example happens to have no branch lengths
>>> e.extract_feature_tree("Rate classes") 
(0564_7:1[&&NHX:Rateclasses=1],(((((0564_11:1[&&NHX:Rateclasses=2],0564_4:1[&&NHX:Rateclasses=2])Node20:1[&&NHX:Rateclasses=1],(0564_1:1[&&NHX:Rateclasses=1],(0564_21:1[&&NHX:Rateclasses=1],0564_5:1[&&NHX:Rateclasses=1])Node25:1[&&NHX:Rateclasses=1])Node23:1[&&NHX:Rateclasses=1])Node19:1[&&NHX:Rateclasses=1],0564_17:1[&&NHX:Rateclasses=1])Node18:1[&&NHX:Rateclasses=1],((0564_13:1[&&NHX:Rateclasses=1],(0564_15:1[&&NHX:Rateclasses=2])Node32:1[&&NHX:Rateclasses=1])Node30:1[&&NHX:Rateclasses=1],((0564_22:1[&&NHX:Rateclasses=1],0564_6:1[&&NHX:Rateclasses=1])Node36:1[&&NHX:Rateclasses=1],0564_3:1[&&NHX:Rateclasses=2])Node35:1[&&NHX:Rateclasses=2])Node29:1[&&NHX:Rateclasses=1])Node17:1[&&NHX:Rateclasses=1],0564_9:1[&&NHX:Rateclasses=1])Node16:1[&&NHX:Rateclasses=1],(((0557_24:1[&&NHX:Rateclasses=1],0557_4:1[&&NHX:Rateclasses=1],0557_2:1[&&NHX:Rateclasses=1])Node9:1[&&NHX:Rateclasses=1],0557_12:1[&&NHX:Rateclasses=1])Node8:1[&&NHX:Rateclasses=1],((0557_21:1[&&NHX:Rateclasses=1],0557_6:1[&&NHX:Rateclasses=1],0557_9:1[&&NHX:Rateclasses=1],0557_11:1[&&NHX:Rateclasses=1],0557_13:1[&&NHX:Rateclasses=1],0557_26:1[&&NHX:Rateclasses=1],(0557_5:1[&&NHX:Rateclasses=1],0557_7:1[&&NHX:Rateclasses=1])Node53:1[&&NHX:Rateclasses=1])Node6:1[&&NHX:Rateclasses=1],0557_25:1[&&NHX:Rateclasses=1])Node7:1[&&NHX:Rateclasses=1])Separator:1[&&NHX:Rateclasses=2])[&&NHX:Rateclasses=0];;

>>> ### Add a single feature, rate classes, with updated branch lengths 
>>> e.extract_feature_tree("Rate classes",update_branch_lengths = "Nucleotide GTR") 
(0564_7:0.00664844[&&NHX:Rateclasses=1],(((((0564_11:0.00434881[&&NHX:Rateclasses=2],0564_4:0.00593219[&&NHX:Rateclasses=2])Node20:0.0026739[&&NHX:Rateclasses=1],(0564_1:0.00559179[&&NHX:Rateclasses=1],(0564_21:0.00124334[&&NHX:Rateclasses=1],0564_5:0.00259957[&&NHX:Rateclasses=1])Node25:0.000863062[&&NHX:Rateclasses=1])Node23:0.00149918[&&NHX:Rateclasses=1])Node19:0.00164262[&&NHX:Rateclasses=1],0564_17:0.00600417[&&NHX:Rateclasses=1])Node18:0.00100639[&&NHX:Rateclasses=1],((0564_13:0.00534732[&&NHX:Rateclasses=1],(0564_15:0.00278489[&&NHX:Rateclasses=2])Node32:0.000565591[&&NHX:Rateclasses=1])Node30:0.00196314[&&NHX:Rateclasses=1],((0564_22:0.00686685[&&NHX:Rateclasses=1],0564_6:0.00554135[&&NHX:Rateclasses=1])Node36:0.000954793[&&NHX:Rateclasses=1],0564_3:0.00652918[&&NHX:Rateclasses=2])Node35:0.00195507[&&NHX:Rateclasses=2])Node29:0.001029[&&NHX:Rateclasses=1])Node17:0.00155756[&&NHX:Rateclasses=1],0564_9:0.00533212[&&NHX:Rateclasses=1])Node16:0.000567283[&&NHX:Rateclasses=1],(((0557_24:0.000770978[&&NHX:Rateclasses=1],0557_4:0.000770929[&&NHX:Rateclasses=1],0557_2:0.000385454[&&NHX:Rateclasses=1])Node9:0.00199212[&&NHX:Rateclasses=1],0557_12:0.00265769[&&NHX:Rateclasses=1])Node8:0.00119139[&&NHX:Rateclasses=1],((0557_21:0[&&NHX:Rateclasses=1],0557_6:0.000388349[&&NHX:Rateclasses=1],0557_9:0.000388411[&&NHX:Rateclasses=1],0557_11:0.00155424[&&NHX:Rateclasses=1],0557_13:0.000388353[&&NHX:Rateclasses=1],0557_26:0.000776809[&&NHX:Rateclasses=1],(0557_5:0.00116512[&&NHX:Rateclasses=1],0557_7:0[&&NHX:Rateclasses=1])Node53:0.000388349[&&NHX:Rateclasses=1])Node6:0.00118273[&&NHX:Rateclasses=1],0557_25:0.00219124[&&NHX:Rateclasses=1])Node7:0.000940406[&&NHX:Rateclasses=1])Separator:0.00689203[&&NHX:Rateclasses=2])[&&NHX:Rateclasses=0];"

>>> ### Add a multiple features with updated branch lengths 
>>> e.extract_feature_tree(["Rate classes", "LRT", update_branch_lengths = "Nucleotide GTR") 
(0564_7:0.00664844[&&NHX:Rateclasses=1:LRT=0],(((((0564_11:0.00434881[&&NHX:Rateclasses=2:LRT=3.96048976951],0564_4:0.00593219[&&NHX:Rateclasses=2:LRT=4.80587881259])Node20:0.0026739[&&NHX:Rateclasses=1:LRT=3.30060030447],(0564_1:0.00559179[&&NHX:Rateclasses=1:LRT=0.0105269546166],(0564_21:0.00124334[&&NHX:Rateclasses=1:LRT=0],0564_5:0.00259957[&&NHX:Rateclasses=1:LRT=4.51927751707])Node25:0.000863062[&&NHX:Rateclasses=1:LRT=0])Node23:0.00149918[&&NHX:Rateclasses=1:LRT=0])Node19:0.00164262[&&NHX:Rateclasses=1:LRT=0.153859379099],0564_17:0.00600417[&&NHX:Rateclasses=1:LRT=0])Node18:0.00100639[&&NHX:Rateclasses=1:LRT=1.64667962972],((0564_13:0.00534732[&&NHX:Rateclasses=1:LRT=0],(0564_15:0.00278489[&&NHX:Rateclasses=2:LRT=4.97443221859])Node32:0.000565591[&&NHX:Rateclasses=1:LRT=0])Node30:0.00196314[&&NHX:Rateclasses=1:LRT=2.86518293899],((0564_22:0.00686685[&&NHX:Rateclasses=1:LRT=0.114986865638],0564_6:0.00554135[&&NHX:Rateclasses=1:LRT=0])Node36:0.000954793[&&NHX:Rateclasses=1:LRT=0],0564_3:0.00652918[&&NHX:Rateclasses=2:LRT=14.0568340492])Node35:0.00195507[&&NHX:Rateclasses=2:LRT=22.65142315])Node29:0.001029[&&NHX:Rateclasses=1:LRT=1.50723222708])Node17:0.00155756[&&NHX:Rateclasses=1:LRT=2.63431127725],0564_9:0.00533212[&&NHX:Rateclasses=1:LRT=0])Node16:0.000567283[&&NHX:Rateclasses=1:LRT=0],(((0557_24:0.000770978[&&NHX:Rateclasses=1:LRT=0],0557_4:0.000770929[&&NHX:Rateclasses=1:LRT=0],0557_2:0.000385454[&&NHX:Rateclasses=1:LRT=0])Node9:0.00199212[&&NHX:Rateclasses=1:LRT=0],0557_12:0.00265769[&&NHX:Rateclasses=1:LRT=0])Node8:0.00119139[&&NHX:Rateclasses=1:LRT=1.99177154715],((0557_21:0[&&NHX:Rateclasses=1:LRT=0],0557_6:0.000388349[&&NHX:Rateclasses=1:LRT=0.662153642024],0557_9:0.000388411[&&NHX:Rateclasses=1:LRT=0],0557_11:0.00155424[&&NHX:Rateclasses=1:LRT=2.65063418443],0557_13:0.000388353[&&NHX:Rateclasses=1:LRT=0],0557_26:0.000776809[&&NHX:Rateclasses=1:LRT=0],(0557_5:0.00116512[&&NHX:Rateclasses=1:LRT=1.98893541781],0557_7:0[&&NHX:Rateclasses=1:LRT=0])Node53:0.000388349[&&NHX:Rateclasses=1:LRT=0.660753167251])Node6:0.00118273[&&NHX:Rateclasses=1:LRT=0],0557_25:0.00219124[&&NHX:Rateclasses=1:LRT=0])Node7:0.000940406[&&NHX:Rateclasses=1:LRT=1.69045394756])Separator:0.00689203[&&NHX:Rateclasses=2:LRT=14.127483568])[&&NHX:Rateclasses=0:LRT=0];

reveal_fields()¶

Return list of top-level JSON fields.

Examples:

>>> ### Define an ABSREL Extractor
>>> e = Extractor("/path/to/ABSREL.json") 

>>> ### Reveal all fields
>>> e.reveal_fields()
['branch attributes', 'analysis', 'tested', 'data partitions', 'timers', 'fits', 'input', 'test results']

extract_csv(csv, delim=', ', original_names=True, slac_ancestral_type='AVERAGED')¶

Extract a CSV from JSON, for certain methods:

FEL
SLAC
MEME
FUBAR
LEISR
aBSREL

Output for each:

FEL
- site, the site of interest
- alpha, Synonymous substitution rate at a site
- beta, Non-synonymous substitution rate at a site
- alpha=beta,The rate estimate under the neutral model
- LRT, Likelihood ratio test statistic for beta = alpha, vs. alternative beta != alpha
- p-value, P-value from the Likelihood ratio test
- Total branch length, The total length of branches contributing to inference at this site
SLAC
- site, the site of interest
- ES, Expected synonymous sites
- EN, Expected non-synonymous sites
- S, Inferred synonymous substitutions
- N, Inferred non-synonymous substitutions
- P[S], Expected proportion of synonymous sites
- dS, Inferred synonymous susbsitution rate
- dN, Inferred non-synonymous susbsitution rate
- dN-dS, Scaled by the length of the tested branches
- P_[dN/dS_>_1], Binomial probability that S is no greater than the observed value, with P<sub>s</sub> probability of success
- P_[dN/dS_<_1], Binomial probability that S is no less than the observed value, with P<sub>s</sub> probability of success
- Total branch length, The total length of branches contributing to inference at this site, and used to scale dN-dS};
MEME
- site, the site of interest
- alpha, Synonymous substitution rate at a site
- beta_neg>, Non-synonymous substitution rate at a site for the negative/neutral evolution component
- prop_beta_neg, Mixture distribution weight allocated to beta_neg; loosely – the proportion of the tree evolving neutrally or under negative selection
- beta_pos, Non-synonymous substitution rate at a site for the positive/neutral evolution component
- prop_beta_pos, Mixture distribution weight allocated to beta_pos; loosely – the proportion of the tree evolving neutrally or under positive selection
- LRT, Likelihood ratio test statistic for episodic diversification
- p-value, Asymptotic p-value for episodic diversification
- num_branches_under_selection, The (very approximate and rough) estimate of how many branches may have been under selection at this site, i.e., had an empirical Bayes factor of 100 or more for the beta_pos rate
- Total_branch_length, The total length of branches contributing to inference at this site
FUBAR
- site, the site of interest
- alpha, Mean posterior synonymous substitution rate at a site
- beta, Mean posterior non-synonymous substitution rate at a site
- beta-alpha, Mean posterior beta-alpha
- Prob[alpha>beta], Posterior probability of negative selection at a site
- Prob[alpha<beta], Posterior probability of positive selection at a site
- BayesFactor[alpha<beta], Empiricial Bayes Factor for positive selection at a site
- PSRF, Potential scale reduction factor - an MCMC mixing measure
- Neff, Estimated effective sample site for Prob [alpha<beta]};
LEISR
- site, the site of interest
- MLE, Relative rate estimate at a site
- Lower, Lower bound of 95% profile likelihood CI
- Upper, Upper bound of 95% profile likelihood CI
aBSREL
- node, Node name of interest
- baseline_omega, Baseline omega estimate under the MG94xREV model
- number_rate_classes, Number of final rate classes inferred by the adaptive model
- tested, Was this branch tested for selected? 0/1
- prop_sites_selected, Proportion of selected sites along the branch
- LRT, Likelihood ratio test statistic
- uncorrected_P, Uncorrected P-value associated with LRT
- corrected_P, FDR-corrected P-value for selection along this branch

Required positional arguments:

csv, File name for output CSV

Optional keyword arguments:

delim, A different delimitor for the output, e.g. ” ” for tab
original_names, An ABSREL specific boolean argument to indicate whether HyPhy-reformatted branch should be used in output csv (False), or original names as present in the input data alignment should be used (True). Default: True
slac_ancestral_type, A SLAC specific argument, either “AVERAGED” (Default) or “RESOLVED” (case insensitive) to indicate whether reported results should be from calculations done on either type of ancestral counting.

Examples:

>>> ### Define an ABSREL Extractor, for example
>>> e = Extractor("/path/to/ABSREL.json") 
>>> e.extract_csv("absrel.csv")
>>> ## With original names
>>> e.extract_csv("absrel.csv", original_names=True)
>>> ## As tsv
>>> e.extract_csv("absrel.csv", delim="\t")

>>> ### Define a FEL Extractor, for example
>>> e = Extractor("/path/to/FEL.json") 
>>> e.extract_csv("fel.csv")  

>>> ### Define a SLAC Extractor, for example
>>> e = Extractor("/path/to/SLAC.json") 
>>> e.extract_csv("slac.csv")
>>> ### Specify to export ancestral RESOLVED inferences  
>>> e.extract_csv("slac.csv", slac_ancestral_type = "RESOLVED")

extract_timers()¶

Extract dictionary of timers, with display order removed

Examples:

>>> ### Define an ABSREL Extractor, for example
>>> e = Extractor("/path/to/ABSREL.json") 
>>> e.extract_timers()
{'Full adaptive model fitting': 13.0, 'Preliminary model fitting': 1.0, 'Overall': 451.0, 'Testing for selection': 236.0, 'Baseline model fitting': 7.0, 'Complexity analysis': 193.0}

extract_site_logl()¶

Extract BUSTED site log likelihoods, as dictionary

Examples:

>>> ### Define a BUSTED Extractor
>>> e = Extractor("/path/to/BUSTED.json") 
>>> e.extract_site_logl() ## output below abbreviated for visual purposes
{'unconstrained': [-3.820084348560567, -5.294209244108775, ....], 'constrained': [-3.816130970827346, -5.290292412948827, -3.740077801446913, ...], 'optimized null': [-3.819912933488241, -5.292549093626999, -3.743404692680405, ...]}

extract_evidence_ratios()¶

Extract BUSTED evidence ratios, as dictionary.

Examples:

>>> ### Define a BUSTED Extractor
>>> e = Extractor("/path/to/BUSTED.json") 
>>> e.extract_evidence_ratios() ## output below abbreviated for visual purposes
{'constrained': [0.9960544265766805, 0.9960908296179645, 0.9962555861651011, ...], 'optimized null': [0.9998285996183979, 0.9983412268057609, 0.9995755396409283, ...]} 

phyphy documentation

extractor Module

`extractor` Module¶

extractor Module¶

`extractor` Module¶