extractor
Module¶
Parse (Extract!) JSON output from a standard HyPhy analysis.
-
class
extractor.
JSONFields
¶ Bases:
object
This class defines the strings of relevant JSON keys. Note that these strings correspond precisely to those in the HyPhy distribution. See file: TemplateBatchFiles/libv3/all-terms.bf in the terms.json namespace.
-
class
extractor.
AnalysisNames
¶ Bases:
object
This class defines the names of analyses which we can parse.
-
class
extractor.
Genetics
¶ Bases:
object
Class to define codes used. Primarily (only?) used to extract frequencies as dictionaries.
-
class
extractor.
Extractor
(content)¶ Bases:
object
This class parses JSON output and contains a variety of methods for pulling out various pieces of information.
Initialize a Extractor instance.
- Required arguments:
- content, The input content to parse. Two types of input may be provided here, EITHER:
- The path to a JSON file to parse, provided as a string
- A phyphy Analysis (i.e. BUSTED, SLAC, FEL, etc.) object which has been used to execute a HyPhy analysis through the phyphy interface
Examples:
>>> ### Define an Extractor instance with a JSON file >>> e = Extractor("/path/to/json.json")
>>> ### Define an Extractor instance with an Analysis object >>> ### First, define and run an analysis (FEL, for example) >>> myfel = FEL(data = "/path/to/data.fna") >>> myfel.run_analysis() >>> e = Extractor(myfel)
-
extract_number_sequences
()¶ Return the number of sequences in the input dataset.
No arguments are required.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_number_sequences() 10
-
extract_number_sites
()¶ Return the number of sites in the input dataset. Note for codon analyses this will be the number of codon sites (i.e. length/3)
No arguments are required.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_number_sites() 187
-
extract_input_file
()¶ Return the name (including the path to) the input dataset. If alignment and tree were provided separately, this will return the alignment file name.
No arguments are required.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_input_file() "/Users/sjspielman/evogenomics_hyphy/datasets/CD2.fna"
-
extract_partition_count
()¶ Return the number of partitions in the analysis.
No arguments are required.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_partition_count() 1
-
extract_input_tree
(partition=None, original_names=False, node_labels=False)¶ Return the inputted newick phylogeny, whose nodes have been labeled by HyPhy (if node labels were not present). For analyses with a single partition OR for a request for a specific partition’s tree, returns a string. For analyses with multiple partitions (and hence multiple trees), returns a dictionary of trees.
- Optional keyword arguments:
- partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
- original_names, Boolean (Default: False) if should update with original names before returning
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_input_tree() ((((Pig:0.147969,Cow:0.21343)Node3:0.085099,Horse:0.165787,Cat:0.264806)Node2:0.058611,((RhMonkey:0.002015,Baboon:0.003108)Node9:0.022733,(Human:0.004349,Chimp:0.000799)Node12:0.011873)Node8:0.101856)Node1:0.340802,Rat:0.050958,Mouse:0.09795);
>>> ### Use original names >>> e.extract_input_tree(original_names = True) ((((Pig~gy:0.147969,Cow:0.21343)Node3:0.085099,Horse:0.165787,Cat:0.264806)Node2:0.058611,((RhMonkey:0.002015,Baboon:0.003108)Node9:0.022733,(Human:0.004349,Chimp:0.000799)Node12:0.011873)Node8:0.101856)Node1:0.340802,Rat:0.050958,Mouse:0.09795);
>>> e = Extractor("/path/to/FEL_mulitpart.json") ## Define a FEL Extractor, from an analysis with multiple partitions, for example >>> e.extract_input_tree() ## All partitions {0: '((((AF231119:0.00599498,AF231117:0.00602763)Node3:0.00187262,(AF186242:0.00194569,AF186243:0.0059545)Node6:1e-10)Node2:0.00395465,(AF186241:0.00398948,(AF231116:1e-10,AF187824:0.00402724)Node11:0.00395692)Node9:0.00200337)Node1:0.00392717,AF082576:0.00193519,(((AF231118:0.0639035,AF234767:0.143569)Node17:0.000456671,(AF231115:0.00201331,AF231114:0.00592754)Node20:0.00592206)Node16:1e-10,AF231113:0.00395832)Node15:1e-10);', 1: '(((((AF231119:0.00307476,AF231115:1e-10)Node4:1e-10,((AF082576:0.00309362,AF231113:1e-10)Node8:0.0031872,AF231114:0.013292)Node7:0.0030793)Node3:0.00310106,(AF231117:0.00396728,AF231118:0.0665375)Node12:0.00249394)Node2:0.00637034,(AF186242:1e-10,(AF186243:1e-10,AF234767:0.0278842)Node17:0.00311418)Node15:0.00307177)Node1:1e-10,(AF186241:0.00306598,AF231116:1e-10)Node20:1e-10,AF187824:0.00632863);', 2: '(AF231119:0.00208218,AF231117:1e-10,((AF082576:1e-10,AF231113:0.00433775)Node4:0.00208919,((((AF186242:0.00216055,AF186243:0.00437974)Node10:0.00214339,((AF186241:1e-10,AF187824:0.00215048)Node14:0.00214528,AF231116:1e-10)Node13:1e-10)Node9:0.0112142,(AF231118:0.0244917,AF234767:0.0835686)Node18:0.0280857)Node8:0.0021073,(AF231115:1e-10,AF231114:0.00868934)Node21:0.00639388)Node7:1e-10)Node3:1e-10);', 3: '((AF231119:0.000939531,AF082576:0.00182425)Node1:1e-10,(((AF231117:0.00499646,(AF231116:1e-10,(AF187824:0.00453171,AF231113:0.0180629)Node10:0.00923609)Node8:0.00581275)Node6:0.00383552,(AF231115:1e-10,AF231114:0.0100664)Node13:0.00401088)Node5:0.00102177,((AF186242:0.00171504,AF186243:0.00438135)Node17:0.00180763,AF186241:0.0044495)Node16:0.00408249)Node4:0.000197413,(AF231118:0.032062,AF234767:0.0409599)Node21:0.0228604);'}
>>> e.extract_input_tree(partition = 1) ## Single specified partitions (((((AF231119:0.00307476,AF231115:1e-10)Node4:1e-10,((AF082576:0.00309362,AF231113:1e-10)Node8:0.0031872,AF231114:0.013292)Node7:0.0030793)Node3:0.00310106,(AF231117:0.00396728,AF231118:0.0665375)Node12:0.00249394)Node2:0.00637034,(AF186242:1e-10,(AF186243:1e-10,AF234767:0.0278842)Node17:0.00311418)Node15:0.00307177)Node1:1e-10,(AF186241:0.00306598,AF231116:1e-10)Node20:1e-10,AF187824:0.00632863);
-
reveal_fitted_models
()¶ Return a list of all model names in the fits JSON field.
No arguments are required.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.reveal_fitted_models() ['Nucleotide GTR', 'Global MG94xREV]
>>> e = Extractor("/path/to/aBSREL.json") ## Define an aBSREL Extractor, for example >>> e.reveal_fitted_models() ['Nucleotide GTR', 'Full adaptive model', 'Baseline MG94xREV']
-
extract_model_component
(model_name, component)¶ Return a model component for a given model name found in the fits JSON field.
- Required arguments:
- model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()
- component, the component of the model to return.
- Recommended use: Note there are a variety of convenience methods which wrap this function to extract all components (note that not all analyses will have all of these components):
.extract_model_logl(model_name)
returns the log likelihood of a given model fit.extract_model_estimated_parameters(model_name)
returns the number of estimated parameters in a given model fit.extract_model_aicc(model_name)
returns the small-sample AIC (AIC-c) for a given model fit.extract_model_rate_distributions(model_name)
returns rate distributions for a given model fit.extract_model_frequencies(model_name)
returns the equilibrium frequencies for the given model fit
See one of these other methods for example(s).
-
extract_model_logl
(model_name)¶ Return log likelihood (as a float) for a given model that appears in the the fits field.
- Required arguments:
- model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_model_logl("Nucleotide GTR") -3531.96378073
-
extract_model_estimated_parameters
(model_name)¶ Return estimated parameters (as an int) for a given model that appears in the fits field.
- Required arguments:
- model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_model_estimated_parameters("Nucleotide GTR") 24
-
extract_model_aicc
(model_name)¶ Return AICc (as a float) for a given model that appears in the fits field.
- Required arguments:
- model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_model_aicc("Nucleotide GTR") 7112.57796796
-
extract_model_rate_distributions
(model_name)¶ Return rate distributions, as a reformatted dictionary, for a given model that appears in the fits field. NOTE: Currently assumes dS = 1 for all initial MG94xREV fits, as in the current HyPhy implementation (True in <=2.3.4).
- Required arguments:
- model_name, the name of the model of interest. Note that all model names can be revealed with the method .reveal_fitted_models()
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_model_rate_distributions("Nucleotide GTR") {'AC': 0.5472216942647106, 'GT': 0.3027127947903878, 'AG': 1, 'CG': 0.4864956075134169, 'AT': 0.2645767737218761, 'CT': 1.017388348535757}
>>> e.extract_model_rate_distributions("Global MG94xREV") {'test': {'proportion': 1.0, 'omega': 0.9860796476982517}}
-
extract_model_frequencies
(model_name, as_dict=False)¶ Return a list of equilibrium frequencies (in alphabetical order) for a given model that appears in the field fits.
- Required arguments:
- model_name, the name of the model of interest. Note that all model names can be revealed with the method .extract_model_names()
- Optional keyword arguments:
- as_dict, Boolean to indicate if the frequencies should be returned as a dictionary. Default: False.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_model_frequencies("Nucleotide GTR") [0.3563279857397504, 0.1837789661319073, 0.2402852049910873, 0.2196078431372549]
>>> ### Return dictionary instead of list >>> e.extract_model_frequencies("Nucleotide GTR", as_dict = True) {'A': 0.3563279857397504, 'C': 0.1837789661319073, 'T': 0.2196078431372549, 'G': 0.2402852049910873}
-
extract_branch_sets
(by_set=False)¶ Return branch set designations as a dictionary for all nodes. By default, this function will return the branch sets “as is” is the JSON field tested, where keys are node and values are the branch set to which the given node belongs NOTE: Assumes that all partitions share the same branch sets.
- Optional keyword arguments:
- by_set, Boolean to indicate if the returned dictionary should use branch sets as keys, and values are a list of nodes in that branch set. Default: False.
Examples:
>>> e = Extractor("/path/to/BUSTED.json") ## Define a BUSTED Extractor, for example >>> e.extract_branch_sets() {'Node12': 'test', 'GOR': 'test', 'HUM': 'test', 'PON': 'test', 'MAC': 'test', 'MAR': 'test', 'BAB': 'test', 'GIB': 'test', 'BUS': 'test', 'Node3': 'test', 'Node2': 'test', 'Node5': 'test', 'Node4': 'test', 'PAN': 'test', 'Node6': 'test'}
>>> ### Return dictionary of lists per set instead of default >>> e.extract_branch_sets(by_set = True) {'test': ['Node12', 'HUM', 'PON', 'MAC', 'MAR', 'BAB', 'GIB', 'Node2', 'BUS', 'Node3', 'Node6', 'Node5', 'Node4', 'PAN', 'GOR']}
-
reveal_branch_attributes
()¶ Return a dictionary of all the attributes in the branch attributes field and their attribute type (node label or branch label).
Examples:
>>> e = Extractor("/path/to/BUSTED.json") ## Define a BUSTED Extractor, for example >>> e.reveal_branch_attributes() {'Nucleotide GTR': 'branch length', 'unconstrained': 'branch length', 'constrained': 'branch length', 'MG94xREV with separate rates for branch sets': 'branch length', 'original name': 'node label'}
-
extract_branch_attribute
(attribute_name, partition=None)¶ Return dictionary of attributes for given attribute, where keys are nodes and values are attributes. If there are multiple partitions, default returns a dictionary with all partitions. If partition = [some integer], only the attribute for the given partition will be returned. NOTE: PARTITION STARTS FROM 0.
Importantly, the values for all returned dictionaries will be strings, except for the extraction of rate distributions .
- Required positional arguments:
- attribute_name, the name of the attribute to obtain. Attribute names available can be revealed with the method .reveal_branch_attributes().
- Optional keyword arguments:
- partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
Examples:
>>> e = Extractor("/path/to/FEL.json") ## Define a FEL Extractor, for example >>> e.extract_branch_attribute("Nucleotide GTR") ## branches lengths {'Horse': '0.209139911487', 'Node12': '0.0178341148216', 'Cow': '0.248286674829', 'Chimp': '0.00181779097957', 'RhMonkey': '0.00377365885129', 'Pig': '0.187127383086', 'Node9': '0.0256769899145', 'Node8': '0.106120848179', 'Rat': '0.0666961080592', 'Node3': '0.0989071298032', 'Human': '0', 'Node1': '0.277289433172', 'Cat': '0.266103366998', 'Node2': '0.0661858336662', 'Mouse': '0.118170595693', 'Baboon': '0.0016809649281'}
>>> e = Extractor("/path/to/ABSREL.json") ## Define an ABSREL Extractor, for example >>> e.extract_branch_attribute("Rate classes") ## Number of inferred rate classes per node {'0557_7': '1', '0557_4': '1', 'Node29': '1', '0564_13': '1', 'Node25': '1', 'Node20': '1', 'Node23': '1', '0557_11': '1', '0557_12': '1', '0557_13': '1', '0564_22': '1', '0564_21': '1', '0564_15': '2', 'Node9': '1', '0564_1': '1', '0564_3': '2', 'Separator': '2', '0564_5': '1', '0564_6': '1', '0564_7': '1', '0564_9': '1', '0557_24': '1', 'Node7': '1', 'Node6': '1', '0557_9': '1', 'Node17': '1', 'Node16': '1', 'Node19': '1', 'Node32': '1', 'Node30': '1', '0557_6': '1', 'Node36': '1', 'Node35': '2', '0557_5': '1', '0557_2': '1', '0564_11': '2', '0564_17': '1', 'Node18': '1', '0557_25': '1', '0564_4': '2', 'Node8': '1', '0557_26': '1', '0557_21': '1', 'Node53': '1'}
-
map_branch_attribute
(attribute_name, original_names=False, partition=None)¶ Return the newick phylogeny with specified attribute mapped into the phylogeny as branch lengths. If there are multiple partitions, default returns a dictionary of mapped trees for all partitions. If partition is specified, only the attribute for the given partition will be returned. NOTE: PARTITION STARTS FROM 0.
- Required positional arguments:
- attribute_name, the name of the attribute to obtain. Attribute names available can be revealed with the method .reveal_branch_attributes().
- Optional keyword arguments:
- partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
- original_names, reformat the tree with the original names (as opposed to hyphy-friendly names with forbidden characters replaced). In most cases hyphy and original names are identical. Default: False.
Examples:
>>> e = Extractor("/path/to/ABSREL.json") ## Define an aBSREL Extractor, for example >>> e.map_branch_atttribute("Rate classes") ## number of inferred rate classes, as branch lengths (0564_7:1,(((((0564_11:2,0564_4:2)Node20:1,(0564_1:1,(0564_21:1,0564_5:1)Node25:1)Node23:1)Node19:1,0564_17:1)Node18:1,((0564_13:1,(0564_15:2)Node32:1)Node30:1,((0564_22:1,0564_6:1)Node36:1,0564_3:2)Node35:2)Node29:1)Node17:1,0564_9:1)Node16:1,(((0557_24:1,0557_4:1,0557_2:1)Node9:1,0557_12:1)Node8:1,((0557_21:1,0557_6:1,0557_9:1,0557_11:1,0557_13:1,0557_26:1,(0557_5:1,0557_7:1)Node53:1)Node6:1,0557_25:1)Node7:1)Separator:2);
-
extract_model_tree
(model, partition=None, original_names=False)¶ Return newick phylogeny fitted to a certain model, i.e. with branch lengths optimized for specified model. This is just a special case of map_branch_attribute.
- Required positional arguments:
- model, the name of the model whose optimized tree you wish to obtain. Models names available can be revealed with the method .reveal_fitted_models().
- Optional keyword arguments:
- partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
- original_names, reformat the tree with the original names (as opposed to hyphy-friendly names with forbidden characters replaced). In most cases hyphy and original names are identical. Default: False.
Examples:
>>> ### Define a FEL Extractor, for example >>> e = Extractor("/path/to/FEL.json") >>> e.extract_model_tree("Global MG94xREV) ((((Pig:0.192554792971,Cow:0.247996722936)Node3:0.101719189407,Horse:0.211310618381,Cat:0.273732369855)Node2:0.0644249932769,((RhMonkey:0.00372054481786,Baboon:0.0017701670358)Node9:0.0259206344918,(Human:0,Chimp:0.00182836999996)Node12:0.0178636195889)Node8:0.109431753602)Node1:0.284434196447,Rat:0.0670087588444,Mouse:0.120166947697);"
>>> ### Use original names rather than HyPhy-reformatted names >>> e.extract_model_tree("Global MG94xREV") ((((Pig~gy:0.192554792971,Cow:0.247996722936)Node3:0.101719189407,Horse:0.211310618381,Cat:0.273732369855)Node2:0.0644249932769,((RhMonkey:0.00372054481786,Baboon:0.0017701670358)Node9:0.0259206344918,(Human:0,Chimp:0.00182836999996)Node12:0.0178636195889)Node8:0.109431753602)Node1:0.284434196447,Rat:0.0670087588444,Mouse:0.120166947697);
>>> ### Define a FEL Extractor, from an analysis with multiple partitions, for example >>> e = Extractor("/path/to/FEL_mulitpart.json") >>> e.extract_model_tree("Global MG94xREV", partition = 1) ## specify only one partition (((((AF231119:0.00272571804934,AF231115:0)Node4:0,((AF082576:0.00274243126371,AF231113:0)Node8:0.0027139677452,AF231114:0.011078118042)Node7:0.00276605108624)Node3:0.00271644261188,(AF231117:0.00298921107219,AF231118:0.0505182782033)Node12:0.00258521327296)Node2:0.00550172127052,(AF186242:0,(AF186243:0,AF234767:0.0224059556982)Node17:0.00273365779956)Node15:0.00270941747926)Node1:0,(AF186241:0.00270936341991,AF231116:0)Node20:0,AF187824:0.00546772497238);
-
extract_absrel_tree
(original_names=False, update_branch_lengths=None, p=0.05, labels=None)¶ Return newick phylogeny in Extended Newick Format (
ete
-style features) as selection indicators (Default is 0 for not selected, 1 for selected) at the specified p threshold. aBSREL only.- Optional keyword arguments:
- original_names, reformat the tree with the original names (as opposed to hyphy-friendly names with forbidden characters replaced). In most cases hyphy and original names are identical. Default: False.
- update_branch_lengths, string model name, indicting that branch lengths should be replaced with the given model fit’s optimized lengths. Default: None.
- p, the p-value threshold for calling selection. Default: 0.05
- labels: A tuple of labels to use for (selected, not selected). Default is (1,0)
Examples:
>>> ### Define an ABSREL Extractor >>> e = Extractor("/path/to/ABSREL.json")
>>> ### Add extended-newick format labels of selection with default labels. >>> ### Note this example happens not to have branch lengths in the input tree. >>> e.extract_absrel_tree() (0564_7:1[&&NHX:Selected=0],(((((0564_11:1[&&NHX:Selected=0],0564_4:1[&&NHX:Selected=0])Node20:1[&&NHX:Selected=0],(0564_1:1[&&NHX:Selected=0],(0564_21:1[&&NHX:Selected=0],0564_5:1[&&NHX:Selected=0])Node25:1[&&NHX:Selected=0])Node23:1[&&NHX:Selected=0])Node19:1[&&NHX:Selected=0],0564_17:1[&&NHX:Selected=0])Node18:1[&&NHX:Selected=0],((0564_13:1[&&NHX:Selected=0],(0564_15:1[&&NHX:Selected=0])Node32:1[&&NHX:Selected=0])Node30:1[&&NHX:Selected=0],((0564_22:1[&&NHX:Selected=0],0564_6:1[&&NHX:Selected=0])Node36:1[&&NHX:Selected=0],0564_3:1[&&NHX:Selected=1])Node35:1[&&NHX:Selected=1])Node29:1[&&NHX:Selected=0])Node17:1[&&NHX:Selected=0],0564_9:1[&&NHX:Selected=0])Node16:1[&&NHX:Selected=0],(((0557_24:1[&&NHX:Selected=0],0557_4:1[&&NHX:Selected=0],0557_2:1[&&NHX:Selected=0])Node9:1[&&NHX:Selected=0],0557_12:1[&&NHX:Selected=0])Node8:1[&&NHX:Selected=0],((0557_21:1[&&NHX:Selected=0],0557_6:1[&&NHX:Selected=0],0557_9:1[&&NHX:Selected=0],0557_11:1[&&NHX:Selected=0],0557_13:1[&&NHX:Selected=0],0557_26:1[&&NHX:Selected=0],(0557_5:1[&&NHX:Selected=0],0557_7:1[&&NHX:Selected=0])Node53:1[&&NHX:Selected=0])Node6:1[&&NHX:Selected=0],0557_25:1[&&NHX:Selected=0])Node7:1[&&NHX:Selected=0])Separator:1[&&NHX:Selected=1])[&&NHX:Selected=0];
>>> ### Add extended-newick format labels of selection with default labels, with branch lengths updated as the adaptive model >>> e.extract_absrel_tree(update_branch_lengths="Full adaptive model") (0564_7:0.00708844[&&NHX:Selected=0],(((((0564_11:0.00527268[&&NHX:Selected=0],0564_4:0.00714182[&&NHX:Selected=0])Node20:0.0022574[&&NHX:Selected=0],(0564_1:0.00583239[&&NHX:Selected=0],(0564_21:0.00121537[&&NHX:Selected=0],0564_5:0.00266921[&&NHX:Selected=0])Node25:0.000797211[&&NHX:Selected=0])Node23:0.00142056[&&NHX:Selected=0])Node19:0.0019147[&&NHX:Selected=0],0564_17:0.00605582[&&NHX:Selected=0])Node18:0.00100178[&&NHX:Selected=0],((0564_13:0.0053066[&&NHX:Selected=0],(0564_15:0.00346989[&&NHX:Selected=0])Node32:0.000752206[&&NHX:Selected=0])Node30:0.00188243[&&NHX:Selected=0],((0564_22:0.00686981[&&NHX:Selected=0],0564_6:0.00581523[&&NHX:Selected=0])Node36:0.00125905[&&NHX:Selected=0],0564_3:0.00791919[&&NHX:Selected=1])Node35:0.0174886[&&NHX:Selected=1])Node29:0.0010489[&&NHX:Selected=0])Node17:0.00156911[&&NHX:Selected=0],0564_9:0.00551506[&&NHX:Selected=0])Node16:0.000783733[&&NHX:Selected=0],(((0557_24:0.00078793[&&NHX:Selected=0],0557_4:0.000787896[&&NHX:Selected=0],0557_2:0.000399166[&&NHX:Selected=0])Node9:0.00206483[&&NHX:Selected=0],0557_12:0.00267531[&&NHX:Selected=0])Node8:0.00118205[&&NHX:Selected=0],((0557_21:0[&&NHX:Selected=0],0557_6:0.000391941[&&NHX:Selected=0],0557_9:0.000402021[&&NHX:Selected=0],0557_11:0.00156985[&&NHX:Selected=0],0557_13:0.000401742[&&NHX:Selected=0],0557_26:0.00079377[&&NHX:Selected=0],(0557_5:0.00117641[&&NHX:Selected=0],0557_7:0[&&NHX:Selected=0])Node53:0.000391973[&&NHX:Selected=0])Node6:0.00118062[&&NHX:Selected=0],0557_25:0.00220372[&&NHX:Selected=0])Node7:0.00103489[&&NHX:Selected=0])Separator:0.00822051[&&NHX:Selected=1])[&&NHX:Selected=0];
>>> ### Add extended-newick format labels of selection with *custom* labels, with branch lengths updated as the adaptive model >>> e.extract_absrel_tree(update_branch_lengths="Full adaptive model", labels=["no", "yes"]) (0564_7:0.00708844[&&NHX:Selected=yes],(((((0564_11:0.00527268[&&NHX:Selected=yes],0564_4:0.00714182[&&NHX:Selected=yes])Node20:0.0022574[&&NHX:Selected=yes],(0564_1:0.00583239[&&NHX:Selected=yes],(0564_21:0.00121537[&&NHX:Selected=yes],0564_5:0.00266921[&&NHX:Selected=yes])Node25:0.000797211[&&NHX:Selected=yes])Node23:0.00142056[&&NHX:Selected=yes])Node19:0.0019147[&&NHX:Selected=yes],0564_17:0.00605582[&&NHX:Selected=yes])Node18:0.00100178[&&NHX:Selected=yes],((0564_13:0.0053066[&&NHX:Selected=yes],(0564_15:0.00346989[&&NHX:Selected=yes])Node32:0.000752206[&&NHX:Selected=yes])Node30:0.00188243[&&NHX:Selected=yes],((0564_22:0.00686981[&&NHX:Selected=yes],0564_6:0.00581523[&&NHX:Selected=yes])Node36:0.00125905[&&NHX:Selected=yes],0564_3:0.00791919[&&NHX:Selected=no])Node35:0.0174886[&&NHX:Selected=no])Node29:0.0010489[&&NHX:Selected=yes])Node17:0.00156911[&&NHX:Selected=yes],0564_9:0.00551506[&&NHX:Selected=yes])Node16:0.000783733[&&NHX:Selected=yes],(((0557_24:0.00078793[&&NHX:Selected=yes],0557_4:0.000787896[&&NHX:Selected=yes],0557_2:0.000399166[&&NHX:Selected=yes])Node9:0.00206483[&&NHX:Selected=yes],0557_12:0.00267531[&&NHX:Selected=yes])Node8:0.00118205[&&NHX:Selected=yes],((0557_21:0[&&NHX:Selected=yes],0557_6:0.000391941[&&NHX:Selected=yes],0557_9:0.000402021[&&NHX:Selected=yes],0557_11:0.00156985[&&NHX:Selected=yes],0557_13:0.000401742[&&NHX:Selected=yes],0557_26:0.00079377[&&NHX:Selected=yes],(0557_5:0.00117641[&&NHX:Selected=yes],0557_7:0[&&NHX:Selected=yes])Node53:0.000391973[&&NHX:Selected=yes])Node6:0.00118062[&&NHX:Selected=yes],0557_25:0.00220372[&&NHX:Selected=yes])Node7:0.00103489[&&NHX:Selected=yes])Separator:0.00822051[&&NHX:Selected=no])[&&NHX:Selected=0];
>>> ### Add extended-newick format labels of selection with default labels, using a P-threshold of 0.3 >>> e.extract_absrel_tree(p=0.3) (0564_7:1[&&NHX:Selected=1],(((((0564_11:1[&&NHX:Selected=0],0564_4:1[&&NHX:Selected=0])Node20:1[&&NHX:Selected=0],(0564_1:1[&&NHX:Selected=0],(0564_21:1[&&NHX:Selected=0],0564_5:1[&&NHX:Selected=0])Node25:1[&&NHX:Selected=0])Node23:1[&&NHX:Selected=0])Node19:1[&&NHX:Selected=0],0564_17:1[&&NHX:Selected=0])Node18:1[&&NHX:Selected=0],((0564_13:1[&&NHX:Selected=0],(0564_15:1[&&NHX:Selected=0])Node32:1[&&NHX:Selected=0])Node30:1[&&NHX:Selected=0],((0564_22:1[&&NHX:Selected=0],0564_6:1[&&NHX:Selected=0])Node36:1[&&NHX:Selected=0],0564_3:1[&&NHX:Selected=1])Node35:1[&&NHX:Selected=1])Node29:1[&&NHX:Selected=0])Node17:1[&&NHX:Selected=0],0564_9:1[&&NHX:Selected=0])Node16:1[&&NHX:Selected=0],(((0557_24:1[&&NHX:Selected=0],0557_4:1[&&NHX:Selected=0],0557_2:1[&&NHX:Selected=0])Node9:1[&&NHX:Selected=0],0557_12:1[&&NHX:Selected=0])Node8:1[&&NHX:Selected=0],((0557_21:1[&&NHX:Selected=0],0557_6:1[&&NHX:Selected=0],0557_9:1[&&NHX:Selected=0],0557_11:1[&&NHX:Selected=0],0557_13:1[&&NHX:Selected=0],0557_26:1[&&NHX:Selected=0],(0557_5:1[&&NHX:Selected=0],0557_7:1[&&NHX:Selected=0])Node53:1[&&NHX:Selected=0])Node6:1[&&NHX:Selected=0],0557_25:1[&&NHX:Selected=0])Node7:1[&&NHX:Selected=0])Separator:1[&&NHX:Selected=1])[&&NHX:Selected=0];
-
extract_feature_tree
(feature, original_names=False, update_branch_lengths=None, partition=None)¶ Return newick phylogeny in Extended Newick Format (
ete
-style features) with specified feature(s).- Required positional arguments:
- feature, The feature(s) to be included the final tree. This is either a string of a feature, or a list of features. Features are taken from attributes. Note that the exported feature label will have all spaces removed.
- Optional keyword arguments:
- update_branch_lengths, string model name, indicting that branch lengths should be replaced with the given model fit’s optimized lengths. Default: None.
- partition, Integer indicating which partition’s tree to return (as a string) if multiple partitions exist. NOTE: PARTITIONS ARE ORDERED FROM 0. This argument is ignored for single-partitioned analyses.
Examples:
>>> ### Define an ABSREL Extractor >>> e = Extractor("/path/to/ABSREL.json")
>>> ### Add a single feature, rate classes >>> ### Note this example happens to have no branch lengths >>> e.extract_feature_tree("Rate classes") (0564_7:1[&&NHX:Rateclasses=1],(((((0564_11:1[&&NHX:Rateclasses=2],0564_4:1[&&NHX:Rateclasses=2])Node20:1[&&NHX:Rateclasses=1],(0564_1:1[&&NHX:Rateclasses=1],(0564_21:1[&&NHX:Rateclasses=1],0564_5:1[&&NHX:Rateclasses=1])Node25:1[&&NHX:Rateclasses=1])Node23:1[&&NHX:Rateclasses=1])Node19:1[&&NHX:Rateclasses=1],0564_17:1[&&NHX:Rateclasses=1])Node18:1[&&NHX:Rateclasses=1],((0564_13:1[&&NHX:Rateclasses=1],(0564_15:1[&&NHX:Rateclasses=2])Node32:1[&&NHX:Rateclasses=1])Node30:1[&&NHX:Rateclasses=1],((0564_22:1[&&NHX:Rateclasses=1],0564_6:1[&&NHX:Rateclasses=1])Node36:1[&&NHX:Rateclasses=1],0564_3:1[&&NHX:Rateclasses=2])Node35:1[&&NHX:Rateclasses=2])Node29:1[&&NHX:Rateclasses=1])Node17:1[&&NHX:Rateclasses=1],0564_9:1[&&NHX:Rateclasses=1])Node16:1[&&NHX:Rateclasses=1],(((0557_24:1[&&NHX:Rateclasses=1],0557_4:1[&&NHX:Rateclasses=1],0557_2:1[&&NHX:Rateclasses=1])Node9:1[&&NHX:Rateclasses=1],0557_12:1[&&NHX:Rateclasses=1])Node8:1[&&NHX:Rateclasses=1],((0557_21:1[&&NHX:Rateclasses=1],0557_6:1[&&NHX:Rateclasses=1],0557_9:1[&&NHX:Rateclasses=1],0557_11:1[&&NHX:Rateclasses=1],0557_13:1[&&NHX:Rateclasses=1],0557_26:1[&&NHX:Rateclasses=1],(0557_5:1[&&NHX:Rateclasses=1],0557_7:1[&&NHX:Rateclasses=1])Node53:1[&&NHX:Rateclasses=1])Node6:1[&&NHX:Rateclasses=1],0557_25:1[&&NHX:Rateclasses=1])Node7:1[&&NHX:Rateclasses=1])Separator:1[&&NHX:Rateclasses=2])[&&NHX:Rateclasses=0];;
>>> ### Add a single feature, rate classes, with updated branch lengths >>> e.extract_feature_tree("Rate classes",update_branch_lengths = "Nucleotide GTR") (0564_7:0.00664844[&&NHX:Rateclasses=1],(((((0564_11:0.00434881[&&NHX:Rateclasses=2],0564_4:0.00593219[&&NHX:Rateclasses=2])Node20:0.0026739[&&NHX:Rateclasses=1],(0564_1:0.00559179[&&NHX:Rateclasses=1],(0564_21:0.00124334[&&NHX:Rateclasses=1],0564_5:0.00259957[&&NHX:Rateclasses=1])Node25:0.000863062[&&NHX:Rateclasses=1])Node23:0.00149918[&&NHX:Rateclasses=1])Node19:0.00164262[&&NHX:Rateclasses=1],0564_17:0.00600417[&&NHX:Rateclasses=1])Node18:0.00100639[&&NHX:Rateclasses=1],((0564_13:0.00534732[&&NHX:Rateclasses=1],(0564_15:0.00278489[&&NHX:Rateclasses=2])Node32:0.000565591[&&NHX:Rateclasses=1])Node30:0.00196314[&&NHX:Rateclasses=1],((0564_22:0.00686685[&&NHX:Rateclasses=1],0564_6:0.00554135[&&NHX:Rateclasses=1])Node36:0.000954793[&&NHX:Rateclasses=1],0564_3:0.00652918[&&NHX:Rateclasses=2])Node35:0.00195507[&&NHX:Rateclasses=2])Node29:0.001029[&&NHX:Rateclasses=1])Node17:0.00155756[&&NHX:Rateclasses=1],0564_9:0.00533212[&&NHX:Rateclasses=1])Node16:0.000567283[&&NHX:Rateclasses=1],(((0557_24:0.000770978[&&NHX:Rateclasses=1],0557_4:0.000770929[&&NHX:Rateclasses=1],0557_2:0.000385454[&&NHX:Rateclasses=1])Node9:0.00199212[&&NHX:Rateclasses=1],0557_12:0.00265769[&&NHX:Rateclasses=1])Node8:0.00119139[&&NHX:Rateclasses=1],((0557_21:0[&&NHX:Rateclasses=1],0557_6:0.000388349[&&NHX:Rateclasses=1],0557_9:0.000388411[&&NHX:Rateclasses=1],0557_11:0.00155424[&&NHX:Rateclasses=1],0557_13:0.000388353[&&NHX:Rateclasses=1],0557_26:0.000776809[&&NHX:Rateclasses=1],(0557_5:0.00116512[&&NHX:Rateclasses=1],0557_7:0[&&NHX:Rateclasses=1])Node53:0.000388349[&&NHX:Rateclasses=1])Node6:0.00118273[&&NHX:Rateclasses=1],0557_25:0.00219124[&&NHX:Rateclasses=1])Node7:0.000940406[&&NHX:Rateclasses=1])Separator:0.00689203[&&NHX:Rateclasses=2])[&&NHX:Rateclasses=0];"
>>> ### Add a multiple features with updated branch lengths >>> e.extract_feature_tree(["Rate classes", "LRT", update_branch_lengths = "Nucleotide GTR") (0564_7:0.00664844[&&NHX:Rateclasses=1:LRT=0],(((((0564_11:0.00434881[&&NHX:Rateclasses=2:LRT=3.96048976951],0564_4:0.00593219[&&NHX:Rateclasses=2:LRT=4.80587881259])Node20:0.0026739[&&NHX:Rateclasses=1:LRT=3.30060030447],(0564_1:0.00559179[&&NHX:Rateclasses=1:LRT=0.0105269546166],(0564_21:0.00124334[&&NHX:Rateclasses=1:LRT=0],0564_5:0.00259957[&&NHX:Rateclasses=1:LRT=4.51927751707])Node25:0.000863062[&&NHX:Rateclasses=1:LRT=0])Node23:0.00149918[&&NHX:Rateclasses=1:LRT=0])Node19:0.00164262[&&NHX:Rateclasses=1:LRT=0.153859379099],0564_17:0.00600417[&&NHX:Rateclasses=1:LRT=0])Node18:0.00100639[&&NHX:Rateclasses=1:LRT=1.64667962972],((0564_13:0.00534732[&&NHX:Rateclasses=1:LRT=0],(0564_15:0.00278489[&&NHX:Rateclasses=2:LRT=4.97443221859])Node32:0.000565591[&&NHX:Rateclasses=1:LRT=0])Node30:0.00196314[&&NHX:Rateclasses=1:LRT=2.86518293899],((0564_22:0.00686685[&&NHX:Rateclasses=1:LRT=0.114986865638],0564_6:0.00554135[&&NHX:Rateclasses=1:LRT=0])Node36:0.000954793[&&NHX:Rateclasses=1:LRT=0],0564_3:0.00652918[&&NHX:Rateclasses=2:LRT=14.0568340492])Node35:0.00195507[&&NHX:Rateclasses=2:LRT=22.65142315])Node29:0.001029[&&NHX:Rateclasses=1:LRT=1.50723222708])Node17:0.00155756[&&NHX:Rateclasses=1:LRT=2.63431127725],0564_9:0.00533212[&&NHX:Rateclasses=1:LRT=0])Node16:0.000567283[&&NHX:Rateclasses=1:LRT=0],(((0557_24:0.000770978[&&NHX:Rateclasses=1:LRT=0],0557_4:0.000770929[&&NHX:Rateclasses=1:LRT=0],0557_2:0.000385454[&&NHX:Rateclasses=1:LRT=0])Node9:0.00199212[&&NHX:Rateclasses=1:LRT=0],0557_12:0.00265769[&&NHX:Rateclasses=1:LRT=0])Node8:0.00119139[&&NHX:Rateclasses=1:LRT=1.99177154715],((0557_21:0[&&NHX:Rateclasses=1:LRT=0],0557_6:0.000388349[&&NHX:Rateclasses=1:LRT=0.662153642024],0557_9:0.000388411[&&NHX:Rateclasses=1:LRT=0],0557_11:0.00155424[&&NHX:Rateclasses=1:LRT=2.65063418443],0557_13:0.000388353[&&NHX:Rateclasses=1:LRT=0],0557_26:0.000776809[&&NHX:Rateclasses=1:LRT=0],(0557_5:0.00116512[&&NHX:Rateclasses=1:LRT=1.98893541781],0557_7:0[&&NHX:Rateclasses=1:LRT=0])Node53:0.000388349[&&NHX:Rateclasses=1:LRT=0.660753167251])Node6:0.00118273[&&NHX:Rateclasses=1:LRT=0],0557_25:0.00219124[&&NHX:Rateclasses=1:LRT=0])Node7:0.000940406[&&NHX:Rateclasses=1:LRT=1.69045394756])Separator:0.00689203[&&NHX:Rateclasses=2:LRT=14.127483568])[&&NHX:Rateclasses=0:LRT=0];
-
reveal_fields
()¶ Return list of top-level JSON fields.
Examples:
>>> ### Define an ABSREL Extractor >>> e = Extractor("/path/to/ABSREL.json")
>>> ### Reveal all fields >>> e.reveal_fields() ['branch attributes', 'analysis', 'tested', 'data partitions', 'timers', 'fits', 'input', 'test results']
-
extract_csv
(csv, delim=', ', original_names=True, slac_ancestral_type='AVERAGED')¶ - Extract a CSV from JSON, for certain methods:
- FEL
- SLAC
- MEME
- FUBAR
- LEISR
- aBSREL
- Output for each:
- FEL
site
, the site of interestalpha
, Synonymous substitution rate at a sitebeta
, Non-synonymous substitution rate at a sitealpha=beta
,The rate estimate under the neutral modelLRT
, Likelihood ratio test statistic forbeta = alpha
, vs. alternativebeta != alpha
p-value
, P-value from the Likelihood ratio testTotal branch length
, The total length of branches contributing to inference at this site
- SLAC
site
, the site of interestES
, Expected synonymous sitesEN
, Expected non-synonymous sitesS
, Inferred synonymous substitutionsN
, Inferred non-synonymous substitutionsP[S]
, Expected proportion of synonymous sitesdS
, Inferred synonymous susbsitution ratedN
, Inferred non-synonymous susbsitution ratedN-dS
, Scaled by the length of the tested branchesP_[dN/dS_>_1]
, Binomial probability that S is no greater than the observed value, with P<sub>s</sub> probability of successP_[dN/dS_<_1]
, Binomial probability that S is no less than the observed value, with P<sub>s</sub> probability of successTotal branch length
, The total length of branches contributing to inference at this site, and used to scale dN-dS};
- MEME
site
, the site of interestalpha
, Synonymous substitution rate at a sitebeta_neg>
, Non-synonymous substitution rate at a site for the negative/neutral evolution componentprop_beta_neg
, Mixture distribution weight allocated to beta_neg; loosely – the proportion of the tree evolving neutrally or under negative selectionbeta_pos
, Non-synonymous substitution rate at a site for the positive/neutral evolution componentprop_beta_pos
, Mixture distribution weight allocated to beta_pos; loosely – the proportion of the tree evolving neutrally or under positive selectionLRT
, Likelihood ratio test statistic for episodic diversificationp-value
, Asymptotic p-value for episodic diversificationnum_branches_under_selection
, The (very approximate and rough) estimate of how many branches may have been under selection at this site, i.e., had an empirical Bayes factor of 100 or more for the beta_pos rateTotal_branch_length
, The total length of branches contributing to inference at this site
- FUBAR
site
, the site of interestalpha
, Mean posterior synonymous substitution rate at a sitebeta
, Mean posterior non-synonymous substitution rate at a sitebeta-alpha
, Mean posterior beta-alphaProb[alpha>beta]
, Posterior probability of negative selection at a siteProb[alpha<beta]
, Posterior probability of positive selection at a siteBayesFactor[alpha<beta]
, Empiricial Bayes Factor for positive selection at a sitePSRF
, Potential scale reduction factor - an MCMC mixing measureNeff
, Estimated effective sample site for Prob [alpha<beta]};
- LEISR
site
, the site of interestMLE
, Relative rate estimate at a siteLower
, Lower bound of 95% profile likelihood CIUpper
, Upper bound of 95% profile likelihood CI
- aBSREL
node
, Node name of interestbaseline_omega
, Baseline omega estimate under the MG94xREV modelnumber_rate_classes
, Number of final rate classes inferred by the adaptive modeltested
, Was this branch tested for selected? 0/1prop_sites_selected
, Proportion of selected sites along the branchLRT
, Likelihood ratio test statisticuncorrected_P
, Uncorrected P-value associated with LRTcorrected_P
, FDR-corrected P-value for selection along this branch
- Required positional arguments:
- csv, File name for output CSV
- Optional keyword arguments:
- delim, A different delimitor for the output, e.g. ” ” for tab
- original_names, An ABSREL specific boolean argument to indicate whether HyPhy-reformatted branch should be used in output csv (False), or original names as present in the input data alignment should be used (True). Default: True
- slac_ancestral_type, A SLAC specific argument, either “AVERAGED” (Default) or “RESOLVED” (case insensitive) to indicate whether reported results should be from calculations done on either type of ancestral counting.
Examples:
>>> ### Define an ABSREL Extractor, for example >>> e = Extractor("/path/to/ABSREL.json") >>> e.extract_csv("absrel.csv") >>> ## With original names >>> e.extract_csv("absrel.csv", original_names=True) >>> ## As tsv >>> e.extract_csv("absrel.csv", delim="\t")
>>> ### Define a FEL Extractor, for example >>> e = Extractor("/path/to/FEL.json") >>> e.extract_csv("fel.csv")
>>> ### Define a SLAC Extractor, for example >>> e = Extractor("/path/to/SLAC.json") >>> e.extract_csv("slac.csv") >>> ### Specify to export ancestral RESOLVED inferences >>> e.extract_csv("slac.csv", slac_ancestral_type = "RESOLVED")
-
extract_timers
()¶ Extract dictionary of timers, with display order removed
Examples:
>>> ### Define an ABSREL Extractor, for example >>> e = Extractor("/path/to/ABSREL.json") >>> e.extract_timers() {'Full adaptive model fitting': 13.0, 'Preliminary model fitting': 1.0, 'Overall': 451.0, 'Testing for selection': 236.0, 'Baseline model fitting': 7.0, 'Complexity analysis': 193.0}
-
extract_site_logl
()¶ Extract BUSTED site log likelihoods, as dictionary
Examples:
>>> ### Define a BUSTED Extractor >>> e = Extractor("/path/to/BUSTED.json") >>> e.extract_site_logl() ## output below abbreviated for visual purposes {'unconstrained': [-3.820084348560567, -5.294209244108775, ....], 'constrained': [-3.816130970827346, -5.290292412948827, -3.740077801446913, ...], 'optimized null': [-3.819912933488241, -5.292549093626999, -3.743404692680405, ...]}
-
extract_evidence_ratios
()¶ Extract BUSTED evidence ratios, as dictionary.
Examples:
>>> ### Define a BUSTED Extractor >>> e = Extractor("/path/to/BUSTED.json") >>> e.extract_evidence_ratios() ## output below abbreviated for visual purposes {'constrained': [0.9960544265766805, 0.9960908296179645, 0.9962555861651011, ...], 'optimized null': [0.9998285996183979, 0.9983412268057609, 0.9995755396409283, ...]}