Detecting selection in protein-coding sequences
Overview
This session will address the question: Given a multiple sequence alignment containing protein-coding sequences and a corresponding phylogeny, what kinds of questions can we ask about natural selection?
We will investigate the signatures of natural selection in protein-coding sequence data usng HyPhy, via the Datamonkey webserver.
Note: Newer analyses (like BUSTED, aBSREL, and RELAX) are only available via the development Datamonkey webserver.
Background
These papers provide excellent overviews (with technical details as well) of phylogenetic codon models:
- Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
- Trends in Substitution Models of Molecular Evolution
- If you want to dive in as much as possible, this book is the definitive overview of molecular evolution, including codon models: Computational Molecular Evolution
Materials
You can download slides for today’s session here.
Data
We will investigate selection using two example datasets:
- A dataset of 10 CD2 mammalian orthologs.
- HyPhy-formatted dataset
- Link to alignment only
- Link to tree only
- A dataset containing Borna disease virus sequences from both endogenous and closely-related “free-living” viruses.
- HyPhy-formatted dataset
- Link to alignment only
- Link to tree only
Protip: This python script is a useful wrapper for prepping your data for HyPhy input. It is written in Python2 and requires that the aligner mafft and the phylogenetic reconstruction method FastTree are installed and available in your path.
Note that you can use this online widget to visualize and annotate phylogenies.
Results
Method | Input data | Datamonkey Version | Result page(s)* |
---|---|---|---|
BUSTED | CD2 | test.datamonkey.org | Subset FG Results and All FG Results |
aBSREL | CD2 | test.datamonkey.org | Results |
RELAX | Borna | test.datamonkey.org | Results |
FEL | CD2 | datamonkey.org | Results |
MEME | CD2 | datamonkey.org | Results |
*Note: Some of these links will expire a few days after this workshop, so please re-run analyses in the future.