Functional Ecological Genomics 2017

Materials for Stephanie Spielman's sessions

Detecting selection in protein-coding sequences



Overview

This session will address the question: Given a multiple sequence alignment containing protein-coding sequences and a corresponding phylogeny, what kinds of questions can we ask about natural selection?

We will investigate the signatures of natural selection in protein-coding sequence data usng HyPhy, via the Datamonkey webserver.

Note: Newer analyses (like BUSTED, aBSREL, and RELAX) are only available via the development Datamonkey webserver.

Background

These papers provide excellent overviews (with technical details as well) of phylogenetic codon models:

Materials

You can download slides for today’s session here.

Data

We will investigate selection using two example datasets:

  1. A dataset of 10 CD2 mammalian orthologs.
  2. A dataset containing Borna disease virus sequences from both endogenous and closely-related “free-living” viruses.

Protip: This python script is a useful wrapper for prepping your data for HyPhy input. It is written in Python2 and requires that the aligner mafft and the phylogenetic reconstruction method FastTree are installed and available in your path.

Note that you can use this online widget to visualize and annotate phylogenies.

Results

Method Input data Datamonkey Version Result page(s)*
BUSTED CD2 test.datamonkey.org Subset FG Results and All FG Results
aBSREL CD2 test.datamonkey.org Results
RELAX Borna test.datamonkey.org Results
FEL CD2 datamonkey.org Results
MEME CD2 datamonkey.org Results

*Note: Some of these links will expire a few days after this workshop, so please re-run analyses in the future.