EvIL- Research

TOAST | Utility Functions

What is in this section

In this section we will cover some utility functions hidden in TOAST.

This builds on the previous sections, so please go back to make sure you can use other functions.

Note that there are ways to use these utilities across huge numbers of files etc. Please modify them as needed for your purposes.

As we develop utilties for manuscripts we will add them here. Likewise, if you have a handy function without a home you would like to see featured please let us know!

Convertig between alignment filetypes

A chore in bioinformatics/phylogenomics is converting between alignment filetypes.
TOAST has a series of functions to convert between NEXUS, phylip, and fasta formats.

For our example to convert from nexus to fasta simply use

NexIntoFas(nexusfile= mynexusdata, filename="file_name",externalfile=TRUE, tolower=FALSE)

This function uses an aligned nexus file that is either in your working directory externalfile=TRUE or that has been read into memory, externalfile=false.
The tolower=TRUE options provides a chance to convert all text to lower case in case that is needed (this is required if END; and MATRIX are uppercase in your nexus file.

Alternatively, you may wish to convert a phylip to fasta

The following function uses an aligned phylip file that is either in your working directory externalfile=TRUE or that has been read into memory, externalfile=false. This then converts it to a fasta format writing to file with user specified filename

PhyIntoFasta(phylipfile="myphylipdata", filename="file_name",externalfile=TRUE)

Finally, you may wish to convert to phylip format from a fasta. Using similar logic this can be done as follows

FastaIntoPhy(fastafile=myfastadata, filename="file_name",externalfile=TRUE)

If you need one of the other possible conversions, you can use these functions to make intermediate steps. You can also loop these files across all the files in a directory to readily convert bactches of alignments.

Subsetting taxa out of alignments

One of the more troublesome chores in bioinformatics is the pruning of taxa that only share partial strings in their name. For example, you may wish to isolate all zebrafish sequences, but they are named G4657_Danr_245; G47897_DanR_249; Q789_Dan_754; etc in an alignment of thousands of taxa.

TOAST has a phylip alignment pruning function that can assess partial string matches by taxa or sequence motif to remove or keep target sequences.

PruneSuperAlign(filepath="yourpath/superalign.phy", targets=c("Danio", "Canis", "Balistes", "Equu"), fileName="NewFileName")

This writes a new file out that is just the sequences you are looking for and prints the new length and number of taxa found to screen. You can supply complete or partial matches. For example if you have sequences for horses, equus, equu, equ, would all return the same sequences provided there are no similar strings.

That's it for now! Check back as this area is one in active development and we are constantly developing little helper functions to make our lives easier.

Next Section: Gene tree based filtration

Back to: Previous Page

Skip to: Utilities | Interactive Plots

Back to: Installation

Back to: TOAST main page