Viral tagging reveals discrete populations in Synechococcus viral genome sequence space

Associate manuscript for all scripts on this page: Deng, L., Ignacio-Espinoza, J.C., Gregory, A., Poulos, B.T., Weitz, J.S., Hugenholtz, P., Sullivan, M.B. (accepted). Viral tagging reveals discrete populations in Synechococcus viral genome sequence space. Nature.

recruitment

recruitment.txt.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: July 2011

Recruit2Cloud

recruit2cloud1-0-2.py.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Sep 2012
Description: Recruit2cloud01.pl takes a series of genome reads, a reference genome, and a blastn of the reference genome (in that order) to generate a COV file. Each line in the COV file corresponds to all the bases found in that position after aligning the reads to the blastn output with muscle. The first-order option and file descriptor and option represent the file with the return-delimited list of genome reads. The second-order option and file descriptor and option represents the file with the reference genome. The third-order option and file descriptor and option represents the file with the blastn output of the reference genome.

plotSmall

plotsmall.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Jan 2013

SizeandLocation

sizeandlocation.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Jan 2013

rarefaction

rarefaction.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Generates a rarefaction curve from resampling a tabulated list of reads and its assigned protein cluster.

dunns

dunns.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Calculates Dunn's index as a way to asses the compactness and separation of clusters.

dunnRdm

dunnrdm.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Generates a random distribution of Dunn's index values from the data to evaluate the observed Dunn's index. Then the effect size (z-score) serves as a direct form of evaluation of the observed value.

Acc

acc.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Calculates the accuracy of assignation of clusters. Data points are assigned to the closest cluster centroid. Then, accuracy of assignation Q, becomes the ratio of accurate assignations to the total number of observations.

AccRdm

accrdm.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Generates a random distribution of values of Q. The effect size can be obtained by comparing this distribution to the observed values of Q.

matrix2PCA.m

matrix2pca.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: Matlab list of commands, input is a m x n matrix where m is the number of observations and n is the number of variables measured.

read2genome.pl

read2genome.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: The input files are a blastn file and a reference file. It aligns the best hits to the reference dataset, It outputs a per base frequency of nucleotides along the reference genome.

randomGenome.pl

randomgenome.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: The input is the output of read2genome.pl. It generates a set of random contigs based on their per base frequency.

chopGenome.pl

chopgenome.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: The input is a fasta file, it outputs a multi fasta file where the original file has been cut.

bioinformatics/scripts/nature.txt · Last modified: 2015/10/15 22:37 (external edit)
CC Attribution-Noncommercial-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0