Associate manuscript for all scripts on this page: Deng, L., Ignacio-Espinoza, J.C., Gregory, A., Poulos, B.T., Weitz, J.S., Hugenholtz, P., Sullivan, M.B. (accepted). Viral tagging reveals discrete populations in Synechococcus viral genome sequence space. Nature.
recruitment.txt.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: July 2011
recruit2cloud1-0-2.py.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Sep 2012
Description: Recruit2cloud01.pl takes a series of genome reads, a reference genome, and a blastn of the reference genome (in that order) to generate a COV file. Each line in the COV file corresponds to all the bases found in that position after aligning the reads to the blastn output with muscle. The first-order option and file descriptor and option represent the file with the return-delimited list of genome reads. The second-order option and file descriptor and option represents the file with the reference genome. The third-order option and file descriptor and option represents the file with the blastn output of the reference genome.
plotsmall.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Jan 2013
sizeandlocation.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Jan 2013
rarefaction.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Generates a rarefaction curve from resampling a tabulated list of reads and its assigned protein cluster.
dunns.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Calculates Dunn's index as a way to asses the compactness and separation of clusters.
dunnrdm.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Generates a random distribution of Dunn's index values from the data to evaluate the observed Dunn's index. Then the effect size (z-score) serves as a direct form of evaluation of the observed value.
acc.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Calculates the accuracy of assignation of clusters. Data points are assigned to the closest cluster centroid. Then, accuracy of assignation Q, becomes the ratio of accurate assignations to the total number of observations.
accrdm.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Nov 2013
Description: Generates a random distribution of values of Q. The effect size can be obtained by comparing this distribution to the observed values of Q.
matrix2pca.m.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: Matlab list of commands, input is a m x n matrix where m is the number of observations and n is the number of variables measured.
read2genome.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: The input files are a blastn file and a reference file. It aligns the best hits to the reference dataset, It outputs a per base frequency of nucleotides along the reference genome.
randomgenome.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: The input is the output of read2genome.pl. It generates a set of random contigs based on their per base frequency.
chopgenome.pl.zip
Author: Julio Cesar Ignacio-Espinoza
Last Revision: Dec 2013
Description: The input is a fasta file, it outputs a multi fasta file where the original file has been cut.