Recently Published
mec_proteome_scatterplots
Scatterplot ggplot2 code for complex proteomes with metadata
mec cassette relative abundances and synteny
This document details methods used to calculate relative abundances of mec cassettes and the custom R scripting used to construct mec cassette synteny maps.
mec cassette: phylogenetic trees
This document details the pipeline used to harvest protein homologs, cluster, align and build ML-approximated phylogenetic trees
mec cassette: transcriptomics
This document describes the pipeline used for detection of mec genes in contaminated site metatranscriptomes
geosmithia.assembly.and.gene.prediction
Overview of assembly, use of MAKER for gene prediction, and preparation for protein annotation
geosmithia.comp.resources
resources for geosmithia comp project
RAG charts
Simple method for making Red/Amber/Green charts with grouped assays
G. morbida hierarchical annotations for GenBank
Development of a semi-automated, reproducible system for generating a GenBank/NCBI-compatible hierarchical Eukaryotic genome annotation by making a gff file
Saline Isolates 1908
Primarily a purity check of strain AR
G.morbida CodingQuarry Pathogen Mode
CodingQuarry Pathogen Mode was applied to the G. morbida reference genome to search for protein-coding genes which were missed initially. Resulting proteins are subjected to positive selection testing and functionally annotated
Initial.GWAS.with.PLINK2
converting a multi-vcf file into PLINK2 binary format, conducting -glm (linear regression?) for variants against a quantitative phenotype, and characterizing variant positions in regard to proximity to coding regions
G.morbida COG and KEGG PSgene summaries
The goal of this script is to present KEGG and COG functional category analyses for the entire Geosmithia genome vs the positively selcted (PS) genes.
Clean COG ID to COG function mapping file
The goal of this script is to make a clean COG ID to mapping table that can be used for generating your own COG category summary tables and figures.
Arhodomonas Mauve whole genome alignment visualization
Simple script for turning the Mauve backbone file, resulting from an iterative contig reordering, into a more graceful, customizable vector graphic
Affymetrix: Nested Interactions Decisions
There is no standard procedure regarding how to filter DE gene lists. This document describes and executes three standard methods and examines the consequences to interpretation
Affymetrix v2.0: all analyses
This second analysis structuring of the bladder Affymetrix set aims towards characterizing treatment effects, strain effects, and then ultimately functions that are differentially expressed in treated GFR versus treated GF
GFR Affymetrix analysis
This script performs basic overviews of the GFR Affymetrix data by running MDS ordinations and plotting probe intensity.
Affymetrix: tGFR vs. tGF
Examining the differential expression in treated mutant strain vs. treated wild-type
Affymetrix: tGF vs. cGF
Testing the effect of treatment on the control strain
Affymetrix: tGFR vs. cGFR
Pairwise comparison 2, examining the effect of treatment on the mutant strain
Affymetrix: cGFR vs. cGF
one of four pairwise test scripts for differential expression based on Affymetrix chip results processed primarily via the "limma" package
2.Arhodomonas.oxygenases.and.hydroxylases
A survey of oxyenases and hydroxylases in three Arhodomonas genomes
1.Arhodomonas KEGG modules
Characterizing and comparing KEGG modules in three Arhodomonas strains
KEGG search parsing
A simple script for parsing results of a KEGG orthology name search
qPCR validation
Some general metrics for comparing two qPCR standard curve runs
10.gene.diversity.summary
The goal of this pipeline is to take a variety of calculations performed on genes across the 22 strains resequenced and place them into a single database. To this point, gene diversity across the population has been characterized/analyzed by several metrics 1. Gene and protein allele counting 2. Rough protein vs. gene tree ratio calculation
1, dN/dS (w, omega, Ka/Ks, etc.) is being calculated across every gene using the “one-ratio model”, i.e., M0
2. Likelihood ratio test (LRT) for positive selection for each gene; compares the likelihood of a neutral evolution model (M7) with a positive selection model (M8): In most cases, the neutral model will be about as likely as the positive selection model (LR ~ 0). Where positive selection better explains the alignement pattern of a given gene, M8 explains the data better than M7. The LRT applies appropriate statistical testing and provides a p-value based on this difference in likelihoods.
Archaeal SSU gene fragment placements
SSU ASVs identified as Archaeal via qiime2 were placed onto a custom-built Archaeal SSU reference tree. Methods for building the SSU reference set and tree are described, along with fragment placement.
3.all.alleles.and.functions
this script takes the entire variant table and calculates allele counts across the population for each gene. The possible functional consequences are then explored using KEGG annotations for the more allelic genes
Oxygenases in strain 273
Comparing oxygenase gene families in strain 273 vs distributions across all Pseudomonas genomes
Archaeal RDases
taxonomic placement of novel Archaeal RDases
G.morbida.variant.heat.map
broad depiction of variants across genomes of the 22 strains using a heatmap and clustered genomes
6.nonsynonymous.alleles.and.functions
this script takes the nonsynonymous variant table and calculates allele counts across the population for each gene. The functional consequences are then explored using KEGG annotations for the more allelic genes(proteins).
1.G.morbida.vcf.to.ordination
Part of the G. morbida resequencing project. This script parses CLC multi-vcf file, producing a simple presence/absence table for each variant position and type. It them produces ordinations based on this table.
Pseudomonas sp. 273 annotation
The genome is annotated by prokka, then deeper functional annotation is provided as a gene table with KO, GO, COG/eggNOG annotations.
Parsing CD-hit clstr file
CD-hit produces very awkward output files. This rmd demonstrates how to take a CD-hit .clstr mapping file and turn it into a more easily searchable data table. It also attaches a map to genome and taxonomic information.
Bladder cancer chemotherapy Affymetrix study, differential expression analyses with "limma"
This describes importing .CEL Affymetrix expression data, configuring and running differential expression tests via the "limma" package, and connecting with gene annotation data in preparation for pathway mapping
g.morbida variant visualization 1 - ioslide trial
The same rmd report in ioslide format, mainly for testing purposes
g.morbida variant visualization 1
This describes initial attempts to summarize whole genome resequencing of 23 fungal strain genomes (same species). The goal is to work towards an overview of variant distribution.