RPubs

by RStudio

finchnSNPs

Kristen Finch

Recently Published

Document

Finding environmental differences/distance.

over 6 years ago

Circular Statistics to Understand Random Expectations for Directional Data

Kristen Noelle Finch 3/28/2019 Ideas for this analysis were obtained from this website: astrostatistics.psu.edu/RLectures/day5.pdf via Rich Cronn which has no author or citation listed. Data I’m using data from predicted origins of individual Cedrela odorata s. s. trees based on 119 SNP genotypes. Error of origin estimation is the Haversine distance converted to km between the true origin and the predicted origin. I also calculated a bearing or angular direction from the true origin to the estimated origin. Both calculations were completed with the R package geoshpere. Assumption I assume that the observed origin estimations will be significantly different than a uniform distribution of bearings because the origin estimations are bias by the geographical distribution of my specimens. Similarly, the origin estimations I generated with randomized genotypes should also be affected by this bias. Question Are our origin estimations from observed genotypes and randomized genotypes different from a uniform distibution of angles (or bearings) around any point? Analyses Raleigh test - “This test is based on the fact that if the angles are equally scattered in all directions, then the resultant should be close to zero.” or “tests uniformity as opposed to too many angles in one direction.” Watson’s test - “This is another test based on a similar idea: if the resultant is too large, then most possibly the directions are not uniform. Watson’s test provides an approximate threshold for the length of the resultant to be considered too large.” Kuiper’s test - “This is a more sophisticated test that is based on the Kolmogorov-Smirnov idea.” Rao’s spacing test - “This test is based on the fact that if the angles are uniformly scattered in all directions, then the arc lengths between any two of the angles should have a particular type of distribution.” von Mises distribution - “So far we are talking about only the uniform distribution of angles. This is a special case of the von Mises distribution, which also allows the angles to crowd more towards a certain mean direction. This distribution takes two parameters: the mean direction and ; which measures the concentration of the angles around the mean direction.” In this case we can test if the given data follows a von Mise distribution with a Watson’s test (use argument: dist=‘vm’). According to Wikipedia, a von Mises distribution is a circular normal distribution.

over 6 years ago

Chapter 1 Analysis

Dissertation Research.

over 7 years ago

Learning Module Two: Species Classification with Random Forests

Draft of Bioinformatics Workshop day 3.

over 7 years ago

Assessing Bait Efficiency

Dissertation Research: The data here were collected August 28, 2017 and samples were sequenced on or near August 14, 2017 at Univeristy of Oregon. I have a pool of paired-end 100 sequences from *Cedrela*, *Swietenia*, *Guarea*, and *Trichillia* species (Meliaceae). These sequences were obtained via hybridization capture, targeted enrichment, and short-read sequencing on the Illumina HiSeq 4000. Baits were designed from the transcriptome of *Cedrela odorata*. Here I am testing how many reads were captured by the baits across these species.

about 8 years ago

Chloroplast Assembly Stats

Dissertation Research: This is a comparison of Chloroplast Assembly Protocols ABySS and Spades. The graphs show sequenctial changes to the assembly, and these data were generated using GAEMR basic_assebly_stats.py. DNASeq data from *C. odorata* from HiSeq 3000 PE-100.

over 8 years ago

Data Exploration: Climate

The purpose of this RPub is to aid in partitioning the data set for Fst tests. The vegan-generated PCs may aid in grouping samples according to climate similarity for Fst testing. For example, I could partition samples into high,moderate, and low values on PC1,PC2,PC3 etc.

over 8 years ago

Climate Space

Dissertation Research: The purpose of this analysis is to assess how much of the climate space is captured by my samples. Climate data from WorldClim. Citation: *Fick, S.E. and R.J. Hijmans, 2017. Worldclim 2: New 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology.* *Cedrela odorata* observation data from GBIF Citation: *GBIF (2012). Recommended practices for citation of the data published through the GBIF Network. Version 1.0 (Authored by Vishwas Chavan), Copenhagen: Global Biodiversity Information Facility. Pp.12, ISBN: 87-92020-36-4.*

over 8 years ago

Fst Tests for SNP Selection

Dissertation Research: The purpose of this analysis is to identify SNPs that are spatially informative.

over 8 years ago

Locus Maps

Dissertation research

over 8 years ago

June Data Analysis: Fsts and MAFs

Dissertation Data Analysis. Cedrela SNPs.

over 8 years ago

Transcriptome Assemblies

almost 9 years ago

Random Forests Reanalyzed for Revision 1

These data pertain to Finch et al. 2017 *Applications in Plant Sciences*.

almost 9 years ago

Random Forests and Unbalanced Classes

Testing if random forest classification is sensitive to unbalanced class sizes.

almost 9 years ago

Misclassification of Cores

Finch et al. 2017 Applications in Plant Sciences

almost 9 years ago

K-mer Frequency Distribution

This is an R Markdown document. The data for this analysis was collected on 13 January 2017. I have a pool of paired-end 100 sequences from Cedrela species. These sequences were obtained via hybridization capture, targeted enrichment, and short-read sequencing on the Illumina HiSeq 3000. I used kmercountexact.sh from bbtools to produce a k-mer frequency distribution.

almost 9 years ago

RPubs

finchnSNPs

Kristen Finch

Recently Published

Document

Circular Statistics to Understand Random Expectations for Directional Data

Chapter 1 Analysis

Learning Module Two: Species Classification with Random Forests

Assessing Bait Efficiency

Chloroplast Assembly Stats

Data Exploration: Climate

Climate Space

Fst Tests for SNP Selection

Locus Maps

June Data Analysis: Fsts and MAFs

Transcriptome Assemblies

Random Forests Reanalyzed for Revision 1

Random Forests and Unbalanced Classes

Misclassification of Cores

K-mer Frequency Distribution

Alignment Assessment & Comparison

SPAdes de novo assembly of ced132 length distribution

Target Coverage

Non-target Sequences; chloroplast

Sign In

finchnSNPs

Kristen Finch

Recently Published