In trying to run NMDS analysis on technical replicate ADJNSAF data, I found discrepencies between the ADJNSAF values in Steven’s ABACUS_output021417NSAF.tsv and Sean’s Abacus_output.tsv. I compared Steven’s ABACUS_output021417.tsv file (from which he made ABACUS_output021417NSAF.tsv, see his jupyter notebook https://github.com/sr320/nb-2017/blob/master/C_gigas/04-Exploring-Abacus-out.ipynb) with Sean’s Abacus_output.tsv and found no difference:
R code for comparing files
install.packages("arsenal")
library(arsenal)
#Compare 02/14/2017 data with Sean's march 1 data
data_SR <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE)
data_SB <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE)
compare(data_SR,data_SB)
#Output:
#Compare Object
#Function Call:
# compare.data.frame(x = data_SR, y = data_SB)
#Shared: 457 variables and 8443 observations.
#Not shared: 0 variables and 0 observations.
#Differences found in 0/456 variables compared.
#0 variables compared have non-identical attributes.
###SHOWS NO DIFFERENCES BETWEEN FILES
confirmed by command line diff command
#D-10-18-212-233:Desktop Shelly$ diff ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_outputMar1.tsv ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv #D-10-18-212-233:Desktop Shelly$The values in Steven’s ABACUS_output021417NSAF.tsv are in fact NUMSPECSADJ values
R code to determine what the values in ABACUS_output021417NSAF.tsv are
data_SR_NSAF <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417NSAF.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
data_SB_NUMSPECADJ <- data_SB[,c(1,grep("NUMSPECSADJ", colnames(data_SB)))]
colnames(data_SB_NUMSPECADJ) <- gsub("NUMSPECSADJ","ADJNSAF", colnames(data_SB_NUMSPECADJ))
compare(data_SR_NSAF,data_SB_NUMSPECADJ)
#Output:
#Compare Object
#Function Call:
# compare.data.frame(x = data_SR_NSAF, y = data_SB_NUMSPECADJ)
#Shared: 46 variables and 8443 observations.
#Not shared: 0 variables and 0 observations.
#Differences found in 0/45 variables compared.
#0 variables compared have non-identical attributes.
###SHOWS NO DIFFERENCES BETWEEN FILES SO VALUES IN
###ABACUS_output021417NSAF.tsv ARE ACTUALLY
###NUMSPECADJ VALUES!!!!
Determined values in ABACUS_output021417NSAF.tsv are in fact NUMSPECADJ values
Determined values in Kaitlyn’s file ABACUSdata_only.csv are in fact the averages of technical replicate NUMSPECADJ values
- see markdown summary of R analysis
Next steps:
***see Emma’s awesome explanation of what ABACUS values to use and when
- NMDS analysis
- extract ADJNSAF values from ABACUS_output021417.tsv
- Find appropriate data transformation/normalization if necessary
- Emma log transformed her NSAF values before doing NMDS
- try NMDS again
- determine if replicates can be pooled
- Try downstream analyses with NUMSPECSTOT values from ABACUS_output021417.tsv
- if it makes sense to sum NUMSPECSTOT values for replicates, try that and then try running stats on those values
-
Determine what values make sense to use in Hierarchical clustering analysis and ASCA, then re-do those analyses
- Look more closely at development over time
- Try a fold-change analysis with each developmental time point relative to day 0