In trying to run NMDS analysis on technical replicate ADJNSAF data, I found discrepencies between the ADJNSAF values in Steven’s ABACUS_output021417NSAF.tsv and Sean’s Abacus_output.tsv. I compared Steven’s ABACUS_output021417.tsv file (from which he made ABACUS_output021417NSAF.tsv, see his jupyter notebook https://github.com/sr320/nb-2017/blob/master/C_gigas/04-Exploring-Abacus-out.ipynb) with Sean’s Abacus_output.tsv and found no difference:

R code for comparing files

install.packages("arsenal")
library(arsenal)
#Compare 02/14/2017 data with Sean's march 1 data
data_SR <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE)
data_SB <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output.tsv", sep = "\t" , header=TRUE, stringsAsFactors = FALSE)
compare(data_SR,data_SB)
#Output:
  	#Compare Object
  	#Function Call: 
  	# 	compare.data.frame(x = data_SR, y = data_SB)
  	#Shared: 457 variables and 8443 observations.
 	#Not shared: 0 variables and 0 observations.
  	
  	#Differences found in 0/456 variables compared.
  	#0 variables compared have non-identical attributes.
  
###SHOWS NO DIFFERENCES BETWEEN FILES

confirmed by command line diff command

#D-10-18-212-233:Desktop Shelly$ diff ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_outputMar1.tsv ~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417.tsv #D-10-18-212-233:Desktop Shelly$

The values in Steven’s ABACUS_output021417NSAF.tsv are in fact NUMSPECSADJ values

R code to determine what the values in ABACUS_output021417NSAF.tsv are

data_SR_NSAF <- read.csv("~/Documents/GitHub/OysterSeedProject/raw_data/ABACUS_output021417NSAF.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
data_SB_NUMSPECADJ <- data_SB[,c(1,grep("NUMSPECSADJ", colnames(data_SB)))]
colnames(data_SB_NUMSPECADJ) <- gsub("NUMSPECSADJ","ADJNSAF", colnames(data_SB_NUMSPECADJ))
compare(data_SR_NSAF,data_SB_NUMSPECADJ)
#Output:
#Compare Object
#Function Call: 
  	#	compare.data.frame(x = data_SR_NSAF, y = data_SB_NUMSPECADJ)
#Shared: 46 variables and 8443 observations.
#Not shared: 0 variables and 0 observations.

#Differences found in 0/45 variables compared.
#0 variables compared have non-identical attributes.

###SHOWS NO DIFFERENCES BETWEEN FILES SO VALUES IN 
###ABACUS_output021417NSAF.tsv ARE ACTUALLY
###NUMSPECADJ VALUES!!!!

Determined values in ABACUS_output021417NSAF.tsv are in fact NUMSPECADJ values

Determined values in Kaitlyn’s file ABACUSdata_only.csv are in fact the averages of technical replicate NUMSPECADJ values

Next steps:

***see Emma’s awesome explanation of what ABACUS values to use and when

  1. NMDS analysis
    • extract ADJNSAF values from ABACUS_output021417.tsv
    • Find appropriate data transformation/normalization if necessary
      • Emma log transformed her NSAF values before doing NMDS
    • try NMDS again
    • determine if replicates can be pooled
  2. Try downstream analyses with NUMSPECSTOT values from ABACUS_output021417.tsv
    • if it makes sense to sum NUMSPECSTOT values for replicates, try that and then try running stats on those values
  3. Determine what values make sense to use in Hierarchical clustering analysis and ASCA, then re-do those analyses

  4. Look more closely at development over time
    • Try a fold-change analysis with each developmental time point relative to day 0