ASCA analysis with oyster proteomics temperature x time series dataset

  • Goal: To determine if proteins drive difference between temperature groups over time
    • we need to understand the model that ASCA creates from the data so we want to extract the loadings from the temperature factor ASCA and try to understand the behavior of the proteins with high loadings values
      • PC1 explains 100% of the variation; this seems strange so need to understand what is up with that
    • first I checked that protein column names match up with the rows in the ASCA R function SVD output
    • then I saved a csv file with the protein names and PC1 loadings values
    • I made two plots of the PC1 loadings values in excel (this could have been done in R):
      • 1) ordered from highest to zero
        ASCA_PC1_positiveLoadings
      • 2) ordered from lowest to zero
        ASCA_PC1_negativeLoadings
      • the reason for this was to identify proteins that explain the most variation in PC1
      • based on the two plots I made a high and low loadings cut-off:
        • loadings value >= 0.05 or loadings value <= -0.05
    • then I subsetted the data in R using the cutoffs and saved a csv file that I could upload into MetaboAnalyst to visualize what these proteins are doing
    • I uploaded data to metaboanalyst:
      • two factor time series, peak intensity table,samples in rows
      • no data filtering
      • mean-centering
        Normalization
      • ran ASCA
        -model performance is worse than before and not different from random
        ASCA model validation for temperature
        ASCA model validation for time and the interaction of temp and time
        ASCA outlier x leaverage plot for temperature
        ASCA outlier x leaverage plot for time
        ASCA outlier x leaverage plot for temperature x time
      • ran heatmap function to get a better idea of the behavior of the protiens
        heatmap of protein abundance over time in the two temperatures (euclidean distance and ward clustering)
        • i’m not sure what to think of the proteins that don’t show any abundance except in two samples (ACT.527m15, LOCI100497129, and TXND3.2.4m2). Should these really be included in the analysis?
  • conclusions:
    • It seems like there are some clusters in the heatmap that are different between temperature treatments, but I don’t know what this means if the ASCA model of these proteins is really poor
    • I’m not sure I really understand the whole ASCA feature selection process so need to read this paper to understand it better: https://academic.oup.com/bioinformatics/article/23/14/1792/189939