Steven generated destranded (“merged”) coverage files from Bismark .cov output.

Analysis below was done 12/20/2019 - 01/07/2020

Global methylation analysis

  • Using this jupyter notebook 20191222_GlobalMethylation_5x_CpGs.ipynb I created this table allc_5x_CpG.txt which has the following columns:
    1. Sample
    2. Number of mCs
    3. Number of total Cs
    4. % methylation
  • Using this R project 20191222.Rproj and this R markdown script Overall_CpG_analysis.Rmd I generated the following figures:
    • number of mCpGs across samples from different time points
    • percent CpG methylation for each sample group
    • percent CpG methyaltion for each sample group facetted by time
  • I used this R script CpG_analysis_d0to135.Rmd to generate this figure for percent CpG methylation for samples from day 0, day 10, and day 135 for PAG presentation:

DMRs from 5x destranded coverage files

Convert 5x.tab files to Methylpy allc format

  • Used this jupyter notebook 20191222_DMRfind_5xmerg.ipynb to convert 5x.tab files to allc format on my account on Ostrich
    • previously attempted to do this on 12/20 using this jupyter notebook 20191220_DMRfind_5xmerg.ipynb but it turned out the last column in the 5x.tab file is in fact the number of unmethylated Cs AND NOT the total number of Cs so that is why I could not get DMRfind to work. See issue posted here.
    • copied new allc.tsv files to Gannet using same notebook 20191222_DMRfind_5xmerg.ipynb

Running Methylpy DMRfind for all 4 comparisons

Filter DMRs for coverage in 3/4 samples per group

Running group statistics (ANOVA) on DMRs

  • Using this script MCmax25_asinT_groupStats.Rmd I performed ANOVA on all the DMRs from each comparison.
    1. CHECK DATA DISTRIBUTION: First looked at each groups’ % methylation distribution
      • All ambient sampels:
      • Day 10 samples:
      • Day 135 samples:
      • Day 145 samples:
    2. TRANSFORM DATA: No distribution is normal so performed arcsin square root transformation and here’s how the distrubtions changed:
      • All ambient sampels:
      • Day 10 samples:
      • Day 135 samples:
      • Day 145 samples:
    3. Perform ANOVA on transformed data
    4. Plot % methylation x group for significant DMRs
      • DMRs significant at p value < 0.01
        • All ambient samples:
        • Day 10 samples:
        • Day 135 samples:
        • Day 145 samples:
      • DMRs significant at p value < 0.05
        • All ambient samples:
        • Day 10 samples:
        • Day 135 samples:
        • Day 145 samples:
    5. Plot heatmaps of DMRs x samples colored by % methylation
      • DMRs significant at p value < 0.01
        • All ambient samples:
        • Day 10 samples:
        • Day 135 samples:
        • Day 145 samples:
      • DMRs significant at p value < 0.05
        • All ambient samples:
        • Day 10 samples:
        • Day 135 samples:
        • Day 145 samples:
    6. Plot heatmaps of DMRs x group means colored by % methyaltion
      • DMRs significant at p value < 0.01
        • All ambient samples:
        • Day 10 samples:
        • Day 135 samples:
        • Day 145 samples:
      • DMRs significant at p value < 0.05
        • All ambient samples:
        • Day 10 samples:
        • Day 135 samples:
        • Day 145 samples:
    7. Identify persistent DMRs
      • Using this jupyter notebook 20191223_PersistantDMRs.ipynb I compared DMRs from day10 samples (aov_0.05pH_d10DMR.bed) and DMRs from day 135 samples (aov_0.05pH_d135DMR.bed)
      • none were overlapping and the closest DMR was > 16kb away.
      • realized the DMR may not be the same because all these analyses were done separately. If all samples were processed together, I could compare DMRs from different time points

Running Methylpy DMRfind on all samples together

Conclusions:

  • When I ran DMRfind on just the Day 10 samples and on just the Day 135 samples, then performed ANOVA on regions identified in each DMRfind run, there were no overlapping DMRs with significant pH effect (ANOVA p value < 0.05)
  • When I ran DMRfind on all 52 samples, then performed ANOVA on regions identified, 1 DMR (scaffold 3: 56511986-56512009) showing a significant pH effect (ANOVA pvalue < 0.05) was overlapping between day 10 and day 135 samples
  • ANOVA is not the best test to be using for this data because it is not normal
    • a GLM would likely be more sensitive but I would need to reformat the data to run this test (determine # Cs and # mCs for each region).
      • This is possible by running bedtools intersect or closest on DMR bedfile and counts (5x.tab) files https://osf.io/yem8n/files/ and then collapsing/summing the counts for DMRs. But I would need to code it.
  • For now, I’m just going to go with the results from running Methylpy DMRfind on all samples together.