DMR analysis of low pH juveniles
DMRs identified in 20190822 analysis:
- mox script here: https://gannet.fish.washington.edu/metacarcinus/mox_jobs/20190822_DMRfindAllEPIsamples.sh
- data directory here: https://gannet.fish.washington.edu/metacarcinus/Pgenerosa/analyses/20190822/
- DMR reports:
- DMR beds:
- DMS beds:
feature and function analysis
- started this jupyter notebook on Ostrich while mounted to Gannet and Owl
- https://github.com/shellywanamaker/Shelly_Pgenerosa/blob/master/analyses/20190909_allSampleDMRs_GenomeFeaturesFunctions.ipynb
- it uses bedtools intersect to match TE and gene features up to DMRs with
- output is here: https://gannet.fish.washington.edu/metacarcinus/Pgenerosa/analyses/20190909_anno/
- started this summary table:
- 20190907_DMRanalysis_summary
-
Day #DMRs #TEs overlapping with DMRs #uniq DMRs in TEs fraction DMRs in TEs #DMRs in Gene fraction DMRs in genes # uniq genes with DMRs #annotated uniq genes with DMRs 10 133 101 85 0.6390977444 91 0.6842105263 86 71 135 83 60 47 0.5662650602 62 0.7469879518 61 50 145 166 120 102 0.6144578313 121 0.7289156627 119 101 amb_alldays 192 118 102 0.53125 146 0.7604166667 140 111
- eventually I want to make a table like Hollie is doing for DMGs:
- “gene” “scaffold” “start” “stop” “adj.pval.treatment” “adj.pval.position” “adj.pval.treatment_x_position” “uniprot.id” “blastx.eval” “score” “protein” “protein.name” “taxon” “Length” “GO.BP” “GO.CC” “GO.GO” “GO.MF” “GO.IDs”
Determine hypo and hyper methylated DMRs
- DMR reports listed above contain this info, but a closer look at these files showed that a DMR is called when only 2 samples from any group are represented
- I think this is too loose a parameter
- I need to visualize these DMRs in IGV, but I haven’t because I accidentally deleted my destranded files (there was file called /gscratch/scrubbed/strigg/analyses/20190822/destrand_allc_files/allc*.tsv.gz that got created when my script had the incorrect path name and when I tried to remove the file, I forgot to comment out the star)
- I created a mox script to regenerate them and then make bigwig files from them that can be loaded into IGV. Will run when mox maintenance is done
- In the meantime, I ran DMRfind including –min-cluster 3 parameter (requires DMRs to be represented in at least 3 samples/group) on all ambient allc files that were not destranded
- Did analysis on ostrich with this jupyter notebook: https://github.com/shellywanamaker/Shelly_Pgenerosa/blob/master/analyses/20190909_DMRallEPI_allc_minClst3.ipynb
- Because I am using the parameter –mc-max-dist 25, all mCpG counts within a 25bp region are summed, so it seems like destranding may not be necessary
- see DMRfind source code here
- this explains why I didn’t previously observe much of an increase in DMRs (gained 10) after destranding for 3x coverage 1000bp DMRs https://shellywanamaker.github.io/158th-post/
- DMR report: https://gannet.fish.washington.edu/metacarcinus/Pgenerosa/analyses/20190822/amb_AllTimes_DMR250bp_MCmax25_cov5x_clst3_rms_results_collapsed.tsv
- DMR bed: https://gannet.fish.washington.edu/metacarcinus/Pgenerosa/analyses/20190822/amb_AllTimes_DMR250bp_MCmax25_cov5x_clst3_rms_results_collapsed.tsv.DMR.bed
- this resulted in only 22 DMRs where before there were 192.
Next steps:
- I want to try to increase the mc-max-dist parameter along with the ‘–min-cluster 3’ and run this on the destranded data and see what those DMRs look like in IGV