QC of DMRs
- created filtered bam files for doing quality check on the DMRs called from 9/26 script
- I only created filtered bam files for the all ambient timepoints comparison
- jupyter notebook here (section noted as “This analysis below was done on 9/26/19”) : 20190909_DMRallEPI_allc_minClst3.ipynb
- Filtered bam files for DMR QC are here: https://gannet.fish.washington.edu/metacarcinus/Pgenerosa/analyses/20190926/allAmb/
- IGV session of filtered bam files is here: https://gannet.fish.washington.edu/metacarcinus/Pgenerosa/analyses/20190926/allAmb/QC_20190926DMRs_allAmb.xml
- DMR bed files in IGV:
- some notes about these files:
- DMRs are commonly defined as 250bp regions containing minimally 3 differentially methylated sites (DMS) that have 5x coverage (noted in the track name as “cov5x”)
- MAPQ filter is specified at the beginning of the track name
- MCmax25 refers to a parameter in DMRfind specifying that if DMS are within a 25bp window, their cytosine coverage can be summed. MCmax50 means the DMS can be within a 50bp window.
- “clst” in the track name refers to how many individuals within a treatment group need to have coverage of the DMR for it to be reported. “clst2” requires 2 individuals/group and “clst3” requires 3 individuals/group.
- another DMR bed file that could be looked at in this session is from the analysis where data from all individuals in each group were merged so that each group had one allc file before DMRs were called. That file is here
Functional analysis of DMRs
- created tables for DMRs from each of the 4 comparisons containing the following fields:
- chromosome (scaffold)
- start position
- end position
- number of DMS in DMR
- Gene ID
- Swissprot ID
- Swissprot Entry Name
- Protein name(s)
- GO ID
- GO term
- these tables are here:
- these tables were created with this jupyter notebook: 20191001_DMR_functional_analysis.ipynb
- I also created TE tables for each comparison, however there were not many TEs (because there are not that many DMRs in general)
Next steps
- Look into what proteins are and finish functional analysis
- see if genes identified overlap with Hollie’s DMGs
- Decide if it’s worth doing a domain enrichment analysis referred to in this slack convo
- Do a more thorough QC check to make sure these DMRs are believable
- Decide if any DMRfind parameters should be adjusted and rerun