Methylation analysis:
Geoduck alignments too slow, should be directional, and many chromosomal sequence extraction errors
-
increase processors/cores to use from default(4) to 28 (i.e. -p 28)
-
change to directional mapping because non-directional mapping should be 1:1:1:1 But instead the non-directional mapping shows ~ 10:1:1:10 as follows:
CT/GA/CT: 2758 ((converted) top strand)
GA/CT/CT: 350 (complementary to (converted) top strand)
GA/CT/GA: 316 (complementary to (converted) bottom strand)
CT/GA/GA: 2725 ((converted) bottom strand) - padded genome
- strigg/analyses/20181025/slurm-401715.out shows > 1M errors: ‘Chromosomal sequence could not be extracted’
- creating padded genome to see if this improves mapping
- Steven can run scripts from coenv : sbatch -p coenv -A coenv 20181101_bmrkPgenrGenmPadded.sh *needed to make my directory writable (chmod 777)
- tested effects of directional mapping and genome padding on mapping efficiency with one sample (103)
- jupyter notebook
- mapping reports to compare:
- original alignment (non-directional mapping, aligned to non-padded genome)
-
/Volumes/web/metacarcinus/Pgenerosa/20181025/EPI-103_S27_L005_R1_001_val_1_bismark_bt2_PE_report.txt
Bismark was run with Bowtie 2 against the bisulfite genome of /gscratch/srlab/strigg/data/Pgenr/ with the specified options: -q --score-min L,0,-1.2 --ignore-quals --no-mixed --no-discordant --dovetail --minins 60 --maxins 500 Option '--non_directional' specified: alignments to all strands were being performed (OT, OB, CTOT, CTOB) Final Alignment report ====================== Sequence pairs analysed in total: 20349659 Number of paired-end alignments with a unique best hit: 13364383 Mapping efficiency: 65.7% Sequence pairs with no alignments under any condition: 3385894 Sequence pairs did not map uniquely: 3599382 Sequence pairs which were discarded because genomic sequence could not be extracted: 45055 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 6344954 ((converted) top strand) GA/CT/CT: 288878 (complementary to (converted) top strand) GA/CT/GA: 294928 (complementary to (converted) bottom strand) CT/GA/GA: 6390568 ((converted) bottom strand) Final Cytosine Methylation Report ================================= Total number of C's analysed: 436619499 Total methylated C's in CpG context: 12108570 Total methylated C's in CHG context: 2075398 Total methylated C's in CHH context: 14520988 Total methylated C's in Unknown context: 485587 Total unmethylated C's in CpG context: 48927920 Total unmethylated C's in CHG context: 89229392 Total unmethylated C's in CHH context: 269757231 Total unmethylated C's in Unknown context: 2195501 C methylated in CpG context: 19.8% C methylated in CHG context: 2.3% C methylated in CHH context: 5.1% C methylated in unknown context (CN or CHN): 18.1% Bismark completed in 0d 6h 36m 43s
-
-
new alignment (directional mapping, aligned to padded genome) -/Volumes/web/metacarcinus/Pgenerosa/20181101/EPI-103_S27_L005_R1_001_val_1_bismark_bt2_PE_report.txt
Bismark report for: /Volumes/web-1/Athaliana/20180516_geoduck_trimgalore_rrbs/EPI-103_S27_L005_R1_001_val_1.fq.gz and /Volumes/web-1/Athaliana/20180516_geoduck_trimgalore_rrbs/EPI-103_S27_L005_R2_001_val_2.fq.gz (version: v0.20.0) Bismark was run with Bowtie 2 against the bisulfite genome of /Volumes/web/metacarcinus/Pgenerosa/GENOMES/v070/padded/ with the specified options: -q --score-min L,0,-1.2 -p 8 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --minins 60 --maxins 500 Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed) Final Alignment report ====================== Sequence pairs analysed in total: 20349659 Number of paired-end alignments with a unique best hit: 12983519 Mapping efficiency: 63.8% Sequence pairs with no alignments under any condition: 3888896 Sequence pairs did not map uniquely: 3477244 Sequence pairs which were discarded because genomic sequence could not be extracted: 53095 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 6437602 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 6492822 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Final Cytosine Methylation Report ================================= Total number of C's analysed: 419468259 Total methylated C's in CpG context: 11961463 Total methylated C's in CHG context: 1856434 Total methylated C's in CHH context: 4405042 Total methylated C's in Unknown context: 133308 Total unmethylated C's in CpG context: 48736712 Total unmethylated C's in CHG context: 89082405 Total unmethylated C's in CHH context: 263426203 Total unmethylated C's in Unknown context: 1862416 C methylated in CpG context: 19.7% C methylated in CHG context: 2.0% C methylated in CHH context: 1.6% C methylated in unknown context (CN or CHN): 6.7% Bismark completed in 0d 6h 30m 37s Shellys-MacBook-Pro:20181025 Shelly$ cd .. Shellys-MacBook-Pro:Pgenerosa Shelly$ ls 20181011 20181025 20181101 BLAST_20181101 GENOMES Shellys-MacBook-Pro:Pgenerosa Shelly$ cd 20181101/ Shellys-MacBook-Pro:20181101 Shelly$ ls EPI-103_S27_L005_R1_001_val_1_bismark_bt2_PE_report.txt EPI-103_S27_L005_R1_001_val_1_bismark_bt2_pe.bam Shellys-MacBook-Pro:20181101 Shelly$ cat EPI-103_S27_L005_R1_001_val_1_bismark_bt2_PE_report.txt Bismark report for: /Volumes/web-1/Athaliana/20180516_geoduck_trimgalore_rrbs/EPI-103_S27_L005_R1_001_val_1.fq.gz and /Volumes/web-1/Athaliana/20180516_geoduck_trimgalore_rrbs/EPI-103_S27_L005_R2_001_val_2.fq.gz (version: v0.20.0) Bismark was run with Bowtie 2 against the bisulfite genome of /Volumes/web/metacarcinus/Pgenerosa/GENOMES/v070/padded/ with the specified options: -q --score-min L,0,-1.2 -p 8 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --minins 60 --maxins 500 Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed) Final Alignment report ====================== Sequence pairs analysed in total: 20349659 Number of paired-end alignments with a unique best hit: 12983519 Mapping efficiency: 63.8% Sequence pairs with no alignments under any condition: 3888896 Sequence pairs did not map uniquely: 3477244 Sequence pairs which were discarded because genomic sequence could not be extracted: 53095 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 6437602 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 6492822 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Final Cytosine Methylation Report ================================= Total number of C's analysed: 419468259 Total methylated C's in CpG context: 11961463 Total methylated C's in CHG context: 1856434 Total methylated C's in CHH context: 4405042 Total methylated C's in Unknown context: 133308 Total unmethylated C's in CpG context: 48736712 Total unmethylated C's in CHG context: 89082405 Total unmethylated C's in CHH context: 263426203 Total unmethylated C's in Unknown context: 1862416 C methylated in CpG context: 19.7% C methylated in CHG context: 2.0% C methylated in CHH context: 1.6% C methylated in unknown context (CN or CHN): 6.7% Bismark completed in 0d 6h 30m 37s
- original alignment (non-directional mapping, aligned to non-padded genome)
CONCLUSION:
- DIRECTIONAL MAPPING AND PADDED GENOME LED TO SLIGHTLY LOWER MAPPING EFFICIENCY
- PADDED GENOME DID NOT IMPROVE THE NUMBER OF SEQUENCES THAT COULD BE EXTRACTED
- submitted job 417196 with 28 processors, with directional mapping, and aligning to the original genome (/Volumes/web/metacarcinus/Pgenerosa/GENOMES/v070/Pgenerosa_v070.fa)