I. Concatenate newly trimmed reads
These newly trimmed reads are from analysis started on 8/6; see notebook entry here):
- did this on mox:
-
copied data over from Ostrich/Gannet to Mox:
rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/*001_val_1.fq.gz /gscratch/scrubbed/strigg/TRIMG_adapt_5bp/ rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/*001_val_2.fq.gz /gscratch/scrubbed/strigg/TRIMG_adapt_5bp/
- concatenate script here: 20190809_ConcatL1L2reads.sh
-
II. Run multiqc on newly trimmed reads
- copied data over from Ostrich/Gannet to Emu, ran multiQC, then copied back to Ostrich/Gannet:
```
srlab@emu:~/GitHub/Shelly_Pgenerosa/multiqc$ rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/FASTQC .
srlab@emu:~/GitHub/Shelly_Pgenerosa/multiqc/FASTQC$ multiqc .
[INFO ] multiqc : This is MultiQC v0.9
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '.'
[INFO ] fastqc : Found 88 reports
[INFO ] multiqc : Report : multiqc_report.html
[INFO ] multiqc : Data : multiqc_data
[INFO ] multiqc : MultiQC complete
rsync --archive --progress --verbose multiqc_data/ strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/FASTQC
```
III. Determine bismark parameters to apply
Sea lice:
- Mox script here: 20190809_BmrkCmp_100K_Calig.sh
- aligns 100K reads from each of the two sea lice samples and applies 6 different alignment parameters
- output here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Calig_trimG/
- mapping efficiency is between 30-40% with reasonable score-min parameters
- summary data here: 20190809_100K_Calig_trimG/bismarkASthreshold_bigger_comparison.txt
- Alignment score comparison here: SeaLiceBmrkAlignementScoreComparison.html
- CONCLUSIONS:
- not sure if low mapping is a genome thing or what
- Go with score-min L,0,-1
For Salmon:
- Mox script here: 20190809_BmrkCmp_100K_Ssalar_1.2.sh
- Script randomly selects 5 samples and tests the different alignment parameters on each different group of 5 samples
- Output here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Salmo/
- Summary report here: bismarkASthreshold_bigger_comparison.txt
- Alignment score comparison here: BmrkAlignementScoreComparison.html
- CONCLUSIONS:
- Go with score-min L,0,-0.6
IV. Other options to consider
see suggestions from this article:
- TrimGalore: retain unpaired
- Bismark: PE save unmapped, then run bismark SE on unmapped reads
To investigate:
- How trimming affects mapping efficiency
- How many more mapped reads come from retaining unpaired trimmed reads + running Bismark SE on unmapped PE reads
I ran the following mox script on one Salmon sample (16C_26psu_1_S13):
- mox script here: 20190809_Bmrk_CompareTrim_Salmo_100K.sh
- output here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Salmo_trimME/
- Summary file of trimming effect on mapping efficiency here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Salmo_trimME/bismarkTrim_comparison.txt
- No trimming = 22.7% mapping
- 8/5 adapter trimming = 0.5% mapping
- maybe this is because it wasn’t in PE mode?
- 8/5 adapter trimming + my 5’ 5bp hard trim = 0.5% mapping
- maybe this is because it wasn’t in PE mode?
- 8/6 adapter trimming with 5’ 5bp clip = 48.6% mapping
- 8/6 adapter trimming with 5’ 5bp clip with no dovetail Bismark option = 48.6% mapping
- Summary reports of SE alignments here:
- 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz_unmapped_reads_1_bismark_bt2_SE_report.txt
- R1: 8232 (16%; 8232/51367) mapped, 35% (18133/51367) didn’t map uniquely
- 16C_26psu_1_S13_L001_R2_001_va2_1.fq.gz_unmapped_reads_1_bismark_bt2_SE_report.txt
- R2: 7113 (13.8% 7113/51367) mapped, 35% (17989/51367) didn’t map uniquely
- 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz_unmapped_reads_1_bismark_bt2_SE_report.txt
- CONCLUSIONS:
- % unpaired trimmed reads isn’t that much;they are only a fraction of what got removed after trimming which was:
- < 0.2% (~40K reads/22M reads) are removed after TrimGalore is run
- FASTQC files before trimming are here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190722/
- FASTQC files after trimming are here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190806_TrimGalore/FASTQC/
- < 0.2% (~40K reads/22M reads) are removed after TrimGalore is run
- my trimming from 8/5 was no good
- –dovetail Bismark option does not drasticly increase mapping efficiency but setting the correct TrimGalore parameters does
- Running Bismark SE alignment of unmapped PE reads gave ~ 7-8% of reads back
- doesn’t seem worth it to retain these right now
- % unpaired trimmed reads isn’t that much;they are only a fraction of what got removed after trimming which was:
V. NEXT STEPS:
- run bamQC
- BamQC outputs a lot of useful info including genome coverage and insert size
- run full alignment with determined alignment parameters