Fri. Aug. 9, 2019 Salmon and Sea lice new trim + alignments

I. Concatenate newly trimmed reads

These newly trimmed reads are from analysis started on 8/6; see notebook entry here):

did this on mox:

copied data over from Ostrich/Gannet to Mox:

  rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/*001_val_1.fq.gz /gscratch/scrubbed/strigg/TRIMG_adapt_5bp/
		
  rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/*001_val_2.fq.gz /gscratch/scrubbed/strigg/TRIMG_adapt_5bp/

concatenate script here: 20190809_ConcatL1L2reads.sh

II. Run multiqc on newly trimmed reads

- copied data over from Ostrich/Gannet to Emu, ran multiQC, then copied back to Ostrich/Gannet:

```
srlab@emu:~/GitHub/Shelly_Pgenerosa/multiqc$ rsync --archive --progress --verbose strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/FASTQC .

srlab@emu:~/GitHub/Shelly_Pgenerosa/multiqc/FASTQC$ multiqc .
[INFO   ]         multiqc : This is MultiQC v0.9
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching '.'



[INFO   ]          fastqc : Found 88 reports
[INFO   ]         multiqc : Report      : multiqc_report.html
[INFO   ]         multiqc : Data        : multiqc_data
[INFO   ]         multiqc : MultiQC complete

rsync --archive --progress --verbose multiqc_data/ strigg@ostrich.fish.washington.edu:/Users/strigg/Desktop/Salmo_Calig/TRIM/FASTQC

```

III. Determine bismark parameters to apply

Sea lice:

Mox script here: 20190809_BmrkCmp_100K_Calig.sh
- aligns 100K reads from each of the two sea lice samples and applies 6 different alignment parameters
output here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Calig_trimG/
mapping efficiency is between 30-40% with reasonable score-min parameters
- summary data here: 20190809_100K_Calig_trimG/bismarkASthreshold_bigger_comparison.txt
- Alignment score comparison here: SeaLiceBmrkAlignementScoreComparison.html
CONCLUSIONS:
- not sure if low mapping is a genome thing or what
- Go with score-min L,0,-1

For Salmon:

Mox script here: 20190809_BmrkCmp_100K_Ssalar_1.2.sh
- Script randomly selects 5 samples and tests the different alignment parameters on each different group of 5 samples
Output here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Salmo/
Summary report here: bismarkASthreshold_bigger_comparison.txt
Alignment score comparison here: BmrkAlignementScoreComparison.html
CONCLUSIONS:
- Go with score-min L,0,-0.6

IV. Other options to consider

see suggestions from this article:

TrimGalore: retain unpaired
Bismark: PE save unmapped, then run bismark SE on unmapped reads

To investigate:

How trimming affects mapping efficiency
How many more mapped reads come from retaining unpaired trimmed reads + running Bismark SE on unmapped PE reads

I ran the following mox script on one Salmon sample (16C_26psu_1_S13):

mox script here: 20190809_Bmrk_CompareTrim_Salmo_100K.sh
output here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Salmo_trimME/
Summary file of trimming effect on mapping efficiency here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190809_100K_Salmo_trimME/bismarkTrim_comparison.txt
- No trimming = 22.7% mapping
- 8/5 adapter trimming = 0.5% mapping
  - maybe this is because it wasn’t in PE mode?
- 8/5 adapter trimming + my 5’ 5bp hard trim = 0.5% mapping
  - maybe this is because it wasn’t in PE mode?
- 8/6 adapter trimming with 5’ 5bp clip = 48.6% mapping
- 8/6 adapter trimming with 5’ 5bp clip with no dovetail Bismark option = 48.6% mapping
Summary reports of SE alignments here:
- 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz_unmapped_reads_1_bismark_bt2_SE_report.txt
  - R1: 8232 (16%; 8232/51367) mapped, 35% (18133/51367) didn’t map uniquely
- 16C_26psu_1_S13_L001_R2_001_va2_1.fq.gz_unmapped_reads_1_bismark_bt2_SE_report.txt
  - R2: 7113 (13.8% 7113/51367) mapped, 35% (17989/51367) didn’t map uniquely
CONCLUSIONS:
1. % unpaired trimmed reads isn’t that much;they are only a fraction of what got removed after trimming which was:
  - < 0.2% (~40K reads/22M reads) are removed after TrimGalore is run
    - FASTQC files before trimming are here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190722/
    - FASTQC files after trimming are here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190806_TrimGalore/FASTQC/
2. my trimming from 8/5 was no good
3. –dovetail Bismark option does not drasticly increase mapping efficiency but setting the correct TrimGalore parameters does
4. Running Bismark SE alignment of unmapped PE reads gave ~ 7-8% of reads back
  - doesn’t seem worth it to retain these right now

V. NEXT STEPS:

run bamQC
- BamQC outputs a lot of useful info including genome coverage and insert size
run full alignment with determined alignment parameters