-
Checksums from copying adapter + 5bp from 5’ trimmed data from mox to gannet:
-
ran this mox script: https://gannet.fish.washington.edu/metacarcinus/mox_jobs/20190805_md5cksm.sh
-
renamed output file and copied to Gannet
mv /gscratch/srlab/strigg/data/Salmo_Calig/TRIM_1st5bp/checksums.md5 /gscratch/srlab/strigg/data/Salmo_Calig/TRIM_1st5bp/checksums.md5.mox scp /gscratch/srlab/strigg/data/Salmo_Calig/TRIM_1st5bp/checksums.md5.mox strigg@ostrich.fish.washington.edu:/Volumes/web/metacarcinus/Salmo_Calig/analyses/20190722/TRIM_1st5bp Password: checksums.md5.mox 100% 12KB 3.2MB/s 00:00
-
ran a comparison check of both [checksums.md5.mox] and [checksums.md5] to see if they are the same
ostrich:TRIM_1st5bp strigg$ paste -d'\t' checksums.md5.mox checksums.md5 | awk '{if($1=$6)print $1,$6}' | head 7036f0a884346dcbcdf5a66f24a9a926 7036f0a884346dcbcdf5a66f24a9a926 e13367f5f3feb36753845d7e130de4aa e13367f5f3feb36753845d7e130de4aa dce1e7629182baeb37102cddb09db512 dce1e7629182baeb37102cddb09db512 1170441e896de35738ed0300244d89fe 1170441e896de35738ed0300244d89fe ddb14c1f81ff1fe810b019dc13baf488 ddb14c1f81ff1fe810b019dc13baf488 bb576b125bdebdd2750cc88996a02c59 bb576b125bdebdd2750cc88996a02c59 5e6590529f5a4634c10b65e157d83e53 5e6590529f5a4634c10b65e157d83e53 a3209d0e0d6ec8ecc04260b52de4e132 a3209d0e0d6ec8ecc04260b52de4e132 cc25f84b0bced546d7573f9503ea4b98 cc25f84b0bced546d7573f9503ea4b98 6f0113309650574e9b560de68f33e7f1 6f0113309650574e9b560de68f33e7f1 ostrich:TRIM_1st5bp strigg$ paste -d'\t' checksums.md5.mox checksums.md5 | awk '{if($1=$6)print $1,$6}' | wc -l 88 ostrich:TRIM_1st5bp strigg$ paste -d'\t' checksums.md5.mox checksums.md5 | awk '{if($1!=$6)print $1,$6}' | wc -l 0
-
CONCLUSIONS: file codes are the same so copying from mox to Gannet worked fine
-
- Concatenating L001 and L002 .gz files
- got job to run on mox by changing the sbatch time (maintenance is happening next week so my previous job’s run time would have overlapped with that)
- output saved on mox here: /gscratch/scrubbed/strigg/TRIM_cat/
-
transferred concatenated files to Gannet here: https://gannet.fish.washington.edu/metacarcinus/Salmo_Calig/analyses/20190722/TRIM_cat/
ostrich:20190722 strigg$ rsync --archive --progress --verbose strigg@mox.hyak.uw.edu:/gscratch/scrubbed/strigg/TRIM_cat . Password: Enter passcode or select one of the following options: 1. Duo Push to Android (XXX-XXX-0029) 2. Phone call to Android (XXX-XXX-0029) Duo passcode or option [1-2]: 1 receiving file list ... 46 files to consider TRIM_cat/ TRIM_cat/16C_26psu_1_S13_R1_001_trimmed.5bp_3prime.fq.gz 2272571496 100% 37.97MB/s 0:00:57 (xfer#1, to-check=44/46) TRIM_cat/16C_26psu_1_S13_R2_001_trimmed.5bp_3prime.fq.gz 2393036541 100% 37.47MB/s 0:01:00 (xfer#2, to-check=43/46) TRIM_cat/16C_26psu_2_S14_R1_001_trimmed.5bp_3prime.fq.gz 1794524920 100% 38.64MB/s 0:00:44 (xfer#3, to-check=42/46) TRIM_cat/16C_26psu_2_S14_R2_001_trimmed.5bp_3prime.fq.gz 1871002080 100% 35.30MB/s 0:00:50 (xfer#4, to-check=41/46) TRIM_cat/16C_26psu_3_S15_R1_001_trimmed.5bp_3prime.fq.gz 1515096746 100% 35.59MB/s 0:00:40 (xfer#5, to-check=40/46) TRIM_cat/16C_26psu_3_S15_R2_001_trimmed.5bp_3prime.fq.gz 1604258990 100% 38.59MB/s 0:00:39 (xfer#6, to-check=39/46) TRIM_cat/16C_26psu_4_S16_R1_001_trimmed.5bp_3prime.fq.gz 1739852971 100% 36.87MB/s 0:00:45 (xfer#7, to-check=38/46) TRIM_cat/16C_26psu_4_S16_R2_001_trimmed.5bp_3prime.fq.gz 1817746738 100% 33.96MB/s 0:00:51 (xfer#8, to-check=37/46) TRIM_cat/16C_32psu_1_S1_R1_001_trimmed.5bp_3prime.fq.gz 1896054131 100% 34.35MB/s 0:00:52 (xfer#9, to-check=36/46) TRIM_cat/16C_32psu_1_S1_R2_001_trimmed.5bp_3prime.fq.gz 1981391166 100% 38.21MB/s 0:00:49 (xfer#10, to-check=35/46) TRIM_cat/16C_32psu_2_S2_R1_001_trimmed.5bp_3prime.fq.gz 1715567344 100% 38.37MB/s 0:00:42 (xfer#11, to-check=34/46) TRIM_cat/16C_32psu_2_S2_R2_001_trimmed.5bp_3prime.fq.gz 1789513677 100% 36.53MB/s 0:00:46 (xfer#12, to-check=33/46) TRIM_cat/16C_32psu_3_S3_R1_001_trimmed.5bp_3prime.fq.gz 1291766259 100% 35.51MB/s 0:00:34 (xfer#13, to-check=32/46) TRIM_cat/16C_32psu_3_S3_R2_001_trimmed.5bp_3prime.fq.gz 1379258469 100% 34.33MB/s 0:00:38 (xfer#14, to-check=31/46) TRIM_cat/16C_32psu_4_S4_R1_001_trimmed.5bp_3prime.fq.gz 1704589933 100% 36.14MB/s 0:00:44 (xfer#15, to-check=30/46) TRIM_cat/16C_32psu_4_S4_R2_001_trimmed.5bp_3prime.fq.gz 1802754382 100% 36.61MB/s 0:00:46 (xfer#16, to-check=29/46) TRIM_cat/8C_26psu_1_S9_R1_001_trimmed.5bp_3prime.fq.gz 1872155177 100% 38.20MB/s 0:00:46 (xfer#17, to-check=28/46) TRIM_cat/8C_26psu_1_S9_R2_001_trimmed.5bp_3prime.fq.gz 1957282059 100% 35.13MB/s 0:00:53 (xfer#18, to-check=27/46) TRIM_cat/8C_26psu_2_S10_R1_001_trimmed.5bp_3prime.fq.gz 1837426749 100% 38.07MB/s 0:00:46 (xfer#19, to-check=26/46) TRIM_cat/8C_26psu_2_S10_R2_001_trimmed.5bp_3prime.fq.gz 1935888232 100% 37.95MB/s 0:00:48 (xfer#20, to-check=25/46) TRIM_cat/8C_26psu_3_S11_R1_001_trimmed.5bp_3prime.fq.gz 1605119393 100% 34.60MB/s 0:00:44 (xfer#21, to-check=24/46) TRIM_cat/8C_26psu_3_S11_R2_001_trimmed.5bp_3prime.fq.gz 1703938463 100% 34.62MB/s 0:00:46 (xfer#22, to-check=23/46) TRIM_cat/8C_26psu_4_S12_R1_001_trimmed.5bp_3prime.fq.gz 1321058388 100% 38.23MB/s 0:00:32 (xfer#23, to-check=22/46) TRIM_cat/8C_26psu_4_S12_R2_001_trimmed.5bp_3prime.fq.gz 1414160455 100% 37.52MB/s 0:00:35 (xfer#24, to-check=21/46) TRIM_cat/8C_32psu_1_S5_R1_001_trimmed.5bp_3prime.fq.gz 1231884567 100% 37.73MB/s 0:00:31 (xfer#25, to-check=20/46) TRIM_cat/8C_32psu_1_S5_R2_001_trimmed.5bp_3prime.fq.gz 1286891488 100% 38.51MB/s 0:00:31 (xfer#26, to-check=19/46) TRIM_cat/8C_32psu_2_S6_R1_001_trimmed.5bp_3prime.fq.gz 1326114322 100% 37.53MB/s 0:00:33 (xfer#27, to-check=18/46) TRIM_cat/8C_32psu_2_S6_R2_001_trimmed.5bp_3prime.fq.gz 1369513220 100% 35.69MB/s 0:00:36 (xfer#28, to-check=17/46) TRIM_cat/8C_32psu_3_S7_R1_001_trimmed.5bp_3prime.fq.gz 1373099534 100% 35.87MB/s 0:00:36 (xfer#29, to-check=16/46) TRIM_cat/8C_32psu_3_S7_R2_001_trimmed.5bp_3prime.fq.gz 1429454137 100% 35.57MB/s 0:00:38 (xfer#30, to-check=15/46) TRIM_cat/8C_32psu_4_S8_R1_001_trimmed.5bp_3prime.fq.gz 1558622098 100% 34.77MB/s 0:00:42 (xfer#31, to-check=14/46) TRIM_cat/8C_32psu_4_S8_R2_001_trimmed.5bp_3prime.fq.gz 1642882596 100% 30.95MB/s 0:00:50 (xfer#32, to-check=13/46) TRIM_cat/CTRL_16C_26psu_1_S19_R1_001_trimmed.5bp_3prime.fq.gz 1914346417 100% 38.22MB/s 0:00:47 (xfer#33, to-check=12/46) TRIM_cat/CTRL_16C_26psu_1_S19_R2_001_trimmed.5bp_3prime.fq.gz 2011802240 100% 34.10MB/s 0:00:56 (xfer#34, to-check=11/46) TRIM_cat/CTRL_16C_26psu_2_S21_R1_001_trimmed.5bp_3prime.fq.gz 1275247795 100% 37.46MB/s 0:00:32 (xfer#35, to-check=10/46) TRIM_cat/CTRL_16C_26psu_2_S21_R2_001_trimmed.5bp_3prime.fq.gz 1323060829 100% 31.40MB/s 0:00:40 (xfer#36, to-check=9/46) TRIM_cat/CTRL_8C_26psu_1_S17_R1_001_trimmed.5bp_3prime.fq.gz 1422864827 100% 36.53MB/s 0:00:37 (xfer#37, to-check=8/46) TRIM_cat/CTRL_8C_26psu_1_S17_R2_001_trimmed.5bp_3prime.fq.gz 1500712020 100% 32.79MB/s 0:00:43 (xfer#38, to-check=7/46) TRIM_cat/CTRL_8C_26psu_2_S18_R1_001_trimmed.5bp_3prime.fq.gz 1481425749 100% 35.59MB/s 0:00:39 (xfer#39, to-check=6/46) TRIM_cat/CTRL_8C_26psu_2_S18_R2_001_trimmed.5bp_3prime.fq.gz 1547852687 100% 38.00MB/s 0:00:38 (xfer#40, to-check=5/46) TRIM_cat/Sealice_F1_S20_R1_001_trimmed.5bp_3prime.fq.gz 12379825515 100% 36.30MB/s 0:05:25 (xfer#41, to-check=4/46) TRIM_cat/Sealice_F1_S20_R2_001_trimmed.5bp_3prime.fq.gz 12999258153 100% 34.86MB/s 0:05:55 (xfer#42, to-check=3/46) TRIM_cat/Sealice_F2_S22_R1_001_trimmed.5bp_3prime.fq.gz 8873822837 100% 36.13MB/s 0:03:54 (xfer#43, to-check=2/46) TRIM_cat/Sealice_F2_S22_R2_001_trimmed.5bp_3prime.fq.gz 9376290902 100% 35.48MB/s 0:04:11 (xfer#44, to-check=1/46) TRIM_cat/slurm-1141613.out 72 100% 0.10kB/s 0:00:00 (xfer#45, to-check=0/46) sent 1012 bytes received 109567734710 bytes 37879943.21 bytes/sec total size is 109540986764 speedup is 1.00
- Alignment parameters test
- First, ran 20190806_BmrkCmpAS_Calig_100K.sh on mox
- OUTCOME: summary file shows very poor alignment in all parameters tested for both sea lice samples
- To see if poor alignment was specific to the sea lice samples or genome, I proceeded running the alignment test on the Salmon data (mox job 20190806_BmrkCmpAS_Ssalar_100K.sh)
- OUTCOME: summary file showed Salmon data aligns very poorly also, so it’s not sample or genome specific
- I ended up canceling the mox job before running all test parameters (only did AS-0.2, AS-0.6, and AS-1 corresponding to columns 2-4 in summary file)
- OUTCOME: summary file showed Salmon data aligns very poorly also, so it’s not sample or genome specific
- To test if my 5bp blunt trimming was causing problems, I ran the alignment parameters test on sea lice data that had only been adapter trimmed and lanes not concatenated
- mox script 20190806_BmrkCmpAS_Calig_test.sh
- OUTCOME: summary file showed alignments were all still really poor
- mox script 20190806_BmrkCmpAS_Calig_test.sh
-
I found this article discussing ‘dovetailing’ reads (where one or oth reads seem to extend past the start of the mate). It says that dovetailing reads are considered discordant and are not reported.
-
So I tested both the trimming and alignment parameters specified in article on one set of read pairs
(base) ostrich:TRIM_5bp strigg$ /Users/strigg/anaconda3/bin/TrimGalore-0.6.0/trim_galore \ > --output_dir /Users/strigg/Desktop/Salmo_Calig/TRIM \ > --fastqc_args \ > "--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM/FASTQC" \ > --clip_R1 9 \ > --clip_R2 9 \ > --paired \ > --path_to_cutadapt /Users/strigg/anaconda3/bin/cutadapt \ > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz \ > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
- I copied these trimmed reads 16C_26psu_1_S13_L001_R1 and 16C_26psu_1_S13_L001_R2 to mox and ran 20190806_BmrkCmpAS_Ssalar_100K_test.sh
- OUTCOME: summary file showed these parameters greatly improved mapping
-
To see if –clip_R1 and –clip_R2 could be adjusted to 5bp (as suggested by Zymo) without losing mapping efficiency, I ran the trimming again on the same two files (full output appended at end)
(base) ostrich:TRIM_5bp strigg$ /Users/strigg/anaconda3/bin/TrimGalore-0.6.0/trim_galore \ > --output_dir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp \ > --fastqc_args \ > "--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC" \ > --clip_R1 5 \ > --clip_R2 5 \ > --paired \ > --path_to_cutadapt /Users/strigg/anaconda3/bin/cutadapt \ > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz \ > /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
- I copied these trimmed reads (R1 and R2) to mox and ran 20190806_Bmrk_CompareAS_Ssalar_100K_test5bp.sh
- OUTCOME: summary file showed nearly identical mapping efficiencies so clipping 5bp is sufficient.
- I am currently running TrimGalore with the same parameters as directly above through this jupyter notebook 20190722_SalmonSeaLice_FASTQC_TRIM.ipynb on ostrich.
NEXT STEPS:
- Copy newly trimmed reads to mox
- Run alignment parameter test on subset of all trimmed reads
- Decide which parameters to go with
- Start full Bismark analysis on all data
Full output from TrimGALORE Trim adapters + 5bp
(base) ostrich:TRIM_5bp strigg$ /Users/strigg/anaconda3/bin/TrimGalore-0.6.0/trim_galore \
> --output_dir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp \
> --fastqc_args \
> "--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC" \
> --clip_R1 5 \
> --clip_R2 5 \
> --paired \
> --path_to_cutadapt /Users/strigg/anaconda3/bin/cutadapt \
> /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz \
> /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: '/Users/strigg/anaconda3/bin/cutadapt' (user defined)
Cutadapt seems to be working fine (tested command '/Users/strigg/anaconda3/bin/cutadapt --version')
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Output directory /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/ doesn't exist, creating it for you...
Output will be written into the directory: /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 538094 AGATCGGAAGAGC 1000000 53.81
Nextera 0 CTGTCTCTTATA 1000000 0.00
smallRNA 0 TGGAATTCTCGG 1000000 0.00
Using Illumina adapter for trimming (count: 538094). Second best hit was Nextera (count: 0)
Writing report to '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/16C_26psu_1_S13_L001_R1_001.fastq.gz_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.0
Cutadapt version: 2.4
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
All Read 1 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases
All Read 2 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications)
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: '--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC'
Output file(s) will be GZIP compressed
Writing final adapter and quality trimmed output to 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz <<<
10000000 sequences processed
20000000 sequences processed
This is cutadapt 2.4 with Python 3.6.8
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 852.63 s (38 us/read; 1.56 M reads/minute).
=== Summary ===
Total reads processed: 22,159,935
Reads with adapters: 17,366,785 (78.4%)
Reads written (passing filters): 22,159,935 (100.0%)
Total basepairs processed: 3,323,990,250 bp
Quality-trimmed: 20,636,466 bp (0.6%)
Total written (filtered): 2,599,584,885 bp (78.2%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17366785 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 29.9%
C: 14.0%
G: 20.5%
T: 35.6%
none/other: 0.0%
Overview of removed sequences
length count expect max.err error counts
1 2904827 5539983.8 0 2904827
2 423079 1384995.9 0 423079
3 231866 346249.0 0 231866
4 143449 86562.2 0 143449
5 117570 21640.6 0 117570
6 115529 5410.1 0 115529
7 96119 1352.5 0 96119
8 119364 338.1 0 119364
9 118406 84.5 0 116886 1520
10 112051 21.1 1 106032 6019
11 122987 5.3 1 113734 9253
12 119477 1.3 1 111842 7635
13 120023 0.3 1 112246 7777
14 122963 0.3 1 114085 8878
15 127528 0.3 1 119209 8319
16 126445 0.3 1 117002 9443
17 152761 0.3 1 140488 12273
18 129037 0.3 1 121213 7824
19 112889 0.3 1 105773 7116
20 133616 0.3 1 125004 8612
21 146516 0.3 1 136291 10225
22 126419 0.3 1 119045 7374
23 145391 0.3 1 135716 9675
24 136894 0.3 1 125670 11224
25 145117 0.3 1 135313 9804
26 134897 0.3 1 126717 8180
27 148757 0.3 1 137734 11023
28 145635 0.3 1 135492 10143
29 153188 0.3 1 140948 12240
30 163373 0.3 1 154030 9343
31 130689 0.3 1 122129 8560
32 157390 0.3 1 147156 10234
33 145226 0.3 1 135269 9957
34 158524 0.3 1 147826 10698
35 161582 0.3 1 148734 12848
36 173098 0.3 1 162838 10260
37 141921 0.3 1 133053 8868
38 164933 0.3 1 154468 10465
39 167185 0.3 1 155913 11272
40 175799 0.3 1 163207 12592
41 197239 0.3 1 176427 20812
42 165247 0.3 1 150517 14730
43 270149 0.3 1 254538 15611
44 85600 0.3 1 76961 8639
45 197628 0.3 1 175697 21931
46 723044 0.3 1 688287 34757
47 254889 0.3 1 240372 14517
48 114855 0.3 1 101708 13147
49 409546 0.3 1 390013 19533
50 125438 0.3 1 116192 9246
51 110069 0.3 1 99667 10402
52 381945 0.3 1 364475 17470
53 196688 0.3 1 188239 8449
54 63003 0.3 1 58174 4829
55 106109 0.3 1 101117 4992
56 55521 0.3 1 50154 5367
57 169184 0.3 1 161794 7390
58 35182 0.3 1 32212 2970
59 44244 0.3 1 39185 5059
60 173147 0.3 1 165425 7722
61 86440 0.3 1 82402 4038
62 40559 0.3 1 36479 4080
63 130914 0.3 1 123404 7510
64 122315 0.3 1 116337 5978
65 65805 0.3 1 60428 5377
66 148115 0.3 1 139100 9015
67 175930 0.3 1 163369 12561
68 293688 0.3 1 278052 15636
69 189726 0.3 1 179114 10612
70 75474 0.3 1 69457 6017
71 27345 0.3 1 23952 3393
72 97969 0.3 1 91968 6001
73 138145 0.3 1 130111 8034
74 143624 0.3 1 135279 8345
75 142674 0.3 1 134066 8608
76 141682 0.3 1 133308 8374
77 141205 0.3 1 132947 8258
78 133036 0.3 1 124966 8070
79 131630 0.3 1 123633 7997
80 128859 0.3 1 121116 7743
81 126119 0.3 1 118667 7452
82 123492 0.3 1 116262 7230
83 118847 0.3 1 111615 7232
84 115583 0.3 1 108642 6941
85 111409 0.3 1 104817 6592
86 107099 0.3 1 100545 6554
87 102731 0.3 1 96477 6254
88 101527 0.3 1 95245 6282
89 95383 0.3 1 89600 5783
90 95037 0.3 1 89230 5807
91 89836 0.3 1 84395 5441
92 84977 0.3 1 79865 5112
93 79479 0.3 1 74762 4717
94 76672 0.3 1 72010 4662
95 70676 0.3 1 66313 4363
96 66886 0.3 1 62766 4120
97 63105 0.3 1 59150 3955
98 58813 0.3 1 55101 3712
99 56801 0.3 1 52921 3880
100 51251 0.3 1 47973 3278
101 48131 0.3 1 44969 3162
102 43975 0.3 1 41134 2841
103 40749 0.3 1 38132 2617
104 37602 0.3 1 35059 2543
105 33601 0.3 1 31318 2283
106 30687 0.3 1 28622 2065
107 27550 0.3 1 25698 1852
108 25919 0.3 1 24135 1784
109 23271 0.3 1 21584 1687
110 20524 0.3 1 19077 1447
111 18563 0.3 1 17263 1300
112 16204 0.3 1 15036 1168
113 14110 0.3 1 13017 1093
114 12321 0.3 1 11394 927
115 10629 0.3 1 9884 745
116 9483 0.3 1 8756 727
117 8057 0.3 1 7440 617
118 7067 0.3 1 6491 576
119 5777 0.3 1 5264 513
120 4917 0.3 1 4511 406
121 4181 0.3 1 3800 381
122 3506 0.3 1 3174 332
123 2912 0.3 1 2640 272
124 2546 0.3 1 2297 249
125 2101 0.3 1 1902 199
126 1717 0.3 1 1548 169
127 1536 0.3 1 1361 175
128 1201 0.3 1 1095 106
129 1012 0.3 1 912 100
130 739 0.3 1 668 71
131 638 0.3 1 572 66
132 553 0.3 1 500 53
133 556 0.3 1 500 56
134 665 0.3 1 607 58
135 362 0.3 1 324 38
136 207 0.3 1 176 31
137 143 0.3 1 128 15
138 99 0.3 1 86 13
139 84 0.3 1 72 12
140 58 0.3 1 50 8
141 42 0.3 1 38 4
142 37 0.3 1 29 8
143 24 0.3 1 22 2
144 25 0.3 1 17 8
145 26 0.3 1 22 4
146 29 0.3 1 21 8
147 85 0.3 1 77 8
148 64 0.3 1 59 5
149 40 0.3 1 35 5
150 355 0.3 1 318 37
RUN STATISTICS FOR INPUT FILE: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R1_001.fastq.gz
=============================================
22159935 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)
Writing report to '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/16C_26psu_1_S13_L001_R2_001.fastq.gz_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.0
Cutadapt version: 2.4
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
All Read 1 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases
All Read 2 sequences will be trimmed by 5 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications)
Running FastQC on the data once trimming has completed
Running FastQC with the following extra arguments: '--outdir /Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC'
Output file(s) will be GZIP compressed
Writing final adapter and quality trimmed output to 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz <<<
10000000 sequences processed
20000000 sequences processed
This is cutadapt 2.4 with Python 3.6.8
Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 904.27 s (41 us/read; 1.47 M reads/minute).
=== Summary ===
Total reads processed: 22,159,935
Reads with adapters: 17,263,070 (77.9%)
Reads written (passing filters): 22,159,935 (100.0%)
Total basepairs processed: 3,323,990,250 bp
Quality-trimmed: 16,760,179 bp (0.5%)
Total written (filtered): 2,603,569,143 bp (78.3%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17263070 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 28.6%
C: 12.4%
G: 23.7%
T: 35.3%
none/other: 0.0%
Overview of removed sequences
length count expect max.err error counts
1 2980698 5539983.8 0 2980698
2 434290 1384995.9 0 434290
3 244115 346249.0 0 244115
4 143736 86562.2 0 143736
5 130207 21640.6 0 130207
6 112454 5410.1 0 112454
7 91383 1352.5 0 91383
8 130413 338.1 0 130413
9 102897 84.5 0 102495 402
10 113241 21.1 1 107719 5522
11 119188 5.3 1 111572 7616
12 120663 1.3 1 112735 7928
13 133942 0.3 1 125965 7977
14 126547 0.3 1 117452 9095
15 102830 0.3 1 96138 6692
16 149163 0.3 1 139761 9402
17 119889 0.3 1 111631 8258
18 113552 0.3 1 106594 6958
19 144826 0.3 1 135202 9624
20 135016 0.3 1 126722 8294
21 118040 0.3 1 110327 7713
22 135122 0.3 1 126795 8327
23 139623 0.3 1 130868 8755
24 139327 0.3 1 130312 9015
25 156486 0.3 1 147346 9140
26 131367 0.3 1 122873 8494
27 144343 0.3 1 134126 10217
28 135732 0.3 1 128445 7287
29 144460 0.3 1 135341 9119
30 143460 0.3 1 135475 7985
31 154935 0.3 1 145507 9428
32 140211 0.3 1 132501 7710
33 154714 0.3 1 145023 9691
34 197197 0.3 1 185506 11691
35 114069 0.3 1 107701 6368
36 154713 0.3 1 145095 9618
37 166945 0.3 1 157027 9918
38 155394 0.3 1 147358 8036
39 151879 0.3 1 143914 7965
40 168764 0.3 1 158797 9967
41 162964 0.3 1 154146 8818
42 166211 0.3 1 157158 9053
43 152395 0.3 1 144005 8390
44 174656 0.3 1 164812 9844
45 186871 0.3 1 176811 10060
46 135036 0.3 1 127666 7370
47 168636 0.3 1 159853 8783
48 166288 0.3 1 157802 8486
49 174530 0.3 1 165351 9179
50 164769 0.3 1 155907 8862
51 207765 0.3 1 197011 10754
52 147151 0.3 1 139795 7356
53 177051 0.3 1 168817 8234
54 133625 0.3 1 126900 6725
55 174567 0.3 1 166338 8229
56 178950 0.3 1 169886 9064
57 168945 0.3 1 161037 7908
58 161931 0.3 1 154618 7313
59 165789 0.3 1 157874 7915
60 171736 0.3 1 163328 8408
61 175826 0.3 1 167046 8780
62 192123 0.3 1 182534 9589
63 252743 0.3 1 241513 11230
64 207112 0.3 1 199124 7988
65 135660 0.3 1 129354 6306
66 145866 0.3 1 139110 6756
67 145748 0.3 1 138731 7017
68 146161 0.3 1 139522 6639
69 144487 0.3 1 137768 6719
70 154292 0.3 1 147160 7132
71 158353 0.3 1 151327 7026
72 150227 0.3 1 143181 7046
73 157376 0.3 1 150704 6672
74 144388 0.3 1 138042 6346
75 137234 0.3 1 131094 6140
76 133825 0.3 1 127901 5924
77 133664 0.3 1 127640 6024
78 127632 0.3 1 121889 5743
79 122934 0.3 1 117397 5537
80 122791 0.3 1 117317 5474
81 119094 0.3 1 113976 5118
82 115048 0.3 1 109953 5095
83 110957 0.3 1 106119 4838
84 107760 0.3 1 102959 4801
85 105919 0.3 1 100973 4946
86 104575 0.3 1 99936 4639
87 105119 0.3 1 100537 4582
88 102473 0.3 1 97892 4581
89 90861 0.3 1 86820 4041
90 87725 0.3 1 83791 3934
91 82796 0.3 1 79052 3744
92 76768 0.3 1 73386 3382
93 74497 0.3 1 71224 3273
94 71670 0.3 1 68502 3168
95 67300 0.3 1 64357 2943
96 63751 0.3 1 61088 2663
97 58781 0.3 1 56250 2531
98 54871 0.3 1 52462 2409
99 53000 0.3 1 50773 2227
100 48407 0.3 1 46381 2026
101 45093 0.3 1 43177 1916
102 40752 0.3 1 38996 1756
103 37584 0.3 1 35906 1678
104 35749 0.3 1 34226 1523
105 31850 0.3 1 30464 1386
106 29282 0.3 1 28064 1218
107 26084 0.3 1 24956 1128
108 23726 0.3 1 22669 1057
109 21646 0.3 1 20704 942
110 19323 0.3 1 18515 808
111 18235 0.3 1 17436 799
112 15797 0.3 1 15095 702
113 13713 0.3 1 13089 624
114 11968 0.3 1 11416 552
115 10124 0.3 1 9656 468
116 8777 0.3 1 8354 423
117 7339 0.3 1 6995 344
118 6452 0.3 1 6137 315
119 5422 0.3 1 5101 321
120 4499 0.3 1 4281 218
121 3883 0.3 1 3680 203
122 3313 0.3 1 3156 157
123 2807 0.3 1 2669 138
124 2400 0.3 1 2285 115
125 2022 0.3 1 1907 115
126 1650 0.3 1 1555 95
127 1434 0.3 1 1346 88
128 1129 0.3 1 1055 74
129 969 0.3 1 902 67
130 682 0.3 1 644 38
131 587 0.3 1 552 35
132 523 0.3 1 500 23
133 524 0.3 1 494 30
134 633 0.3 1 604 29
135 337 0.3 1 318 19
136 197 0.3 1 186 11
137 147 0.3 1 133 14
138 92 0.3 1 88 4
139 79 0.3 1 71 8
140 57 0.3 1 52 5
141 37 0.3 1 35 2
142 37 0.3 1 35 2
143 25 0.3 1 20 5
144 20 0.3 1 20
145 26 0.3 1 25 1
146 24 0.3 1 23 1
147 85 0.3 1 85
148 61 0.3 1 59 2
149 39 0.3 1 36 3
150 325 0.3 1 310 15
RUN STATISTICS FOR INPUT FILE: /Users/strigg/Desktop/Salmo_Calig/16C_26psu_1_S13_L001_R2_001.fastq.gz
=============================================
22159935 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)
Validate paired-end files 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz and 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz
file_1: 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz, file_2: 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz
>>>>> Now validing the length of the 2 paired-end infiles: 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz and 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz <<<<<
Writing validated paired-end read 1 reads to 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz
Writing validated paired-end read 2 reads to 16C_26psu_1_S13_L001_R2_001_val_2.fq.gz
Total number of sequences analysed: 22159935
Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 42248 (0.19%)
>>> Now running FastQC on the validated data 16C_26psu_1_S13_L001_R1_001_val_1.fq.gz<<<
Specified output directory '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC' does not exist
>>> Now running FastQC on the validated data 16C_26psu_1_S13_L001_R2_001_val_2.fq.gz<<<
Specified output directory '/Users/strigg/Desktop/Salmo_Calig/TRIM_5bp/FASTQC' does not exist
Deleting both intermediate output files 16C_26psu_1_S13_L001_R1_001_trimmed.fq.gz and 16C_26psu_1_S13_L001_R2_001_trimmed.fq.gz
====================================================================================================