Purpose

The purpose of this entry is to align RNAseq and EM-seq data to the P. tuahiniensis genome using nf-core pipelines on Klone.

Results

RNAseq

mutliqc: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250421_RNAseq/multiqc/multiqc_report.html
RNAseq output and counts matrices: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250421_RNAseq/
Samples with poor alignment, duplicated reads and overrepresented seqs
- POC-201-TP3 44.9% & 41.6% (unmapped: too short; low coverage remaining: 15.8M)
- POC-219-TP3 18.1% & 17.1% (unmapped: too short; low coverage remaining: 8.34M)
- POC-52-TP1 17.1% & 15.9% (unmapped: too short; low coverage: 5.4M)
- POC-255-TP3 51.6% & 47.4% (overrepresented seqs)
Samples with GC bias
- all samples above and POC-42-TP4 (low coverage: 21.15M)

EM-seq

multiqc: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250422_methylseq/multiqc/bismark/multiqc_report.html
Methylseq output and counts matrices: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250422_methylseq/

Methods

Copy genome files to klone

# show path
pwd
/gscratch/srlab/strigg/GENOMES

# copy genome
wget https://owl.fish.washington.edu/halfshell/genomic-databank/Pocillopora_meandrina_HIv1.assembly.fasta

#copy gtf file
wget https://github.com/urol-e5/timeseries_molecular/raw/d5f546705e3df40558eeaa5c18b122c79d2f4453/F-Ptua/data/Pocillopora_meandrina_HIv1.genes-validated.gtf

# copy gff file
wget https://github.com/urol-e5/timeseries_molecular/raw/d5f546705e3df40558eeaa5c18b122c79d2f4453/F-Ptua/data/Pocillopora_meandrina_HIv1.genes-validated.gff3

Note: I had to remove the ‘3’ from the file extention of the gff and zip it to run the nf-core RNAseq pipeline

Copy RNAseq data to Klone

# open screen session (reopened existing session)
screen -r RNAseq

# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=16:00:00

# copy data
rsync --progress --verbose --archive shellytrigg@gannet.fish.washington.edu:/volume2/web/gitrepos/urol-e5/timeseries_molecular/F-Ptua/output/01.00-F-Ptua-RNAseq-trimming-fastp-FastQC-MultiQC/*.gz /gscratch/scrubbed/strigg/analyses/20250421_RNAseq

Copy WGBS to klone

# open screen session
screen -r methylseq

# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=16:00:00

# copy data 
rsync --progress --verbose --archive shellytrigg@gannet.fish.washington.edu:/volume2/web/gitrepos/urol-e5/timeseries_molecular/F-Ptua/output/01.00-F-Ptua-WGBS-trimming-fastp-FastQC-MultiQC/*.gz /gscratch/scrubbed/strigg/analyses/20250421_methylseq

Run RNAseq pipeline

# open screen session 
screen -r RNAseq

# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=24:00:00

# activate conda environment
mamba activate nextflow

# run pipeline
nextflow run nf-core/rnaseq -resume \
-c /gscratch/srlab/nextflow/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250421_RNAseq/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250421_RNAseq \
--gtf /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.genes-validated.gtf \
--gff /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.genes-validated.gff.gz \
--fasta /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.assembly.fasta \
--skip_trimming \
--aligner star_salmon \
--skip_pseudo_alignment \
--multiqc_title Pmeandrina_RNAseq \
--deseq2_vst

Run methylseq pipeline

Had to rerun the pipeline because my first iteration omitted a backslash after the --em_seq parameter so it included trimming by default.

Second iteration run in the same screen session and interactive node initiated in the first iteration.

nextflow run nf-core/methylseq \
-c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250422_methylseq/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250422_methylseq \
--fasta /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.assembly.fasta \
--em_seq \
-resume \
-with-report nf_report.html \
-with-trace \
-with-timeline nf_timeline.html \
--skip_trimming \
--nomeseq 

First iteration (started on 2025-04-21)

# open screen session 
screen -r methylseq

# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=72:00:00

#activate conda environment
mamba activate nextflow

#run methylseq pipeline

nextflow run nf-core/methylseq \
-c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250421_methylseq/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250421_methylseq \
--fasta /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.assembly.fasta \
--em_seq 
-resume \
-with-report nf_report.html \
-with-trace \
-with-timeline nf_timeline.html \
--skip_trimming \
--nomeseq