remapping proteins sequences to 2019 Uniprot DB
reference this jupyter notebook: 20190130_Cg_Giga_cont_AA.fa_BLASTP_uniprot_swprot2019.ipynb
- Rebuilt the BLAST index from 2019 Uniprot DB
- it lives here http://gannet.fish.washington.edu/metacarcinus/Cgigas
-
remapped protein sequences from Steven’s file http://gannet.fish.washington.edu/halfshell/bu-git-repos/nb-2017/C_gigas/data/Cg_Giga_cont_AA.fa to updated Uniprot DB.
- reformatted BLAST output to remove pipes in protein names and contain the following fields:
- protein_ID
- Entry
- Entry_name
- perc_ident_match
- align_len
- num_mismatch
- num_gaps
- querStart
- querEnd
- subjStart
- subjEndevalue
- bitscore
- Entry.1
- Entry_name.1
- Protein_names
- Gene_names
- Organism
- Protein_length Pathway
- GO_bp
- GO
- GO_IDs
- Protein_fams - first 10 lines of data file can be previewed here. Complete data file can be found here http://gannet.fish.washington.edu/metacarcinus/Cgigas/all_giga-uniprot-blastP-out.nopipe.annotations.tab
fold change analysis and p-value
- made all fold change comparisons for all time points and temperatures to time 0, and calculated p-values using chi square proportions test. Did not do anything with zero values.
- analysis here and used this R project