Example gene Annotation workflow
• Advanced Repeat library pipeline
• Genome guided transcriptome using trinity and stringtie -> combined using Evigene
• Maker gene prediction using trained Augustus and Genemark and Evigene evidence
• Maker transfer evidence using Evigene (transcriptome)
• Evigene pick best models from both Maker gene prediction and transferred set
BUSCO: C:94.9%[S:87.3%,D:7.6%],F:0.9%,M:4.2% = ~30K gene models
• Additional steps using Gffcompare to pull in novel transcripts and manual curation to select models and remove excess gene predictions without evidence.
BUSCO: C:96.2%[S:88.4%,D:7.8%],F:0.7%,M:3.1% = ~13,500 gene models
• PASA to correct models and add UTR/isoforms
BUSCO: C:96.4%[S:85.7%,D:10.7%],F:0.5%,M:3.1% = ~13,400 gene models
i) Manual model selection followed to split models that were incorrect and curate where possible any obviously incorrect models.
ii) Using genomic pfam track to identify loci. Manual curation of gene families of interest: P450, UDP, ABC, IRAC.
Example Genome Assembly workflow for 20x coverage
• Hifiasm assembly (recommend deepconsensus to correct before)
• Geneious lastz alignments to do some limited manual further removal of redundancy and merge overlapping contigs
• HiC juicer and 3d-dna to scaffold (switch off breaking as excessive "-r 0")
• Check Juicer HiC maps and manually correct/check
• Map HiC reads and error correct using homozygous snp/indels
• Take unmapped HiC reads and map back to original assembly to check if missing sequence has been lost and reinsert
• Check lastz alignments to check for any duplication error and check for artefact's
Note: Difference when coverage is 5x would be to do different assemblies using flye, canu and use quickmerge and additional manual curation to try and merge contigs using geneious. But the result will be limited without 20x coverage.