Collected on 01.02.1961; UK; England (Rothamsted Experimental Station).
Next Generation Sequencing
i) Illumina Hi-C sequencing 150 bp paired end data:
459,724,380 reads and 79x coverage.
ii) Illumina RNA-seq sequencing 150 bp paired end data:
iii) PacBio HiFi data, of mean read length 16,612, total reads 1,588,895, read length N50 17,008, and total bases 26,395,523,915. DNA was extracted at the University of Delaware using the MagAttract kit (2000ng gDNA).
iiii) PacBio isoseq data, x2 smrt cells using Sequel II.
Non-sexed single-individual used for PacBio HiFi (University of Delaware, USA) and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Hifiasm was used to assemble the PacBio HiFi, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Manual curation was done to bring the genome together and check for miss-assemblies. Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Error correction was done with Hi-C data using freebayes.
PGI Iso-seq transcriptome (2 smrt cell) was assembled (INSECTA BUSCO: C:72.6%[S:20.3%,D:52.3%],F:1.0%,M:26.4%, FUNGI BUSCO: C:52.9%[S:15.4%,D:37.5%],F:2.9%,M:44.2%) and a PGI RNA-seq transcriptome (BUSCO: C:97.1%[S:95.8%,D:1.3%],F:0.4%,M:2.5%) which were used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.
A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.
An endosymbiont/parasite was assembled as an unknown fungus, potentially Tubulinosema (10,826,776 bp).