Insecticide susceptible individual from a culture at Bayer established since 2007, originally collected from an unknown host plant (Andermatt, Switzerland).
Next Generation Sequencing
i) Illumina 10X genomics 150 bp paired end data:
431,917,580 reads and 64,787,637,000 bp.
ii) PacBio CLR data, of mean read length 22,626, total reads 938,328, read length N50 35,348, and total bases 21,230,928,495. DNA was extracted using DNAzol (2000ng) at Rothamsted.
Single individual DNA used for PacBio CLR and 10X genomics sequencing (Georgia genomics). Falcon was used to assemble the PacBio CLR, with 10X genomics used for error correction (pilon). Haplotigs were removed (redundans). Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Manual curation was done to bring the genome together and check for miss-assemblies.
Public RNA-seq data: PRJNA171128 (egg, nymph, pupa, adult tissues), and PRJNA188757 (eggs, 1(st) to 5(th) instar larvae, pupae, male and female adults) were assembled into a transcriptome (BUSCO: C:97.0%[S:59.4%,D:37.6%],F:1.3%,M:1.7%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.
A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq.