An insecticide susceptible strain which has become wingless during long-term culture. The strain has been maintained at Rothamsted since 2012, when it was supplied by Bayer.
Next Generation Sequencing
i) Illumina Hi-C sequencing 150 bp paired end data:
824,808,638 reads and 132x coverage.
ii) PacBio CLR data, of mean read length 9,519, total reads 26,277,464, read length N50 13,526, and total bases 250,138,790,988. DNA was extracted using 6 individuals using the MagAttract kit at Rothamsted Research (2000ng gDNA).
iii) Illumina 10X sequencing 150 bp paired end data:
417,294,921 reads and 134x coverage.
iv) Nanopore sequencing of 2 flow cells, totalling 7,235,072 reads, 24,277,531,149 bp and 26x coverage.
Multi-individual used for PacBio CLR (University of Georgia, USA) and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Falcon was used to assemble the PacBio CLR, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Manual curation was done to bring the genome together and check for miss-assemblies. Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Error correction was done with Illumina Hi-C data using freebayes.
Public RNA-seq data was assembled into a transcriptome (BUSCO: C:97.6%[S:59.8%,D:37.8%],F:1.5%,M:0.9%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using infernal v1.1.4.
A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC were curated.
Endosymbiont Arsenophonus (2,858,500 bp) was assembled and submitted.