Syngenta received this strain from Aventis in April 2000.
Next Generation Sequencing
i) Illumina Hi-C sequencing 150 bp paired end data:
234,532,466 reads and 45x coverage.
ii) Illumina 10X sequencing 150 bp paired end data:
328,141,794 reads and 63x coverage.
iii) PacBio CLR data, of mean read length 11,591, total reads 11,066,056, read length N50 14,892, and total bases 128,275,638,188. DNA was extracted using DNAzol at Rothamsted Research (2250ng gDNA) and BluePippin purified before library preparation.
Non-sexed single-individual used for PacBio CLR (University of Maryland, USA), 10X genomics Illumina, and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Falcon was used to assemble the PacBio CLR, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (Redundans). Manual curation was done to bring the genome together and check for miss-assemblies. Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Error correction was done with Illumina 10X library data using freebayes.
RNA-seq data PRJNA89435 (midgut), PRJNA183730 (Parasitization by a braconid wasp, Cotesia chilonis), PRJNA375715 (6th instar larvae and prepupae), PRJNA551080 (adult) assembled with a BUSCO: C:99.5%[S:61.3%,D:38.2%],F:0.1%,M:0.4% and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.
A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.
An endosymbiont Enterobacter species was assembled (4,762,943 bp).