Orius laevigatus (commonly known as a minute pirate bug) were obtained from ‘Bioline AgroSciences’. CO2 was used for anaesthesia to allow the insects to be sorted from the substrate. Both adults and nymphs were then flash frozen with liquid N2 and stored at − 80 °C. The whole process was done within 48 h of arrival.
Next Generation Sequencing
i) Illumina genomic sequencing 150 bp paired end data:
413,143,574 reads with a total length of 123 Gb (~820x coverage).
ii) PacBio CLR data, total reads 537,651, read length N50 11,287, and total bases 6Gb (44x). DNA was extracted from ~10000 individuals.
iii) Illumina RNA sequencing 150 bp paired end data: 413,137,378 reads.
The raw PacBio long reads were assembled into contigs with the Flye v2.5. de novo assembler. Rascaf was then used to improve the Flye genome assembly with RNA-seq data. Contigs were also produced with the raw PacBio long reads using Canu v1.8 as well as with FALCON v1.3.0 and FALCON-Unzip, which is recommended for heterozygous/outbred organisms with diploid or higher ploidy (and also includes phased-polishing with Arrow).
QuickMerge v0.3 was used to merge the assemblies, with Flye as the reference assembly. BUSCO outputs were compared between the merged assembly and the standalone assemblies to identify genes which had been lost during the merging process. Full-length contigs containing these missing genes were extracted from the standalone assemblies and added to the merged assembly, based on the assumption that these contigs would also contain other missed genes (i.e. those not included in BUSCO’s list of 1658 core insect genes). Multiple rounds of Pilon error polishing were performed, using the Illumina short read data, until no further improvement in BUSCO score was seen.
Redundans was used for scaffolding and redundant contig removal. Redundans is geared towards highly heterozygous genomes. Some redundant regions had to be removed manually, as Redundans does not detect redundancy when only part of the contig is duplicated. The nucmer tool from the MUMmer4 package was used to detect these redundant regions through a whole genome self-alignment.
A BLAST search against the NCBI Reference Sequence (Refseq) database release 93, was performed using the Tera-BLAST algorithm on a TimeLogic DeCypher system (Active Motif Inc., Carlsbad, CA). The results were processed with Megan to identify any bacterial or viral sequences which were then removed manually in Geneious v10.2.6.