Diabrotica balteata

Profile

Species: Diabrotica balteata (LeConte) – Banded cucumber beetle

Order: Coleoptera

Family: Chrysomelidae

Genus: Diabrotica

D. balteata is a tropical insect, occurring in many areas of North and South America. Until the early 1900s it was limited to southern Arizona and Texas and south through Mexico and Central America. It has since expanded its range throughout the southern United States from North Carolina to southern California, though its intolerance to freezing temperatures probably limits its northward distribution. Within Florida it is most abundant in the organic soils near Lake Okechobee, though it occurs throughout the state, where it is known as a vegetable pest.

Adults feed on a wide range of plants, but seem to prefer plants from the family Cucurbitaceae, Rosaceae, Leguminoseae, and Crucifereae. Vegetable crops damaged include cucumber, squash, beet, bean, pea, sweet potato, okra, corn, lettuce, onion, and various cabbages. Bean and soybean are especially favoured. All parts of the plant are injured, foliage, blossoms, silk, kernels, the plant crown, and roots. Larvae feed only on the roots. The most frequent forms of serious injury are defoliation by adults and root feeding on plant seedlings by larvae. It is known as a vector of virus diseases in beans, and larval feeding might increase the incidence and severity of Fusarium wilt.

Source: University of Florida ‘Featured Creatures’ https://entnemdept.ufl.edu/creatures

Sample collection

Collected in the USA on the 27th July 2011. It was found to not be chemical resistant.

Next Generation Sequencing

i) Illumina Hi-C sequencing 150 bp paired end data:

340,568,496 reads and 31x coverage.

ii) Illumina 10X sequencing 150 bp paired end data:

468,733,082 reads and 43x coverage.

ii) Illumina RNA-seq sequencing 150 bp paired end data:

44,428,924 reads.

iii) PacBio CLR data, of mean read length 18,295, total reads 5,454,563, read length N50 32,151, and total bases 99,791,115,120. DNA was extracted from a 12x multi-individual sample using MagAttract (6000ng gDNA).

iiii) PacBio isoseq data, x2 smrt cells using Sequel II.

Methods

Non-sexed multi-individual DNA used for PacBio CLR (University of Leiden, Netherlands) and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Falcon was used to assemble the PacBio CLR, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Manual curation was done to bring the genome together and check for miss-assemblies. Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Error correction was done with Illumina 10X library data using freebayes.

PGI ISO-seq (BUSCO: C:91.7%[S:34.5%,D:57.2%],F:0.9%,M:7.4%) data was used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.

A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.

Final Results

A complete annotated 10 chromosome assembly deposited at NCBI under accession PRJEB47897 (incl. raw data).

BUSCO (Insecta odb10): C:95.3,F:1.3%,M:3.4%

13,810 gene models - BUSCO C:92.8%[S:87.2%,D:5.6%],F:2.9%,M:4.3%

Scaffold No. (incl Mt): 408

N50: 161,557,723

N bases (bp): 1,774,900

Repeat: 47.40%

Total size (bp) (chr no.): 1,607,116,900 (10)

Curated: 130x P450, 0x ABC transporter, 41x UGT, and 115/130 defined IRAC gene models.

Genome files

Genome fasta file