Chironomus riparius

Profile

Species: Chironomus riparius (Meigen) a.k.a. C. thummi - Harlequin fly

Order: Diptera

Family: Chironomidae

Genus: Chironomus

C. riparius, also known as C. thummi and commonly known as the harlequin fly, is a species of non-biting midge. The larvae are known by the common name of blood worm due to their red colouration. It is common in both North America and Europe.

It has been used extensively as a model for genome structure analysis in insects and is also used in toxicology tests and functional developmental genetic studies. Both the adult and larval forms have been implicated as disease vectors but are also an important part of freshwater food chains.

Chironomus riparius are easy to maintain in a laboratory environment and has been used extensively as a model for genome structure analysis in insects and is also used in toxicology tests and functional developmental genetic studies.

Source: https://en.wikipedia.org/wiki/

Sample collection

Originally sourced in Germany. Procured commercially: Innovative Environmental services (IES) Ltd, Switzerland.

Next Generation Sequencing

i) Illumina Hi-C sequencing 150 bp paired end data:

925,594,532 reads and 723x coverage.

ii) PacBio HiFi data, of mean read length 14,960, total reads 615,477, read length N50 16,105, and total bases 9,207,869,729. 450ng extracted using MagAttract method from a single individual at Rothamsted Research.

Methods

DNA was extracted using a modified MagAttract method from the published Mosquito protocol (https://www.biorxiv.org/content/10.1101/499954v1). The volume was 30ul, concentration 15ng/ul by Qubit analysis at Rothamsted (450ng gDNA).

Single female individual DNA used for PacBio HiFi and Hi-C Illumina sequencing (University College London, Arima Genomics respectively). Hifiasm was used to assemble the PacBio HiFi, with Juicer and 3d-dna and Hi-C data used for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Manual curation was done to bring the genome together and check for miss-assemblies.

Public RNA-seq was assembled (BUSCO: C:94.7%[S:53.7%,D:41.0%],F:0.4%,M:4.9%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. Data: PRJEB15223 (Larvae), PRJNA166085 (egg ropes, all four larval stages, pupae and male and female adults. Also included are larvae exposed to different concentrations of several model toxicants), PRJNA229141 (anterior and posterior early embryo), PRJNA675286 (larvae - TMO/LCO exposure). PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.

A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.

Two Endosymbionts were assembled which included an unknown Enterobacter (1,661,850 bp) and Wolbachia (559,667 bp).


Final Results

A complete annotated 4 chromosome assembly deposited at NCBI under accession PRJEB47883 (incl. raw data).

BUSCO (Insecta odb10): C:98.9%,F:0.3%,M:0.8%

15,212 gene models - BUSCO C:96.9%[S:88.9%,D:8.0%],F:0.1%,M:3.0%

Scaffold No. (incl Mt): 16

N50: 58,906,861

N bases (bp): 32,000

Repeat: 23.83%

Total size (bp) (chr no.): 191,837,449 (4)

Curated: 151x P450, 86x ABC transporter, 60x UGT, and the majority of 108/130 defined IRAC gene models.

Two Endosymbionts were assembled which included an unknown Enterobacter (1,661,850 bp) and Wolbachia (559,667 bp).


Other files

These are files that were not submitted to NCBI but might be useful.

Genomic PFAM annotation track

Non-coding RNA annotation track

Repeat annotation track

Repeat library