Species: Phaedon cochleariae (Fabricius) – Mustard beetle/Watercress beetle

Order: Coleoptera

Family: Chrysomelidae

Genus: Phaedon

P. cochleariae is mainly found in Europe, especially in the UK. However, since its accidental introduction in the early 20th century, it has become established and widespread in the north eastern United states and eastern Canada. Typically host plants include various Brassicas but especially watercress (Nasturtium officinale), great yellow-cress (Rorippa amphibia) and garlic mustard (Alliaria petiolata). Away from wetlands it also occurs on black mustard (Brassica nigra) and shepherd’s purse (Capsella bursa-pastoris) and has been recorded from a range of cultivated Brassica crops such as turnip (Brassica rapa) and horse-radish (Armoracia rusticana).

Females chew small cavities into the underside of leaves into which they lay single eggs, and larvae emerge after a week or so and feed openly on the foliage. Adults sometimes occurring in large aggregations where the foliage may be skeletonised and appear blue with the beetles. Adults are also good flyers and so new habitats may be quickly colonized.

Source: UK Beetles

Sample collection

Collected on 01.02.1961; UK; England (Rothamsted Experimental Station).

Next Generation Sequencing

i) Illumina Hi-C sequencing 150 bp paired end data:

459,724,380 reads and 79x coverage.

ii) Illumina RNA-seq sequencing 150 bp paired end data:

332,250,852 reads.

iii) PacBio HiFi data, of mean read length 16,612, total reads 1,588,895, read length N50 17,008, and total bases 26,395,523,915. DNA was extracted at the University of Delaware using the MagAttract kit (2000ng gDNA).

iiii) PacBio isoseq data, x2 smrt cells using Sequel II.


Non-sexed single-individual used for PacBio HiFi (University of Delaware, USA) and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Hifiasm was used to assemble the PacBio HiFi, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Manual curation was done to bring the genome together and check for miss-assemblies. Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Error correction was done with Hi-C data using freebayes.

PGI Iso-seq transcriptome (2 smrt cell) was assembled (INSECTA BUSCO: C:72.6%[S:20.3%,D:52.3%],F:1.0%,M:26.4%, FUNGI BUSCO: C:52.9%[S:15.4%,D:37.5%],F:2.9%,M:44.2%) and a PGI RNA-seq transcriptome (BUSCO: C:97.1%[S:95.8%,D:1.3%],F:0.4%,M:2.5%) which were used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.

A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.

An endosymbiont/parasite was assembled as an unknown fungus, potentially Tubulinosema (10,826,776 bp).

Final Results

A complete annotated 17 chromosome assembly deposited at NCBI under accession PRJEB47900 (incl. raw data).

BUSCO (Insecta odb10): C:98.1%,F:0.6%,M:1.3%

13,141 gene models - BUSCO C:95.4%[S:88.2%,D:7.2%],F:1.7%,M:2.9%

Scaffold No. (incl Mt): 61

N50: 57,205,461

N bases (bp): 870,500

Repeat: 18.11%

Total size (bp) (chr no.): 871,249,447 (17)

Curated: 89x P450, 78x ABC transporter, 35x UGT, and the 120/130 defined IRAC gene models.

One endosymbiont/parasite was assembled as an unknown fungi, potentially Tubulinosema (10,826,776 bp).

Other files

These are files that were not submitted to NCBI but might be useful.

Genomic PFAM annotation track

Non-coding RNA annotation track

Repeat annotation track

Repeat library