Bemisia tabaci


Species: Bemisia tabaci (Gennadius) – Silver leaf whitefly/Sweet potato whitefly/Tobacco whitefly

Order: Hemiptera

Suborder: Sternorrhyncha

Family: Aleyrodidae

Genus: Bemisia

Very few countries remain free from B. tabaci, illustrating the difficulty of preventing its movement in international trade. Furthermore, it is likely that various species of the B. tabaci complex are already present, but unreported. Most biotypes can vector over 60 plant viruses. Those biotypes that are poor vectors, appear to be so, due to their inability to feed on alternative host plant species. Whitefly-transmitted viruses are by far the most important agriculturally, causing yield losses to crops of between 20 and 100% and causing a range of different symptoms that include yellow mosaics, yellow veining, leaf curling, stunting and vein thickening.

The Mediterranean species (formerly known as Q biotype) is found throughout the Iberian Peninsula, around the Mediterranean basin (including Israel) and in the Canary Islands. It is widely thought that this is the indigenous biotype to these regions, although it co-exists with other species in Israel, Italy and the Canary Islands. It was first recorded in the UK in 2012. It has, over recent years, been exposed to extensive insecticide applications and with-in areas of intensive agriculture exhibits a high level of resistance.

Source: CABI Invasive Species Compendium

Sample collection

Originally sourced from Spain, Almeria. Obtained by Syngenta from Rothamsted Research in 2007.

Next Generation Sequencing

i) Illumina Hi-C sequencing 150 bp paired end data:

120,548,746 reads and 30x coverage.

ii) PacBio CLR data, of mean read length 9,693, total reads 6,484,563, read length N50 13,887, and total bases 62,856,441,162. DNA was extracted in two preparations using Zymo Insect kit and Zymo HMW kit and combined (1539ng gDNA) at the University of Georgia.


Multi-individual used for PacBio CLR (University of Georgia, USA) and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Falcon was used to assemble the PacBio CLR, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Manual curation was done to bring the genome together and check for miss-assemblies. Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Error correction was done with Illumina Hi-C data using freebayes.

Public RNA-seq data was assembled into a transcriptome (BUSCO: C:98.6%[S:96.7%,D:1.9%],F:0.1%,M:1.3%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.

A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.

Endosymbionts previously reported in NCBI (CP007563.1) Candidatus Portiera aleyrodidarum MED (357,461 bp) and (CP007563.1) Candidatus Hamiltonella defensa (Bemisia tabaci) strain MEAM1 (1,739,504 bp) were assembled with identical identity so were not submitted again.

Final Results

A complete annotated 10 chromosome assembly deposited at NCBI under accession PRJEB47898 (incl. raw data).

BUSCO (Insecta odb10) : C:97.2%,F:0.7%,M:2.1%

14,350 gene models - BUSCO of C:95.5%[S:84.7%,D:10.8%],F:0.7%,M:3.8%

Scaffold No. (incl Mt): 158

N50: 57,159,705

N bases: 397,000

Repeat: 50.30%

Total size (bp) (chr): 611,946,799 (10)

Curated: 120x P450, 0x ABC transporter, 55x UGT, and 104/130 IRAC gene models.

Other files

These are files that were not submitted to NCBI but might be useful.

Genomic PFAM annotation track

Non-coding RNA annotation track

Repeat annotation track

Repeat library