Aphis gossypii


Species: Aphis gossypii (Glover) - Cotton aphid/Melon aphid

Order: Hemiptera

Suborder; Sternorrhyncha

Family: Aphididae

Genus: Aphis

A. gossypii has a worldwide distribution, although in arctic regions it is mostly confined to glasshouses. It is particularly abundant in the tropics. It is extremely polyphagous and very damaging to many economically important crops, including cotton, aubergine, citrus, coffee, melon, okra, peppers, potato, squash and sesame. It is a major pest of cotton and cucurbits

Economic damage due to A. gossypii is by direct feeding, the excretion of honeydew and virus transmission. Damage to cotton, okra and certain cucurbits occurs when large populations of aphids build up, feed on the crops and excrete honeydew. However, its biggest overall economic impact is as a vector of pathogenic plant viruses in over two dozen crops. There is little quantitative information on exact crop losses. In cotton, for example, A. gossypii is only one of many crop pests. Monetary losses to this pest are substantial and are a result of crop loss and crop quality reduction.

Source: CABI Invasive Species Compendium https://www.cabi.org/isc

Sample collection

Multi-individual clonal population. In culture at Bayer since 2001, originally collected from greenhouse cucumber (Hohenheim, Germany).

Next Generation Sequencing

i) Illumina Hi-C sequencing 150 bp paired end data:

475,901,602 reads and 219x coverage.

ii) Illumina 10X sequencing 150 bp paired end data:

519,690,998 reads and 239x coverage.

iii) PacBio CLR data using low input protocol after sample bead clean-up (~50% contaminated with myzus persicae), of mean read length 9,349, total reads 2,606,695, read length N50 17,236, and total bases 24,371,365,388. DNA was extracted using DNAzol at Rothamsted Research (1250ng gDNA).


Non-sexed multi-individual DNA (mixed species sample with Myzus persicae) used for low input PacBio CLR (University of Georgia, USA) and multi-individual for Hi-C Illumina sequencing (Arima Genomics USA). Falcon was used to assemble the PacBio CLR, with Juicer then 3d-dna using Hi-C data for chromosome level assembly. Haplotigs were removed (purge_haplotigs). Aphis contigs were selected using uncontaminated Illumina reads mapped to the assembly. Manual curation was done to bring the genome together and check for miss-assemblies. Error correction was done with Illumina 10X library data using pilon.

PGI RNA-seq data was assembled into a transcriptome (BUSCO: C:96.0%[S:75.6%,D:20.4%],F:0.7%,M:3.3%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using Infernal v1.1.4.

A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq and a Maker gene annotation.

An endosymbiont previously reported in NCBI (CP056771.1) Buchnera aphidicola (628,098 bp) was assembled with identical identity so was not submitted again.

Final Results

A complete annotated 4 chromosome assembly deposited at NCBI under accession PRJEB47903 (incl. raw data).

BUSCO (Insecta odb 10): C:96.4%,F:0.9%,M:2.7%

12,229 gene models - BUSCO C:94.7%[S:90.8%,D:3.9%],F:1.9%,M:3.4%

Scaffold No. (incl Mt): 15

N50: 89,130,466

N bases (bp): 321,077

Repeat: 17.14%

Total size (bp) (chr no.) 325,719,615 (4)

Curated: 57x P450, 91x ABC transporter, 44x UGT, and the majority of 107/130 defined IRAC gene models.

One endosymbiont was assembled Buchnera aphidicola (628,098 bp).

Other files

These are files that were not submitted to NCBI but might be useful.

Genomic PFAM annotation track

Non-coding RNA annotation track

Repeat annotation track

Repeat library