Species: Chrysodeixis includens (Soybean looper)
C. includens is found in north and south America predominantly a pest of soybean. In Puerto Rico, C. includens can be a severe pest of tomatoes, soyabean, other bean crops, sunflowers and aubergine. Damage to tomato fruit can exceed 90% with total defoliation being common in heavy infestations. Insecticide resistance is widespread in this pest resulting in control failures with the commonly used insecticides including thiodicarb and permethrin.
C. includens will also feed on kudzu (Pueraria lobata), a forage plant introduced into the USA and Puerto Rico as cattle fodder and a plant to stabilize eroded or unstable hillsides. This plant is now considered as a pest in both the USA and Puerto Rico and there has been an effort in some states to use C. includens as a control measure.
The early larval stages of this pest (first to third instars) feed on the palisade layer of leaf indersides, leaving the upper leaf cuticle and wax layer intact. This appears as a clear, irregular area on the leaves resembling a window, resulting in feeding damage sometimes being referred to as 'window paneing'. Larger instars will feed on the entire leaf, and heavy infestations can completely defoliate entire plants.
Collected 1988 from South Carolina USA but most recent collections used in this study are from Athens, Georgia, USA.
Next Generation Sequencing
i) Illumina 10X genomics 150 bp paired end data:
197,813,254 reads and 29,671,988,100 bp, with a coverage of 78.
ii) Hi-C 100 bp paired end data, totalling 413,998,784 reads and 41,399,878,400, with a coverage of 108.
iii) Nanopore data, of mean read length 6,633, total reads 2,167,276, read length N50 13,884, and total bases 14,376,879,363. DNA was extracted using DNAzol (2000ng) at Rothamsted.
iv) HiFi PacBio data, of mean read length 16,665, total reads 247,344, read length N50 19,496, and total bases 4,122,025,904. DNA was extracted using the MagAttract kit (3000ng) at University of Delaware.
PGIa individual version: Single individual DNA used for Nanopore and 10X genomics sequencing (Rothamsted and Georgia genomics respectively). Masurka was used to assemble the nanopore with 10X genomics. PGIb individual version: HiFi reads x6 coverage were assembled using HiFiasm. Both individual assemblies: Haplotigs were removed (redundans). Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Manual curation was done to bring the genome together and check for miss-assemblies. Juicer then 3d-dna was used with Hi-C data to a produce chromosome level assembly. 10X data was used with Freebayes for error correction.
PGI Syngenta sourced RNA-seq data were assembled into a transcriptome (BUSCO: C:99.0%[S:64.9%,D:34.1%],F:0.2%,M:0.8%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using infernal v1.1.4.
A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq.
Two complete annotated 31 chromosome assemblies from two different individuals deposited at NCBI under accession PRJEB38103 (incl. raw data).
BUSCO (Insecta odb10): C:98.1%,F:0.3%,M:1.6%
13,334 gene models (PGIa) - BUSCO C:95.9%[S:89.5%,D:6.4%],F:0.6%,M:3.5%
Scaffold No. (incl Mt): 85
N bases (bp): 89,238
Total size (bp) (chr no.): 379,968,472 (31)
Curated: 108x P450, 49x ABC transporter, 26x UGT, and the majority of 107/130 defined IRAC gene models.
These are files that were not submitted to NCBI but might be useful.