Collected 1988 from South Carolina USA but most recent collections used in this study are from Athens, Georgia, USA.
Next Generation Sequencing
i) Illumina 10X genomics 150 bp paired end data:
197,813,254 reads and 29,671,988,100 bp, with a coverage of 78.
ii) Hi-C 100 bp paired end data, totalling 413,998,784 reads and 41,399,878,400, with a coverage of 108.
iii) Nanopore data, of mean read length 6,633, total reads 2,167,276, read length N50 13,884, and total bases 14,376,879,363. DNA was extracted using DNAzol (2000ng) at Rothamsted.
iv) HiFi PacBio data, of mean read length 16,665, total reads 247,344, read length N50 19,496, and total bases 4,122,025,904. DNA was extracted using the MagAttract kit (3000ng) at University of Delaware.
PGIa individual version: Single individual DNA used for Nanopore and 10X genomics sequencing (Rothamsted and Georgia genomics respectively). Masurka was used to assemble the nanopore with 10X genomics. PGIb individual version: HiFi reads x6 coverage were assembled using HiFiasm. Both individual assemblies: Haplotigs were removed (redundans). Unmapped reads were mapped back to the original assembly to check for missing sequence and incorporated into the final assembly. Manual curation was done to bring the genome together and check for miss-assemblies. Juicer then 3d-dna was used with Hi-C data to a produce chromosome level assembly. 10X data was used with Freebayes for error correction.
PGI Syngenta sourced RNA-seq data were assembled into a transcriptome (BUSCO: C:99.0%[S:64.9%,D:34.1%],F:0.2%,M:0.8%) and used in the Maker2 annotation pipeline with trained Augustus and Genemark gene predictors. PASA was used to update the gene models to add UTR, correct existing models and add isoforms. Non-coding RNA was annotated using infernal v1.1.4.
A Pfam genomic track was created by converting to six reading frames and utilizing hmmer to identify loci of interest i.e. P450 pfam domains on the genome. Using this information, loci of interest including UDP, P450, ABC and IRAC gene models were found and curated using mapped RNA-seq.