Thousands of endoparasitoid wasp species in the families Braconidae and Ichneumonidae harbor "domesticated endogenous viruses" (DEVs) in their genomes. This study focuses on ichneumonid DEVs, named ichnoviruses (IVs). Large quantities of DNA-containing IV virions are produced in ovary calyx cells during the pupal and adult stages of female wasps. Females parasitize host insects by injecting eggs and virions into the body cavity. After injection, virions rapidly infect host cells which is followed by expression of IV genes that promote the successful development of wasp offspring. IV genomes consist of two components: proviral segment loci that serve as templates for circular dsDNAs that are packaged into capsids, and genes from an ancestral virus that produce virions. In this study, we generated a chromosome-scale genome assembly for Hyposotor didymator that harbors H. didymator ichnovirus (HdIV). We identified a total of 67 HdIV loci that are amplified in calyx cells during the wasp pupal stage. We then focused on an HdIV gene, U16, which is transcribed in calyx cells during the initial stages of replication. Sequence analysis indicated that U16 contains a conserved domain in primases from select other viruses. Knockdown of U16 by RNA interference inhibited virion morphogenesis in calyx cells. Genome-wide analysis indicated U16 knockdown also inhibited amplification of HdIV loci in calyx cells. Altogether, our results identified several previously unknown HdIV loci, demonstrated that all HdIV loci are amplified in calyx cells during the pupal stage, and showed that U16 is required for amplification and virion morphogenesis.
Author summary: Parasitoid "domesticated endogenous viruses" (DEVs) provide a fascinating example of eukaryotes acquiring new functions through integration of a viral genome. DEVs consist of multiple loci in the genomes of wasps. Upon activation, these elements collectively orchestrate the production of virions or virus-like particles that are crucial for successful parasitism of host insects. Despite the significance of DEVs for parasitoid biology, the mechanisms regulating key steps in virion morphogenesis are largely unknown. In this study, we focused on the ichneumonid parasitoid Hyposoter didymator, which harbors an ichnovirus consisting of 67 proviral loci. Our findings reveal that all proviral loci are simultaneously amplified in ovary calyx cells of female wasps during the pupal stage suggesting the hijacking of cellular replication complexes by viral proteins. We functionally studied the ichnovirus gene U16, which encodes a protein with a weakly conserved primase C-terminal domain. Silencing U16 inhibited viral DNA amplification and virion production, underscoring the key role of this gene for ichnovirus replication. Our study provides evidence that many genes involved in viral DNA replication have been conserved during the domestication of ichnoviruses.
Endogenous viral elements (EVEs) refer to viral sequences in eukaryotic genomes that originate from complete or partial integration of a viral genome into the germline [[
DEVs are most prevalent among species in two wasp families named the Braconidae and Ichneumonidae. The DEVs identified in these two wasp families have evolved from different viral ancestors but through convergence have been similarly repurposed to produce either virions containing circular double-stranded (ds) DNAs or virus-like particles (VLPs) lacking nucleic acid. The hyperdiverse Microgastroid complex in the family Braconidae harbors DEVs named bracoviruses (BVs) that evolved from a common virus ancestor in the family Nudiviridae [[
BVs have been more studied than IVs but the latter are intriguing because of their uncertain origins. Despite differences in ancestry and gene content, BV and IV genomes are similarly organized into two components that have distinct functions [[
The second component of IV genomes are domains referred to as "proviral segments," which are amplified in calyx cells and produce circular, double-stranded (ds) DNAs that are packaged into capsids [[
The replication of IVs, encompassing the processes leading to the production of virions containing IV segments, occurs within the nuclei of calyx cells during pupal and adult developmental stages [[
In this work, we explored IV DNA replication using the campoplegine wasp H. didymator. We first generated a chromosome-level assembly for the H. didymator genome. Through this assembly, we determined that all genome components undergo local amplification in calyx cells which initiates between pupal stages 1 and 2. Notably, IVSPERs, isolated IV core genes, and proviral segments were amplified in large regions with non-discrete boundaries. Next, we studied the function of U16 which is located on H. didymator IVSPER-3. U16 is one of the most transcribed IVSPER genes during the initial pupal stage and contains a weakly conserved domain found in the C-terminus of primases. RNAi knockdown of U16 inhibited virion formation. Knockdown also significantly reduced DNA amplification of all HdIV genome components, which decreased transcript abundance of IV core genes and the abundance of circular dsDNA viral molecules. We conclude that U16 is an essential gene for amplification of the HdIV genome and virion production, demonstrating that genes from the IV ancestor regulating IV replication have been conserved during virus domestication. Additionally, our results show that viral DNA amplification is essential for IV virion production.
The genome assembly for H. didymator we previously generated [[
The twelve scaffolds contained all HdIV loci previously identified: the replication genes organized in clusters (IVSPER-1 to IVSPER-5), the predicted IV replication gene (U37) located outside of an IVSPER, and 53 of the 54 previously identified proviral segments (Hd1 to Hd54) [[
DIAGRAM: Fig 1 Genome of H. didymator and position of HdIV loci.Position of the HdIV loci previously identified in [[
While three IVSPERs and the majority of proviral segments were distantly located from each other in the H. didymator genome, there were exceptions to this pattern including certain pairs of proviral segments separated by less than 20 kb (e.g., Hd36 and Hd38; Hd46 and Hd43; Hd44.1 and Hd44.2; Hd12 and Hd16 in scaffolds -1, -2 and -11 respectively). In all of these cases, the paired segments exhibited significant homology which suggested they derived from recent duplication events (Fig B in S1 Dataset). Additionally, several proviral segments were in proximity to IVSPERs or IV replication genes that resided outside of IVSPERs (e.g., Hd46 near U37; Hd29 and Hd24 on each side of IVSPER-2; Hd15 near IVSPER-1 in scaffolds -2 and -11 respectively; Fig 1).
Prior results suggested IV proviral segment loci are amplified before processing and excision of individual segments that are packaged into capsids [[
Graph: Fig 2 DNA amplification of HdIV loci.(A) qPCR analysis of selected IVSPER genes in calyx cells during wasp pupal development. Top panel. A schematic representation of H. didymator IVSPERs-1, -2, and -3 (GenBank GQ923581.1, GQ923582.1, and GQ923583.1); genes selected for qPCR assays are highlighted in white. U1-U24 are unknown protein-encoding genes, while IVSPs are members of a gene family encoding ichnovirus structural proteins. Bottom panel. Genomic (g) DNA amplification levels of IVSPER genes and wasp XRCC1 in calyx cells from pupal stage 1–4. The XRCC1 (X-Ray Repair Cross Complementing 1) encoding gene is located 1,200 bp from U1 (position 3,270,470 to 3,272,519 in Scaffold-11). Data correspond to gDNA amplification relative to amplification of the housekeeping gene elongation factor 1 (ELF1). The Y-axis was transformed using the square root function for better data visualization. Statistical test results are available in Table B in S2 Dataset. (B) Read depth of HdIV loci compared to the rest of the wasp genome. Read depth values per analyzed region (see Materials and methods) are presented for each locus type (proviral segments and IVSPERs) at pupal stage 1 (hyaline pupa) and pupal stage 3 (pigmented pupa). The read depths per HdIV locus (HdIV) are compared to the read depth per random genome regions outside of HdIV loci (wasp). Note that the read depth value for random wasp regions is lower for DNA samples collected from stage 3 versus stage 1 pupae. This difference is attributed to the higher proportion of reads mapping to HdIV regions among the total number of reads in stage 3 compared to stage 1. The significance levels are indicated as follows: ns = non-significant, **p<0.01, and ***p<0.001. Statistical test results are available at https://github.com/flegeai/EVE_amplification/blob/main/notebooks/Count_tables.ipynb.
We next investigated whether all or only some HdIV genome components are amplified by isolating DNA from calyx cells from stage 1 pupae, which the preceding qPCR assays detected no amplification, and stage 3 pupae, which showed high levels of amplication. We then generated paired-end libraries, which were sequenced using the Illumina platform, followed by read alignment to the new chromosome-level genome assembly. When analyzing the reads from stage 1 pupae, read depth per HdIV locus did not differ significantly from read depth of randomly selected regions of the same size from the rest of the wasp genome (Fig 2B). In contrast, for stage 3 pupae, read depth of the HdIV loci was significantly higher than that of randomly selected regions from the rest of the wasp genome (Fig 2A, read depth values in S1 Table), suggesting most or all HdIV loci are amplified in pupal stage 3.
While the preceding results suggested all HdIV genome components were amplified in calyx cells, the qPCR results presented in Fig 2A also indicated amplification levels varied with genes in IVSPER-3 exhibiting higher levels of amplification than genes in IVSPER-1 and -2. We therefore also analyzed read depth per position and the ratio of read depths between stage 3 and stage 1 in our Illumina data set (Figs 3 and S1). This analysis indicated that amplification levels of IVSPER loci, determined at the summit of the read depth curve, ranged from 10X for IVSPER-5 in Scaffold-7 to over 200X for IVSPER-3 in Scaffold-3 (S1 Table). Consistent with the findings from our qPCR analysis, IVSPER-3 was more highly amplified than IVSPER-1 and -2. Results from the read depth per position curve further indicated that the peak of amplification occurs toward the middle of each IVSPER (S1 Fig), consistent with our qPCR data genes closer to the boundary of an IVSPER that exhibit lower levels of amplification than genes in the middle of the cluster (Figs 2A, 3 and S1).
Graph: Fig 3 HdIV DNA amplification.DNA amplification in pupal stage 3 was assessed by mapping genomic DNA Illumina reads against the 12 large H. didymator genome scaffolds. In each scaffold, red bars indicate amplified loci, with the intensity of red corresponding to increased values of the counts per million (CPM) ratio between pupal stage 3 and pupal stage 1. The positions of IVSPERs and isolated IV replication genes are indicated by purple squares, while proviral segments are indicated by green circles. For selected HdIV loci, amplification curves (representing the ratio of the CPM values calculated for 10 bp intervals between pupal stage 3 and pupal stage 1) are shown in boxes. Amplification curves for all of the annotated HdIV loci are shown in S1 Fig. Each HdIV locus is indicated in red while 10,000 bp of flanking sequence on each side of the locus is also shown. For proviral segments, loci are defined as the sequence delimited by two direct repeats; IVSPERs are defined as the region between the start and stop codon of the first and last coding sequences in the cluster; isolated IV replication genes are defined by their coding sequence.
Proviral segment loci were relatively more amplified than IV replication gene loci, but were also variable (Figs 3 and S1). For example, the ratio between read depth between stage 3 and stage 1 ranged from 30X for proviral locus Hd40 in Scaffold-6 to over 1,100X for Hd27 in Scaffold-7 (S1 Table) at the summit of the read depth curves. Variability in sequencing depth among proviral loci was consistent with earlier studies indicating that the circularized DNAs packaged into IV capsids are non-equimolar in abundance [[
All proviral segments consistently exhibited a substantial increase in amplification that peaked between the two DRs (as exemplified by Hd14 or Hd12 in S2 Fig). For numerous proviral loci, the reads mapping between the flanking DRs displayed uniform read depth. However, in other cases, peaks with varying read depth were evident (as exemplified by Hd32 or Hd16 in S2 Fig). This differential read depth usually applied to proviral segments that contained more than one pair of DRs, as illustrated by proviral locus Hd11 (Fig 4A) or Hd32 and Hd16 (S2 Fig). Previous studies indicated that Hd11 contains two pairs of DRs, enabling the formation of two nested, circularized segments termed Hd11-1 (formed by recombination between DR1Left (DR1L) and DR1Right (DR1R)) and Hd11-2 (formed by recombination between DR2L and DR2R) (Fig 4A). Reads mapping to the Hd11 locus (bounded by DR1L and DR2R) exhibited three relatively uniform plateaus of different values. Two plateaus corresponded to reads mapping to the predicted locations of Hd11-1 (235X) and Hd11-2 (111X), while the central region with higher read depth (311X) corresponded to reads mapping to both nested segments (Fig 4A). This differential read depth would not be expected if reads mapped only to Hd11 chromosomal DNA. Consequently, the pattern of proviral segment amplification suggested part of the read depth values were due to reads arising from amplification intermediates and/or circularized dsDNAs that were also present in our DNA samples. Some amplified HdIV loci contain both an IVSPER and proviral segments. Two of these loci resided on Scaffold-11 (Hd29, IVSPER-2, Hd24, and Hd33, Hd15, IVSPER-1 (Fig 4B)). For these loci, the amplification curves spanned the length of the amplified region (yellow dotted line in Fig 4B) but were interrupted by peaks corresponding to the length of proviral segments. This pattern suggested amplification levels of the chromosomal form of the proviral segments could correspond to levels comparable to the surrounding region and to the IVSPER amplification curves, but were higher because of reads also mapping to circular dsDNAs or amplification intermediates.
Graph: Fig 4 HdIV amplified regions in Scaffold-11.(A) Detail of the amplified region at the Hd11 locus. Hd11 (red bar) represents two overlapping segments Hd11-1 (defined by left and right DRs, DR1L and DR1R respectively, blue bar) and Hd11-2 (defined by DR2L and DR2R, orange bar). (B) Detail of two other amplified regions containing IVSPERs and HdIV proviral loci. In (B), amplification curves of IVSPERs are highlighted in yellow. In (A) and (B), amplification curves represent the ratio of the CPM values (calculated for 10 bp intervals) obtained in pupal stage 3 compared to pupal stage 1. For each locus, amplification values at the summit of the peaks (bold type) and at the start and end positions of HdIV segments are indicated with arrows. Each amplification curve figure was generated by Integrated Genome Viewer (IGV) [[
Since our read depth data indicated amplified regions were larger than the annotated HdIV loci (Figs 3 and S1), we used the MACS2 peak calling program, originally developed for chromatin immunoprecipitation sequencing experiments, to identify areas in the H. didymator genome that were enriched for reads when compared to a control [[
Graph
Table 1 HdIV loci amplified in calyx cells from stage 3 pupae identified by read mapping and/or the MACS2 algorithm. For each scaffold, the position and size of the HdIV loci are indicated. Loci newly identified in the present work are marked with asterisks. Corresponding amplified regions (i.e., the peak predicted by the MACS2 algorithm) are provided for each locus or groups of loci. Start and end positions delimiting the HdIV loci and the amplified regions detected by MACS2 are indicated. The distance between the start or the end of the amplified region and the locus is presented. Asterisks indicate the HdIV loci identified in the present work. For each HdIV locus and amplified region detected by MACS2, read depth values are provided for calyx cell samples collected from stage 1 or stage 3 pupae. Read depth is based on the length of the HdIV locus or the amplified region. ND indicates amplified regions not detected by MACS2.
Annotated proviral locus Amplified region (MACS2 peak prediction) Scaffold HdIV locus Locus start Locus end Locus size (bp) Read depth Pupal stage 1 Read depth Pupal stage 3 MACS2 Peak name Predicted region start Distance from locus start (bp) Predicted region end Distance from locus end (bp) Region size (bp) Read depth Pupal stage 1 Read depth Pupal stage 3 Scaffold-1 Hd26 13,523,617 13,528,634 5,017 3 723 Hd26 13,517,317 6,300 13,534,392 5,758 17,075 2 217 Hd38 20,666,612 20,670,275 3,663 3 1,090 Hd38+Hd36 20,656,417 10,195 20,681,709 11,434 25,292 2 246 Hd36 20,670,791 20,674,528 3,737 3 501 14,374 7,181 Hd50 27,247,704 27,253,490 5,786 2 123 Hd50 27,241,046 6,658 27,259,533 6,043 18,487 2 42 Scaffold-2 Hd49 2,124,324 2,129,588 5,264 2 505 Hd49 2,117,165 7,159 2,135,110 5,522 17,945 2 155 U37 2,735,040 2,736,878 1,838 3 88 U37+Hd46+Hd43 2,728,207 6,833 2,758,008 21,130 29,801 3 132 Hd46 2,737,536 2,741,644 4,108 2 188 9,329 16,364 Hd43 2,744,162 2,748,320 4,158 3 365 15,955 9,688 Hd22 5,839,467 5,843,644 4,177 2 152 Hd22 5,832,207 7,260 5,850,065 6,421 17,858 2 40 Hd44.1 16,284,645 16,287,653 3,008 3 218 Hd44.1+Hd44.2 16,278,981 5,664 16,302,690 15,037 23,709 2 60 Hd44.2 16,289,605 16,294,435 4,830 3 122 10,624 8,255 Scaffold-3 Hd30 419,458 423,621 4,163 3 1,502 Hd30 410,434 9,024 429,813 6,192 19,379 3 336 Hd48 16,689,745 16,699,417 9,672 3 184 Hd48 16,679,121 10,624 16,706,535 7,118 27,414 2 70 Scaffold-4 Hd19 6,231,095 6,235,534 4,439 3 428 Hd19 6,229,498 1,597 6,243,248 7,714 13,750 2 143 Hd41 12,119,882 12,127,834 7,952 2 618 Hd41 12,113,372 6,510 12,135,096 7,262 21,724 2 233 Hd45.2 12,329,917 12,331,967 2,050 1 83 Hd45.2 12,321,418 8,499 12,340,715 8,748 19,297 2 18 Hd45.1 15,910,208 15,914,421 4,213 1 186 ND ND ND ND ND ND ND Scaffold-5 Hd13 5,876,265 5,882,021 5,756 3 385 Hd13 5,869,105 7,160 5,889,886 7,865 20,781 2 112 Scaffold-6 Hd40 2,988,772 2,992,265 3,493 2 54 Hd40 2,983,699 5,073 2,997,266 5,001 13,567 2 16 Hd9 7,229,774 7,247,665 17,891 2 148 Hd9 7,223,110 6,664 7,254,609 6,944 31,499 3 87 U42* 9,084,542 9,085,630 1,088 3 20 U42* 9,075,600 8,942 9,095,671 10,041 20,071 2 8 Hd1 11,708,977 11,723,747 14,770 7 284 Hd1 11,702,950 6,027 11,728,752 5,005 25,802 5 168 Hd31-34 11,868,292 11,872,410 4,118 3 945 Hd31-34 11,862,430 5,862 11,878,832 6,422 16,402 2 242 Scaffold-7 U38* 4,768,418 4,770,145 1,727 2 30 U38* 4,758,477 9,941 4,776,797 6,652 18,320 2 15 Hd27 6,697,648 6,701,649 4,001 3 2,407 Hd27 6,688,177 9,471 6,708,677 7,028 20,500 2 479 Hd52* 6,772,399 6,776,469 4,070 2 141 Hd52* 6,765,613 6,786 6,783,734 7,265 18,121 2 35 Hd5 7,084,945 7,098,657 13,712 2 220 Hd5 7,078,958 5,987 7,104,504 5,847 25,546 2 124 Hd47 7,900,815 7,905,317 4,502 3 248 Hd47 7,894,307 6,508 7,912,018 6,701 17,711 2 67 IVSPER-5 9,178,274 9,179,903 1,629 3 22 IVSPER-5 9,168,108 10,166 9,191,030 11,127 22,922 2 9 IVSPER-3 9,252,903 9,278,334 25,431 2 342 IVSPER-3 9,243,166 9,737 9,287,918 9,584 44,752 2 221 U39* 11,747,266 11,750,520 3,254 3 92 U39* 11,738,062 9,204 11,764,025 13,505 25,963 3 33 IVSPER-4 13,191,258 13,207,069 15,811 2 61 IVSPER-4 13,176,550 14,708 13,216,107 9,038 39,557 3 33 Hd7 13,968,918 13,976,983 8,065 2 157 Hd7 13,965,551 3,367 13,978,603 1,620 13,052 2 98 Hd2 14,085,672 14,099,608 13,936 3 373 Hd2 14,079,432 6,240 14,106,161 6,553 26,729 2 205 Hd6 14,221,056 14,231,516 10,460 3 473 Hd6 14,218,171 2,885 14,238,167 6,651 19,996 3 254 Hd2-like 14,388,343 14,389,879 1,536 3 528 ND ND ND ND ND ND ND Scaffold-8 Hd20 1,942,407 1,949,270 6,863 2 189 Hd20 1,934,960 7,447 1,956,552 7,282 21,592 2 65 Hd8 6,823,745 6,831,100 7,355 2 404 Hd8 6,818,165 5,580 6,837,954 6,854 19,789 2 156 Hd4 7,097,009 7,107,334 10,325 2 780 Hd4 7,090,110 6,899 7,114,242 6,908 24,132 2 340 Hd17 9,012,106 9,019,835 7,729 2 342 Hd17 9,006,318 5,788 9,026,009 6,174 19,691 2 138 Hd54* 9,214,207 9,220,125 5,918 2 319 Hd54* 9,208,814 5,393 9,228,042 7,917 19,228 2 104 Hd18 9,364,794 9,369,489 4,695 2 771 Hd18 9,360,612 4,182 9,376,778 7,289 16,166 2 229 Scaffold-9 U40* 5,723,866 5,725,782 1,916 3 37 U40+Hd39 5,714,059 9,807 5,757,032 31,250 42,973 2 37 Hd39 5,744,955 5,749,076 4,121 2 243 30,896 7,956 U41* 7,049,476 7,051,946 2,470 2 22 U41 7,040,265 9,211 7,060,018 8,072 19,753 2 11 Scaffold-10 Hd23.2 3,553,203 3,556,564 3,361 3 90 Hd23.2 3,545,352 7,851 3,561,495 4,931 16,143 3 27 Hd23.1 3,592,527 3,596,983 4,456 2 229 Hd23.1 3,587,203 5,324 3,605,012 8,029 17,809 2 63 Hd25 9,399,172 9,403,345 4,173 3 496 Hd25 9,393,559 5,613 9,412,007 8,662 18,448 2 117 Scaffold-11 Hd10 453,966 460,472 6,506 2 403 Hd10 447,712 6,254 465,323 4,851 17,611 2 154 Hd11 768,148 777,337 9,189 2 361 Hd11 764,311 3,837 783,916 6,579 19,605 4 175 Hd16 2,262,941 2,270,644 7,703 2 1,298 Hd16+Hd12 2,257,264 5,677 2,292,899 22,255 35,635 2 318 Hd12 2,280,031 2,285,932 5,901 2 177 22,767 6,967 Hd29 3,146,273 3,150,628 4,355 3 587 Hd29+IVSPER-2+Hd24 3,139,131 7,142 3,193,136 42,508 54,005 2 153 IVSPER-2 3,155,228 3,181,838 26,610 2 153 16,097 11,298 Hd24 3,182,748 3,187,444 4,696 3 196 3,128,743 5,692 Hd33 3,234,245 3,238,079 3,834 3 244 Hd33+Hd15+IVSPER-1 3,229,681 4,564 3,277,203 39,124 47,522 2 115 Hd15 3,249,051 3,254,037 4,986 3 363 19,370 23,166 IVSPER-1 3,255,266 3,269,285 14,019 2 141 25,585 7,918 Hd14 3,759,577 3,764,772 5,195 2 325 Hd14 3,753,870 5,707 3,770,377 5,605 16,507 2 107 Hd32 3,811,943 3,819,858 7,915 2 402 Hd32 3,807,216 4,727 3,823,379 3,521 16,163 2 200 Hd42 6,043,764 6,046,920 3,156 2 184 Hd42 6,039,490 4,274 6,052,307 5,387 12,817 2 48 Hd21 6,076,348 6,080,715 4,367 2 312 Hd21 6,067,122 9,226 6,086,904 6,189 19,782 2 75 Hd53* 6,184,778 6,190,126 5,348 3 178 Hd53* 6,177,710 7,068 6,199,505 9,379 21,795 2 49 Hd3 6,685,111 6,695,124 10,013 2 807 Hd3 6,678,585 6,526 6,701,155 6,031 22,570 2 367 Hd37 6,786,206 6,789,913 3,707 2 220 Hd37 6,779,687 6,519 6,795,693 5,780 16,006 2 55 Scaffold-12 Hd28 308,843 313,456 4,613 2 40 Hd28 305,632 3,211 316,324 2,868 10,692 2 19 Hd35 2,941,876 2,945,585 3,709 3 1,062 Hd35 2,935,430 6,446 2,954,029 8,444 18,599 2 219
Our results also indicated all amplified regions in the H. didymator genome containing HdIV loci consist of the annotated HdIV locus along with flanking wasp sequences consistent with our detailed analysis of the wasp gene XRCC1 that is located in close proximity to IVSPER-1 (Fig 2B). Across all HdIV loci, we determined that the flanking regions containing wasp sequences that were amplified varied from 7,000 to 15,000 bp (Table 1). The total size of the amplified regions ranged from 10,692 bp (Hd28 on Scaffold-12) to 54,005 bp (IVSPER-2 on Scaffold-11). Most amplified regions contained a single HdIV locus, but seven contained a mix of HdIV genome components (Table 1). Three amplified regions contained the neighboring and closely related proviral segments mentioned above (e.g., Hd36 and Hd38 on Scaffold-1, Hd44.1 and Hd44.2 on Scaffold-2, Hd12 and Hd16 on Scaffold-11). In addition to the two examples noted above on Scaffold 11 (see Fig 4B), two other amplified regions also contained both IVSPERs and proviral segments (U37, Hd46, and Hd43 on Scaffold-2; U40 and Hd39 on Scaffold-9). The simultaneous amplification of both types of HdIV loci within the same amplicons suggested regulation by shared molecular mechanisms. In turn, we further hypothesized similar molecular mechanisms also regulate the amplification of isolated proviral segments and IVSPERs.
Lastly, we searched for sequence signatures that potentially identify the amplification boundaries for each HdIV locus. However, our analysis identified only low-complexity A-tract sequences, which were not specific to HdIV components as they were also found in random non-amplified wasp genomic sequences (S3 Fig). Thus, no motifs were identified that distinguished the amplification boundaries of HdIV loci.
We selected the gene U16 located on H. didymator IVSPER-3 as a factor with potential functions in activating IV replication. U16 is conserved among all IV-producing wasps for which genome or transcriptome data are available (Fig 5). In H. didymator calyx cells, U16 is also one of the most transcribed IV genes detected in calyx cells from stage 1 pupae [[
MAP: Fig 5 U16 proteins analysis.U16 proteins were identified in the campoplegine Hyposoter didymator [[
Graph: Fig 6 RNAi knockdown of U16.(A) RT-qPCR data showing relative expression of U16 in dsGFP (control) and dsU16 injected females. ** p<0.01. Images of ovaries dissected from newly emerged adult females that were injected with dsGFP (left) or dsU16 (right). Note the blue color in the oviduct of the dsGFP control indicating the presence of HdIV virions. (B) Schematics and electron micrographs showing that (a) calyx cell nuclei (N) from females treated with dsGFP-injected contain subvirions (V) while (b) calyx cell from a dsU16-injected wasps do not. This results in no accumulation of virions in the calyx lumen as illustrated in the schematic images. Abbreviations: CL, calyx lumen; Cyt, cytoplasm. Scale bars = 5 μm, zooms = 1 μm.
Since U16 contained a domain found in primases, we investigated whether RNAi knockdown also disabled amplification of HdIV genome components. We injected newly pupated wasps with dsU16 or dsGFP, followed by isolation and deep sequencing of calyx cell DNA from stage 3 pupae in three independent replicates. Mapping the reads from dsGFP-treated calyx samples to the H. didymator genome indicated all HdIV loci were amplified as evidenced by higher read depth values when compared to random regions of the wasp genome (Fig 7A). Silencing U16 significant decreased read depth values when compared to dsGFP samples for both IVSPER and proviral segment loci (Fig 7A). However, read depth values remained higher for HdIV loci compared to other regions of the wasp genome in dsU16-treated samples (Fig 7A), suggesting that dsU16 treatment did not completely abolish U16 expression across the three replicates. Upon individual analysis of each HdIV locus, we observed that read depth for every HdIV genome component (including IVSPERs, isolated IV replication genes, or HdIV proviral segments) was lower in the dsU16-treated samples compared to the dsGFP-treated samples (Fig 7B and 7C and S3 Table; statistical analyses at https://github.com/flegeai/EVE_amplification).
Graph: Fig 7 Impact of U16 RNAi knockdown on DNA proviral amplification.(A) Comparative distribution of read depths in dsGFP- and dsU16-injected females. For each of the three replicates, read depth values are given for HdIV loci (V) and random genomic regions outside of HdIV loci (W) with equal size distribution. IVSPERs and IV replication genes loci are shown in the left panel, while proviral segment loci are shown in the right panel. (B) Read depth values for IVSPERs and IV replication genes residing outside an IVSPER in three biological replicates treated with dsU16- or dsGFP. Names of HdIV loci and the scaffold (Scaf-) they are located are indicated. (C) Read depth values for proviral segment loci in the three biological replicates of the dsU16 and dsGFP samples. For better visualization, only the scaffold (Scaf-) on which the proviral segments are located is indicated. The list of the proviral segment loci within each scaffold is available in Table 1. The y-axis was transformed by the log function for better data visualization. In (A), significance levels are indicated as follows: **p<0.01, and ***p<0.001. In (B) and (C), all differences between dsU16 and dsGFP samples are statistically significant at p<0.05. Statistical test results are available at https://github.com/flegeai/EVE_amplification/blob/main/notebooks/Count_tables.ipynb.
We extended our analysis by injecting dsGFP or dsU16 into newly formed pupae followed by isolation of DNA from calyx cells and hind legs where no HdIV replication occurs. We then used specific primers and qPCR assays that measured DNA abundance of three wasp genes, selected HdIV replication genes inside and outside of IVSPERs, and selected HdIV genes in different proviral segments. As anticipated, no genes were amplified in hind legs from either control or treatment wasps (Fig 8 and Table D in S2 Dataset). In dsGFP-injected control wasps, all HdIV genes were amplified in calyx cell samples (Fig 8 and Table D in S2 Dataset). Among the wasp genes, only XRCC1 exhibited significant amplification, consistent with its location within the IVSPER-1 amplified region (Fig 8 and Table D in S2 Dataset). In contrast, when examining calyx cell DNA from wasps injected with dsU16, none of the HdIV genes nor XRCC1 were amplified (Fig 8 and Table D in S2 Dataset). Altogether, our results indicate that U16 is required for amplification of all HdIV loci.
Graph: Fig 8 Impact of U16 RNAi knockdown on amplification of selected wasp and HdIV genes.Relative genomic amplification of select HdIV genes in two-day-old females injected with dsGFP or dsU16. The wasp gene XRCC1, located within the amplified region of the IVSPER-1 locus, was incorporated into the analysis. Wasp histone (H1) and ribosomal protein (rpl) genes served as controls. Samples were obtained from calyx cells (where virion are produced) and hind legs (control). Statistical significance levels are denoted as follows: ns = non-significant, *p<0.05, **p<0.01, and ***p<0.001. Statistical test results are available in Table D in S2 Dataset. The y-axis values were transformed using the square root function for better data visualization.
We hypothesized that amplification of IV replication genes would increase transcript abundance which in turn would be affected by inhibiting HdIV DNA amplification. We thus compared transcript abundance of various genes in IVSPER-1, -2, and -3, in calyx RNA samples that were collected from wasps treated with dsU16 or dsGFP. U16 knockdown reduced the expression of every HdIV replication gene we examined (Fig 9A and Table F in S2 Dataset).
Graph: Fig 9 Impact of U16 RNAi knockdown on HdIV replication gene expression and proviral segment amplification.(A) Relative expression of nine IVSPER genes in 2-day-old adult females injected with dsGFP (control) or dsU16. (B) Relative DNA amplification of the integrated linear (proviral) and circularized (episomal) forms of viral segment Hd29 in 2-day-old adult females injected with dsGFP (control) or dsU16. The left panel illustrates the position of primer pairs designed to selectively amplify the proviral form (Proviral Left and Right, indicated by red and black arrows), the circularized form (Episomal, red arrows), or both (Proviral + Episomal, brown arrows). Note that the primers designed to amplify circular molecules may also anneal to concatemers generated during segment amplification and processing. The right panel presents the relative amplification of each form using DNA from dsGFP- and dsU16-injected females. In both (A) and (B), significance levels are indicated as follows: ns = non-significant, *p<0.005, **p<0.01, and ***p<0.001. Statistical test results are available in Tables F (A) and H (B) in S2 Dataset. The y-axis values were transformed using the square root function for better data visualization.
Finally, we investigated the impact of U16 knockdown on the abundance of the circularized dsDNAs that are processed from amplified proviral segments. For this assay, we used PCR primers that specifically amplified the proviral form, circularized (episomal) or concatemeric form or both forms of Hd29 (Fig 9B and Table H in S2 Dataset). Results showed a significant reduction in both the proviral and circularized/concatemeric forms of Hd29 in calyx cell DNA from wasps injected with dsU16 when compared to DNA from wasps injected with dsGFP (Fig 9B and Table H in S2 Dataset). Our results thus indicated U16 is required for proviral segment amplification which is also required for production of circularized segments.
This study significantly advances our understanding of IV replication by generating a chromosome-level assembly for the H. didymator genome, presenting several lines of evidence on the timing of HdIV loci amplification in calyx cells, and identifying U16 as an essential gene for amplification of all HdIV loci and virion formation. Read mapping to genomic DNA extracted from H. didymator pupal stages 1 and 3 further shows that all HdIV genome components are concurrently amplified in calyx cells while our qPCR data indicate amplification begins between pupal stage 1 and stage 2 and reaches its maximum in stage 4. This temporal pattern is overall similar to what has previously been shown to occur for BVs [[
A second important advance from this study is our finding that HdIV loci are amplified with non-discrete boundaries that extend variable distances into flanking wasp DNA. This result differs from other integrated viruses such as polyomavirus where DNA undergoes "onion skin" replication, which is distinguished by replication forks that terminate at discrete boundaries [[
Our study also highlights that amplification of proviral segment loci is associated with a significant increase in read depth at the Direct Repeat (DR) positions bordering the proviral segments, which serve as sites for homologous recombination and circularization of the segments. This suggests that a portion of the rapid increase in read depth is due to reads mapping to amplification intermediates and circularized segments. The presence of concatemeric intermediates and circularized HdIV DNAs in the genomic DNA samples we sequenced is supported by our qPCR results for segment Hd29 (Fig 9B). Quantifying the proportion of reads mapping to the chromosomal form of HdIV segments was not possible in this study because our paired-end read data cannot discriminate between chromosomal HdIV DNA, potential replication intermediates, or circularized DNA. Long-read data as generated for some BVs [[
Our interest in U16 stemmed from previous results indicating it is transcriptionally upregulated in calyx cells before the appearance of envelope and capsid components [[
BVs share some features with IVs but also exhibit differences. Notably, in contrast to IVs, where most core genes with functions in virion morphogenesis reside in IVSPERs, many BV core replication genes are widely dispersed in the genomes of wasps [[
BV genomes also contain proviral segment loci with boundaries defined by flanking DRs and amplified in regions that include flanking regions outside of each DR. However, unlike IV proviral segments, the amplified flanking regions in BVs contain very precise nucleotide junctions that identify the boundaries of amplification [[
Collectively, our results identify U16 as a gene deriving from the IV ancestor that is required for HdIV DNA replication. This indicates that viral regulatory factors required for DNA amplification other than U16 have been preserved in parasitoid genomes, which may also interact with wasp cellular machinery in regulating viral replication. Furthermore, this work emphasizes the value of studying original endogenized viruses, such as those found in parasitoids, to unveil new regulators of DNA processing.
H. didymator was reared as previously outlined by [[
DNA from 10 male offspring (i.e., haploid genomes) from a single female H. didymator was sent on dry ice to Dovetail Genomics for Omni-C library construction. In the process of constructing the Dovetail Omni-C library, chromatin was fixed in place within the nucleus using formaldehyde and subsequently extracted. The fixed chromatin was digested with DNAse I followed by repair of chromatin ends and ligation to a biotinylated bridge adapter. Proximity ligation of adapter-containing ends ensued. Post-proximity ligation, crosslinks were reversed, and the DNA was purified. The purified DNA underwent treatment to eliminate biotin not internal to ligated fragments. Sequencing libraries were generated utilizing NEBNext Ultra enzymes and Illumina-compatible adapters. Fragments containing biotin were isolated using streptavidin beads before PCR enrichment of each library. The library was sequenced using the Illumina HiSeqX platform, which generated approximately 30x read depth. Subsequently, HiRise utilized reads with a mapping quality greater than 50 (MQ>50) for scaffolding purposes.
The de novo assembly from [[
Comparative analysis of two pupal stages. Genomic DNA (gDNA) was extracted from pooled calyx samples dissected from H. didymator female pupae at stage 1 (~60 females) and stage 3 (~50 females). Since the aim was to compare the two developmental pupal stages, a single replicate was done for each stage. Impact of U16 knockdown. Genomic DNA from calyces was collected from stage 3 female pupae that were injected with dsGFP and dsU16. This experiment involved three biological replicates, each corresponding to 30 to 50 calyx samples. Genomic DNA was extracted using the phenol-chloroform method. Briefly, calyx samples were incubated in proteinase K (Ambion, 0.5 μg/μl) and Sarkosyl detergent (Sigma, 20%), followed by treatment with RNAse (Promega, 0.3 μg/μl). Total genomic DNA was then extracted through phenol-chloroform extraction and ethanol precipitation. Following extraction, gDNA was quantified using a QBIT fluorometer (ThermoFisher) and subsequently sent for sequencing to Genewiz/Azenta company. Paired-end sequencing was carried out using Illumina technology and NovaSeq 2x150bp platform.
Illumina reads were aligned to the updated version of the H. didymator genome using bwa mem [[
Amplification peaks were identified using MACS2 [[
Read depths were computed for various types of virus-related genomic regions, including each locus (IVSPERs, IV replication genes outside IVSPERs, proviral segments), and each MACS2-detected amplified region. To calculate read depth for non-viral regions of the wasp genome, one hundred sets of random regions mimicking the size distribution of HdIV loci were generated using the shuffle tool from bedtools version 2.27 [[
Genome read depths per position in 10 bp bins were acquired using the BamCoverage tool from the deeptools package [[
The MEME suite [[
To assess the level of DNA amplification, total genomic DNA (gDNA) was extracted using the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's protocol. Ovaries (ovarioles removed) and hind legs, representing the negative control, were dissected from ten pupae at four different stages. Three replicates were generated for each pupal stage. Quantification of target gene amplification was conducted through quantitative PCR, utilizing LightCycler 480 SYBR Green I Master Mix (Roche) in 384-well plates (Roche). The total reaction volume per well was 3 μl, comprising 1.75 μl of the reaction mix (1.49 μl SYBR Green I Master Mix, 0.1 μl nuclease-free water, and 0.16 μl diluted primer), and 1.25 μl of each gDNA sample diluted to achieve a concentration of 1.2 ng/μl. Primers used are listed in S4 Table. The gDNA levels corresponding to the viral genes and the housekeeping wasp gene (elongation factor (ELF-1)) were determined using the LightCycler 480 System (Roche). The cycling conditions involved heating at 95°C for 10 min, followed by 45 cycles of 95°C for 10 s, 58°C for 10 s, and 72°C for 10 s. Each sample was evaluated in triplicate. The obtained DNA levels were normalized with respect to the wasp gene ELF-1. Raw data are provided in S2 Dataset.
Total RNA was extracted from ovaries (ovarioles removed) dissected from pupae at different stages using the Qiagen RNeasy extraction kit in accordance with the manufacturer's protocol. To control for gene silencing, total RNAs were also extracted from individual adult wasp abdomens (2 to 4 days old). For this, Trizol reagent (Ambion) was initially used followed by extraction using the NucleoSpin RNA kit (Macherey-Nagel). Isolated RNA was then subjected to DNase treatment using the TURBO DNA-free Kit (Life Technologies) to assure removal of any residual genomic DNA from the RNA samples.
Conserved domains of U16 were identified using the CD-search tool available through NCBI's conserved domain database resource [[
Gene-specific double-stranded RNA (dsRNA) used for RNAi experiments was prepared using the T7 RiboMAX Express RNAi System (Promega). Initially, a 350–450 bp fragment corresponding to the U16 sequence was cloned into the double T7 vector L4440 (a gift from Andrew Fire, Addgene plasmid # 1654). Subsequently, an in vitro transcription template DNA was PCR amplified with a T7 primer, and this template was used to synthesize sense and antisense RNA strands with T7 RNA polymerase at 37°C for 5 hours. The primers used for dsRNA production are listed in S4 Table. After annealing and DNase treatment using the TURBO DNA-free Kit (Life Technologies), the purified dsRNAs were resuspended in nuclease-free water, quantified using a NanoDrop ND-1000 Spectrophotometer (Thermo Scientific), and examined by agarose gel electrophoresis to ensure their integrity. Injections were performed in less than one-day-old female pupae using a microinjector (Fentojet Express, Eppendorf) and a micromanipulator (Narishige). Approximately 0.3–0.6 μl of 500 ng/μl dsRNA was injected into each individual. Control wasps were injected with a non-specific dsRNA homologous to the green fluorescent protein (GFP) gene. Treated pupae were kept in an incubator until adult emergence, which occurred approximately 5 days after injection.
Ovaries were dissected from adult wasps between 2 and 3 days after emergence, following the procedures outlined in [[
For RT-qPCR assays, 400 ng of total RNA was reverse-transcribed using the SuperScript III Reverse Transcriptase kit (Life Technologies) and oligo(dT)15 primer (Promega). The mRNA transcript levels of selected IVSPER genes were measured by quantitative reverse transcription-PCR (qRT-PCR) using a LightCycler 480 System (Roche) and SYBR Green I Master Mix (Roche). Expression levels were normalized relative to a housekeeping wasp gene (elongation factor 1 ELF-1). Each sample was evaluated in triplicate, and the total reaction volume per well was 3 μl, including 0.5 μM of each primer and cDNA corresponding to 0.88 ng of total RNA. The amplification program consisted of an initial step at 95°C for 10 min, followed by 45 cycles of 95°C for 10 s, 58°C for 10 s, and 72°C for 10 s. The primers used for this analysis are listed in S4 Table.
Data were acquired using Light-Cycler 480 software. PCR amplification efficiency (E) for each primer pair was determined by linear regression of a dilution series (5x) of the cDNA pool. Relative expression, using the housekeeping gene ELF-1 as a reference, was calculated through advanced relative quantification (Efficiency method) software provided by Light-Cycler 480 software. For statistical analyses, Levene's and Shapiro-Wilk tests were employed to verify homogeneity of variance and normal distribution of data among the tested groups. Differences in gene relative expression between developmental stages and between dsGFP and dsU16-injected females were assessed using a two-tailed unpaired t-test for group comparison. In cases where homogeneity of variance was not assumed, the Welch-test was used to compare gene relative expression between groups. A p-value < 0.05 was considered significant. All statistical analyses were conducted using R [[
S1 Dataset
Hyposoter didymator Hi-C genome assembly.
The dataset includes: Fig A. Figure depicting the Hi-C scaffold contact map; Table A. Table presenting the Hi-C scaffolds containing HdIV loci; Fig B. Figure displaying the pairwise comparisons of HdIV segments located in close proximity within the H. didymator scaffolds.(DOCX)
S2 Dataset
Raw data and statistical analyses of qPCR analyses.
The dataset includes raw data and statistical analyses for: Genomic DNA amplification of IVSPER genes at four different H. didymator pupal stages, Table A. Raw data and Table B. Statistical analyses; Genomic DNA amplification of IVSPER and HdIV segment genes in dsGFP and dsU16-injected wasps, Table C. Raw data and Table D. Statistical analyses; RNA quantification of IVSPER genes in dsGFP and dsU16-injected wasps, Table E. Raw data and Table F. Statistical analyses; DNA amplification of Hd29 segment in dsGFP and dsU16-injected wasps, Table G. Raw data and Table H. Statistical analyses.(DOCX)
S3 Dataset
Sequence analysis and alignment of the U16 gene from H. didymator to four other wasp species that harbor IVs.
The dataset includes: Table A. Raw sequence of U16 proteins. Table B. Position of the PriCT-domain in U16 proteins. Fig A. Detail of the predicted secondary structure of the PricT-2 domain in the H. didymator U16 protein. Fig B. Subcellular localization of U16 predicted by DeepLoc 2.0.(DOCX)
S1 Table
Read depth of HdIV loci on each scaffold of the H. didymator genome.
(DOCX)
S2 Table
List of the peaks predicted in H. didymator genome scaffolds using MACS2 algorithm.
(DOCX)
S3 Table
Read depth of HdIV amplified regions in calyx cell DNA from dsGFP- and dsU16-injected female pupae.
(DOCX)
S4 Table
List of primers used in the present work.
(DOCX)
S1 Fig
DNA amplification patterns of HdIV loci in calyx cells of H. didymator.
(DOCX)
S2 Fig
HdIV amplified regions in Scaffold-11.
(DOCX)
S3 Fig
MEME analysis of boundaries of the predicted MACS2 HdIV amplified regions.
(DOCX)
The insects used in the experiments were provided by Raphaël BOUSQUET and Gaétan CLABOTS from the DGIMI insect rearing facility. The authors also thank Dr Jean-Claude OGIER for his assistance in generating the phylogenetic tree of U16 proteins. All RNAi experiments were conducted in the insect quarantine platform (PIQ) of DGIMI lab, which is a member of the Montpellier Vectopole Sud network (https://
By Ange Lorenzi; Fabrice Legeai; Véronique Jouan; Pierre-Alain Girard; Michael R. Strand; Marc Ravallec; Magali Eychenne; Anthony Bretaudeau; Stéphanie Robin; Jeanne Rochefort; Mathilde Villegas; Gaelen R. Burke; Rita Rebollo; Nicolas Nègre and Anne-Nathalie Volkoff
Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author