SARS-CoV-2 genome annotation revealed the presence of 10 open reading frames (ORFs), of which the last one (ORF10) is positioned downstream of the N gene. It is a hypothetical gene, which was speculated to encode a 38 aa protein. This hypothetical protein does not share sequence similarity with any other known protein and cannot be associated with a function. While the role of this ORF10 was proposed, there is growing evidence showing that the ORF10 is not a coding region. Here, we identified SARS-CoV-2 variants in which the ORF10 gene was prematurely terminated. The disease was not attenuated, and the transmissibility between humans was maintained. Also, in vitro, the strains replicated similarly to the related viruses with the intact ORF10. Altogether, based on clinical observation and laboratory analyses, it appears that the ORF10 protein is not essential in humans. This observation further proves that the ORF10 should not be treated as the protein-coding gene, and the genome annotations should be amended.
Author summary: Coronaviral genomes code for several proteins, with the large 1a/1ab being expressed directly from genomic (g)RNA. For the expression of other viral proteins, a set of subgenomic mRNAs is produced during replication. It includes mRNAs for structural (S-E-M-N) and accessory proteins. While the function of structural proteins is well described, the function of the latter ones is under debate. Some of them are required for replication, while others are dispensable in vitro but essential in vivo. Initially, 10 open reading frames (ORFs) were annotated in the SARS-CoV-2 genome, amongst which ORF10 is the most peculiar, as it does not share sequence homology with any known protein. Shortly after the genomic sequences became available, speculations on this protein's role in pathogenesis and innate immunity breaching started. Here, we identified two patients infected with SARS-CoV-2 variants with the ORF10 gene prematurely terminated. The disease was not attenuated, and the transmissibility was maintained. The in vitro study showed that the ORF10 is also not essential for replication. Consequently, ORF10 should not be treated as the protein-coding gene, and the genome annotations should be amended.
Coronaviruses are mammalian and avian RNA viruses, with large genomes of ~30,000 bases, which encode several proteins required for the virus replication, modulating the immune responses, and forming the scaffold of progeny virions [[
Here, we identified two patients infected with the SARS-CoV-2 virus, in which the ORF10 gene was prematurely terminated with a stop codon. The disease was not attenuated, and the transmissibility was maintained. Isolation of these viruses in cell culture showed that also in vitro, these strains replicated similarly to the related viruses with the intact ORF10. Altogether, based on clinical observation and laboratory analyses, it appears that the ORF10 protein is not essential for replication in humans.
The first SARS-CoV-2 infected patient was identified in Poland on 4
Graph
Table 1 The frequency of prematurely terminated sequences per ORF.
ORF number of prematurely terminated sequences Gene length Coefficient of occurrence S 28 3822 0.052 ORF3a 37 828 0.317 E 4 228 0.125 M 4 669 0.042 ORF6 51 186 1.948 ORF7a 102 366 1.98 ORF7b 30 132 1.615 ORF8 187 366 3.63 N 6 1260 0.034 ORF10 57 117 3.461
As we already knew that both original samples carried this mutation, we analyzed the accessible clinical data. A 58-year-old Polish man living in Warsaw, Poland, spent a few days in Germany at the end of February 2020. After returning to Poland, he was informed that he was in contact with the person infected with the SARS-CoV-2 virus. Despite the lack of apparent symptoms, he contacted a public health center. On the 4
Based on the collected data, one may safely assume that the virus with the disrupted ORF10 was infectious and pathogenic in humans. The identical change in two patients proves that it did not result from intra-patient genetic drift and that the virus transmissibility was not affected.
To further characterize the phenotype of isolates, available clinical samples were overlaid on the fully confluent Vero E6 cells. Simultaneously, parallel cultures were inoculated with closely related PL_P31 and PL_P38 isolates (see Fig 1).
Graph: Fig 1 Phylogenetic analysis of the isolates included in the study.The analysis was carried out using the nexstrain server based on GISAID data [[
In all four cases, 72 h post-inoculation, we observed the appearance of a characteristic CPE. The media samples were collected daily, and total RNA was isolated. The RT-qPCR reaction was carried out, and the virus yields are presented in Fig 2. No difference between the replication dynamics between strains carrying the nonsense mutation in the ORF10 and the strains with intact ORF10 was observed. The genomes of all the strains were re-sequenced after the passage, and in all the cases, the sequences were identical to the ones observed for clinical isolates.
Concluding, results obtained from the cell culture, sequencing, and clinical data show that the stop codon in the two-thirds of the protein did not affect the virus fitness. This observation further supports the thesis that the ORF10 should not be treated as the protein-coding gene, and the genome annotations should be altered [[
Graph: Fig 2 Replication kinetics of the SARS-CoV-2 strains.Virus yield was determined with RT-qPCR, and the data is presented as a mean ±SD. The EVAg strain was used as a reference.
Vero E6 (Cercopithecus aethiops; kidney epithelial; CRL-1586) were cultured in Dulbecco's MEM (Thermo Fisher Scientific, Poland) supplemented with 3% fetal bovine serum (heat-inactivated; Thermo Fisher Scientific, Poland) and antibiotics: penicillin (100 U/ml), streptomycin (100 μg/ml), and ciprofloxacin (5 μg/ml). Cells were maintained at 37°C under 5% CO
The strains with the nonsense mutation in the ORF10 gene were designated names PL_P32 and PL_P33 [GISAID [[
All SARS-CoV-2 stocks were generated by infecting monolayers of Vero E6 cells. The virus-containing liquid was collected at day 3 post-infection (p.i.), aliquoted, and stored at −80°C. Control samples from mock-infected cells were prepared in the same manner. Virus yield was assessed by titration on fully confluent Vero E6 cells in 96-well plates, according to the method of Reed and Muench. Plates were incubated at 37°C for three days, and the cytopathic effect (CPE) was scored by observation under an inverted microscope.
Total RNA was isolated from the throat swabs collections stored as frozen PBS suspensions at -20°C using a manual TRI Reagent–chloroform extraction and sodium acetate–ethanol precipitation (Sigma-Aldrich, Poznań, Poland). The presence of SARS-CoV-2 material in the collected sample was tested using GeneFinder real-time COVID-19 plus kit (OSANG Healthcare, Korea). Isolated total RNA was treated with DNAse I to remove DNA contamination, reverse transcribed with SuperScript IV and random oligohexamer primers, next second strand synthesis was completed using DNA polymerase I (all reagents from Thermo Fisher, Warszawa, Poland). Illumina platform sequencing libraries were prepared using Nextera Flex Enrichment Library with Respiratory Virus Oligo Panel capture workflow according to the manufacturer instruction Illumina–Analitik, Warszawa, Poland). Two libraries of 12 samples barcoded with individual i7 and i5 adapters were sequenced in each run. NGS sequencing was accomplished using MiSeq v.3 2x75 chemistry (Illumina). Raw sequencing files were demultiplexed using IlluminaBasecallsToFasq procedure from PICARD package and mapped to NC_055512.2 SARS-CoV-2 reference sequence with BwaAndMarkDuplicatesPipelineSpark procedure from GATK v.4.1.5.0 package (Broad Institute, Boston, MA). Individual samples files were manually inspected using Integrated Genomics Viewer (Broad Institute). Only 2 samples out of 72 sequenced had identical C>T transition at NC_0055512:29642 position within the putative orf10 at 3' of the virus genome. Base T read quality value was QV = 38, and the numbers of reads were 265 and 340 for samples PL_P32 and PL_P33. This transition could change putative codon 29 from glutamine (CAA, id-gu280_gp11.2) to the stop (TAA). No other sequence variants were detected in the orf10 region. Sequence alignments of samples PL_P32 and PL_P33 are in S1 and S2 Files, respectively.
A viral DNA/RNA kit (A&A Biotechnology, Poland) was used for nucleic acid isolation from cell culture supernatants. RNA was isolated according to the manufacturer's instructions. According to the manufacturer's instructions, cDNA samples were prepared with a high-capacity cDNA reverse transcription kit (Thermo Fisher Scientific, Poland).
Viral RNA was quantified using quantitative PCR (qPCR; CFX96 Touch real-time PCR detection system; Bio-Rad, Poland). cDNA was amplified using 1× qPCR master mix (A&A Biotechnology, Poland) in the presence of the probe (100 nM; FAM/BHQ1, ACT TCC TCA AGG AAC AAC ATT GCC A) and primers (450 nM each; CAC ATT GGC ACC CGC AAT C and GAG GAA CGA GAA GAG GCT TG). The heating scheme was as follows: 2 min at 50°C and 10 min at 92°C, followed by 30 cycles of 15 s at 92°C and 1 min at 60°C. In order to assess the copy number for the N gene, standards were prepared and serially diluted.
The relative number (by ORF size) of premature termination mutations was calculated with 42,227 high-quality SARS-CoV-2 sequences (without ambiguous nucleotides) from GISAID. The coefficient of occurrence of premature termination mutations was calculated using the number of samples with new premature mutations generating stop codons divided by the number of codons in particular ORFs and was further normalized by multiplication by a factor of 100,000/42,227, to scale the result properly for a clearer understanding of the magnitude.
S1 File. Sequence alignment for sample PL_P32.
(ZIP)
S2 File. Sequence alignment for sample PL_P33.
(ZIP)
Authors thank Illumina Netherlands BV for the consumables, including Respiratory Virus Oligo Panel, provided free of charge in connection with exploring research and surveillance in response to the SARS CoV-2 pandemic. We acknowledge the contributions of both the Submitting and the Originating laboratories of the GISAID data used in this study.
By Katarzyna Pancer; Aleksandra Milewska; Katarzyna Owczarek; Agnieszka Dabrowska; Michal Kowalski; Paweł Łabaj; Wojciech Branicki; Marek Sanak and Krzysztof Pyrc
Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author