# Genomic location-coordinate of RdRp of SARS-CoV2

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I understand that nsp12 is the RdRp protein in the ORF1 of SARS-CoV2 genome. And nsp12 starts from the starting nucleotide base of orf1ab. Could you please tell me the exact genomic location coordinate of RdRp?

Is there and database from where the FASTA sequence (nucleotide and aa) of RdRp of SARS-CoV2 can be obtained?

Any answer to this question would be highly appreciated.

The following is a standard and quite straightforward way to obtain the information requested that is generally applicable and does not require any programming.

1. Go to the NCBI website or its Genbank page, select Genome from the pull-down menu and type your search term, 'SARS-CoV2' into the search field:

1. On clicking Search, you will be taken to a page with a brief summary of information about the genome. In the Reference genome summary box click on the RefSeq ID, NC_045512_2:

1. This takes you to the record for the nucleotide sequence of the virus genome. Scroll down (or do a Find in your web browser) to the gene of interest, nsp12. There you will find an entry with the genomic co-ordinates (13442-13468 and 13468-16236) and a cross-reference to the protein, [YP_009725307.1(https://www.ncbi.nlm.nih.gov/protein/1802476815).

1. If you click on that ID you will be taken to the Genbank entry for the protein encoded by nsp12, with the sequence in FASTA format.

You can download a file containing the sequence from the Send to pull-down shown. There is also a link to the graphical interface for viewing the protein in the context of the genome (available from the nucleotide sequence page as well).

A useful database and exploratory tool is the UCSC Genome Browser `wuhCor1`assembly browser instance.

A synonym for nsp12 is Pol. Its position range in the`wuhCor1`assembly is NC_045512v2:13,442-16,236.

There are several ways to extract sequence, but one programmatic way is to get the FASTA from UCSC Goldenpath for the assembly, and extract the characters of interest from the position range for the gene of interest using standard Unix tools:

``$$wget -qO- ftp://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/chromosomes/NC_045512v2.fa.gz | gunzip -c | tail -n+2 | tr -d ' ' | cut -c 13442-16236 | awk '{print ">nsp12"; print 0}' > nsp12.fa$$ cat nsp12.fa >nsp12 TCAGCTGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTGCAGCCCGTCTTACACCGTGCGGCACAGGCACTAGTACTGATGTCGTATACAGGGCTTTTGACATCTACAATGATAAAGTAGCTGGTTTTGCTAAATTCCTAAAAACTAATTGTTGTCGCTTCCAAGAAAAGGACGAAGATGACAATTTAATTGATTCTTACTTTGTAGTTAAGAGACACACTTTCTCTAACTACCAACATGAAGAAACAATTTATAATTTACTTAAGGATTGTCCAGCTGTTGCTAAACATGACTTCTTTAAGTTTAGAATAGACGGTGACATGGTACCACATATATCACGTCAACGTCTTACTAAATACACAATGGCAGACCTCGTCTATGCTTTAAGGCATTTTGATGAAGGTAATTGTGACACATTAAAAGAAATACTTGTCACATACAATTGTTGTGATGATGATTATTTCAATAAAAAGGACTGGTATGATTTTGTAGAAAACCCAGATATATTACGCGTATACGCCAACTTAGGTGAACGTGTACGCCAAGCTTTGTTAAAAACAGTACAATTCTGTGATGCCATGCGAAATGCTGGTATTGTTGGTGTACTGACATTAGATAATCAAGATCTCAATGGTAACTGGTATGATTTCGGTGATTTCATACAAACCACGCCAGGTAGTGGAGTTCCTGTTGTAGATTCTTATTATTCATTGTTAATGCCTATATTAACCTTGACCAGGGCTTTAACTGCAGAGTCACATGTTGACACTGACTTAACAAAGCCTTACATTAAGTGGGATTTGTTAAAATATGACTTCACGGAAGAGAGGTTAAAACTCTTTGACCGTTATTTTAAATATTGGGATCAGACATACCACCCAAATTGTGTTAACTGTTTGGATGACAGATGCATTCTGCATTGTGCAAACTTTAATGTTTTATTCTCTACAGTGTTCCCACCTACAAGTTTTGGACCACTAGTGAGAAAAATATTTGTTGATGGTGTTCCATTTGTAGTTTCAACTGGATACCACTTCAGAGAGCTAGGTGTTGTACATAATCAGGATGTAAACTTACATAGCTCTAGACTTAGTTTTAAGGAATTACTTGTGTATGCTGCTGACCCTGCTATGCACGCTGCTTCTGGTAATCTATTACTAGATAAACGCACTACGTGCTTTTCAGTAGCTGCACTTACTAACAATGTTGCTTTTCAAACTGTCAAACCCGGTAATTTTAACAAAGACTTCTATGACTTTGCTGTGTCTAAGGGTTTCTTTAAGGAAGGAAGTTCTGTTGAATTAAAACACTTCTTCTTTGCTCAGGATGGTAATGCTGCTATCAGCGATTATGACTACTATCGTTATAATCTACCAACAATGTGTGATATCAGACAACTACTATTTGTAGTTGAAGTTGTTGATAAGTACTTTGATTGTTACGATGGTGGCTGTATTAATGCTAACCAAGTCATCGTCAACAACCTAGACAAATCAGCTGGTTTTCCATTTAATAAATGGGGTAAGGCTAGACTTTATTATGATTCAATGAGTTATGAGGATCAAGATGCACTTTTCGCATATACAAAACGTAATGTCATCCCTACTATAACTCAAATGAATCTTAAGTATGCCATTAGTGCAAAGAATAGAGCTCGCACCGTAGCTGGTGTCTCTATCTGTAGTACTATGACCAATAGACAGTTTCATCAAAAATTATTGAAATCAATAGCCGCCACTAGAGGAGCTACTGTAGTAATTGGAACAAGCAAATTCTATGGTGGTTGGCACAACATGTTAAAAACTGTTTATAGTGATGTAGAAAACCCTCACCTTATGGGTTGGGATTATCCTAAATGTGATAGAGCCATGCCTAACATGCTTAGAATTATGGCCTCACTTGTTCTTGCTCGCAAACATACAACGTGTTGTAGCTTGTCACACCGTTTCTATAGATTAGCTAATGAGTGTGCTCAAGTATTGAGTGAAATGGTCATGTGTGGCGGTTCACTATATGTTAAACCAGGTGGAACCTCATCAGGAGATGCCACAACTGCTTATGCTAATAGTGTTTTTAACATTTGTCAAGCTGTCACGGCCAATGTTAATGCACTTTTATCTACTGATGGTAACAAAATTGCCGATAAGTATGTCCGCAATTTACAACACAGACTTTATGAGTGTCTCTATAGAAATAGAGATGTTGACACAGACTTTGTGAATGAGTTTTACGCATATTTGCGTAAACATTTCTCAATGATGATACTCTCTGACGATGCTGTTGTGTGTTTCAATAGCACTTATGCATCTCAAGGTCTAGTGGCTAGCATAAAGAACTTTAAGTCAGTTCTTTATTATCAAAACAATGTTTTTATGTCTGAAGCAAAATGTTGGACTGAGACTGACCTTACTAAAGGACCTCATGAATTTTGCTCTCAACATACAATGCTAGTTAAACAGGGTGATGATTATGTGTACCTTCCTTACCCAGATCCATCAAGAATCCTAGGGGCCGGCTGTTTTGTAGATGATATCGTAAAAACAGATGGTACACTTATGATTGAACGGTTCGTGTCTTTAGCTATAGATGCTTACCCACTTACTAAACATCCTAATCAGGAGTATGCTGATGTCTTTCATTTGTACTTACAATACATAAGAAAGCTACATGATGAGTTAACAGGACACATGTTAGACATGTATTCTGTTATGCTTACTAATGATAACACTTCAAGGTATTGGGAACCTGAGTTTTATGAGGCTATGTACACACCGCATACAGTCTTACAG``

Another option is to visit the Pol link again and look for the "View DNA for this feature (wuhCor1/SARS-CoV-2)" link, click on it, and click on "get DNA" to get a similar FASTA file. But if you're going to do bioinformatics, knowing how to string together Unix tools is a valuable skill.

To get the amino acid sequence, visit the Pol link, look for the Swiss-Prot record information, and do a search through UniProt for the record P0DTD1.

Via the "RNA-directed RNA polymerase" entry for P0DTD1, you can click on the "RNA-directed RNA polymerase" post-translation product to get to a record with the amino acid sequence: https://www.uniprot.org/blast/?about=P0DTD1[4393-5324]&key=Chain&id=PRO_0000449629

## RdRp mutations are associated with SARS-CoV-2 genome evolution

COVID-19, caused by the novel SARS-CoV-2 virus, started in China in late 2019, and soon became a global pandemic. With the help of thousands of viral genome sequences that have been accumulating, it has become possible to track the evolution of the viral genome over time as it spread across the world. An important question that still needs to be answered is whether any of the common mutations affect the viral properties, and therefore the disease characteristics. Therefore, we sought to understand the effects of mutations in RNA-dependent RNA polymerase (RdRp), particularly the common 14408C>T mutation, on mutation rate and viral spread. By focusing on mutations in the slowly evolving M or E genes, we aimed to minimize the effects of selective pressure. Our results indicate that 14408C>T mutation increases the mutation rate, while the third-most common RdRp mutation, 15324C>T, has the opposite effect. It is possible that 14408C>T mutation may have contributed to the dominance of its co-mutations in Europe and elsewhere.

## Specific mutations in SARS-CoV2 RNA dependent RNA polymerase and helicase alter protein structure, dynamics and thus function: Effect on viral RNA replication

The open reading frame (ORF) 1ab of SARS-CoV2 encodes non-structural proteins involved in viral RNA functions like translation and replication including nsp1-4 3C like proteinase nsp6-10 RNA dependent RNA polymerase (RdRp) helicase and 3’-5’ exonuclease. Sequence analyses of ORF1ab unravelled emergence of mutations especially in the viral RdRp and helicase at specific positions, both of which are important in mediating viral RNA replication. Since proteins are dynamic in nature and their functions are governed by the molecular motions, we performed normal mode analyses of the SARS-CoV2 wild type and mutant RdRp and helicases to understand the effect of mutations on their structure, conformation, dynamics and thus function. Structural analyses revealed that mutation of RdRp (at position 4715 in the context of the polyprotein/ at position 323 of RdRp) leads to rigidification of structure and that mutation in the helicase (at position 5828 of polyprotein/ position 504) leads to destabilization increasing the flexibility of the protein structure. Such structural modifications and protein dynamics alterations might alter unwinding of complex RNA stem loop structures, the affinity/ avidity of polymerase RNA interactions and in turn the viral RNA replication. The mutation analyses of proteins of the SARS-CoV2 RNA replication complex would help targeting RdRp better for therapeutic intervention.

## Results

### SARS-CoV-2 proximity to coronaviruses, host genomes and tissue transcriptomes

Since the end of last year when it first emerged, SARS-CoV-2 has been mutating and spreading around the world. Over 5,000 complete or near-complete SARS-CoV-2 genomes are currently accessible in GenBank, with various mutations. To determine which SARS-CoV-2 sequence was most appropriate to use, we retrieved all the published sequences of the virus available in NCBI’s SARS-CoV-2 data hub (5,064 complete SARS-CoV-2 genomes) after excluding incomplete and low-quality sequences and CDSs with insertions or deletions, we calculated the percent difference in codon usage between these and the reference sequence. The average percent difference in codon usage was

8 codons/10,000, clearly showing that variation in sequences is not significantly affecting overall codon usage. This degree of mutation between strains is corroborated by a recently published study 29 , and is encouraging as it suggests that escape mutants are unlikely to develop, even for viral genes that have the highest selection pressure such as the S protein. Furthermore, we examined genetic diversity data from Nextstrain 30 , accounting for 4,675 SARS-CoV-2 genomes. From these sequences, there are 293 codon positions (

23% of the S gene) with reported Shannon entropies, i.e. with a documented mutation at that position. Among all other genes, there are 2,328 such positions (

27% of all non-S genes), indicating a higher percentage of mutated codons outside of the S gene.

A discrepancy between virus and host codon and codon pair usage bias has been observed across a range of viruses 12,13,31,32 , therefore we examined whether this was true for SARS-CoV-2 and its current host and to other Coronaviruses. Codon pair data inherently contain the codon usage data and therefore are better suited than codon usage data for this type of comparison. As expected, SARS-CoV-2 codon pair usage closely resembles the codon usage of the coronoviridae family, while it is quite distinct from the codon pair usage of the human genome (Table 1). Bat (Chiroptera) and pangolin (Pholidota) from which the virus may have been transmitted to humans, as well as dog (Canis lupus familiaris) to which the virus is feared may be transmitted next, were included in the analysis. We find that these species have a similar codon usage when compared with human therefore, viral tropism cannot be inferred based on codon usage data alone (Table 1). Since SARS-CoV-2 infects bronchial epithelial cells and type II pneumocytes and our recent findings show that transcriptomic tissue-specific codon pair usage can vary greatly from genomic codon pair usage 28 , we also examined the transcriptomic codon pair usage of the lung and how it compares with the SARS-CoV-2 codon pair usage. Rather surprisingly, the codon pair usage in the lung was more distinct from SARS-CoV-2 codon pair usage than the Homo sapiens genomic codon pair usage. The transcriptomic codon pair usage of kidney and small intestine, tissues that are also susceptible to the infection, are similarly distant from SARS-CoV-2 (Table 1). Recently, it was argued that some degree of dissimilation in codon usage between the virus and the host may be beneficial to the virus, as it does not severely impede host gene translation 11 .

### Codon, codon pair and dinucleotide usage of SARS-CoV-2

To inspect the sequence features of SARS-CoV-2 in more detail, we plotted its codon usage per amino acid and compared it with the human genome and lung transcriptome (Fig. 1). SARS-CoV-2 clearly exhibits a preference in codons ending in T and A (71.7%), which is not observed in the human genome (44.9% ending in T or A) and lung transcriptome (37.6% ending in T or A). Similarly, the kidney and small intestine transcriptome show a preference for codons ending in C and G (62.5% in the kidney and 61.8% in the small intestine, Supplemental Figure 1). The codon pair usage of SARS-CoV-2 was also examined in juxtaposition with the human codon pair usage (Fig. 2A,B). The differences in codon pair usage of the two genomes are highlighted in Fig. 2C.

Codon frequencies per 1,000 for SARS-CoV-2 (Red), Homo sapiens Genomic (Black) and Homo sapiens Lung (Yellow). Codons are grouped by the amino acid they encode (alternating light blue columns, Met (M) and Trp (W) represented as single letter).

Heat maps of log transformed codon pair frequencies per 1 M for Homo sapiens Genomic (A), SARS-CoV-2 (B) and the absolute value of difference between the two (C). Codon pairs increase in frequency from dark to light.

Since the mechanism of viral attenuation through codon pair deoptimization is not entirely clear, and it has been argued that it is an indirect result of increased CpG content, we further investigated the dinucleotide and junction dinucleotide profile of the SARS-CoV-2 as it compares with Homo sapiens genome and lung transcriptome (Fig. 3). Clearly, CpG dinucleotides are avoided in the SARS-CoV-2 genome, and to a lesser extent CC and GG dinucleotides are too. This provides an opportunity to increase immunogenicity of a potential attenuated virus vaccine by increasing its CpG content.

Dinucleotide (A) and junction dinucleotide (B) frequencies per 1,000 for SARS-CoV-2 (Red), Homo sapiens Genomic (Black) and Homo sapiens Lung (Yellow).

### RNA folding

The genome sequence determines not only the amino acid sequence, but also the structure of the mRNA. The mRNA structure following the frameshift site is expected to be especially biologically relevant, as pseudoknots following programmed ribosomal frameshifts have been found to regulate the efficiency of the frameshift 33,34 . We therefore sought to study the similarity of the SARS-CoV-2 mRNA structure compared with the structures of different coronavirus mRNAs in the region following the ORF1ab frameshift.

RNA structures were predicted using two distinct secondary structure prediction algorithms, LandscapeFold 36 and NuPack 35,37 . Of the top 10 coronaviruses whose predicted minimum free energy (MFE) structures best aligned to that of SARS-CoV-2, seven matched among the two algorithms, showing a high degree of agreement among the two sets of structure predictions. Those seven consensus best-aligned structures are shown, alongside the novel coronavirus post-frameshift structure, in Fig. 4A–H. The similarity of two of these structures to SARS-CoV-2 can be explained by a high degree of sequence similarity to the SARS-CoV-2 mRNA (a SARS-related coronavirus and a bat coronavirus, shown in Fig. 4B, C). However, the other five—all belonging to avian coronaviruses, which are part of the group of the so-called gammacoronaviruses, causing highly contagious diseases of chickens, turkey and other birds—were not in the top 10 sequences most closely aligned to the SARS-CoV-2 mRNA on the basis of sequence. It should be noted that, of the 5,064 SARS-CoV-2 sequences analyzed, 3,978 had a complete ORF1ab with the exact ‘UUUAAAC’ frameshift sequence in the annotated position. Of these, 3,951 share the same sequence in the 100 nts downstream of the signal, indicating a high degree of conservation in this region.

(A) The predicted minimum free energy (MFE) secondary structure of the novel coronavirus RNA in the 75 nts following the frameshift. All MFE structures displayed are those predicted by LandscapeFold results discussed were found to be insensitive to prediction algorithm by comparison to NuPack. (B,C) Known coronaviruses with high degree of sequence and structure similarity to the novel coronavirus. (DH) Known coronaviruses with a high degree of structure similarity to the novel coronavirus, but less sequence similarity. See main text for further discussion. (I) In addition to examining the predicted MFE structures, we considered the full free-energy landscapes. The probability of each coronavirus to form a pseudoknot in the 75 nts following the frameshift (orange), and the probability of the first stem to be part of a 3-stem pseudoknot (blue), are histogrammed.

Finally, we used LandscapeFold 36 to study the RNA folding beyond the MFE structures. We find that even those coronaviruses whose MFE structure does not contain a pseudoknot will fold into a pseudoknot in a relatively high fraction of cases, and that most coronaviruses have a relatively high probability of the initial stem following the frameshift folding into part of a 3-stem pseudoknot like the one exhibited by the SARS-CoV-2 MFE structure (Fig. 4I).

### Viral gene codon usage properties

We next sought to examine each viral gene separately in terms of their codon and codon pair usage. Relative synonymous codon usage (RSCU) and codon pair score (CPS) are commonly used metrics to describe the codon and codon pair usage bias, respectively. RSCU expresses the observed over expected synonymous codon usage ratio, while CPS is the natural log of the observed over expected synonymous codon pair ratio using observed individual codon usage 2,38 . In our analyses, RSCU and CPS are derived from human genomic codon and codon pair usage frequencies. For ease of comparison, we used ln(RSCU) to measure the codon usage bias. The average CPS across a gene is referred to as codon pair bias (CPB) of the gene 2 . The average ln(RSCU) and CPB of each viral gene was calculated and compared with host genes average ln(RSCU) and CPB (Fig. 5). The average RSCU, ln(RSCU) and CPB of each viral gene appear in Table 2. ORF10 was strikingly the least similar gene to the human genome in terms of both its codon and codon pair usage, followed by the E gene. These genes provide little opportunity for deoptimization, since their sequence is already far from optimal. On the other hand, genes S and N are more similar to human in terms of their codon pair usage. To explore further the potential for codon pair deoptimization, we plotted their CPS across their sequence (Supplemental Figure 2 and Fig. 6). As seen in these figures all viral genes use mostly rare codons (ln(RSCU) < 0) however, it is striking that ORF6 and ORF10 use almost exclusively rare codons, while ORF3a and M and ORF10 have some of the lowest ln(RSCU) values. Regarding codon pair usage, S stands out as the gene that uses frequent codon pairs more often (peaks with relatively high CPS scores), while N, ORF6 and ORF7b are genes that do not use very rare codon pairs (CPS values are only moderately negative).

Scatterplots of RSCU bias [average ln(RSCU)] (A) and codon pair bias (CPB) (B) by CDS length of human and viral genes. Human genes appear as grey dots and viral genes appear with different colored markers.

Seven codon sliding window average of ln(RSCU) (A) and codon pair score (CPS) (B) of structural SARS-CoV-2 genes. Genes are shown in the order they appear in the viral genome, but gaps between open reading frames have been removed. Genes alternate in colors black and blue for clarity, with the gene name in the corresponding color appearing above or below the window. RSCU and CPS are calculated based on Homo sapiens genomic codon and codon pair usage.

## RNA Synthesis

The RNA template, 5′-rUrUrUrUrUrCrArUrArArCrUrUrArArUrCrUrCrAr- CrArUrArGrCrArCrUrG-3′, and RNA primer 5′-rCrArGrUrGrCrUrArUrGrUr- GrArGrArUrUrArArGrUrUrArU-3′ were prepared by solid-phase synthesis on an ÄKTA oligopilot plus 10 (Cytiva). RNAs were cleaved from the solid support by treating with 4 mL of a 1:1 mixture of 28% wt N H 3 / H 2 O solution and 33% wt C H 3 N H 2 /EtOH solution at 55 ° C for 30 min then the silyl protecting groups were removed by treating with 3 mL of 1:1 mixture of triethylamine trihydrofluoride and dimethyl sulfoxide at 55 ° C for 90 min. Then, 30 mL of cold 50 mM N a C l O 4 in acetone was added to precipitate the RNA product. After centrifugation, the pellet of RNA was dissolved in 5 mL of water and passed through a Sep-Pack C18 Cartridge, 5 g sorbent (Waters). Eluates containing RNA were combined and lyophilized. Mass spectroscopy of the template RNA found m/z ([M- 7 H + ]) = 1,341.8 (theoretical 1,342.1) and found m/z ([M- 6 H + ]) = 1,565.6 (theoretical 1,565.9) mass spectrometry of the primer RNA found m/z ([M- 6 H + ]) = 1,278.6 (theoretical 1,278.9) and found m/z ([M- 5 H + ]) = 1,534.5 (theoretical 1,534.9).

### Annealing of Primer:Template RNA Duplexes.

Single-stranded primer and template RNAs (Fig. 1A) were resuspended in deionized and purified H 2 O to a final concentration of 200 μM, mixed at an equimolar ratio, and incubated for 5 min in a heat block at 95 ° C in Eppendorf tubes with punctured lids. The heat block was then removed from the heating device and allowed to cool to ambient temperature. Annealed dsRNAs were then dispensed as single-use aliquots, flash frozen in liquid nitrogen, and stored at − 80 ° C.

Reconstitution of SARS-CoV-2 RdRp activity and inhibition by favipiravir-RTP (FTP). (A) Sequence of the annealed primer (Upper):template (Lower) dsRNA duplex employed in biochemical assays. (B) Reconstituted SARS-CoV-2 RdRp extends the 24mer primer in the presence (lanes 4 to 6), but not the absence (lanes 1 to 3), of ribonucleotides (rNTPs). A 31mer product is present in lane 6, due to addition of a nontemplated base to the primer strand. (C) Favipiravir is weakly incorporated into the 24mer primer strand (lanes 7 to 15), and suppresses RNA replication by the SARS-CoV-2 RdRp (lanes 16 to 24). Replication, nucleotide incorporation, and replication inhibition assays are representative results of four, six, and six technical replicates, respectively.

### Assembly of the apo-RdRp Complex.

Purified apo-RdRp, assembled as above, was mixed immediately after purification with annealed dsRNA, resuspended in d d H 2 O, to a final protein:RNA concentration of 8:20 μM, and was incubated at room temperature for 5 min prior to addition of favipiravir-RTP to a final concentration of 100 μM. Thereafter, complexes were incubated for an additional 30 min at room temperature, prior to another round of centrifugation at 4 ° C, transferred to fresh Eppendorf tubes, and used immediately for the preparation of cryoEM grids.

### Primer Extension, Drug Incorporation, and Replication Inhibition Assays.

All assays (Fig. 1 B and C and SI Appendix, Fig. S1) were performed at room temperature in assembly buffer. Reaction conditions were designed to approximate, as closely as possible, those employed in production of vitrified electron microscopy (EM) grids. For primer extension assays, 6 μM RdRp (prepared as above) was assembled with 6 μM dsRNA, and mixed with rATP/rGTP (at a final concentration of 500 μM). For drug incorporation assays, RdRp:dsRNA complexes were assembled, mixed with favipiravir-RTP (at final concentrations of 10, 100, and 500 μM), and reactions were allowed to proceed. For replication inhibition assays, RdRp:dsRNA complexes were assembled, and preincubated with favipiravir-RTP (at final concentrations of 10, 100, and 500 μM) for a period of 30 min, prior to addition of rATP/rGTP to a final concentration of 500 μM. This is similar to the assay protocol used for remdesivir in ref. 10. Samples were removed at the indicated time points (0, 30, and 90 min) the reactions were stopped with a 1:1 addition of quenching buffer (98% formamide, 10 mM (ethylenedinitrilo)tetraacetic acid), flash frozen, and stored at − 21 ° C prior to gel analyses.

### Specimen Analysis.

The 20% polyacrylamide, 8 M urea gels (0.75 mm thick, 20 cm long) were run at 15 W in Tris–borate–EDTA buffer for 2 h. The RNA gel was stained using SYBR Gold Nucleic Acid Gel Stain (Invitrogen). Fluorescence imaging was performed using an Amersham Typhoon imager (Cytiva).

### CryoEM and Atomic Model Building.

The complex, prepared as described above, was used at a concentration of 1.3 mg/mL for preparing cryoEM grids. All-gold HexAuFoil grids with a hexagonal array of 280-nm-diameter holes, 700-nm hole-to-hole spacing, and 330 Å foil thickness were made in-house, and plasma was treated as described in ref. 16. Graphene was grown by chemical vapor deposition, transferred onto all-gold UltrAuFoil R0.6/1 grids (QuantiFoil), and partially hydrogenated following a previously described procedure (17). Grids were rapidly cooled using a manual plunger (18) in a 4 ° C cold room. A 3 μL volume of the protein solution was pipetted onto the foil side of the grid and then blotted from the same side with filter paper (Whatman No. 1) for 11 s to 14 s. The grids were immediately plunged into liquid ethane kept at 93 K in a cryostat (19), and were stored in liquid nitrogen until they were imaged in the electron cryomicroscope.

We acquired electron micrographs from four grids: three HexAuFoil grids (total 52,881 multiframe micrographs) and one partially hydrogenated graphene-coated grid (11,096 multiframe micrographs) (Fig. 2A and SI Appendix, Table S1). Other surfaces that were screened during initial specimen preparation included graphene functionalized with amylamine or hexanoic acid and graphene oxide, none of which improved the orientation distribution or yielded two-dimensional (2D) classes indicating degradation of the complex more severe than due to the air–water interface alone. Thus, they were not included in the dataset for high-resolution reconstruction. The high density of foil holes on the HexAuFoil grid, combined with fast imaging by aberration-free image shift and a fast, direct-electron detector (Falcon 4, 248 Hz), allowed us to acquire more than 200 micrographs per hour for 14 consecutive days. The number of micrographs that can be acquired from a single HexAuFoil grid is limited only by the ice contamination rate in the vacuum of the electron microscope, rather than by the number of holes on one grid, even with the fastest available data collection setups.

Electron cryomicroscopy of RdRp complexes in the presence of RNA and favipiravir-RTP. (A) Electron cryomicrographs of the reconstituted complexes in unsupported ice (HexAuFoil grid, Upper), and on hydrogenated graphene (Lower) were used for structure determination. (Scale bar, 500 Å.) (B) This 2D class average from the images of the complex, containing RNA, corresponds to the most frequent orientation of the particles in the thin film of vitreous water. (C) The orientation distribution, with efficiency, E o d (23), of the particles used in the reconstruction is plotted on a Mollweide projection, with the most common views on each type of grids marked with gray lines. (D) The Fourier shell correlation (FSC) between the two independent masked half maps, and between the final map and the atomic model, is plotted versus resolution. (E) An overview of the EM map of the polymerase complex is colored by subunit: green, nsp12 blue, nsp7 pink and purple, two copies of nsp8 yellow, template:primer RNA.

All data were processed in RELION 3.1 (20), using particle picks imported from crYOLO 1.5 (21) (SI Appendix, Fig. S2) with manual retraining of the model. After four rounds of 2D classification, it was clear that the specimen adopted a preferred orientation (Fig. 2B), as previously reported (9). The initial model for 3D refinement was obtained by low-pass filtering a published map of the complex with remdesivir (Electron Microscopy Data Bank accession code EMD-30210) (10) to 20 Å. The uniformity of the particle orientation distribution and the isotropy of the final 3D reconstruction were improved by combining data from HexAuFoil and partially hydrogenated graphene-coated UltrAuFoil EM grids (Fig. 2C). After three rounds of 3D classification, we obtained a 2.5 Å resolution isotropic map of the full complex from 40,828 particles (Fig. 2 D and E). A different set of 140,639 particles produced a 2.5 Å resolution map of the nsp12 subunit alone. The remaining 99% of the particles were discarded, as these corresponded to either the overrepresented view of the nsp12 subunit alone or to structurally heterogeneous complexes that did not align to high resolution.

Optical aberrations were refined per grid, the astigmatism was refined per micrograph, and the defocus was refined per particle. Particle movement during irradiation was tracked separately for the graphene and the HexAuFoil datasets, using Bayesian polishing (22). The dataset on graphene shows typical beam-induced motion at the onset of irradiation, whereas this movement is eliminated by the use of the HexAuFoil grids, which provided superior data quality, especially at the onset of irradiation, when the specimen is least damaged. Of the particles contributing to the final 3D refinement, 80% originated from the HexAuFoil grids, and 20% from the partially hydrogenated graphene grid. This was necessary to improve the particle orientation distribution: The efficiency, E o d , increased from 0.54 in unsupported ice and 0.57 on partially hydrogenated graphene to 0.68 using a combination of particles from both support surfaces (Fig. 2C) (23). External reconstruction in SIDESPLITTER was performed (24). The atomic model was built based on a previously published atomic model of the nsp12–nsp7–nsp8 complex bound to RNA and remdesivir-RTP (Protein Data Bank [PDB] ID code 7BV2) (10), with manual model building in Coot (25, 26), and real-space refinement in Phenix (27).

## Structure-Based SARS-CoV-2 Inhibitors

Currently, some small-molecule compounds have been developed which showed inhibitory effects on the SARS-CoV-2 infection, as described below.

### Remdesivir

Remdesivir is an adenosine analogue and is a potent inhibitor of RdRp. Remdesivir could potently inhibit the replication of SARS-CoV-2 in vitro. Remdesivir shows broad-spectrum antiviral effects against RNA virus infection in cultured cells, nonhuman primate models, and mice. As an adenosine analogue, remdesivir functions after virus entry, via incorporating into nascent viral RNA to terminate the replication before the RNA become mature (Gurwitz, 2020). Remdesivir is a kind of prodrug. In target cells, it would transform into the triphosphate form (RTP) and become active (Siegel etਊl., 2017). Like other nucleotide analog prodrugs, remdesivir inhibits the RdRp activity through covalently binds the primer strand to terminate RNA chain. Upon adding ATP, the nsp12-nsp7-nsp8 complex exhibits the function of RNA polymerase. However, with the addition of the active triphosphate form of remdesivir (RTP), the RNA polymerization activity would be significantly inhibited. The structure of the apo RdRp is composed of nsp12, nsp7, and nsp8. Besides, the template-RTP RdRp complex is composed of a 14-base RNA in the template strand as well as 11-base RNA in the primer strand. Of note, the remdesivir is in the monophosphate form (RMP) in the complex. The RMP is covalently linked to the primer strand, three magnesium ions, and a pyrophosphate. The three magnesium ions locate near the active site and promote catalysis. The RMP locates in the catalytic active site center. The catalytic active site is composed of seven motifs. There are base-stacking interactions between RMP and the base of the primer strand in the upstream. Hydrogen bonds also exists between RMP and the uridine base of the template strand. There are also interactions between RMP and side chains (K545 and R555). Twenty-nine residues from nsp12 participate the binding of the RNA directly. No residue from nsp7 or nsp8 mediates the RNA interactions (Yin etਊl., 2020).

Similar to remdesivir, favipiravir is also an inhibitor of the RdRp. The structure of favipiravir resembles the endogenous guanine. Clinical trial demonstrated that favipiravir had little side effect as the first anti-SARS-CoV-2 compound conducted in China (Furuta etਊl., 2017 Tu etਊl., 2020).

A mechanism-based inhibitor, N3, which was identified by the drug design aided by computer, could fit inside the substrate-binding pocket of the main protein and is a potent irreversible inhibitor of the main protein. Two of the Mpro-N3 complex associate to form a dimer (the two complexes are named protomer A and protomer B, respectively). Each protomer contains three domains which are designated as domain I−III. Both domain I and domain II have a β-barrel structure arranged in antiparallel manner. Domain III has five α-helices which associate to form a globular cluster structure in antiparallel manner. Domain III connects to domain II with a long loop. The cleft between domain I and domain II contains the substrate binding site. The backbone atoms of the compound N3 form an antiparallel sheet with residues 189� of the loop that connects domain II and domain III on one side, and with residues 164� of the long strand (residues 155�) on the other (Jin et al., 2020a) ( Figure 5A ).

The crystal structure of N3 and its inhibitors. (A) The crystal structure of N3-main protease complex. The main protease is colored brightorange. N3 is colored green (PDB: 6LU7). (B) The crystal structure of 11a-main protease complex. The main protease is colored brightorange, 11a is blue (PDB: 6LZE). (C) The crystal structure of 11b-main protease complex. The main protease is brightorange, 11b is red (PDB: 6M0K). (D) The crystal structure of Carmofur-main protease complex. The main protease is brightorange, carmofur is cyan (PDB: 7BUY).

### 11a and 11b

Two compounds, namely 11a and 11b which target the M pro , exhibit excellent inhibitory effects on SARS-CoV-2 infection in vitro. The inhibitory activity of 11a and 11b at 1 µM is 100 and 96%. In vivo, the 11a and 11b exhibit good pharmacokinetics (PK) properties. Of note, 11a showed low toxicity as well. The -CHO group of 11a and 11b bond to the cysteine 145 of M pro covalently. Different parts of 11a (designated as P1’, P1, P2, and P3) fits into different parts of the substrate-binding site. The (S)-γ-lactam ring of 11a at P1 inserts into the S1 site. The cyclohexyl moiety of 11a at P2 fits into the S2 site. At the part P3 of 11a, the indole group is exposed to the S4 site (in the solvent). The oxygen atom of -CHO forms a hydrogen bond with the cysteine 145 in the S1’ site. In addition, many water molecules (designated as W1−W6) are critical for binding 11a. The SARS-CoV-2 M pro -11b complex is similar to the SARS-CoV-2 M pro -11a complex and the 11a and 11b exhibit similar inhibitor binding mode (Dai W. et al., 2020) ( Figures 5B, C ).

### Camostat Mesylate

TMPRSS2 and TMPRSS4 are two mucosa-specific serine proteases which facilitate the fusogenic activity of SARS-CoV-2 spike protein and facilitate the virus to enter host cells (Zang etਊl., 2020). SARS-CoV-2 employs the TMPRSS2 in cells to prime the spike protein. TMPRSS2 activity is critical for the spread of SARS-CoV-2 as well as the pathogenesis in the infected host. Therefore, TMPRSS2 is a potential antiviral target. The spectrum of cell lines mediated entry by the S protein of SARS-CoV-2 and SARS-CoV are similar. Camostat mesylate, a clinical TMPRSS2 inhibitor, can partially block SARS-CoV-2 spike-driven entry into lung cells. In addition, camostat mesylate exhibits potent inhibit activity on SARS-CoV, SARS-CoV-2, and MERS-CoV, inhibiting them from entering lung cell line Calu-3, without cytotoxicity. In conclusion, camostat mesylate has the potential to treat and prevent COVID-19 (Hoffmann etਊl., 2020).

### Carmofur

The antineoplastic drug carmofur can inhibit the main protease (M pro ) of SARS-CoV-2. The crystal structure of carmofur-main protease complex has been solved. Carmofur inhibits the activity of SARS-CoV-2 main protein in vitro and the half-maximum inhibitory concentration (IC50) is 1.82 μM. Carmofur is an approved antineoplastic agent used for colorectal cancer. It is a derivative of 5-fluoroyracil (5-FU). The molecular details of how carmofur inhibits the activity of SARS-CoV-2 main protein have not been resolved. One study showed the crystal structure of SARS-CoV-2 M pro -carmofur complex. The electron density figure indicates that the fatty acid moiety (C7H14NO) of carmofur links with the Sγ atom of SARS-CoV-2 main protein catalytic residue Cys145 covalently. The electrophilic carbonyl group of carmofur is attacked by the sulfhydryl group of Cys145. This process modifies the Cys145 covalently and releases the 5-FU motif. Notably, numerous hydrogen bonds and hydrophobic interactions stabilize the inhibitor carmofur. The fatty acid tail of carmofur (an extended conformation) inserts into the S2 subunit of SARS-CoV-2. Most of the hydrophobic interactions are contributed by His41, Met165, and Met49 in the side chain (Jin et al., 2020b) ( Figure 5D ).

### Lipopeptide EK1C4

The complex (6-HB) formed by the HR1 and HR2 of the SARS-CoV-2 S protein could facilitate the infection of the viruses (Xia et al., 2020b). EK1 is one kind of coronavirus fusion inhibitor and has an inhibitory effect on various coronaviruses. It targets the HR1 of the S protein of human coronavirus and has been proved to effectively inhibit the infection of five HCoVs, including SARS-CoV and MERS-CoV. Peptide EK1 could intervene the formation of viral 6-HB (Xia et al., 2020a). A recent study shows that the peptide EK1 could also inhibit the membrane fusion mediated by SARS-CoV-2 spike protein as well as SARS-CoV-2 pseudovirus infection in a dose-dependent manner (Xia et al., 2020a Xia et al., 2020b). EK1C is constructed by covalently attaching the cholesterol acid to the C-terminal of EK1 sequence. It is noteworthy that the lipopeptide EK1C4 has the strongest inhibitory effect on the membrane fusion which is mediated by the spike protein, with IC50 of 4.3 nM. However, the IC50 of EK1 is 409.3 nM. EK1C4 could also potently inhibit the infection caused by live coronavirus in vitro with little, or even no, toxic effect. In conclusion, EK1C4 has the potential to be used for the treatment and prevention of COVID-19 (Xia et al., 2020a).

## Data availability

The PhyloCSF tracks and FRESCo synonymous constraint elements are available for the SARS-CoV-2/wuhCor1 assembly in the UCSC Genome Browser at http://genome.ucsc.edu as public track hubs 1,37,38,42 named “PhyloCSF” and “Synonymous Constraint”. All other data generated or analysed during this study are included in this published article and its supplementary information files. This study made use of publicly available datasets from GISAID (https://www.gisaid.org) and from UniProtKB/Swiss-Prot (https://www.uniprot.org). Source data are provided with this paper.

## Introduction

The SARS-CoV-2 (a member of Coronaviruses) outbreak occurred in Wuhan, China in December 2019, and it became a pandemic by spreading to almost all countries worldwide. The SARS-CoV-2 causes the COVID-19 disease that has created a global public health problem. As of May 8, 2020, more than 3.9 million confirmed COVID-19 cases were reported worldwide with 0.27 million confirmed deaths. As the virus spreads to new locations, it alters its protein sequence by the introduction of mutations in its genome that help it to survive better in the host (Sackman et al., 2017). The β lymphocytes of the host adaptive immune system eventually identify the specific epitopes of the pathogenic antigen and start producing protective antibodies, which in turn results in agglutination and clearance of the pathogen (Alcami & Koszinowski, 2000 Chaplin, 2010 Jensen & Thomsen, 2012). Being an efficient unique pathogen, a virus often mutates its proteins in a manner that it can still infect the host cells, evading the host immune system. Even when fruitful strategies are discovered and engaged, the high rate of genetic change displayed by viruses frequently leads to drug resistance or vaccine escape (McKeegan, Borges-Walmsley & Walmsley, 2002).

The SARS-CoV-2 has a single stranded RNA genome of approximately 29.8 Kb in length and accommodates 14 ORFs encoding 29 proteins that include four structural proteins: Envelope (E), Membrane (M), Nucleocapsid (N) and Spike (S) protein, 16 non-structural proteins (nsp) and nine accessory proteins (Gordon et al., 2020a Wu et al., 2020) including the RNA dependent RNA polymerase (RdRp) (also named as nsp12). RdRp is comprised of multiple distinct domains that catalyse RNA-template dependent synthesis of phosphodiester bonds between ribonucleotides. The SARS-CoV-2 RdRp is the prime constituent of the replication/transcription machinery. The structure of the SARS-CoV-2 RdRp has recently been solved (Gao et al., 2020) and show three distinct domains.

For RNA viruses, the RdRp presents an ideal target because of its vital role in RNA synthesis and absence of host homolog. RdRp is therefore a primary target for antiviral inhibitors such as Remdesivir (Gordon et al., 2020b) that is being considered a potential drug for the treatment of COVID-19. Since RNA viruses constantly evolve owing to the rapid rate of mutations in their genome, we decided to analyse the RdRp protein sequence of SARS-CoV-2 from different geographical regions to see if RdRp also mutates. Here, in the present study, we identified and characterised three mutations in the RdRp protein isolated from India against that of the ‘Wuhan wet sea food market’ (Wu et al., 2020) SARS-CoV-2. Altogether, our data strongly suggest at the prevalence of mutations in the genome of SARS-CoV-2 needs to be considered to develop new approaches for targeting this virus.

## SARS-CoV-2 Standard #COV019

При работе с нашим интерактивным средством для поиска сертификатов анализа придерживайтесь следующих рекомендаций:

• Убедитесь, что Вы ввели правильный номер по каталогу и номер партии (или контрольный номер) в полях для поиска.
• Если Вы используете набор, попробуйте поискать сертификат анализа по данным набора, а также по данным индивидуальных компонентов.
• Необходимый сертификат анализа может быть недоступен на веб-сайте. В этом случае можно обратиться к представителю Bio-Rad или воспользоваться формой запроса.

Если не удается найти требуемый сертификат анализа, воспользуйтесь формой запроса по приведенной ссылке.

#### Где можно найти номер по каталогу, номер артикула или продукта?

Номер по каталогу, номер артикула или продукта напечатаны на этикетке продукта. Местоположение этой информации показано на приведенном ниже образце.

#### Где можно найти номер партии или контрольный номер?

Номер партии или контрольный номер (только один из двух) напечатан на этикетке продукта. Местоположение этой информации показано на приведенном ниже образце.

#### Почему сертификаты анализа находятся не на вкладке документов?

Сертификаты анализа связаны не только с продуктом, но и с конкретными партиями этого продукта. Для каждого продукта может быть доступно несколько сертификатов анализа, особенно если данная линия продуктов выпускается давно и за годы производства было выпущено несколько партий. С помощью средства поиска сертификатов анализа можно ввести номер по каталогу и номер партии (или контрольный номер) для конкретного продукта, находящегося у Вас на руках, и загрузить необходимый сертификат анализа.

#### У меня есть номер серии, а не номер партии или контрольный номер. Как мне найти сертификат анализа для моего продукта?

Номер серии можно использовать вместо номера партии или контрольного номера. Используйте средство поиска сертификатов анализа: введите номер по каталогу, как обычно, а вместо номера партии или контрольного номера укажите номер серии.

#### Могу ли я получить сертификат анализа, если срок действия моего продукта истек?

Да. Несмотря на то, что мы периодически удаляем сертификаты анализа в ходе процедур обслуживания сайта, мы стараемся оставлять их в доступе в течение продолжительного времени после истечения срока действия продукта.

#### Почему для сертификата анализа, приведенного в результатах поиска, имеется пометка «Название продукта не найдено»?

Существуют сертификаты анализа для продуктов, выпуск которых прекращен, или продуктов, которые недоступны на веб-сайте. В таких ситуациях сертификат анализа доступен для загрузки, однако другие сведения о продукте, такие как его название, недоступны.

#### Почему для сертификата анализа, приведенного в результатах поиска, имеется пометка «Недоступно»?

Это означает, что Вы ввели правильный номер по каталогу и номер партии (или контрольный номер), и мы нашли сертификат анализа. Однако по какой-то причине сам файл недоступен.

Воспользуйтесь формой запроса или обратитесь к представителю Bio-Rad, чтобы мы выслали Вам сертификат анализа.

## Insufficient Sensitivity of RNA Dependent RNA Polymerase Gene of SARS-CoV-2 Viral Genome as Confirmatory Test using Korean COVID-19 Cases

How to cite: Kim, S. Kim, D. Lee, B. Insufficient Sensitivity of RNA Dependent RNA Polymerase Gene of SARS-CoV-2 Viral Genome as Confirmatory Test using Korean COVID-19 Cases. Preprints 2020, 2020020424 (doi: 10.20944/preprints202002.0424.v1). Kim, S. Kim, D. Lee, B. Insufficient Sensitivity of RNA Dependent RNA Polymerase Gene of SARS-CoV-2 Viral Genome as Confirmatory Test using Korean COVID-19 Cases. Preprints 2020, 2020020424 (doi: 10.20944/preprints202002.0424.v1). Copy

### Cite as:

Kim, S. Kim, D. Lee, B. Insufficient Sensitivity of RNA Dependent RNA Polymerase Gene of SARS-CoV-2 Viral Genome as Confirmatory Test using Korean COVID-19 Cases. Preprints 2020, 2020020424 (doi: 10.20944/preprints202002.0424.v1). Kim, S. Kim, D. Lee, B. Insufficient Sensitivity of RNA Dependent RNA Polymerase Gene of SARS-CoV-2 Viral Genome as Confirmatory Test using Korean COVID-19 Cases. Preprints 2020, 2020020424 (doi: 10.20944/preprints202002.0424.v1). Copy