Many of the most important viruses have genomes composed of RNA instead of DNA. Examples include influenza, ebola, rabies, SARS, MERS, polio, hepatitis C, yellow fever and dengue viruses. While the human genome comprises some 3 billion nucleotides (A, C, G or T), RNA virus genomes lie in the range of just ~2000 to ~35000 nucleotides. Due to their extreme small size, RNA viruses have evolved very high density coding within their genomes. Unlike the human genome, which is largely "junk" DNA, essentially every part of an RNA virus genome is functional, and some regions even simultaneously encode multiple functions.
One focus of research in our lab is the development and application of mathematical and computational techniques to gain a better understanding of the molecular biology of RNA viruses. By analyzing the patterns of nucleotide changes across alignments of sequences from different isolates of a virus species, we can make inferences about the functions of each part of the virus genome: e.g. protein-coding sequence evolves in a different way from RNA-structure-encoding sequence. We are using comparative genomic approaches to identify new features in both plant and animal RNA viruses, and guide follow-up experimental analyses. Our long-term goal is to map "all" functional elements in all economically and medically important RNA virus genomes. This will provide a robust platform on which to build future research into the molecular biology of medically, veterinarily, and agriculturally important RNA viruses, and may also uncover new molecular mechanisms with potential biotechnological applications.
![]() |
![]() |
![]() |
Many previously undetected genes ('hidden' genes) are translated via non-canonical mechanisms such as programmed ribosomal frameshifting, non-AUG initiation, programmed stop codon readthrough, and internal ribosome entry sites (IRESes). These unusual translational mechanisms are particularly frequently used by RNA viruses. Since viruses use the host cell's ribosomes and many other components of the protein synthesis machinery, these unusual translation mechanisms are potentially also relevant to cellular gene expression and genome annotation. Indeed programmed -1 ribosomal frameshifting, stop codon readthrough and internal ribosome entry - now known to be important for the expression of certain cellular genes - were all first identified and studied in viruses. We are interested in finding new non-canonical translation mechanisms in viruses, characterizing mechanisms and associated sequence motifs, and searching for related instances in cellular genes.
Protein-directed ribosomal frameshifting temporally regulates gene expression. Nat Commun 2017
Transcriptional slippage in the positive-sense RNA virus family Potyviridae. EMBO Rep 2015
An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci U S A 2008
Reviews:
Non-canonical translation in RNA viruses. J Gen Virol 2012

Ribosome profiling (Ingolia, 2016) is an emerging technique which relies on the fact that a translating ribosome protects ~30 nucleotides of messenger RNA (mRNA). An RNA nuclease is used to digest unprotected RNA, the 30-nt ribosome-protected fragments (RPFs) are purified for deep sequencing, and the sequenced fragments are mapped back to the transcriptome, to give a global snapshot of protein synthesis in the cell. Ribosome profiling has proven to be increasingly valuable in studies of the translation process, for example, in the discovery of novel translated open reading frames (ORFs), determination of elongation rates, and identification of sites of ribosome pausing. It also has broad application in the analysis of global protein synthesis and has been exploited in studies of infectious diseases, cell growth, differentiation and development, mitochondrial gene expression, and cell stress. In collaboration with Betty Chung and the Brierley Lab (especially Nerea Irigoyen) we have been applying ribosome profiling and parallel RNA-seq to a variety of virus species to study the kinetics of virus gene expression, unusual transcriptional and translational mechanisms employed by viruses, and host responses to virus infection at both the transcriptional and translational levels.
Protein-directed ribosomal frameshifting temporally regulates gene expression. Nat Commun 2017
Maturation of selected human mitochondrial tRNAs requires deadenylation. Elife 2017
![]() |
![]() |
![]() |
While viruses of humans, livestock, angiosperms and some model organisms have been comparatively well-studied, RNA viruses of most invertebrate and protist phyla have barely been studied at all. Such viruses tend to be related to known virus groups but are nonetheless often highly divergent. Divergent viruses provide a broad phylogenetic baseline for studying virus taxonomy and evolution, and for comparative analyses of economically and medically important species. Studies of such viruses may also reveal novel molecular mechanisms. Together with several collaborators, we are using high-throughput sequencing and bioinformatic approaches to identify novel divergent viruses and characterize their gene expression mechanisms.
Polycipiviridae: a proposed new family of polycistronic picorna-like RNA viruses. J Gen Virol 2017
A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses. Virology 2016
![]() |
![]() |
![]() |
![]() |




![]() | Andrew E. Firth (CV) | Wellcome Trust Senior Research Fellow Professor in Virus Bioinformatics |
![]() | Hazel Stewart | Post doctoral research associate (virology) |
![]() | Nina Lukhovitskaya | Post doctoral research associate (virology) |
![]() | Katy Brown | Post doctoral research associate (bioinformatics) |
![]() | Samantha Nguyen | PhD student (virology) |
![]() | Ingrida Olendraite | Post doctoral research associate (bioinformatics) |
![]() | Ji Wang | PhD student (bioinformatics) |
![]() | Georgia Cook | Post doctoral research associate (bioinformatics) |
![]() | Valeria Lulla | Post doctoral research associate (virology) |
![]() | Rhian O'Connor | PhD student (virology) |
![]() | Charlotte Tumescheit | PhD student (bioinformatics) |
![]() | Tingting Zhang | Visiting Scientist (molecular biology) |
![]() | Yousuf Khan | MPhil student (molecular biology) |
![]() | Adam Dinan | Post doctoral research associate (bioinformatics) |
![]() | Lucas Ferguson | MPhil student (virology/bioinformatics) |
![]() | Susanne Bell | Chief research laboratory technician (virology) |
![]() | Krzysztof Franaszek | PhD student (bioinformatics) |
![]() | Yanhua Li | Visiting Scientist (Ying Fang's lab; virology) |
![]() | Roger Ling | Post doctoral research associate (virology) |
![]() | Leanne Finch | PhD student (virology) |
![]() | Allan Olspert | Post doctoral research associate (plant virology) |
| Ed Long (fall 2022; virology) |
| Boyan Wang (fall 2021; bioinformatics) |
| Bill Taylor (fall 2020; bioinformatics) |
| Filip Lastovka (fall 2019; bioinformatics) |
| Gemma Lindsey (fall 2018; virology) |
| Charlene Tang (summer 2018; virology/bioinformatics) |
| Eleni Kleanthous (fall 2017; virology/bioinformatics) |
| Robert Arculus (summer 2017; computational biology) |
| Eleni Kleanthous (summer 2017; bioinformatics) |
| Ben Butt (fall 2016; virology) |
| Ji Wang (summer 2016; bioinformatics) |
| Luis de Haro (summer 2015; plant virology) |
| Johannes Kangur (fall 2014; virology) |
| Chris Boursnell (summer 2010; bioinformatics) |
Enteroviruses comprise a large group of mammalian pathogens that includes poliovirus. Pathology in humans ranges from sub-clinical to acute flaccid paralysis, myocarditis and meningitis. Until now, the eleven viral proteins were thought to derive from proteolytic processing of a single polyprotein encoded in a long open reading frame (ORF) encompassing most of the ~7500-nucleotide virus genome. By analyzing patterns of nucleotide substitutions among >3000 enterovirus genome sequences, we were able to predict that a short upstream open reading frame (uORF) encodes an additional virus protein, termed UP ("uORF protein"). Using a range of classical molecular biology and virology techniques, we showed that the protein is produced during infection with a model enterovirus, echovirus 7. We also studied ribosome occupancy of the uORF in exquisite detail via ribosome profiling - a high throughput sequencing technique for footprinting translating ribosomes at single-nucleotide resolution. In collaboration with Nicola Stonehouse's group (University of Leeds), we were able to extend these studies to poliovirus type 1 (the most common poliovirus serotype) and EV-A71 (one of the major causative agents of hand, foot and mouth disease). To investigate the function of the UP protein, we turned to a differentiated human intestinal organoid system in collaboration with the groups of Matthias Zilbauer (Department of Paediatrics) and Ian Goodfellow (Division of Virology). This work revealed that the UP protein facilitates virus release from membranous components during viral growth in gut epithelia - the initial site of viral invasion into susceptible hosts. These findings overturn the 50-year-old dogma that enteroviruses use a single-polyprotein gene expression strategy, and have important implications for understanding enterovirus pathogenesis and vaccine design.
Collaboration with Nicola Stonehouse, Matthias Zilbauer and Ian Goodfellow.
![]() |
![]() |
![]() |
Programmed -1 ribosomal frameshifting is a mechanism of gene expression, whereby specific signals within messenger RNAs direct a proportion of translating ribosomes to shift -1 nt and continue translating in the new reading frame. Such frameshifting normally occurs at a set ratio and is utilized in the expression of many viral genes and a number of cellular genes. An open question is whether proteins might function as trans-acting switches to turn frameshifting on or off in response to cellular conditions. Here we show that frameshifting in a model RNA virus, encephalomyocarditis virus, is trans-activated by viral protein 2A. As a result, the frameshifting efficiency increases from 0 to 70% (one of the highest known in a mammalian system) over the course of infection, temporally regulating the expression levels of the viral structural and enzymatic proteins.
Collaboration with Ian Brierley.
![]() |
![]() |
Ribosome profiling is emerging as a powerful technique to monitor translation in living cells at sub-codon resolution. We carried out almost (see Khong et al., 2016) the first ribosome profiling analysis of an RNA virus, using as a model system murine coronavirus, a betacoronavirus in the same genus as the medically important SARS-CoV and MERS-CoV. Parallel ribosome profiling and high-throughput RNA sequencing of infected tissue culture cells allowed us to monitor virus gene expression kinetics and the relative translational efficiencies of virus and host mRNAs. The sensitivity and precision of the approach permitted us to uncover several unanticipated features of coronavirus translation, giving insights into ribosomal frameshifting, ribosome pausing, and the utilisation of short, potentially regulatory, upstream open reading frames. We also identified various challenges associated with application of the technique to virus infection samples and developed bioinformatic strategies to address these.
Collaboration with Nerea Irigoyen, Betty Chung, Ian Brierley.
![]() |
![]() |
The family Potyviridae encompasses around 30% of known plant virus species and causes more than half of viral crop damage worldwide. All the viral proteins were thought to be encoded within a single open reading frame (ORF) that is translated as a polyprotein and cleaved to produce the mature virus proteins. Around 2006, using comparative genomic techniques, we identified a novel overlapping ORF (termed PIPO, "Pretty Interesting Potyviridae ORF") in a central region of the virus genome and, subsequently, Betty Chung verified that PIPO is essential for virus infectivity and is expressed as a fusion with part of the polyprotein via some form of frameshifting (Chung et al., 2008, PNAS). Recently, Allan Olspert demonstrated that PIPO is expressed as a fusion with the N-terminal part of the P3 protein (giving P3N-PIPO) and that this is a result of programmed insertional slippage, at a level of ~2%, by the virus polymerase (Olspert et al., 2015, EMBO Rep). P3N-PIPO is essential for virus cell-to-cell movement.
Collaboration with John Carr, Allen Miller, John Atkins.
![]() |
![]() |
![]() |
![]() |
![]() |
Collaboration with Ian Brierley, John Atkins.
![]() |
![]() |
Collaboration with Véronique Ziegler-Graff, Allen Miller.
![]() |
![]() |
![]() |
Collaboration with Eric Snijder, Ying Fang, Ian Brierley, John Atkins.
![]() |
![]() |
![]() |
Collaboration with Paul Diggard, Jeff Taubenberger, John Atkins.
![]() |
![]() |
![]() |
Collaboration with John Carr.
![]() |
Collaboration with Alex Khromykh, Brad Blitvich, John Atkins, Ernie Gould and others.
![]() |
![]() |
![]() |
![]() |
Collaborations with John Atkins, Marina Fleeton, Ilya Frolov.
![]() |
![]() |
![]() |
![]() |
![]() |
Collaboration with Shelley Cook, David Bass, Betty Chung.
![]() |
| Web interface | Source code | Brief description | Citation(s) |
|---|---|---|---|
| SynPlot2 | SynPlot2.zip | For finding regions of enhanced synonymous site conservation in coding-sequence alignments. Useful for identifying overlapping genes and overlapping non-coding elements such as functionally important RNA structures. | PMID 25326325 |
| MLOGD | For analyzing the coding potential of sequence alignments; essentially a glorified Ka/Ks statistic. Works best for non-overlapping genes. (Somewhat dated.) | PMIDs 16483358, 15347574 | |
| GLUE, PEDEL, DRIVeR | Programmes for calculating diversity in randomized protein-encoding libraries under a variety of protocols. | PMIDs 18442989, 15932904, 12874379 | |
| Manual | Programs, plot scripts, etc | Our lab's manual for Ribo-Seq data analysis with a specific focus on virus infection data. | PMIDs 26919232 |