ATP cost for gene expression

ATP cost for gene expression

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

How would you estimate the number of ATPs required to transcribe, export and translate a single eukariotic protein?

The cost of transcribing and translating a hypothetical average gene in yeast has been calculated as 551 activated phosphate bonds ~P per second (Wagner, 2005).

The median length of a yeast RNA molecule is 1,474 nucleotides, and the median cost of precursor synthesis per nucleotide (derived from the base composition of yeast-coding regions) is 49.3 ∼P. With a median mRNA abundance of R = 1.2 mRNA molecules per cell and a median mRNA decay constant of dR = 5.6 × 10−4 s−1, the mRNA synthesis costs calculates as 49.3 × 1,474 × 1.2 × (5.6 × 10−4) = 48.8 ∼P per second and cell. This is a fraction 48.8/1.34 × 107 = 3.6 × 10−6 of the total RNA synthesis cost per second. The median length of a yeast protein is 385 amino acids, with a combined biosynthesis and polymerization cost of 30.3 ∼P per amino acid. The median abundance is 2,460 protein molecules per cell. No currently available data allows a meaningful estimate of the median protein half-life, but a protein of an intermediate half-life (see below) of 10 h (decay constant dP = 1.92 × 10−5 s−1) yields an overall synthesis cost of 30.3 × 385 × 2,460 × (1.92 × 10−5) = 551 ∼P s−1.

For your question about a single gene, the cost would be 49.3 * 1474 ~P for the mRNA and 30.3 * 385 ~P for the translation, which would result in around 84 thousand ~P. This is probably a very misleading statistic as you can transcribe multiple proteins from a single mRNA.

How the cost of mRNA synthesis and translation are calculated is described in detail in the paper. A large part of the cost comes from the synthesis of the basic building blocks, the nucleotides and the amino acids.

  • Wagner, A. Energy Constraints on the Evolution of Gene Expression. Mol Biol Evol 22, 1365-1374 (2005).

ATP cost for gene expression - Biology

The high cost of microalgal cultivation has hindered exploitation of their advantages for sustainable production of green chemicals and biomass. Nevertheless, recent advances in the field of synthetic biology could help to overcome the associated bottlenecks.

Improving reducing power generation and carbon influx will be crucial for attaining an overall improvement in microalgal productivity.

Enhancing light absorption, in conjunction with techniques to swiftly channel electrons through the electron transport chain, could enhance the generation of reducing power.

The Calvin–Benson–Bassham (CBB) cycle might not be the best CO2 fixation pathway, and other natural and synthetic pathways may outperform the CBB cycle. However, implementing these entire pathways in new hosts will be very challenging.

Mixotrophic cultivation and microbial electrosynthesis could be implemented as an additional source of energy and carbon to improve microalgal productivity.

The major bottleneck in commercializing biofuels and other commodities produced by microalgae is the high cost associated with phototrophic cultivation. Improving microalgal productivities could be a solution to this problem. Synthetic biology methods have recently been used to engineer the downstream production pathways in several microalgal strains. However, engineering upstream photosynthetic and carbon fixation metabolism to enhance growth, productivity, and yield has barely been explored in microalgae. We describe strategies to improve the generation of reducing power from light, as well as to improve the assimilation of CO2 by either the native Calvin cycle or synthetic alternatives. Overall, we are optimistic that recent technological advances will prompt long-awaited breakthroughs in microalgal research.


Lactic acid bacteria (LAB) ferment plants, fish, meats and milk and turn them into tasty food products with increased shelf life other LAB help digesting food and create a healthy environment in the intestine. The economic and societal importance of these relatively simple and small bacteria is immense. In this review we hope to show that their adaptations to nutrient-rich environments provides fascinating and often puzzling behaviours that give rise to many fundamental evolutionary biological questions in need of a systems biology approach. We will provide examples of such questions, compare the (metabolic) behaviour of LAB to that of other model organisms, and provide the latest insights, if available.

ATP cost for gene expression - Biology

Biological molecules (AQA AS Biology) PART 6 of 8 TOPICS

ATP (Adenosine triphosphate) is a single molecule made up of adenine, ribose and three inorganic phosphate groups.

ATP is made in a condensation reaction between ADP (Adenosine diphosphate) and and inorganic phosphate group (Pi) and is catalysed by the enzyme ATP synthase. The reverse reaction is hydrolysis is catalysed by the enzyme ATP hydrolase. NB: A tip on remembering what each enzyme does to avoid confusion Synthase sounds like synthesis which means to make something therefore ATP Synthase helps to make ATP. As one enzyme makes ATP the other breaks it.

Hydrolysis can be coupled to energy required reactions such as respiration within cells which is covered in a powerpoint made for A2 AQA Biology.

The inorganic phosphates can be used to phosphorylate other compounds often to make them more reactive.

] As you can see, ATP is quite a small topic at AQA. This is all based on the exam boards spec which is explained in a little more detail to help you understand the points which are only said briefly on the specification [


Two major kinds of genetic change can affect rates at which genes are expressed. The first is mutation in regulatory regions that affects either transcription or translational efficiency. The second is gene duplication. If a gene duplication creates identical copies of a gene and its regulatory region, then the initial effect of the duplication is effectively a doubling in gene expression. Both kinds of genetic change play major roles in biological evolution. For example, a growing number of genome-scale expression studies reveal that substantial genetic variation in messenger RNA (mRNA) expression levels exists within populations, among populations, and among closely related species ( Oleksiak, Churchill, and Crawford 2002 Townsend, Cavalieri, and Hartl 2003 Fay et al. 2004 Wittkopp, Haerum, and Clark 2004). Similarly, genome sequence analysis has shown that single-gene duplications occur at substantial rates in eukaryotic genomes ( Lynch and Conery 2000 Gu et al. 2002). Between 30% and 50% of a eukaryotic genome's gene content consists of duplicated genes ( Rubin et al. 2000 Conant and Wagner 2002). These observations underscore the evolutionary importance of single-gene duplication and the ensuing expression changes.

Increases in gene expression incur energy costs. The central question I pose here is whether these costs are substantial enough to affect the reproduction rate of organisms where rapid cell division is important for evolutionary persistence—most notably microbes. To be sure, rapid cell division in a nutrient-rich environment is only one among multiple factors influencing a microbe's success in surviving and reproducing in the wild. Other factors include surviving starvation, drastic temperature fluctuations, and osmotic shocks. Nonetheless, the evolutionary importance of rapid cell division is indicated by codon usage patterns that allow rapid protein synthesis when nutrients are abundant ( Sharp and Li 1986 Akashi and Gojobori 2002).

Maximizing the energy available to cells for biosyntheses, growth, and division is essential for rapid cell division. This is vividly illustrated by recent work that analyzes how the input into a metabolic reaction network and the output produced by the network can affect cell growth ( Ibarra, Edwards, and Palsson 2002 Segre, Vitkup, and Church 2002). However, we currently do not know whether gene expression changes in most genes would affect a cell's energy budget substantially enough to change cell division rates. For one thing, the question is almost impossible to address experimentally. One reason is that an experimental manipulation of gene expression may well carry energy costs, but the changed concentration of a gene product may also affect important biological processes. The two effects are very difficult to disentangle. A second reason is that tiny differences in growth rates, of the order of 10 −7 , much smaller than can be measured in the laboratory, can affect the fate of mutants in microbes with large population sizes ( Hartl and Clark 1997).

I here estimate the cost of changing RNA and protein expression relative to the total energy cost of gene expression in the eukaryotic microbe Saccharomyces cerevisiae. In order to do so, I use information on the energy cost (in activated phosphate bonds, ∼P) of synthesizing the nucleotide building blocks of mRNA and the amino acid building blocks of proteins, as well as genome-scale information on the abundances and decay rates of mRNA and proteins. I relate these cost estimates to a critical selection coefficient s, estimated from the amount of nucleotide polymorphisms in the closest wild relative of S. cerevisiae ( Johnson et al. 2004). The fate of mutations that change the energy budget by an amount smaller than this critical s is dominated by genetic drift. I show that the doubling of both protein and RNA expression, as might be caused by a gene duplication, carries energy costs much higher than s for all yeast genes for which expression data is available. Most assumptions I make in these estimates are conservative, such that improved data will likely show the actual energy costs of gene expression to be higher than what my results suggest. If true in general for microbes, this means that substantial changes in mRNA and protein synthesis rates, as well as gene duplications, can only go to fixation in a population when they provide an advantage sufficiently great to override these costs.

The regulation of metabolic activity by tuning enzyme expression levels is crucial to sustain cellular growth in changing environments. Metabolic networks are often studied at steady state using constraint-based models and optimization techniques. However, metabolic adaptations driven by changes in gene expression cannot be analyzed by steady state models, as these do not account for temporal changes in biomass composition.

Here we present a dynamic optimization framework that integrates the metabolic network with the dynamics of biomass production and composition. An approximation by a timescale separation leads to a coupled model of quasi-steady state constraints on the metabolic reactions, and differential equations for the substrate concentrations and biomass composition. We propose a dynamic optimization approach to determine reaction fluxes for this model, explicitly taking into account enzyme production costs and enzymatic capacity. In contrast to the established dynamic flux balance analysis, our approach allows predicting dynamic changes in both the metabolic fluxes and the biomass composition during metabolic adaptations. Discretization of the optimization problems leads to a linear program that can be efficiently solved.

We applied our algorithm in two case studies: a minimal nutrient uptake network, and an abstraction of core metabolic processes in bacteria. In the minimal model, we show that the optimized uptake rates reproduce the empirical Monod growth for bacterial cultures. For the network of core metabolic processes, the dynamic optimization algorithm predicted commonly observed metabolic adaptations, such as a diauxic switch with a preference ranking for different nutrients, re-utilization of waste products after depletion of the original substrate, and metabolic adaptation to an impending nutrient depletion. These examples illustrate how dynamic adaptations of enzyme expression can be predicted solely from an optimization principle.

Extracellular ATP activates multiple signalling pathways and potentiates growth factor-induced c-fos gene expression in MCF-7 breast cancer cells

In the human breast cancer cell line MCF-7, the nucleotides ATP gamma S and UTP, acting extracellularly through the purinergic receptor P2Y(2), lead to elevated intracellular calcium levels and increased proliferation. ATP gamma S and UTP treatment of MCF-7 cells activated transcription of the immediate early gene c-fos, an important component in the response to proliferative stimulation. c-fos induction was enhanced by co-treatment with ATP gamma S and a variety of proliferative agents including growth factors, tumour promoters and stress. Stimulation with ATP gamma S or epidermal growth factor (EGF) led to extracellular signal-regulated kinase (ERK) activation and phosphorylation of the transcription factors CREB and Elk-1. Co-stimulation synergistically activated fos expression and notably led to increased levels of ERK, CREB and EGF receptor phosphorylation, as well as hyperphosphorylation of ternary complex factor. Nevertheless, the ERK pathway does not fully account for this synergy, since fos induction was differentially sensitive to the MEK inhibitor U0126, indicating that these two agonists signal differently to this immediate early gene. Thus, extracellular nucleotides co-operate with growth factors to activate genes linked to the proliferative response in MCF-7 cells through activation of specific purinergic receptors, which thereby represent important potential targets for arresting the neoplastic progression of breast cancer cells.

Results and discussion

Comparison between different cost functions

Cost-optimal flux simulations were performed using four different types of cost: (1) molecular weights alone, (2) thermodynamic penalties alone, (3) the previously mentioned combination of molecular weights and thermodynamic penalties and (4) uniform costs. The use of uniform costs gives a simple minimization of the overall flux through the enzyme associated reactions. This minimization strategy is a modified version of a widely regarded two-step optimization method called pFBA [12]. pFBA has been previously used to predict unique flux distributions at the predicted FBA optima.

Simulations comparing these four costs have been performed for growth rates ranging from 50 to 100% of the predicted optima. Flux distributions were normalized by the flux of glucose to glucose-6-phosphate, and results for several central carbon metabolism reactions can be found in the supplemental information (Additional file 7: Figure S2). While simulations using molecular weights, thermodynamic costs, or a combination of both present considerably different flux distributions at different sub-optimal values, simulations using uniform costs quickly converge to a unique flux distribution as the objective value is lowered.

The same set of simulations was also compared to several data points from two Metabolic Flux Analysis (MFA) experiments: Ishii et al. [61] and Yao et al. [62]. Correlation and sum of squared error between all simulated and experimental flux distributions for several central carbon metabolism reactions were calculated. These results are presented in the supplemental information (Additional file 8: Figure S3). A similar method to that used by Holzhütter [28] was also included in Additional file 8: Figure S3. While pFBA simulations yield the highest correlation values in the suboptimal space when compared to the Ishii et al. dataset, these same simulations also yield the highest sum of squared error. Furthermore, when the objective function was lowered from 100% down to 65% of optima, corsoFBA yields the lowest sum of squared error for all data points except one (Ishii et al. with dilution of 0.7h -1 ), and the highest correlation values for the higher dilution rates in the Yao et al. data (dilution = 0.4, 0.6 and 0.7h -1 ).

Comparison between metabolic flux analysis and simulated fluxes

In order to visualize the trend of how different fluxes change in the sub-optimal space, simulations utilizing molecular weight costs, thermodynamic costs and a combination of both, normalized again by the flux of glucose to glucose-6-phosphate, were plotted alongside the experimental values from the two MFA experiments. Fluxes were plotted starting at 100% down to 75% of the predicted optima, and the results for several central carbon metabolism reactions are presented in Figure 2. The threshold of 75% was chosen in order to match the trend observed in our simulations to the experimental data [61,62].

Comparison between simulations and MFA experiments. Comparison between selected simulated fluxes using molecular weights, thermodynamic penalties, and a combination of the two costs, and Metabolic Flux Analysis (MFA) data. Fluxes are normalized by glucose to glucose-6-phosphate conversion rates. X-axes values are kept constant for all plots. Reaction names are taken from the Ecoli iJR904 model [52]. Yellow boxes indicate Pentose Phosphate Pathway (PPP), blue indicates glycolysis, green indicates TCA cycle and red indicates other reactions

As the objective function value decreases, corsoFBA flux distributions show close agreement with MFA data for higher growth rates (Figure 2), particularly for simulations considering both thermodynamic and molecular weight costs. Fluxes through the Pentose Phosphate Pathway (PPP) and Glycolysis remain relatively constant, while the flux through the TCA cycle gradually decreases. The decrease in TCA cycle usage leads to higher levels of acetate release. Quantitatively, the fluxes through glycolysis and the TCA cycle are slightly under-predicted by our simulations, while the flux through the PPP is generally over-predicted. Similar patterns have been observed in other FBA method approaches [33,45,48], and this may arise due to the optimization of the biomass function alone. The optimization of ATP production alongside biomass has been shown to yield better predictions [16,63]. ATP production optimization could also lead to higher fluxes through the TCA cycle and lower PPP fluxes. It is also worth noting that, although fluxes through the TCA cycle are under-predicted according to Figure 2, MFA experimental results can be inconsistent, and our predicted TCA fluxes show good agreement with two other MFA studies [64,65].

These simulations mirror a behavior known as overflow metabolism [66], where E. coli, at high growth rates, moves away from the full oxidation of glucose through the TCA cycle and uses less ATP efficient pathways, releasing acetate instead of CO 2. Our simulations support the hypothesis that this behavior stems from a tradeoff between enzyme cost and energy yield [40]. That is, when more glucose is available, E. coli uses a higher flux through pathways that are less enzymatically costly, but which produce fewer ATP per mmol of glucose.

Predicted fluxes using molecular weights only and a combined cost also found excellent agreement with the experimental values in the reactions PPC (PEP Carboxylase) and ICL (Isocitrate Lyase)[61,62]. Both experiments show a decrease in ICL flux and an increase in PPC flux, from near-optimal conditions to a growth rate of 0.5 h -1 . This result is further supported by the experimental results from Nanchen et al. [67], that found a lower flux through ICL during a growth rate of approximately 0.05 h -1 when compared to a growth rate of 0.1 h -1 , suggesting a transient response through this pathway as glucose concentration increases. The same transient response was found in our simulations. These results suggest that E. coli utilizes these anaplerotic reactions (reactions responsible for forming intermediates for metabolic pathways) to relieve enzymatic cost, and that considering these costs during FBA may increase internal flux predictions when comparing to the minimization of metabolic steps alone.

Simulation results also suggest that the optimization of the biomass function may yield fluxes that are fundamentally different from those at near-optimal conditions. At optimal growth, there is no predicted flux through the glyoxylate shunt, considerably lower fluxes through the TCA cycle and PPP, and a flux through the PPC reaction not present at near-optimal conditions. Although comparisons between experimental data and simulated flux distributions show that the highest correlation and lowest error are found at optimal growth, our simulations also indicate that the implementation of these costs yield comparable results in the sub-optimal space (Additional file 8: Figure S3). Moreover, the suboptimal corsoFBA approach can better predict fluxes through certain reactions, such as PPC and ICL. While previous studies have shown that the FBA solution space increases drastically when considering the objective function at near-optimal to optimal values [18], here we show that the optimization of the enzymatic cost at near-optimal conditions yields results that are more consistent with experimental data for certain reactions. This result strongly suggests that exploring the FBA solution space at near-optimal to optimal objective values may increase the predictive accuracy of internal cell fluxes.

The near optimal solution space was also analyzed for knockout strains. Flux distributions for six E. coli knockout strains reported by Ishii et al. [61] were compared to cost-optimal simulations using the combined cost function. Results show that cost-optimal flux distribution from 95% to optimal yield similar or better predictions according to both correlation and sum of square error (Additional file 9: Figure S4). Predictions using uniform costs quickly diverge from experimental data. These simulations further support the “cloud theory” proposed by Wintermute et al. [19], and suggest that the experimental data can also be in good agreement with near-optimal flux distributions.

Comparison between gene expression data and simulated fluxes

To further validate this analysis, simulations using the combined cost function were also compared to gene expression values reported in the MFA studies [61,62]. While the overall correlation between reaction flux and gene expression is moderate at best [34], increased expression of all genes participating in a particular pathway can be considered a good indication of increased flux [68]. Simulations were performed under the same conditions as before, utilizing the combination of molecular weights and thermodynamic penalty, but this time the flux distribution was normalized by the overall flux through the enzyme associated reactions. Comparisons between simulated values and gene expression data are shown in Figure 3.

Comparison between simulations and gene expression data. Comparison between selected simulated fluxes, using a combination of molecular weights and thermodynamic penalties, and associated gene expression. Plot axes are the same as the ones defined in the ptsG plot unless otherwise specified. Multiple genes associated with the same reaction are included in the same box. Reaction names are taken from the E. coli iJR904 reconstruction [52]. Yellow boxes indicate Pentose Phosphate Pathway, blue indicates glycolysis, green indicates TCA cycle and red indicates other reactions

Although gene expression data can also be inconsistent between experiments, and some reactions are associated with multiple genes, this comparison further supports the qualitative results presented in the previous section. When normalizing the predicted fluxes by the overall flux through the enzyme associated reactions, an increased relative uptake of glucose is observed as the objective value decreases. As a result, an increased relative flux through glycolysis is also predicted. Furthermore, the relative flux through the TCA cycle still decreases as the growth rate increases. The increase in relative glucose uptake and flux through glycolysis, as well as the decreased flux through the TCA cycle, are supported by the relative gene expression associated with these pathways [68].

Simulated values also find good agreement with fluxes through the Entner-Doudoroff (ED) pathway. Genes edd and eda, associated with this pathway, have long been known to exist in E. coli [69], but the activity of their associated reactions has been observed mainly under growth on gluconate, glucuronate, and methyl-beta-D-glucuronide phosphate limitation and carbon starvation [70]. Due to this fact, these reactions are generally not included in MFA experiment networks. In contrast with Murray et al., our simulations using molecular weight and combined costs predict the use of the ED pathway under excess glucose conditions, not starvation (Additional file 7: Figure S2). These predictions are supported both by the genetic data presented by the MFA studies [61,62], which show an increase in edd and eda expression at high glucose concentrations, and MFA experimental results by Harcombe et al. [71], where fluxes through the ED pathway were measured in bacteria growing at growth rates above 1 h -1 .

Fundamental pathways analysis

To better understand the transition between metabolic states as the value of the objective function is decreased, the energy producing pathways of the E. coli iJR904 reconstruction were decomposed using the Fundamental Pathways analysis described in the Methods section. Details on how these reactions were calculated can be found in the supplemental information (Additional file 5 and Additional file 6: Table S2). Briefly, starting with the ATP imbalance associated with each fundamental pathway, the total ATP potential of each pathway was estimated based on the imbalance of other energy generating metabolites, such as NADH and Ubiquinol-8. The lowest cost of converting these metabolites into ATP was then added to the total cost associated with the pathway. The final ATP production by enzymatic cost was then compared to the potential ATP production by mmol of glucose (Figure 4).

Fundamental pathway analysis. ATP production potential of energy producing fundamental pathways, compared to the associated protein cost of each mmol of ATP. Optimal (OP) and Near Optimal (NOP) Pathways are highlighted in the scatter plot. OPs are plotted in blue and NOPs in red in the central carbon metabolism diagrams

In accordance with general E. coli metabolism knowledge, the fundamental pathway analysis found two optimal pathways (OP), according to ATP production by both mmol of glucose and protein cost. The most energy productive pathway is the full oxidation of glucose through the TCA cycle (O P 2). The most cost efficient pathway was found to be the production of acetate through glycolysis (O P 1). Although this pathway potentially produces less than half the amount of ATP as O P 2, the total enzymatic cost per mmol of ATP is lower. Also in agreement with general knowledge, this analysis predicted the release of acetate to be more efficient than the release of both lactate and ethanol.

In the simulations presented in previous sections, a gradual transition from O P 2 to O P 1 is observed as the objective function value decreases. This trend also supports the idea that overflow metabolism takes place in order to alleviate enzymatic cost. Under low glucose conditions, E. coli would need to extract from glucose the largest possible amount of energy in order to sustain growth. As more glucose becomes available, however, this bacteria can afford to consume more substrate and produce energy through cheaper pathways.

One interesting observation from this analysis is the existence of Near Optimal Pathways (NOP), which combine the previously described optimal pathways with the PPP or ED pathways. N O P 2 and N O P 3 both produce more ATP per mmol of glucose than O P 1 but less than O P 2, while remaining less cost efficient than O P 1 but slightly more so than O P 2. N O P 2 and N O P 3 reduce the flux through upper glycolysis by partially re-routing the flux through the PPP, and they exhibit no flux through Phosphoglucose isomerase (pgi). A third Near Optimal Pathway, termed N O P 1, skips the upper glycolysis through the ED pathway. This pathway is both less cost and energy effective than O P 1.

The use of these near optimal pathways could explain the inconsistent PPP fluxes reported by the MFA experiments considered here. A short recovery in the simulated PPP flux near a growth rate of 0.65 h −1 (Figure 2) demonstrates that this pathway is a viable option at certain dilution rates, due to its coupling with the PPP and biomass production. It has also been demonstrated that ΔpfkA deficient E. coli strains reduce flux through Phosphofructokinase 1 by diverting fluxes through the PPP [72], hence using N O P 2 and N O P 3 instead of O P 1 or O P 2 when the cost through upper glycolysis is increased.

This Fundamental Pathway analysis elucidates how we would expect the model to behave in terms of energy generation as we optimize the protein cost at sub-optimal states. Based on this analysis, in order to fulfill its catabolic needs, the model transitions from O P 2 to O P 1 as we move away from optimal growth. Although this is in fact the general trend we observe, anaplerotic needs also need to be considered. These needs are addressed most efficiently by reactions not in O P 1 or O P 2, such as PEP carboxylase and the glyoxylate shunt. Interestingly, the model also predicts the use of the ED pathway with anaplerotic purposes at high growth rates. While this pathway has been studied mostly for its catabolic activity in Z. mobilis and several Pseudomonas species, simulations here predict this pathway to be the cheapest way to get glucose shuttled to the TCA cycle for the production of building blocks, while giving up little efficiency in energy pathways through N O P 1.

Author information

These authors contributed equally: Enzo Marinari and Andrea De Martino


Department of Physics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA

Dipartimento di Fisica, Sapienza Università di Roma, Piazzale Aldo Moro 2, Rome, 00185, Italy

INFN, Sezione di Roma 1, Piazzale Aldo Moro 2, Rome, 00185, Italy

Soft & Living Matter Lab, Institute of Nanotechnology (CNR-NANOTEC), c/o Dipartimento di Fisica, Sapienza Università di Roma, Piazzale Aldo Moro 2, Rome, 00185, Italy

Italian Institute for Genomic Medicine, via Nizza 52, Turin, 10126, Italy

Gene Ontology: tool for the unification of biology

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web ( are being constructed: biological process, molecular function and cellular component.

The accelerating availability of molecular sequences, particularly the sequences of entire genomes, has transformed both the theory and practice of experimental biology. Where once biochemists characterized proteins by their diverse activities and abundances, and geneticists characterized genes by the phenotypes of their mutations, all biologists now acknowledge that there is likely to be a single limited universe of genes and proteins, many of which are conserved in most or all living cells. This recognition has fuelled a grand unification of biology the information about the shared genes and proteins contributes to our understanding of all the diverse organisms that share them. Knowledge of the biological role of such a shared protein in one organism can certainly illuminate, and often provide strong inference of, its role in other organisms.

Appendix 3: Expression of highly-expressed and “unnecessary” genes after laboratory evolution

O’Brien and colleagues reported that investment in “utilized” proteins (which corresponds roughly to our essential or important proteins) often increased after laboratory evolution of E. coli in glucose minimal media [17]. This implies that investment in unimportant genes went down. To confirm this, we estimated the proportion of total mRNA expression for each gene as RPKM (reads per kilobase per million) [44] times the length in nucleotides. (To compute the total mRNA expression, we included only the protein-coding genes.) Total expression of important or essential genes was 49–51% of mRNA in the two wild type samples. In 7/8 evolved lines, this proportion increased, to 54–61%. In the other evolved line (strain 8), 48% of mRNA was for important proteins, or slightly less than in wild type strains. Strain 8 was also an outlier in the analysis of O’Brien and colleagues. Not surprisingly, in the seven evolved lines with increased expression of important proteins, the total expression of all “unnecessary” proteins was reduced, from 36–38% in wild type to 23–31%. If we focus on just the 106 highly-expressed proteins that we believe are on standby, then their expression was 11–13% of mRNA in the wild type samples this decreased in the seven lines, to 6–8%.

Although the mRNA expression of the “standby” proteins went down, we do not believe that there was selection specifically to reduce the expression of these genes. In particular, the standby genes do not seem to be downregulated more than other non-important proteins. If we compute a log2 fold change for each protein, and we normalize these so that the median protein has a value of zero, then the 106 standby genes were, on average, downregulated relative to the median gene in just 2 of 8 lines (P < 0.05, Wilcoxon rank sum test median log2 fold changes were −0.40 or −0.26 in these lines). And the 106 genes were significantly upregulated in 1 of the 8 lines (P < 0.05 median log2 change of +0.22 in strain 7B). In other words, the highly-expressed “standby” genes were down-regulated about as much as other unnecessary genes. This suggests the importance of more global regulatory changes rather than mutations that affect the expression of individual genes. Indeed, two of the three most common mutations in these evolved lines affected global regulators: these were misssense mutations in rpoB and an insertion element that increased the expression of hns [44].