10.2: Overview of Metabolic Reactions - Biology

10.2: Overview of Metabolic Reactions - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Learning Objectives

By the end of this section, you will be able to:

  • Describe the process by which polymers are broken down into monomers
  • Describe the process by which monomers are combined into polymers
  • Discuss the role of ATP in metabolism
  • Explain oxidation-reduction reactions
  • Describe the hormones that regulate anabolic and catabolic reactions

Metabolic processes are constantly taking place in the body. Metabolism is the sum of all of the chemical reactions that are involved in catabolism and anabolism. The reactions governing the breakdown of food to obtain energy are called catabolic reactions. Conversely, anabolic reactions use the energy produced by catabolic reactions to synthesize larger molecules from smaller ones, such as when the body forms proteins by stringing together amino acids. Both sets of reactions are critical to maintaining life.

Because catabolic reactions produce energy and anabolic reactions use energy, ideally, energy usage would balance the energy produced. If the net energy change is positive (catabolic reactions release more energy than the anabolic reactions use), then the body stores the excess energy by building fat molecules for long-term storage. On the other hand, if the net energy change is negative (catabolic reactions release less energy than anabolic reactions use), the body uses stored energy to compensate for the deficiency of energy released by catabolism.

Catabolic Reactions

Catabolic reactions break down large organic molecules into smaller molecules, releasing the energy contained in the chemical bonds. These energy releases (conversions) are not 100 percent efficient. The amount of energy released is less than the total amount contained in the molecule. Approximately 40 percent of energy yielded from catabolic reactions is directly transferred to the high-energy molecule adenosine triphosphate (ATP). ATP, the energy currency of cells, can be used immediately to power molecular machines that support cell, tissue, and organ function. This includes building new tissue and repairing damaged tissue. ATP can also be stored to fulfill future energy demands. The remaining 60 percent of the energy released from catabolic reactions is given off as heat, which tissues and body fluids absorb.

Structurally, ATP molecules consist of an adenine, a ribose, and three phosphate groups. The chemical bond between the second and third phosphate groups, termed a high-energy bond, represents the greatest source of energy in a cell. It is the first bond that catabolic enzymes break when cells require energy to do work. The products of this reaction are a molecule of adenosine diphosphate (ADP) and a lone phosphate group (Pi). ATP, ADP, and Pi are constantly being cycled through reactions that build ATP and store energy, and reactions that break down ATP and release energy.

The energy from ATP drives all bodily functions, such as contracting muscles, maintaining the electrical potential of nerve cells, and absorbing food in the gastrointestinal tract. The metabolic reactions that produce ATP come from various sources.

Of the four major macromolecular groups (carbohydrates, lipids, proteins, and nucleic acids) that are processed by digestion, carbohydrates are considered the most common source of energy to fuel the body. They take the form of either complex carbohydrates, polysaccharides like starch and glycogen, or simple sugars (monosaccharides) like glucose and fructose. Sugar catabolism breaks polysaccharides down into their individual monosaccharides. Among the monosaccharides, glucose is the most common fuel for ATP production in cells, and as such, there are a number of endocrine control mechanisms to regulate glucose concentration in the bloodstream. Excess glucose is either stored as an energy reserve in the liver and skeletal muscles as the complex polymer glycogen, or it is converted into fat (triglyceride) in adipose cells (adipocytes).

Among the lipids (fats), triglycerides are most often used for energy via a metabolic process called β-oxidation. About one-half of excess fat is stored in adipocytes that accumulate in the subcutaneous tissue under the skin, whereas the rest is stored in adipocytes in other tissues and organs.

Proteins, which are polymers, can be broken down into their monomers, individual amino acids. Amino acids can be used as building blocks of new proteins or broken down further for the production of ATP. When one is chronically starving, this use of amino acids for energy production can lead to a wasting away of the body, as more and more proteins are broken down.

Nucleic acids are present in most of the foods you eat. During digestion, nucleic acids including DNA and various RNAs are broken down into their constituent nucleotides. These nucleotides are readily absorbed and transported throughout the body to be used by individual cells during nucleic acid metabolism.

Anabolic Reactions

In contrast to catabolic reactions, anabolic reactions involve the joining of smaller molecules into larger ones. Anabolic reactions combine monosaccharides to form polysaccharides, fatty acids to form triglycerides, amino acids to form proteins, and nucleotides to form nucleic acids. These processes require energy in the form of ATP molecules generated by catabolic reactions. Anabolic reactions, also called biosynthesis reactions, create new molecules that form new cells and tissues, and revitalize organs.

Hormonal Regulation of Metabolism

Catabolic and anabolic hormones in the body help regulate metabolic processes. Catabolic hormones stimulate the breakdown of molecules and the production of energy. These include cortisol, glucagon, adrenaline/epinephrine, and cytokines. All of these hormones are mobilized at specific times to meet the needs of the body. Anabolic hormones are required for the synthesis of molecules and include growth hormone, insulin-like growth factor, insulin, testosterone, and estrogen.The following table summarizes the function of each of the catabolic hormones and the subsequent table summarizes the functions of the anabolic hormones.

Table 1. Catabolic Hormones
CortisolReleased from the adrenal gland in response to stress; its main role is to increase blood glucose levels by gluconeogenesis (breaking down fats and proteins)
GlucagonReleased from alpha cells in the pancreas either when starving or when the body needs to generate additional energy; it stimulates the breakdown of glycogen in the liver to increase blood glucose levels; its effect is the opposite of insulin; glucagon and insulin are a part of a negative-feedback system that stabilizes blood glucose levels
Adrenaline/epinephrineReleased in response to the activation of the sympathetic nervous system; increases heart rate and heart contractility, constricts blood vessels, is a bronchodilator that opens (dilates) the bronchi of the lungs to increase air volume in the lungs, and stimulates gluconeogenesis
Table 2. Anabolic Hormones
Growth hormone (GH)Synthesized and released from the pituitary gland; stimulates the growth of cells, tissues, and bones
Insulin-like growth factor (IGF)Stimulates the growth of muscle and bone while also inhibiting cell death (apoptosis)
InsulinProduced by the beta cells of the pancreas; plays an essential role in carbohydrate and fat metabolism, controls blood glucose levels, and promotes the uptake of glucose into body cells; causes cells in muscle, adipose tissue, and liver to take up glucose from the blood and store it in the liver and muscle as glucagon; its effect is the opposite of glycogen; glucagon and insulin are a part of a negative-feedback system that stabilizes blood glucose levels
TestosteroneProduced by the testes in males and the ovaries in females; stimulates an increase in muscle mass and strength as well as the growth and strengthening of bone
EstrogenProduced primarily by the ovaries, it is also produced by the liver and adrenal glands; its anabolic functions include increasing metabolism and fat deposition

Disorders of the Metabolic Processes: Cushing Syndrome and Addison’s Disease

As might be expected for a fundamental physiological process like metabolism, errors or malfunctions in metabolic processing lead to a pathophysiology or—if uncorrected—a disease state. Metabolic diseases are most commonly the result of malfunctioning proteins or enzymes that are critical to one or more metabolic pathways. Protein or enzyme malfunction can be the consequence of a genetic alteration or mutation. However, normally functioning proteins and enzymes can also have deleterious effects if their availability is not appropriately matched with metabolic need. For example, excessive production of the hormone cortisol gives rise to Cushing syndrome. Clinically, Cushing syndrome is characterized by rapid weight gain, especially in the trunk and face region, depression, and anxiety. It is worth mentioning that tumors of the pituitary that produce adrenocorticotropic hormone (ACTH), which subsequently stimulates the adrenal cortex to release excessive cortisol, produce similar effects. This indirect mechanism of cortisol overproduction is referred to as Cushing disease.

Patients with Cushing syndrome can exhibit high blood glucose levels and are at an increased risk of becoming obese. They also show slow growth, accumulation of fat between the shoulders, weak muscles, bone pain (because cortisol causes proteins to be broken down to make glucose via gluconeogenesis), and fatigue. Other symptoms include excessive sweating (hyperhidrosis), capillary dilation, and thinning of the skin, which can lead to easy bruising. The treatments for Cushing syndrome are all focused on reducing excessive cortisol levels. Depending on the cause of the excess, treatment may be as simple as discontinuing the use of cortisol ointments. In cases of tumors, surgery is often used to remove the offending tumor. Where surgery is inappropriate, radiation therapy can be used to reduce the size of a tumor or ablate portions of the adrenal cortex. Finally, medications are available that can help to regulate the amounts of cortisol.

Insufficient cortisol production is equally problematic. Adrenal insufficiency, or Addison’s disease, is characterized by the reduced production of cortisol from the adrenal gland. It can result from malfunction of the adrenal glands—they do not produce enough cortisol—or it can be a consequence of decreased ACTH availability from the pituitary. Patients with Addison’s disease may have low blood pressure, paleness, extreme weakness, fatigue, slow or sluggish movements, lightheadedness, and salt cravings due to the loss of sodium and high blood potassium levels (hyperkalemia). Victims also may suffer from loss of appetite, chronic diarrhea, vomiting, mouth lesions, and patchy skin color. Diagnosis typically involves blood tests and imaging tests of the adrenal and pituitary glands. Treatment involves cortisol replacement therapy, which usually must be continued for life.

Oxidation-Reduction Reactions

The chemical reactions underlying metabolism involve the transfer of electrons from one compound to another by processes catalyzed by enzymes. The electrons in these reactions commonly come from hydrogen atoms, which consist of an electron and a proton. A molecule gives up a hydrogen atom, in the form of a hydrogen ion (H+) and an electron, breaking the molecule into smaller parts. The loss of an electron, or oxidation, releases a small amount of energy; both the electron and the energy are then passed to another molecule in the process of reduction, or the gaining of an electron. These two reactions always happen together in an oxidation-reduction reaction (also called a redox reaction)—when an electron is passed between molecules, the donor is oxidized and the recipient is reduced. Oxidation-reduction reactions often happen in a series, so that a molecule that is reduced is subsequently oxidized, passing on not only the electron it just received but also the energy it received. As the series of reactions progresses, energy accumulates that is used to combine Pi and ADP to form ATP, the high-energy molecule that the body uses for fuel.

Oxidation-reduction reactions are catalyzed by enzymes that trigger the removal of hydrogen atoms. Coenzymes work with enzymes and accept hydrogen atoms. The two most common coenzymes of oxidation-reduction reactions are nicotinamide adenine dinucleotide (NAD) and flavin adenine dinucleotide (FAD). Their respective reduced coenzymes are NADH and FADH2, which are energy-containing molecules used to transfer energy during the creation of ATP.

Chapter Review

Metabolism is the sum of all catabolic (break down) and anabolic (synthesis) reactions in the body. The metabolic rate measures the amount of energy used to maintain life. An organism must ingest a sufficient amount of food to maintain its metabolic rate if the organism is to stay alive for very long.

Catabolic reactions break down larger molecules, such as carbohydrates, lipids, and proteins from ingested food, into their constituent smaller parts. They also include the breakdown of ATP, which releases the energy needed for metabolic processes in all cells throughout the body.

Anabolic reactions, or biosynthetic reactions, synthesize larger molecules from smaller constituent parts, using ATP as the energy source for these reactions. Anabolic reactions build bone, muscle mass, and new proteins, fats, and nucleic acids. Oxidation-reduction reactions transfer electrons across molecules by oxidizing one molecule and reducing another, and collecting the released energy to convert Pi and ADP into ATP. Errors in metabolism alter the processing of carbohydrates, lipids, proteins, and nucleic acids, and can result in a number of disease states.

Self Check

Answer the question(s) below to see how well you understand the topics covered in the previous section.

Critical Thinking Questions

  1. Describe how metabolism can be altered.
  2. Describe how Addison’s disease can be treated.

[reveal-answer q=”720130″]Show Answers[/reveal-answer]
[hidden-answer a=”720130″]

  1. An increase or decrease in lean muscle mass will result in an increase or decrease in metabolism.
  2. Addison’s disease is characterized by low cortisol levels. One way to treat the disease is by giving cortisol to the patient.



anabolic hormones: hormones that stimulate the synthesis of new, larger molecules

anabolic reactions: reactions that build smaller molecules into larger molecules

biosynthesis reactions: reactions that create new molecules, also called anabolic reactions

catabolic hormones: hormones that stimulate the breakdown of larger molecules

catabolic reactions: reactions that break down larger molecules into their constituent parts

FADH2: high-energy molecule needed for glycolysis

flavin adenine dinucleotide (FAD): coenzyme used to produce FADH2

metabolism: sum of all catabolic and anabolic reactions that take place in the body

NADH: high-energy molecule needed for glycolysis

nicotinamide adenine dinucleotide (NAD): coenzyme used to produce NADH

oxidation: loss of an electron

oxidation-reduction reaction: (also, redox reaction) pair of reactions in which an electron is passed from one molecule to another, oxidizing one and reducing the other

reduction: gaining of an electron

Microbiology of Atypical Environments

Wendy Stone , Gideon Wolfaardt , in Methods in Microbiology , 2018


Microbial metabolism in extreme environments has two defining characteristics: it is slow, and it tends towards the lower measurement thresholds of all current techniques and methodological trends. This provides the crucible of challenge to drive novel techniques, forcing researchers to employ new ideas, as well as reinvent old ideas that have fallen out of favour or become overshadowed by current buzzwords in the field. Some of the techniques that have been creatively harnessed to successfully shed light on the impact of microbial metabolism in atypical environments include microscopy at surface–air interfaces, mass balances, radiolabelled fatty acid synthesis, and stereoisomeric ratios. The divergence between the rate and career span of the researcher and the rate of the subject matter is a central consideration in this context, and can be overcome by intentional and rigorous long-term experimental design, mathematical modelling, team work, and global perspectives regarding measurement and impact.

Metabolomics: A Primer

Metabolomics generates a profile of small molecules that are derived from cellular metabolism and can directly reflect the outcome of complex networks of biochemical reactions, thus providing insights into multiple aspects of cellular physiology. Technological advances have enabled rapid and increasingly expansive data acquisition with samples as small as single cells however, substantial challenges in the field remain. In this primer we provide an overview of metabolomics, especially mass spectrometry (MS)-based metabolomics, which uses liquid chromatography (LC) for separation, and discuss its utilities and limitations. We identify and discuss several areas at the frontier of metabolomics. Our goal is to give the reader a sense of what might be accomplished when conducting a metabolomics experiment, now and in the near future.

Keywords: Metabolomics mass spectrometry metabolic biology metabolic network quantitative biology.

Copyright © 2017 Elsevier Ltd. All rights reserved.


Figure 1. Targeted, semi-targeted and untargeted analysis

Figure 1. Targeted, semi-targeted and untargeted analysis

General scheme of different workflows that are available…

Figure 2. Distribution of recent publications on…

Figure 2. Distribution of recent publications on applications of metabolomics by area

Figure 3. Trends in metabolomics

Figure 3. Trends in metabolomics

Trends include broader metabolite coverage from smaller sample sizes, achieving…

Free Energy and ATP

The energetics of biochemical reactions are best described in terms of the thermodynamic function called Gibbs free energy (G), named for Josiah Willard Gibbs. The change in free energy (ΔG) of a reaction combines the effects of changes in enthalpy (the heat that is released or absorbed during a chemical reaction) and entropy (the degree of disorder resulting from a reaction) to predict whether or not a reaction is energetically favorable. All chemical reactions spontaneously proceed in the energetically favorable direction, accompanied by a decrease in free energy (ΔG < 0). For example, consider a hypothetical reaction in which A is converted to B:

If ΔG < 0, this reaction will proceed in the forward direction, as written. If ΔG > 0, however, the reaction will proceed in the reverse direction and B will be converted to A.

The ΔG of a reaction is determined not only by the intrinsic properties of reactants and products, but also by their concentrations and other reaction conditions (e.g., temperature). It is thus useful to define the free-energy change of a reaction under standard conditions. (Standard conditions are considered to be a 1-M concentration of all reactants and products, and 1 atm of pressure). The standard free-energy change (ΔG°) of a reaction is directly related to its equilibrium position because the actual ΔG is a function of both Δ and the concentrations of reactants and products. For example, consider the reaction

The free-energy change can be written as follows:

where R is the gas constant and T is the absolute temperature.

At equilibrium, ΔG= 0 and the reaction does not proceed in either direction. The equilibrium constant for the reaction (K= [B]/[A] at equilibrium) is thus directly related to ΔG° by the above equation, which can be expressed as follows:

If the actual ratio [B]/[A] is greater than the equilibrium ratio (K), ΔG > 0 and the reaction proceeds in the reverse direction (conversion of B to A). On the other hand, if the ratio [B]/[A] is less than the equilibrium ratio, ΔG < 0 and A is converted to B.

The standard free-energy change (ΔG°) of a reaction therefore determines its chemical equilibrium and predicts in which direction the reaction will proceed under any given set of conditions. For biochemical reactions, the standard free-energy change is usually expressed as Δ′, which is the standard free-energy change of a reaction in aqueous solution at pH= 7, approximately the conditions within a cell.

Many biological reactions (such as the synthesis of macromolecules) are thermodynamically unfavorable (ΔG > 0) under cellular conditions. In order for such reactions to proceed, an additional source of energy is required. For example, consider the reaction

The conversion of A to B is energetically unfavorable, so the reaction proceeds in the reverse rather than the forward direction. However, the reaction can be driven in the forward direction by coupling the conversion of A to B with an energetically favorable reaction, such as:

If these two reactions are combined, the coupled reaction can be written as follows:

The ΔG of the combined reaction is the sum of the free-energy changes of its individual components, so the coupled reaction is energetically favorable and will proceed as written. Thus, the energetically unfavorable conversion of A to B is driven by coupling it to a second reaction associated with a large decrease in free energy. Enzymes are responsible for carrying out such coupled reactions in a coordinated manner.

The cell uses this basic mechanism to drive the many energetically unfavorable reactions that must take place in biological systems. Adenosine 5′-triphosphate (ATP) plays a central role in this process by acting as a store of free energy within the cell (Figure 2.31). The bonds between the phosphates in ATP are known as high-energy bonds because their hydrolysis is accompanied by a relatively large decrease in free energy. There is nothing special about the chemical bonds themselves they are called high-energy bonds only because a large amount of free energy is released when they are hydrolyzed within the cell. In the hydrolysis of ATP to ADP plus phosphate (Pi), ΔG°′= -7.3 kcal/mol. Recall, however, that ΔG°′ refers to “standard conditions,” in which the concentrations of all products and reactants are 1 M. Actual intracellular concentrations of Pi are approximately 10 -2 M, and intracellular concentrations of ATP are higher than those of ADP. These differences between intracellular concentrations and those of the standard state favor ATP hydrolysis, so for ATP hydrolysis within a cell, ΔG is approximately -12 kcal/mol.

Figure 2.31

ATP as a store of free energy. The bonds between the phosphate groups of ATP are called high-energy bonds because their hydrolysis results in a large decrease in free energy. ATP can be hydrolyzed either to ADP plus a phosphate group (HPO4 2- ) or to AMP (more. )

Alternatively, ATP can be hydrolyzed to AMP plus pyrophosphate (PPi). This reaction yields about the same amount of free energy as the hydrolysis of ATP to ADP does. However, the pyrophosphate produced by this reaction is then itself rapidly hydrolyzed, with a ΔG similar to that of ATP hydrolysis. Thus, the total free-energy change resulting from the hydrolysis of ATP to AMP is approximately twice that obtained by the hydrolysis of ATP to ADP. For comparison, the bond between the sugar and phosphate group of AMP, rather than having high energy, is typical of covalent bonds for the hydrolysis of AMP, ΔG°′= -3.3 kcal/mol.

Because of the accompanying decrease in free energy, the hydrolysis of ATP can be used to drive other energy-requiring reactions within the cell. For example, the first reaction in glycolysis (discussed in the next section) is the conversion of glucose to glucose-6-phosphate. The reaction can be written as follows:

Because this reaction is energetically unfavorable as written (ΔG°′= +3.3 kcal/mol), it must be driven in the forward direction by being coupled to ATP hydrolysis (ΔG°′= -7.3 kcal/mol):

The combined reaction can be written as follows:

The free-energy change for this reaction is the sum of the free-energy changes for the individual reactions, so for the coupled reaction ΔG°′= -4.0 kcal/mol, favoring glucose-6-phosphate formation.

Other molecules, including other nucleoside triphosphates (e.g., GTP), also have high-energy bonds and can be used as ATP is to drive energy-requiring reactions. For most reactions, however, ATP provides the free energy. The energy-yielding reactions within the cell are therefore coupled to ATP synthesis, while the energy-requiring reactions are coupled to ATP hydrolysis. The high-energy bonds of ATP thus play a central role in cell metabolism by serving as a usable storage form of free energy.

3.10 Summary

  • Biochemical reactions are chemical reactions that take place inside of living things. The sum of all of the biochemical reactions in an organism is called metabolism.
  • Metabolism includes catabolic reactions, which are energy-releasing (exothermic) reactions, as well as anabolic reactions, which are energy-absorbing (endothermic) reactions.
  • Most biochemical reactions need a biological catalyst called an enzyme to speed up the reaction. Enzymes reduce the amount of activation energy needed for the reaction to begin. Most enzymes are proteins that affect just one specific substance, which is called the enzyme’s substrate.
  • There are many inherited metabolic disorders in humans. Most of them are caused by a single defective or missing enzyme.

10.2: Overview of Metabolic Reactions - Biology

Cells perform the functions of life through various chemical reactions. A cell’s metabolism refers to the combination of chemical reactions that take place within it. Catabolic reactions break down complex chemicals into simpler ones and are associated with energy release. Anabolic processes build complex molecules out of simpler ones and require energy.

In studying energy, the term system refers to the matter and environment involved in energy transfers. Entropy is a measure of the disorder of a system. The physical laws that describe the transfer of energy are the laws of thermodynamics. The first law states that the total amount of energy in the universe is constant. The second law of thermodynamics states that every energy transfer involves some loss of energy in an unusable form, such as heat energy. Energy comes in different forms: kinetic, potential, and free. The change in free energy of a reaction can be negative (releases energy, exergonic) or positive (consumes energy, endergonic). All reactions require an initial input of energy to proceed, called the activation energy.

Enzymes are chemical catalysts that speed up chemical reactions by lowering their activation energy. Enzymes have an active site with a unique chemical environment that fits particular chemical reactants for that enzyme, called substrates. Enzymes and substrates are thought to bind according to an induced-fit model. Enzyme action is regulated to conserve resources and respond optimally to the environment.

Practice Questions

Figure 1. Shown are some examples of endergonic processes (ones that require energy) and exergonic processes (ones that release energy). (credit a: modification of work by Natalie Maynor credit b: modification of work by USDA credit c: modification of work by Cory Zanker credit d: modification of work by Harry Malsch)


Kryshtafovych A, Fidelis K, Moult J: Progress from CASP6 to CASP7. Proteins. 2007, 69 (Suppl 8): 194-207. 10.1002/prot.21769.

Grabowski M, Joachimiak A, Otwinowski Z, Minor W: Structural genomics: keeping up with expanding knowledge of the protein universe. Curr Opin Struct Biol. 2007, 17: 347-353. 10.1016/

Reeves GA, Thornton JM: Integrating biological data through the genome. Hum Mol Genet. 2006, 15 (Spec No 1): R81-R87. 10.1093/hmg/ddl086.

Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ: Integrating sequence and structural biology with DAS. BMC Bioinformatics. 2007, 8: 333-10.1186/1471-2105-8-333.

Tramontano A: The role of molecular modelling in biomedical research. FEBS Lett. 2006, 580: 2928-2934. 10.1016/j.febslet.2006.04.011.

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.

Tress M, Cheng J, Baldi P, Joo K, Lee J, Seo JH, Lee J, Baker D, Chivian D, Kim D, Ezkurdia I: Assessment of predictions submitted for the CASP7 domain prediction category. Proteins. 2007, 69 (Suppl 8): 137-151. 10.1002/prot.21675.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G, Barker WC: PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 2004, D112-D114. 10.1093/nar/gkh097. 32 Database

Reid AJ, Yeats C, Orengo CA: Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone. Bioinformatics. 2007, 23: 2353-2360. 10.1093/bioinformatics/btm355.

Apic G, Gough J, Teichmann SA: Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001, 310: 311-325. 10.1006/jmbi.2001.4776.

Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, Ke Z, Krylov D, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Thanki N, Yamashita RA, Yin JJ, Zhang D, Bryant SH: CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 2007, D237-D240. 10.1093/nar/gkl951. 35 Database

Hulo N, Bairoch A, Bulliard V, Cerutti L, Cuche BA, de Castro E, Lachaize C, Langendijk-Genevaux PS, Sigrist CJ: The 20 years of PROSITE. Nucleic Acids Res. 2008, D245-D249. 36 Database

Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2008, D281-D288. 36 Database

Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P: SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 2006, D257-D260. 10.1093/nar/gkj079. 34 Database

The universal protein resource (UniProt). Nucleic Acids Res. 2008, D190-D195. 36 Database

Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, et al: New developments in the InterPro database. Nucleic Acids Res. 2007, D224-D228. 10.1093/nar/gkl841. 35 Database

Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C: Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res. 2008, D414-D418. 36 Database

Wilson D, Madera M, Vogel C, Chothia C, Gough J: The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res. 2007, D308-D313. 10.1093/nar/gkl910. 35 Database

Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH - a hierarchic classification of protein domain structures. Structure. 1997, 5: 1093-1108. 10.1016/S0969-2126(97)00260-8.

Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995, 247: 536-540. 10.1006/jmbi.1995.0159.

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.

Kaplan N, Sasson O, Inbar U, Friedlich M, Fromer M, Fleischer H, Portugaly E, Linial N, Linial M: ProtoNet 4.0: a hierarchical classification of one million protein sequences. Nucleic Acids Res. 2005, D216-D218. 33 Database

Petryszak R, Kretschmann E, Wieser D, Apweiler R: The predictive power of the CluSTr database. Bioinformatics. 2005, 21: 3604-3609. 10.1093/bioinformatics/bti542.

Krause A, Stoye J, Vingron M: Large scale hierarchical clustering of protein sequences. BMC Bioinformatics. 2005, 6: 15-10.1186/1471-2105-6-15.

Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S, Kahn D: The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 2005, D212-D215. 33 Database

Portugaly E, Linial N, Linial M: EVEREST: a collection of evolutionary conserved protein domains. Nucleic Acids Res. 2007, D241-D246. 10.1093/nar/gkl850. 35 Database

Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, Orengo C, Joachimiak A, Laskowski RA, Thornton JM: Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol. 2007, 367: 1511-1522. 10.1016/j.jmb.2007.01.063.

Chothia C, Lesk AM: The relation between the divergence of sequence and structure in proteins. EMBO J. 1986, 5: 823-826.

Taylor WR, Orengo CA: Protein structure alignment. J Mol Biol. 1989, 208: 1-22. 10.1016/0022-2836(89)90084-3.

Redfern OC, Harrison A, Dallman T, Pearl FM, Orengo CA: CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol. 2007, 3: e232-10.1371/journal.pcbi.0030232.

Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993, 233: 123-138. 10.1006/jmbi.1993.1489.

Krissinel E, Henrick K: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr. 2004, 60: 2256-2268. 10.1107/S0907444904026460.

Madej T, Gibrat JF, Bryant SH: Threading a database of protein cores. Proteins. 1995, 23: 356-369. 10.1002/prot.340230309.

Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998, 11: 739-747. 10.1093/protein/11.9.739.

Kolodny R, Koehl P, Levitt M: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol. 2005, 346: 1173-1188. 10.1016/j.jmb.2004.12.032.

Ye Y, Godzik A: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics. 2003, 19 (Suppl 2): ii246-255.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.

Polacco BJ, Babbitt PC: Automated discovery of 3D motifs for protein function annotation. Bioinformatics. 2006, 22: 723-730. 10.1093/bioinformatics/btk038.

Kleywegt GJ: Recognition of spatial motifs in protein structures. J Mol Biol. 1999, 285: 1887-1897. 10.1006/jmbi.1998.2393.

Wangikar PP, Tendulkar AV, Ramya S, Mali DN, Sarawagi S: Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J Mol Biol. 2003, 326: 955-978. 10.1016/S0022-2836(02)01384-0.

Laskowski RA, Luscombe NM, Swindells MB, Thornton JM: Protein clefts in molecular recognition and function. Protein Sci. 1996, 5: 2438-2452.

Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph. 1995, 13: 323-330. 10.1016/0263-7855(95)00073-9.

Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N: ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res. 2005, 33: W299-W302. 10.1093/nar/gki370.

Glaser F, Rosenberg Y, Kessel A, Pupko T, Ben-Tal N: The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins. 2005, 58: 610-617. 10.1002/prot.20305.

Binkowski TA, Freeman P, Liang J: pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins. Nucleic Acids Res. 2004, 32: W555-W558. 10.1093/nar/gkh390.

Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J: CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res. 2006, 34: W116-W118. 10.1093/nar/gkl282.

Bagley SC, Altman RB: Characterizing the microenvironment surrounding protein sites. Protein Sci. 1995, 4: 622-635.

Shulman-Peleg A, Nussinov R, Wolfson HJ: SiteEngines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 2005, 33: W337-W341. 10.1093/nar/gki482.

Sasin JM, Godzik A, Bujnicki JM: SURF'S UP! - protein classification by surface comparisons. J Biosci. 2007, 32: 97-100. 10.1007/s12038-007-0009-0.

Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 2004, 32: D129-D133. 10.1093/nar/gkh028.

Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins. Nucleic Acids Res. 2004, 32: W549-W554. 10.1093/nar/gkh439.

Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA: PDBSite: a database of the 3D structure of protein functional sites. Nucleic Acids Res. 2005, 33: D183-D187. 10.1093/nar/gki105.

George RA, Spriggs RV, Bartlett GJ, Gutteridge A, MacArthur MW, Porter CT, Al-Lazikani B, Thornton JM, Swindells MB: Effective function annotation through catalytic residue conservation. Proc Natl Acad Sci USA. 2005, 102: 12299-12304. 10.1073/pnas.0504833102.

Laskowski RA, Watson JD, Thornton JM: ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005, 33: W89-W93. 10.1093/nar/gki414.

Stark A, Russell RB: Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res. 2003, 31: 3341-3344. 10.1093/nar/gkg506.

Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996, 257: 342-358. 10.1006/jmbi.1996.0167.

Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY, Kimmel M, Kavraki LE, Lichtarge O: Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics. 2008, 9: 17-10.1186/1471-2105-9-17.

Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O: De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS ONE. 2008, 3: e2136-10.1371/journal.pone.0002136.

Herrgard S, Cammer SA, Hoffman BT, Knutson S, Gallina M, Speir JA, Fetrow JS, Baxter SM: Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors. Proteins. 2003, 53: 806-816. 10.1002/prot.10458.

Pal D, Eisenberg D: Inference of protein function from protein structure. Structure. 2005, 13: 121-130. 10.1016/j.str.2004.10.015.

Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32: D449-D451. 10.1093/nar/gkh086.

Friedberg I, Harder T, Godzik A: JAFA: a protein function annotation meta-server. Nucleic Acids Res. 2006, 34: W379-W381. 10.1093/nar/gkl045.

von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750.

Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes HW, Ruepp A, Frishman D: The MIPS mammalian protein-protein interaction database. Bioinformatics. 2005, 21: 832-834. 10.1093/bioinformatics/bti115.

Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks?. Genome Biol. 2006, 7: 120-10.1186/gb-2006-7-11-120.

Aloy P, Russell RB: Ten thousand interactions for the molecular biologist. Nat Biotechnol. 2004, 22: 1317-1321. 10.1038/nbt1018.

Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics. 2005, 21: 410-412. 10.1093/bioinformatics/bti011.

Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res. 2005, 33: D413-D417. 10.1093/nar/gki037.

Riley R, Lee C, Sabatti C, Eisenberg D: Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 2005, 6: R89-10.1186/gb-2005-6-10-r89.

Jothi R, Cherukuri PF, Tasneem A, Przytycka TM: Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol. 2006, 362: 861-875. 10.1016/j.jmb.2006.07.072.

Pagel P, Wong P, Frishman D: A domain interaction map based on phylogenetic profiling. J Mol Biol. 2004, 344: 1331-1346. 10.1016/j.jmb.2004.10.019.

Pagel P, Oesterheld M, Tovstukhina O, Strack N, Stumpflen V, Frishman D: DIMA 2.0--predicted and known domain interactions. Nucleic Acids Res. 2008, D651-D655. 36 Database

Raghavachari B, Tasneem A, Przytycka TM, Jothi R: DOMINE: a database of protein domain interactions. Nucleic Acids Res. 2008, D656-D661. 36 Database

Moult J, Pedersen JT, Judson R, Fidelis K: A large-scale experiment to assess protein structure prediction methods. Proteins. 1995, 23: ii-v. 10.1002/prot.340230303.

Soro S, Tramontano A: The prediction of protein function at CASP6. Proteins. 2005, 61 (Suppl 7): 201-213. 10.1002/prot.20738.

Pellegrini-Calace M, Soro S, Tramontano A: Revisiting the prediction of protein function at CASP6. FEBS J. 2006, 273: 2977-2983. 10.1111/j.1742-4658.2006.05309.x.

3  MetaCyc Availability

MetaCyc is available in several different forms to facilitate different uses of the data:

The MetaCyc data are available through the MetaCyc Web site for interactive querying and visualization.

The MetaCyc data can be downloaded as a set of data files. These files can be parsed and queried using languages like Perl, or they can be loaded into other database systems. Click here for more information about the flat files, or you can download them from here.

A downloadable program that combines the Pathway Tools software with MetaCyc and the other BioCyc PGDBs can be installed on computers at your site, with the following advantages:

It provides some functionality that is not provided by the MetaCyc Web site, such as comparative analysis of entire metabolic pathways

It usually runs more quickly than the Web site

It supports programmatic querying of MetaCyc using APIs in the Java, Perl, and Lisp languages

It supports running an equivalent of the BioCyc Web site on your intranet

Author information


Max Planck Institute for Mathematics in the Sciences, Inselstr 22, 04103, Leipzig, Germany

Areejit Samal & Jürgen Jost

INRA, UMR 0320/UMR 8120 Génétique Végétale, Univ Paris-Sud, F-91190, Gif-sur-Yvette, France

Areejit Samal & Olivier C Martin

Laboratoire de Physique Théorique et Modèles Statistiques, CNRS, Univ Paris-Sud, UMR 8626, F-91405, Orsay Cedex, France

Areejit Samal & Olivier C Martin

Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, CH-8057, Zurich, Switzerland

João F Matias Rodrigues & Andreas Wagner

Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, 1015, Lausanne, Switzerland

João F Matias Rodrigues & Andreas Wagner

The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA

Jürgen Jost & Andreas Wagner

Department of Biology, University of New Mexico, 167 Castetter Hall, Albuquerque, MSC03 2020, USA

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

Corresponding authors


Definition 1 (Hypergraphs and hyperarcs).

A directed hypergraph is a pair H = ( V , E ) where V = <v1, v2 . v n > is the set of vertices and E = <e1, e2. e m > is the set of hyperarcs. A hyperarc e i is an ordered pair e i = (X i , Y i ) of disjoint subsets of V.

The set X i is also called the tail of e i and the set Y i is called the head, with reference to the graphical representation of arcs (directed edges) and hyperarcs as arrows.

We denote by X : E → P ( V ) the application that given an hyperarc e i returns its tail X(e i ) ⊂ V. Analogously we use Y : E → P ( V ) for the application that given a hyperarc returns its head.

Definition 2 (Reactions and networks).

In a metabolic network each vertex corresponds to a metabolite and each hyperarc corresponds to a reaction. A metabolic network of m metabolites and n reactions can be represented with a m × n stoichiometric matrix S , where the rows correspond to the m metabolites and the n columns to the reactions. A reaction j is represented by the column vector S j = (s1j. s mj ) T where s ij is the stoichiometric coefficient of metabolite i in reaction j. Reactants have negative coefficients and products have positive coefficients.

Examples of hypergraph, network, and stochiometric matrix are given in Figure 2A. We notice that the stoichiometric coefficients of the reactions are not taken into account in the hypergraph representation. We also notice that the pair (X, Y) is ordered so to make the distinction between reactants and products. In this representation reactions are irreversible. Many biochemical reactions can be considered as irreversible, since in organisms the homeostatic equilibrium is often strongly polarized. Nonetheless, metabolic network may comprise reversible reactions, and we model these reactions by introducing both hyperarcs: (X, Y), and (Y, X).

Reactions, hyperpath, and corresponding stoichiometric matrix. A) A set of reactions (top), the corresponding hypergraph, and the corresponding stoichiometric matrix (bottom). The hypergraph here represented is a hyperpath from v2 and v4 (source nodes) to the target vertex v8. The reactions can be ordered R3, R2, R1 so that the conditions required by the hyperpath definition (3) are satisfied B. A hypergraph that is not a hyperpath: The hypergraph <R1, R2> is not a hyperpath with source a C) A minimal hyperpath: This hyperpath is a subset of the hyperpath in Figure 2A and is minimal (both R1 and R3 are necessary to link v8 to the source) D) A hypergraph representing a toy metabolic network: Given v1 and v4 as sources of the hypergraph above the reachable vertices are v5, v7, v8, v9, v12 in light green. v2 and v3 are bootstrap compounds: the presence of one of them permits its own production. In red the compounds v10 and v11 are supplements for the production of v2, v3, v6 and v8.

Hyperpaths, a generalization of simple paths in graphs where cycle free paths going from one vertex to another, are used to represent pathways. A hyperpath connects a source set of vertices to a target set of nodes. Two examples of hyperpaths are given in Figures 2A and 2C. We remark that in a natural way a set E of hyperarcs defines a hypergraph ε = ( ∪ eEX(e) ∪ ∪ eEY (e), E). By abuse of the terminology we denote by E the hypergraph corresponding to the set E of hyperarcs and all the heads and tails of the hyperarcs in E. The following definition for hyperpaths is borrowed from Nielsen et al. [22].

Definition 3 (Hyperpaths).

A hyperpath P going from a source subset S H of V to a target subset T P of P in a hypergraph H = ( V , E ) is a hypergraph H P = ( V P , E P ) with VPV, EPE, such that there is an ordering F of the hyperarcs EP with the following properties.

∀ k ∈ < 0 , … , | F | >, X ( F k ) ⊆ S H ∪ ( ∪ j < k Y ( F j ) )

From the point of view of metabolism, the first condition corresponds to the requirement that reactants of reactions participating in the hyperpath can be produced without the presence of the reaction itself. Hyperpaths defined in this manner represent a metabolic route from the source to the target. According to definition (3) the hypergraph of Figure 2B with source a is not a hyperpath because neither reaction R1 nor R2 can happen until the other does not start. The definition (3), though complex, is computationally tractable, meaning that the time required to determine if a hypergraph is a hyperpath is proportional to the number of reactions. A polynomial time algorithm to determine if a hypergraph is a hyperpath is given in [23], the algorithm FindAll presented below can also be used for that purpose. In fact, as discussed below, if the set of reactions returned by FindAll ( H P , S H ) contains all the reactions in H P , then H P is a hyperpath.

The metabolic network described by a hypergraph has to be as comprehensive as possible, containing every known enzyme-catalyzed reaction occurring in organisms. We say that a hyperpath produces a set of target metabolites if it contains all those target elements. A set of target compounds is said to be reachable from a given source, or linked to the source, if there is at least one hyperpath producing the targets.

We are interested in the enumeration of pathways leading to the production of a desired compound. Hyperpaths do not generally give the best representation of pathways because hyperpaths can contain reactions not necessarily linking the target to the source. Minimal hyperpaths, cf. definition (4), are an appropriate representation of pathways since they contain only the essential reactions linking the source to the target.

In the definition given below, we say that a hyperpath P ( V , E ) is a subset of another hyperpath P ′ ( V ′ , E ′ ) if VV' and EE'. For instance the hyperpath of Figure 2C is a subset of the one of Figure 2A.

Definition 4 (Minimal Hyperpaths).

A hyperpath (V P , E P ) with target TP is said to be minimal if it has no proper subsets with the same target.

The target is disconnected from the source if a reaction is removed from a minimal hyperpath. In this sense minimal hyperpaths cannot be reduced. From a metabolic engineering perspective the concept of minimal hyperpath is useful as it defines the minimum set of reactions necessary to produce a target heterologous compounds, and consequently the minimum set of enzymes needed to be inserted into the chassis organism where the compound is going to be produced.

In the following we define B ( H , S H ) to be the set of all molecules linked to the source for a given hypergraph H and source set S H . The characterization of B ( H , S H ) is the first task to be solved before the enumeration. Once this set is known all the minimal hyperpaths can be enumerated for all the molecules associated to the vertices in B ( H , S H ) .


Supplements for a target are molecules whose presence in the source set increases the number of pathways for target production. Finding supplements is an important improvement when exploring ways to produce the target, since they make possible new pathways.

For each target of interest one can look for vertices that once inserted in S H give place to pathways otherwise impassable. In terms of metabolism we are looking for the "supplement" molecules, i.e., molecules that once introduced in the source set permit to find more pathways than those otherwise available. We introduce below FindSupp, an algorithm that returns the supplements.

An analysis of pathways containing supplements allows to find out pathways containing bootstrap molecules, i.e. metabolites that are needed in reactions producing compounds afterwards used for the production of the bootstrap molecules. As a matter of fact, many pathways can be made viable once bootstrap molecules become available in the metabolic network (a concept introduced in [20]). Loosely speaking bootstrap molecules are molecules that cannot be produced by the reactions belonging to a hyperpath unless they are already present in the source. Cottret et al [21] stated that given a source set the existence of a pathway making use of bootstrap molecules can be tested in polynomial time. We provide later in this section an algorithm returning the bootstrap compounds, such algorithm can be used to determine if a target molecule is connected to the source through a pathway making use of bootstraps.

Enumerating pathways using the steady state approach

In steady state, all possible pathways in a metabolic network are by definition stoichiometrically balanced, i.e. all metabolites produced from the source set must be consumed except for those that are target products. Extreme pathways and elementary modes are two methods that compute the set of independent non-decomposable pathways in the network that generate all feasible steady state solutions in the flux space. They do not directly enumerate all pathways linking a source set to a target set of compounds. However, one can construct stoichiometric matrices where input fluxes are added to the set of source compounds and outgoing fluxes are associated to the target and heterologous co-products such that the extreme pathways and elementary modes enumerated from these matrices can be used to generate all pathways linking the source set to the target.

Given a hyperpath H P = ( V , E ) of a hypergraph H = ( V , E ) , we can define a set of flux vectors v P for the hyperpath where components v Pj corresponding to those reactions in the pathway e j ∈ H P are activated:

A hyperpath H P = ( V , E ) of a hypergraph H = ( V , E ) with input source subset S H and the target subset T P is defined as stoichiometrically balanced if the rows corresponding to each metabolite v iV that are obtained from the product of the stoichiometric matrix S and the associated flux vector v p verify:

A way to introduce the constraint on input and output metabolites in the previous equation is by adding to the stoichiometric matrix S additional columns corresponding to input reactions (reactions with no substrate that produce the source set S H ), and output reactions (reactions with no product that consume the product metabolites in the hypergraph T P ). These auxiliary reactions, even if non-properly balanced in terms of the law of conservation of mass, are useful in order to define completely the problem in a compact manner:

Both extreme pathways and elementary modes make use of this formulation in order to compute the set of feasible solutions v. Since in our hypergraph definition all reactions are irreversible, the set of pathways solving Equation 3 computed by both extreme pathways and elementary modes are identical (cf. [5]). Furthermore, solutions in v must contain only positive or null fluxes.

In order to determine all stoichiometrically balanced heterologous pathways H P that can be inserted into the chassis organism to produce a target set T P , we need to constrain the computation of elementary modes only to those that have non-zero fluxes for heterologous reactions. Efficient solutions to this problem have been considered in the divide-and-conquer approach [24, 25] by rearranging the constraints in an echelon form so that the constraints containing only the desired reactions appear at the bottom. To define the constraints in our case, we consider first the hypergraph R T that is formed only by heterologous reactions. This hypergraph R T is the subset of the hypergraph R ( V , E ) formed by those hyperedges where at least one vertex V does not belong to the source set S R , i.e. those metabolites endogenous to the chassis organisms. By considering R T instead of the full hypergraph R , we are looking only for biosynthetic pathways involving heterologous reactions and therefore avoiding cycles internal to the chassis organism. Therefore, to compute all feasible steady state heterologous pathways, we reformulate Equation 2 so that the stoichiometric matrix S is defined by reactions in R T the input is given by all substrates in the source set S R ∩ X ( E R ) and the output by all products of the reactions in the hypergraph Y ( E R ) .

Finally, from the computed set of solutions v for Equation 3, we are interested in enumerating all minimal hyperpaths from S R to the target set T on the hypergraph given by R T . According to Definition 4, minimal hyperpaths for some target T are given by those cycle-free solutions in v containing only reactions linking the source to the target. Since any feasible flux pattern v is a superposition of elementary modes with non-negative coefficients [26], the set of minimal hyperpaths for a given target T is a subset of the elementary modes producing T that are solution of Equation 3. Namely, any feasible solution generated from the elementary modes will contain at least as many reactions as the ones that are in those elementary modes that form its basis. Therefore no additional minimal hyperpaths can be generated in this case by superposition of elementary modes.

Enumerating pathways using the topological approach

The algorithm FindAll that allows to find B ( H , S H ) , the set of metabolites that can be linked to the source S H by a hyperpath. FindAll, by explicitly constructing the ordered set Fin definition (3), provides a proof of the tractability of the problem of checking if a hypergraph is a hyperpath. Moreover FindAllF permits to prune the original hypergraph enabling a faster enumeration algorithm.

As presented below the algorithm Minimize, when called on the output of FindAll, returns, if exists, a minimal hyperpath linking a given target to the source. These algorithms are the main components of the algorithm enumerating the pathways FindPath described next. Then we present FindSupp an algorithm to enumerate supplements.

Finding one minimal hyperpath

Let H = ( V , E ) be the hypergraph representing the set of metabolic reactions, n = |V|, m = |E| and let S H be the set of source vertices representing the source metabolites.

The algorithm FindAll returns all the reactions that can contribute to the production of any element in B ( H , S H ) , i.e., the set of all compounds that can be connected to the source. FindAll is a linear algorithm in the number of vertices, hyperarcs and in the total coordination the complexity is O(n + mvV|X -1 (v) |+ |Y -1 (v)|) that is bounded by O(n + m + n · m). Therefore, such algorithm can be applied to the hypergraph H of all reactions in order to obtain a pruned sub-hypergraph H ′ = ( V ′ , E ′ ) where the set of vertices V ′ : = S H ∪ B , and the set of edges E' is the set of reactions returned by FindAll. In the context of metabolic engineering FindAll returns all the compounds that can be produced from a given set of source compounds and reactions. For instance, using FindAll with all know metabolic reactions one can determine all the compounds that can be produced from the metabolites of E. coli.

Algorithm FindAll (Given a hypergraph H and a source S H , returns all the hyperarcs that are part of at least one hyperpath.)

Watch the video: Metabolic Reactions and Energy Transformations (December 2022).