Designing a Thermostable Cellobiohydrolase : A Novel Approach to Sustainable Ethanol Production


Due to depleting oil reserves and the threat of global warming, renewable energy sources, such as the biofuel ethanol, are being explored as alternatives to petroleum. Cellulose, the most common naturally occurring polymer on Earth, can be a feedstock for cellulosic ethanol. Cellulose is converted to glucose by cellulase enzymes, one of which is cellobiohydrolase. Cellobiohydrolase does not, however, catalyze at a rate fast enough for sustainable ethanol production. The goal of this project is to create a modified, thermostable Cellobiohydrolase Cel7a from Hypocrea jecorina that can catalyze cellulose at a higher rate at a higher temperature. The enzyme was modified by selectively replacing amino acids using site directed mutagenesis. Using UGENE 1.25, multiple cellobiohydrolase FASTA sequences were analyzed to identify conserved domains, active sites and potential targets for modification. Each of the amino acid replaced was carefully chosen for beneficial changes such as extra hydrogen, cation pi, disulfide, and ion bonds. Using the protein modelling software I-TASSER 5.0 and CHIMERA 1.2, a 3-D model of the enzyme was created to verify that the active site is not affected by these changes. Using STRUM, the change in thermal stability for each substitution was calculated using a ΔΔG score. A positive ΔΔG score corresponds to increased thermal stability. The final 16 mutations made to the enzyme resulted in a net increase of hydrogen bonds, cation pi interactions, disulfide bonds, and salt bridges. This will lead to a significant increase in thermal stability of Cellobiohydrolase, improving the efficiency of ethanol production. 



Question / Proposal

Research Question

What are the best possible single point mutations that can be made to the enzyme Cellobiohydrolase Cel7a from Hypocrea jecorina that improve the enzyme’s thermal stability while retaining the enzyme’s catalytic activity?



A common way to increase the rate of any reaction is by increasing the temperature. When the temperature is increased however, the enzyme may denature with the heat. Thus, increasing the thermal stability of the enzyme Cellobiohydrolase will allow the enzyme to function at higher temperatures and at a higher catalytic rate.


Engineering Goals

The goal of the project is to create a thermostable version of the enzyme Cellobiohydrolase Cel7a from Hypocrea jecorina so that it can breakdown cellulose at a commercially viable rate and at higher temperatures. The project will find targeted mutations that are beneficial to the thermal stability of cellobiohydrolase. The impact of these mutations will be measured by calculating a ΔΔG value. If the mutations increase thermal stability, the ΔΔG value will be a positive number.

The wild type protein will be modified using site directed mutagenesis and the change in ∆∆G calculated for each of these mutations. These carefully chosen single point mutations should modify the amino acid structure enough to improve the overall stability of the molecule, while still retaining the enzyme’s catalytic activity. An efficient method to produce cellulosic ethanol could revolutionize the energy sector and decrease dependency on fossil fuels.



Each year in the US, the use of fossil fuel expels 2 billion tons of CO2 into the atmosphere. The increased greenhouse gases are slowly leading to climate change. Most fossil fuels are used in the transportation sector. Biofuels, such as biodiesel and ethanol, can be used instead of gasoline. 97 million barrels of crude oil are used per day, and nearly 36 billion barrels are consumed per year. Approximately 64% of this crude oil is consumed for petroleum to fuel vehicles. Ethanol, a biofuel, has the potential to replace gasoline in these vehicles. The main problem is that current ethanol production comes from corn. Ethanol production consumes about 13% of the corn harvest, which limits the availability of corn for livestock, food and other industrial uses and yet makes up only 10 percent of the fuel used by the transportation sector. Ethanol has traditionally been made from breaking down starches to glucose. This glucose is then fermented to form ethanol.

Current research has been focused on finding an efficient method of breaking down cellulose into glucose so that it can be fermented to ethanol. Cellulose is the most common natural polymer on the planet. Nearly 150 billion tons of cellulose is produced each year, but very few organisms possess the enzymes require to decompose it. 

The biggest obstacle in cellulosic ethanol production is that Cellobiohydrolase works at a very slow rate and has high costs and is not commercially feasible. Cellobiohydrolase is not naturally effective in cellulosic catalysis at the high temperatures required for substantial energy production.

Only certain species of Fungi and Bacteria contain the enzymes Endoglucanase, Cellobiohydrolase, and beta-glucosidase needed for this.

Endoglucanase creates random breaks in the cellulose polymer strand. Cellobiohydrolase attaches to these broken ends and cleaves the cellulose into cellobiose, a disaccharide of glucose. Beta glucosidase is the enzyme that breaks cellobiose into glucose.

Bioinformatics and protein design provide a new research tool to solve environmental problems at the molecular level. The Institute of Systems Biology in the University of Washington has created their own protein modeling software, ROSETTA, which was used in this project. This software allows users to model proteins computationally by using the FASTA (amino acid) sequence. Enzymes can be used as molecular tools to catalyze many reactions with their unique active sites and properties. Research at Caltech by Devin Trudeau, Toni Lee and Nobel Prize Winner Frances H. Arnold have studied chimera and enzyme evolution techniques to create a modified Endoclucanase that can withstand elevated temperatures and can be used in cellulosic ethanol production. 

The Cellobiohydrolase Cel7a from the fungus Hypocrea jecorina is an exo-Cellobiohydrolase. The fungus excretes the enzyme into its surroundings before absorbing the glucose. Thus, the most efficient temperature for the enzyme is around room temperature.

If the thermal stability of Cellobiohydrolase was improved, the range of temperatures that the enzyme would work best at would be shifted upward. Thus, the activation energy required for the reaction would decrease and cause the reaction to speed up.     

Method / Testing and Redesign

Protein Sequence and 3D Structure Analysis:

1. Gather the FASTA sequences of exo-cellobiohydrolase enzymes from several different fungi

2. Using the software UGENE, compare the FASTA sequences to determine amino acid sequences that are conserved. The conserved areas may be significant ligand binding and catalytic sites that determine the exact shape and activity of the enzyme.

3. After determining which amino acid sequences to keep in the protein, use the software I-TASSER and CHIMERA to build and visualize a 3D model of the enzyme.

4. Convert the newly created model to an stl file to 3D print. Use the 3D printed model as a reference to determine the size and geometry the enzyme.

5. Use site-directed mutagenesis to change selected amino acids   that will improve thermal stability without changing the shape of the active site. The goal is to increase the number of bond in the tertiary structure, to create a more stable enzyme.

6. Use ROSETTA to model the change in the protein structure from the mutations.

(Diagram of STRUM, from Yang Zhang Lab)

7. Use STRUM analysis to determine a ΔΔG score for each of the mutations in the protein. If the score is positive, the thermal stability of the protein increased.

8. After modifying the protein, use this modified FASTA sequence that includes the SDM changes in I-TASSER to create a 3D model of the new protein.

9.     Compare this Cellobiohydrolase model to the original model to determine whether the conserved domains were affected. If the shape of the active site changed, use a different single point mutation on the amino acid sequence and re-analyze it.


This experiment was conducted at home on a Linux operating system computer. The Cellobiohydrolase model was modified using site directed mutagenesis. Modified amino acids were carefully chosen based on their bonding capabilities. Proteins with more Asparagine and glutamine have lower denaturization points. Adding glycine to the alpha helix will break the helix decreasing stability. Adding cysteine will add disulfide bonds which will strengthen the proteins tertiary structure. Adding acidic and basic amino acids will form salt bridges. For every salt bridge created the thermal stability increasing by 1 Kcal per bridge. It is important to make the protein as tightly packed as possible to increase hydrogen bonding and London dispersion forces. Hydrogen bonding makes up 40% of the protein’s thermal stability. These methods were used to choose what specific mutations to make in the Cellobiohydrolase. The enzyme’s thermal stability was assessed using STRUM, a program that calculates the protein’s ΔΔG value using gradient boosting regression analysis. The Gibbs Free Energy, ΔG, is calculated between the unfolded version of the protein and the folded version of the protein. The ΔG value is first calculated for both the wild and modified proteins. The ΔΔG value is the change in the Gibbs Free Energy between the two models of proteins. A positive ΔΔG value indicates that the mutation increased thermal stability. Having a positive ΔΔG means it takes more energy to denature the protein.





Selection for Modification Sites in the Cellobiohydrolase Amino Acid Sequence

Detailed analysis of the FASTA sequence of Hypocrea jecorina Cel7a Cellobiohydrolase was compared with 11 other fungal cellobiohydrolase sequences. Over 4500 amino acids were realigned to identify highly conserved sites, using the software UGENE to visualize and colorcode the amino acids. Sites highly conserved were likely vital to the enzyme active site and were determined as areas that would not be modified. Sites where the HJCel7a did not match the consensus were target areas for single point mutations to modify the structure. Amino acids were changed to the most common amino acid found in the other fungal sequences that were compared. 16 mutations were selected that caused a positive impact to the thermal stability. 



After multiple sequence alignment analysis of the test protein with other proteins from the family of Cellobiohydrolase (CBH) 17 Amino acid sites were identified for site directed mutations to change the amino acid at the sites. An engineered modified CBH was created. The table below shows the locations that were identified and the change that was made between the wild type and the mutant molecule. All these locations were analyzed using STRUM to calculate the changes in ∆G(∆∆G) for each of these mutations in Kilocalories per mole and the results are in the data table below.

The new modified CBH molecule sequence was analyzed using I-TASSER and the 3D model generated was used to compare the following features with the wild molecule.

A positive ∆∆G value indicates that the modification improved the overall thermal stability of the protein. Multiple combinations of these modifications were also tested, however this was the combination of modifications that indicated the greatest ∆∆G.

To calculate the range of ∆∆G values for each, the calculations were compiled using 18 separate trials. The mean and standard deviation of ∆∆G was calculated and the expected distribution of ∆∆ G values were plotted on a graph.


Modeling of the 3D protein structures:

Using the protein modeling software I-TASSER from the Zhang Lab, and ROSETTA from the University of Washington, both the wild type enzyme and the modified enzyme were 3D modeled on the computer. The 3D structures were overlayed and compared to each other to validate that the modifications did not interfere with the shape of the active site of the enzyme. The active site of cellobiohydrolase is the cellulose tunnel through which the enzyme attaches, cuts, and moves along the cellulose fiber.

Wild Type Models

Modified Type Models

Both Models

The results showed that overall shape of the wild type (tan) and modified type (blue) cellobiohydrolases were the same, and the active site was conserved. A 3D model of the modified version was also 3D printed to confirm that the active site tunnel was the correct shape. The modifications were labeled on the printed model to make sure they were not involved in the active site of the enzyme. 





This experiment modeled the 3D structure of the original cellobiohydrolase. Computationally, every individual amino acid was mapped and aligned to find conserved areas, as well as potential modification locations. A final protein design was engineered and 3D modeled. It was compared to and validated against the natural wild type enzyme structure. The total calculated ∆∆G from all the modifications in the engineered molecule was 13.77 Kcal/mol.  

The modifications caused many important changes. The number of residues stabilized by hydrogen bonds in the protein increased from 331 in the wild molecule to 335 in the engineered enzyme. The total accessible surface area decreased and the volume decreased by 3%, which will increase the stability inside the protein This leads to closer packing of the residues, and increased London Dispersion forces. The engineered molecule has 40% less buried charges, lowering instability. The number of disulfide bonds increased by 3, improving tertiary structure. The number of ion pairs including salt bridges and partial charge pairs in the engineered protein increased by 7. These forces make a significant contribution to thermal stability. Lastly, the Enzyme commission and Gene Ontology analyses of both the wild protein and the engineered protein are identical giving confidence that the engineered protein should retain the original enzyme functions.


Because it is a theoretical model, in a cellular environment, the predicted folding of the protein may turn out different from what was expected. The change in amino acid sequence would require targeted modified sequence through mutations. Unexpected difficulties may be encountered while making calculated mutations while trying to preserve other important parts of the protein. Multiple trials will allow for a higher success rate in achieving the desired modification of the protein. Develop more than one modification model so that the chances of creating a successfully modified protein are improved.


Future Research

In the future, I would like to produce a sample of the customized Cellobiohydrolase. I will use UGENE to create the complementary DNA sequence of the modified protein, and use this to create the DNA strand. This strand can then be incorporated into a yeast plasmid, which will produce the new Cellobiohydrolase. I would then test the new enzymes activity at a range of temperatures to assess whether there was a significant improvement in thermal stability. To do this, I would need to contact other labs to create a plasmid DNA sequence for my protein so that it can be produced by the fungus.



Cellulosic ethanol is the leading next generation biofuel with strong economic importance. This research design will potentially increase the Cellobiose yield and hence the Ethanol production by 300 % of the current rate. Cellulosic  ethanol will not cut into usable food resources. Huge environmental benefit by emission reduction: Cellulosic Ethanol can lower GHG emissions by over 98% relative to petroleum based gasoline. This is a carbon-cycle balanced and renewable way to produce cellulosic ethanol which means we recycle carbon rather, than exposing underground carbon reservoirs to the atmosphere.


About me

I am a senior and an International Baccalaureate diplomate at Interlake High School. I would like to study Environmental Engineering and Computer Science in the future. My interest in STEM began when I competed in the Department of Energy’s National Science Bowl and have since qualified for nationals twice. I have been inspired by Elon Musk, and Francis Arnold, my favorite scientists. Science and engineering have been an integral part of my life. I have presented my research at the Intel ISEF, Imagine Tomorrow, Junior Science and Humanities Symposium, Genius Olympiad, the Army and Technology conference in Washington DC, and at the American Junior Academy of Sciences. In my spare time you can find me following the migratory path of Red Knots, my favorite birds, peering through my microscope studying diatoms in ocean water or measuring the orbital periods of Jupiter’s moons through my telescope. I specialize in creating cheesy science jokes, with a focus on mitochodria memes. I volunteer at the at the Boys and Girls club, Families for Effective Autism Treatment teaching kids with special needs and mentor middle school science teams. I have been working in the University of Washington’s Armbrust Lab since my Junior year and this summer worked as a research intern at the School of Marine and Atmospheric Sciences in Stonybrook. Winning the Google Science Fair would help me find more resources to take my research to the next level, travel the world, and help others in my community become interested in science.

Health & Safety

This project used bioinformatics and was conducted on a desktop computer from my house. I visited the Fred Hutchinson Cancer Research Center, University of Washington, and Institute of Systems Biology to do background research and learn how to use the protein modeling software ROSETTA. 

My mentor for this project was Alexandra Corella from Fred hutchinson Cancer Research Center. 

I have done this project independently. Alexandra Corella, has been my mentor and has been immensely helpful. She has always answered all questions I had regarding the project and helped me with research articles whenever I needed more information. She also arranged a visit at the Baker Lab where I was able to learn prediction and design of protein structures and protein folding mechanisms.

 I have acquired access to the computer software through my school’s server. I-TASSER was used from the Yang Zhang Lab. ROSETTA was used from the University of Washington. CHIMERA was used to visualize the 3D protein models on the computer, and UGENE was used to analyze the FASTA sequences. 


Alexandra Corella, Ph.D. Candidate

Human Biology


Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
Seattle, WA 98109

(206) 667-5000


A 3D printer was used for my project. It was a Craftbot 3D printer. I had parent supervision when I used the printer. 



Bibliography, references, and acknowledgements

Wang, Nam S. "Cellulose Degradation." Cellulose Degradation. Uivity of Maryland Department

of Chemical & Biomolecular Engineering, n.d. Web.

Center for Climate and Energy Solutions. (2009). Cellulosic Ethanol Factsheet. Retrieved from Cellulosic Ethanol:

Kubicek, C. P., Mikus, M., Schuster, A., Schmoll, M., & Seiboth, B. (2009). Metabolic Engineering Strategies for the improvement of Cellulase Production by Hypocrea jecorina. BioMed Central.

Moumita, S., Hong, G., & Smith, J. C. (2010). Catalytic Mechanism of Cellulose Degradation by a Cellobiohydrolase.

Quan, L., Ly, Q., & Zhang, Y. (2016). STRUM: structure-based prediction of protein stability changes upon single-point mutation. Zhang Lab.

Ivan Getov †, Marharyta Petukh † and Emil Alexov. (2016). SAAFEC:Predicting the Effect of

Single Point Mutations on Protein Folding Free Energy Using a Knowledge-Modified

MM/PBSA Approach. International Journal of Molecular Sciences.

Bruce Watt, University of Maine, (copyright owner photograph Hypocrea jecorina)

Trudeau, D. L., Lee, T. M. and Arnold, F. H. (2014), Engineered thermostable fungal cellulases exhibit efficient synergistic cellulose hydrolysis at elevated temperatures. Biotechnol. Bioeng., 111: 2390–2397. doi:10.1002/bit.25308

 Mingardon F, Bagert JD, Maisonnier C, Trudeau DL, Arnold FH. Comparison of Family 9 Cellulases from Mesophilic and Thermophilic Bacteria . Applied and Environmental Microbiology. 2011;77(4):1436-1442. doi:10.1128/AEM.01802-10.



I would like to thank my mentors:

MIT THINK Scholars Program 2017

Dr. Alexandra Corella, Ph.D. Cell/Cellular and Molecular Biology, UW

Dr. Christine Hickman, Ph.D. IB Biology teacher at Interlake High School

Dr. Phil Bradley, Baker Lab Fred Hutchinson Research Center

The MIT THINK Scholars program has chosen my project as one of the winners for the year 2017, as an honorable mention in this program, I have access to mentors from this program which I am very grateful for. Alexandra Corella, has been my mentor from last year and has been immensely helpful. I have acquired access to the computer software through my school’s server. I received guidance from Dr. Phil Bradley from Fred Hutchinson Research Institute. I have visited the Bradley Lab at Fred Hutchinson, which uses ROSETTA to create designer proteins, and I have witnessed how they use the software in their research.