Posted on Authorea 1 Sep 2022 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.166030463.37247755/v2 | This a preprint and has not been peer reviewed. Data may be preliminary.
Molecular insight into cellulose degradation by the phototrophic
green alga Scenedesmus.
Julieta Barchiesi1 , Marı́a B. Velazquez1 , Marı́a Busi1 , Diego F. Gomez-Casati1 , and
Chitralekha Nag-Dasgupta2
1
2
Centro de Estudios Fotosinteticos y Bioquimicos
Research Cell Lucknow Amity University Uttar Pradesh India
September 1, 2022
Abstract
Lignocellulose is the most abundant natural biopolymer on earth and a potential raw material for the production of fuels and
chemicals. However, only some organisms such as bacteria and fungi produce the necessary enzymes to metabolize it. In this
work we detected the presence of extracellular cellulases in the genome of five species of Scenedesmus. These microalgae grow
in both, freshwater and saltwater regions as well as in soils, displaying highly flexible metabolic properties. The comparison of
sequences of the different cellulases with hydrolytic enzymes from other organisms by means of multi-sequence alignments and
phylogenetic trees showed that these enzymes belong to the families of glycosyl hydrolases 1, 5, 9 and 10. In addition, most
of these presented a greater similarity of sequence with enzymes from invertebrates, fungi, bacteria and other microalgae than
with cellulases from plants; and the 3D modeling data obtained showed that both the main structures of the modeled proteins
and the main amino acid residues implicated in catalysis and substrate binding are well conserved in Scenedesmus enzymes.
We propose that these cellulase-producing phototrophic microorganisms could act as catalysts for the hydrolysis of cellulosic
biomass fueled by sunlight.
Molecular insight into cellulose degradation by the phototrophic green alga Scenedesmus.
Marı́a B. Velazqueza , Marı́a V. Busia , Diego F. Gomez-Casatia , Chitralekha Nag-Dasguptab * and Julieta
Barchiesia *.
a
Centro de Estudios Fotosintéticos y Bioquı́micos (CEFOBI-CONICET), Universidad Nacional de Rosario,
Argentina.
b
Research Cell, Lucknow, Amity University Uttar Pradesh, India.
Running title: Cellulose degradation by Scenedesmus.
Julieta Barchiesi*
Corresponding autor at: Centro de Estudios Fotosintéticos y Bioquı́micos (CEFOBI-CONICET), Suipacha
570, Rosario 2000, Argentina.
E-mail address :barchiesi@cefobi-conicet.gov.ar
Chitralekha Nag Dasgupta*
Co-corresponding autor at: Research Cell, Lucknow, Amity University Uttar Pradesh, Sector 125, Noida,
201313, India.
E-mail address : cndasgupta@lko.amity.edu
1
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Abstract
Lignocellulose is the most abundant natural biopolymer on earth and a potential raw material for the production of fuels and chemicals. However, only some organisms such as bacteria and fungi produce the necessary
enzymes to metabolize it. In this work we detected the presence of extracellular cellulases in the genome of
five species ofScenedesmus . These microalgae grow in both, freshwater and saltwater regions as well as in
soils, displaying highly flexible metabolic properties. The comparison of sequences of the different cellulases
with hydrolytic enzymes from other organisms by means of multi-sequence alignments and phylogenetic trees
showed that these enzymes belong to the families of glycosyl hydrolases 1, 5, 9 and 10. In addition, most of
these presented a greater similarity of sequence with enzymes from invertebrates, fungi, bacteria and other
microalgae than with cellulases from plants; and the 3D modeling data obtained showed that both the main
structures of the modeled proteins and the main amino acid residues implicated in catalysis and substrate
binding are well conserved in Scenedesmus enzymes.
We propose that these cellulase-producing phototrophic microorganisms could act as catalysts for the hydrolysis of cellulosic biomass fueled by sunlight.
Keywords
Cellulases, Scenedesmus, Endoglucanases, β-glucosidases, exocellulases
1. Introduction
Generation of renewable energy resources and waste management are the major concern in twenty first
century. Lignocellulosic agricultural and forest wastes are the promising feedstock for production of biofuel
and value-added products due its high availability and low cost1 . Nevertheless, no commercial process has
still been reported for the enzymatic hydrolysis of cellulose. The main reason is the high cost of the required
enzymes, their low specific activity, their susceptibility to inactivation and the difficulty to recycle them2 .
A group of naturally occurring cellulases are reported from heterotrophic microorganisms including bacteria
and fungi3 . These organisms secrete cellulases to utilize cellulose as a carbon source. Bioconversion processes involve the hydrolysis of cellulose to produce reducing sugars; further fermentation of the sugars to
ethanol and other bioproducts 4 . Cellulases hydrolyze the β-1,4 glycosidic bonds of the glucose polymer
by two different ways, endoglucanases cut random positions along the cellulose chain, and exoglucanases
progressively act on the terminal ends of the polymer, releasing either glucose molecules, or cellobiose3 . Finally, the cellobiose molecules produced are converted to glucose by intra- and extracellular β -glucosidases
(EC 3.2.1.21), celludextrinases (EC 3.2.1.4), and cellodextrin phosphorylases (EC 2.4.1.49), depending upon
the characteristic of each cellulolytic species 5 . Other than heterotrophs, cellulases belonging to glucoside
hydrolase family (GH9) are also described from higher plants6 . However, it has been reported that plant
cellulases participate in the biosynthesis and/or remodeling of cellulose rather than in its degradation 6,7 .
Algae are phototrophs, ubiquitous with versatile metabolic pathways, which have been well exploited to
obtained multiple products through algal refinery 8,9 . However, the presence of cellulases and cellulolytic activity was poorly described in algae. In 1966, Dvořáková-Hladká et al. reported the presence of β-glucosidase
activity in S. obliquus , which allows it to grow using cellobiose as a substrate 10 . In 1970 Burczyk and
col. reported the presence of extracellular cellulases in Scenedesmus obliquusbecause cell walls accumulated
in the medium as a result of mother cells autospore release were deprived of the cellulose layer present in
daughter cells. 11 . In 2012, Blifernez-Klassen and col. observed that the photoheterotrophic microalgae
Chlamydomonas reinhardtii , was also capable of degrading and assimilating exogenous cellulose 3 . This
interesting finding led us to investigate the presence of cellulases in S. quadricauda . This organism is a
freshwater, non-mobile green algae which usually forms colonies of four cells. It belongs to the same class of
green algae (Chlorophyceae) as the genus Chlamydomonas . S. quadricaudahas gained great importance due
to its high capacity for effluent treatment, CO2 capture and biofuel production as we showed in a previous
work 9 . The S. quadricaudaLWG002611 genome was sequenced and functional genes of different metabolic
pathways were identified, such as those involved in the synthesis of triacyl glycerol (TAG) 9 . However, no
2
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
evidence on cellulase secretion and cellulose utilization in this alga was found or reported, as well as genes
that encode proteins with glycoside hydrolase activity have not yet been described.
In this work, we have identified different genes in the genome sequence of S. quadricauda LWG002611, belonging to GH1, GH5, GH9 and GH10 families. Furthermore, a comparative bioinformatic analysis was conducted in several available Scenedesmaceae algae genome (Scenedesmus obliquus EN0004 v1.0, Scenedesmus
obliquusUTEX B 3031, Scenedesmus obliquus var. DOE0013 v1.0,Scenedesmus sp. NREL 46B-D3 v1.0, and
S. quadricaudaLWG002611) to identify multiple homologs of endoglucanase, β-glucosidase and exocellulase
genes. Additionally, a phylogenetic analysis and a 3D protein modeling were achieved. Our results showed
that the 3D structures of all the modeled domains obtained and the main catalytic amino acid residues
implicated in cellulolytic activity are well conserved in the Scenedesmus analyzed enzymes.
These new findings open the opportunity to identify new cellulases from algae, as well as carry out their
functional characterization to be used in biotechnological applications.
2. Methods
2.1. Sequence search, alignment and phylogenetic analysis
Cellulase sequences of S. quadricauda were identified from genome sequence data of our previous study 9
using protein folding homology analysis by Phyre2 12 and Blast-N similarity study 13 with Monoraphidium neglectum taken as reference, and their details are included in table 1. Other analyzed sequences of
Scenedesmus were taken from PhycoCosm 14 or NCBI {https://www.ncbi.nlm.nih.gov/} and their accession
numbers are shown in table 2. Conserved domains, signal peptide, and GH-family assignment were identified
with Prosite patterns 15 , DeepLoc 16 and PredAlgo17 . The sequences were aligned and processed with Clustal
Omega 18 and visualized with ESPript 3.019 . To construct the phylogenetic trees, all the sequences were
aligned with sequences from phylogenetically distant β-1,4-endoglucanases, β-glucosidases or exocellulases
(respectively) from microalgae, fungi, plants, invertebrates and bacteria and processed with Gblock v0.91b
before analyzing them in MEGA 6.0620,21 . Enzymes signal peptides were not included in the phylogenetic
analysis. The phylogenetic trees were built by Maximum Likelihood method in MEGA 6.06 version with the
model and the restrictions suggested by the program. Phylogenies were determined by Bootstrap Analysis
of 100 replicates.
2.2. Protein 3D modeling
The protein 3D models were generated with RaptorX Contact Prediction Server 22 . Superposition between
each model and the templates was done using the align command in PyMOL 2.3.1 version {The PyMOL
Molecular Graphics System}. The regions implicated in substrate binding and activity were manually annotated using the pattern sequences or 3D structure of cellulase templates available in Prosite and RCSB Pdb
database (rcsb.org 23 ).
3. Results and Discussion
3.1. Identification of genes encoding different cellulases in Scenedesmus LWG002611
In the present work, we performed an analysis of Scendesmus quadricauda LWG002611 draft genome sequence, and eleven endoglucanases, β-glucosidases and exoglucanases gene sequences have been detected by
protein folding homology analysis by Phyre2 and similarity study withMonoraphidium neglectum, a closely
related species which has been taken a reference sequence for draft genome analysis.
These sequences have shown 82.26-98.38% similarity with the reference sequences of M. neglectum (Table
1).
According to the amino acid sequence analysis by Pfam two GH9 (Scequ2611|3068 and Scequ2611|4665),
one GH5 (Scequ2611|2009), three GH1 (Scequ2611|3544, Scequ2611|9833 and Scequ2611|10006), one
GH10 (Scequ2611|547), and four undefined GH family short proteins (Scequ2611|8404, Scequ2611|9353,
Scequ2611|13370 and Scequ2611|13657) were detected (Table 1, Supplementary file 1)
3
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
3.2. In silico analysis of Scenedesmaceaecellulases
We have used different bioinformatic tools to analyze the cellulases present in five different species of
Scenedesmus:Scenedesmus obliquusEN0004 v1.0, Scenedesmus obliquus UTEX B 3031, Scenedesmusobliquus
var. DOE0013 v1.0, Scenedesmus sp . NREL 46B-D3 v1.0, Scenedesmus sp PABB004 and Scenedesmus
quadricaudaLWG 002611. The presence of conserved motifs, the arrangement of diverse domains (catalytic
domain, carbohydrate-binding module (CBM) and linker regions) and the phylogenetic relationship between
the proteins were determined. Their amino acid sequences were compared with enzymes previously characterized from others taxonomic groups.
The cellulases sequences analyzed in this study were classified with KEGG ENZYME Database Entry by
Phycocosm 14 , into three groups: (i) endo-β-glucanases (EC 3.2.1.4), (ii) β-glucosidases (EC 3.2.1.21) and
(iii) cellobiohydrolases or exoglucanases (EC 3.2.1.91) (Table 2).
In S. quadricauda LWG002611 sequences with a higher length than 300 amino acids and with higher similarity
with previously studied cellulases are code by the genes: Scequ2611|2009, Scequ2611|9833 and Scequ2611|547
(percentage of identity: 98.32%, 93.93% and 82.26% respectively, table 1), which were chosen to be included
in the posterior bioinformatic analysis
3.2.1. Ενδο-β-γλυςανασες
Among the selected species we found thirty genes that encode different endoglucanases. Their catalytic
domains belong to the GH5 and GH9 family of CAZymes (Carbohydrate-Active Enzymes) 24 and seven of
these proteins present CBM from families 1 and 2. While CBM2 were found only in GH5 endoglucanases,
CBM1 were associated with some GH9 endoglucanases (Figure 1 and Table 2).
Our results showed that three GH9 endoglucanases, Sceob1|18668, SceobE4|579152, SceobDOE|757385,
present Big 1 domains in their C-terminal. Big 1 is a bacterial immunoglobulin (Ig)-like domain, usually present in GH9 endoglucanases. The functions of these Ig-like modules are not clear; however, they
are supposed to be involved in the catalytic efficiency or in the structural stability of GH9 endoglucanases
25,26
. On the other hand, KAF8065624.1 present a LPMO domain in its N-terminal region. LPMO is a
lytic polysaccharide monooxygenase domain that usually act synergically with GH9 domains, increasing the
cellulolytic enzyme activity27 .
Most of the GH9 endoglucanases analyzed are secreted or anchored to the cell membrane (with the catalytic
domain localized to the outer surface of the plasma membrane). On the other hand, only three identified
proteins GH5 endonucleases are predicted to be secreted, while the other four identified proteins would be
cytoplasmic enzymes (Table 2).
The analysis performed with the Prosite Database showed three highly conserved regions present in GH9
endoglucanases which contains conserved residues important for their catalytic activity 5,28 (Figure 2). The
first region comprises the DAGD motif, where the first Asp (Asp54, Nasta 1KS8 numbering) is an active site
residue. The second region contains a conserved RPHHR sequence, where the first His (His359, Nasta 1KS8
numbering) is also part of the active site of GH9 endoglucanases. Finally, Region III contains two Asp and
Glu residues (Asp399 and Glu412, Nasta 1KS8 numbering) that would be involved in catalysis. Thus, all the
proteins identified and analyzed in this study contain four acidic residues, D, H, D and E, in the mentioned
regions, which would be part of the active site of GH9 endoglucanases and would be involved in catalysis,
with the exception of ScsoPA4|KAF8062061.1 and SceobDOE|1035052 proteins, that lack the catalytic D
from Region I and the H residue from Region II, respectively (Fig 2.A.).
S. quadricauda LWG002611 possess another two hypothetical GH9 endoglucanase proteins (Scequ2611|3068
and Scequ2611|4665) showing a high similarity with respect to the analyzed Scenedesmus endoglucanases
(Scequ2611|3068 34-36% with KAF6265438.1 and KAF8066338.1, Scequ2611|4665 30% to KAF6264795.1)
but they lack the essential Glu residue from Region III (Fig 2B). Further studies are needed to determine if
Scequ2611|3068 and Scequ2611|4665 are catalytically inactive enzymes or if they have a different mechanism
than traditional glucosyl hydrolases.
4
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Cellulases usually present linkers with length from 6–14 residues long up to >100 residues 5,29 .
FromScenedesmus endoglucanases analyzed, most of the CBM1 containing enzymes showed putative P/Srich or poliQ linkers, mainly located between the GH9 and the CBM regions (Fig 2.A.). These proline
or glutamine rich spacers constitute a rigid type of linkers which are thought to act as spacers to avoid
non-native interactions between domains that may affect the correct folding of proteins30 . In addition, the
linkers would allow cellulases to push forward on the exterior of the polysaccharide with a caterpillar-like
movement 5,31 .
In contrast to what was described in GH9 endoglucanases from other algae, such as Chlamydomonas , Volvox
and Gonium 5 CBM1 domains are located either at the C- or N-terminus of some the studied Scenedesmus
GH9 endoglucanases. Notably, both, the N- and the C-terminus CBM1 analyzed, are cysteine-rich domains
as previously described in Chlamydomonas .
A phylogenetic tree was also constructed (Fig. 3) using Gblock and the MEGA 6.06 software from the
alignment of the amino acid sequences from Scenedesmus GH9 enzymes together with homologous sequences
identified with Blast-P from invertebrates, fungi, plants and bacteria. The organization of the tree branches
suggest that Scenedesmus GH9 endoglucanases are evolutionarily closer to termites, worms, sea urchins and
bivalves GH9 cellulases (red branches of the tree) rather than the enzymes from higher plants, fungi and
bacteria.
Figure 4 shows the 3D model of the Sceobl1|32711 GH9 and CBM1 domains constructed with RaptorX
Contact Prediction. The model created presents a similar fold to that described for previously characterized
GH9 endoglucanases 5,32 with a (α/α)6 -barrel fold. Besides, the catalytic amino acid residues are positioned
in a similar spatial location when Sceobl1|32711 model was superposed with 1ks8 template (an endocellulase
from the termite Nasutitermes takasagoensis 40.24% identical to Sceobl1|32711 (79% cover). On the other
hand, its N-terminal CBM1 showed a high sequence identity (99.6%, cover 35%) with the cellulose-binding
domain of endoglucanase I fromTrichoderma reesei (PDB entry: 4BMF) and its model present a good spatial
conservation when both models where superposed.
Regarding GH5 endoglucanases, they present the consensus pattern [LIV]-[LIVMFYWGA](2)-[DNEQG][LIVMGST]-{SENR}-N-E-[PV]- [RHDNSTLIVFY] 15 . The C-terminal Glu is an active site residue. The
predicted catalytic residues, Glu168 and Glu309 (Pyrho 3W6M numbering) are strictly conserved in all the
GH5 endonucleases analyzed (Figure 5).
The Figure 6 shows the GH5 endoglucanase phylogenetic tree. The tree branching organization suggest
that most of the GH5 proteins analyzed are evolutionarily closer to those of other microalgae and higher
plants. However, the group of enzymes containing a CBM2 are closer to fungal and bacterial endoglucanases,
suggesting a microbial origin.
The homology model of the Sceobl1|14060 GH5 domain present the (α/β)8 TIM barrel fold classical of
GH5 family 33 (Figure 7). The superposition with Pyrho 3W6M PDB structure (an hyperthermophilic
endocellulase from Pyrococcus horikoshii ) showed a conservation of the catalytic amino acid spatial location.
Respects its N-terminal CBM2 domain, Sceobl1|14060 has a high sequence identity (97.3% identity, 68%
cover) with 2RTT PDB structure (a chitin-binding domain of Chi18aC from Streptomyces coelicolor );
moreover, both models showed a high structural and spatial conservation.
3.2.2. β-γλυςοσιδασες
In the the analyzed genomes we found twenty-nine putative β-glycosidases, all of them belonging to the GH1
family of CAZymes (Figure 8). The most common enzymatic activities reported for glycoside hydrolases of
this family are β-glucosidases and β-galactosidases.
It has been previously described that one of the highly conserved regions in GH1 sequences has a glutamic acid
residue and is classified as GH1 1 34 . This region between positions 388-392 (Nanochloropsis β-glucosidase
GH1 numbering, PDB code: 5YJ7,) presents a conserved sequence (V/I)TENG. The Glu residue would
5
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
participate in the cleavage of the glycosidic bond by acting as a nucleophile
was first identified as Glu358 in a β-glucosidase from Agrobacterium35 .
34
. This catalytic nucleophile
In our work, the conserved region (V/I)TENG was found in all the GH1 sequences analyzed: (Figure
9.A) GH1 1. The extended region defined as consensus is: [LIVMFSTC]-[LIVFYS] [LIV]-[LIVMST]-E-NG-[LIVMFAR]-[CSAGN]. All of the proteins chosen in this work possess in this region the Glu involved in
catalysis followed by Asn and Gly, as can be seen in Figure 9. However, the four amino acids upstream Glu
residue appear to differ in the proteins, with Ile-Trp-Ile-Thr being predominant.
As a second signature pattern, the conserved region GH1 2 (Figure 8.A) was chosen for our analysis. This
region, defined as: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA], is located at
the N-terminal of GH1 β-glucosidases, however, it may not be present in some proteins of this family. The
alignment of Sceobl|9031 and Sceobl|10236 sequences showed that these proteins do not contain that region
(Figure 9.A). On the other hand, the protein SceoblEN4 |617109, possesses only nine of the fifteen amino acids
established as consensus, which is equivalent to 60 %, and therefore would be considered that this domain is
present but with some variations in the amino acid sequence. Similarly, Sceobl1|35463 and Scequ2611|9833
proteins have only 46 and 40 % of the consensus sequence, respectively. Scesp1|1644545 only possesses the
last four amino acids of this consensus sequence, while the other eleven, do not correspond to the established
consensus. The sequence of KAF8060308.1 BGLU11 shows some particular characteristics in this region.
Although it possesses 86 % sequence identity respect to the established consensus sequence, it presents an
insertion of several amino acids downstream the consensus (34 residues), before x-E-x-[GSTA]. This protein
also contains an additional domain in its N-terminal region, a protein disulfide isomerase domain (cl36828:
ER PDI fam Superfamily,36 ), previously involved in protein folding 37 .
Interestingly, only five of the proteins analyzed contains the first Phe residue, which is characteristic of the
GH1 2 pattern. It is interesting to note that none Scenedesmaceae β- glycosidases analyzed in this study
presents CBM nor linkers.
On the other hand, S. quadricauda LWG002611 have a hypothetical GH1 β-glucosidase protein, named
Scequ2611|3544, with high similarity with other Scenedesmus endoglucanases (86% cover and 29.45 % identity with KAF8059426.1) but that lacks the region containing the catalytic Glu residue (Figure 9.B).
The phylogenetic tree performed shows four β-glucosidase subgroups: (i) the plant GH1 subgroup, (ii) the
GH1 from algae, (iii) the GH1 from fungi, and (iv) the GH1 from bacteria (Figure 10). The Scesp1|1509300,
SceobDOE|17466, SceoblEN4|575894, and SceobDOE|32074 proteins are grouped in a branch close to the
bacteria enzyme, that proposes the possible acquisition of these genes by horizontal transfer. On the other
hand, the Scequ2611|9833 protein was found within a large group of GH1 enzymes from algae. This result
suggests the correct inference of its sequence and the possibility that orthologous genes are those that code
for proteins found on a nearby branch.
There is at least one representative of each species corresponding to a genus in each branch of the group
of proteins from algae. This result suggests that the different β-glucosidases present in the different species
could fulfill the same function and that it would not be redundant within the genus.
The homology model of the Sceob1|9434 β-glucosidase constructed with RaptorX Contact Prediction is shown
in Figure 11. The superposition with Nannochloropsis oceanica BGLN1 β-glucosidase crystal structure (PDB
code: 5YJ7) shows a catalytic amino acid positional conservation in the central region. Also, it presents an
overall structure of TIM barrel, and the ENG residues conserved, which suggest a reliable protein function
assignment of this new protein in
Scenedemus quadricauda.
3.2.3. Exocellulases
Exoglucanases or cellobiohydrolases (CBH) (EC 3.2.1.74; 1,4-β-D-glucan-glucanhydrolase) catalyze the successive hydrolysis of residues from the reducing and non-reducing ends of the cellulose polysaccharide, re-
6
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
leasing cellobiose molecules as main product of the reaction 38 . These enzymes account for 40 to 70% of the
total component of the cellulase system, and are able to hydrolyze crystalline cellulose.
An excellent candidate for use as a bait to explore algal genomes is the GH10 from Cellulomonas fimi (CexP07986 Uniprot). High resolution crystal structures are available and there is a large literature on the kinetic
characterization of the enzyme and the identification of amino acid residues important to the mechanism of
catalysis 39 . Among all the Scenedesmaceae genomes studied we found at least eleven enzymes, all of them
putative GH10 bifunctional cellulase/xylanase proteins (Figure 12A).
These enzymes are monomeric proteins with a molecular mass ranging from 50 to 65 kDa, although there are
smaller variants (41.5 kDa) in some fungi, such as Sclerotium rolfsii 40 . The calculated molecular mass of
five of the eleven algal proteins studied were higher (between 70 and 100kDa), while in the other six enzymes
is within the expected range (table 2).
In general, most of the proteins containing GH10 domains show a structure that matches an eightfold α/
β-barrel with a profound channel in the center 41 . However, it has been proposed that the exocellulases from
the GH10 family form a transient tunnel by the extension of some loop regions upon substrate binding42 .
It has been reported that GH10 enzymes present a double-displacement ’retaining’ hydrolysis mechanism,
where one catalytic residue acts a nucleophile and the other acts as a general acid/base43 . The catalytic
nucleophile in Cex is Glu274 and the putative acid/base catalyst is Glu168 44 . As shown in figure 12B, both
residues are full conserved in the algal protein sequences. However, they were replaced by an Asp and Ile
residues in the Scequ2611|547 protein.
We also performed a phylogenetic tree for the GH10 exoglucanase (Figure 13) from the alignment of the amino
acid sequences fromScenedesmus GH10 enzymes together with homologous sequences from invertebrates,
fungi, plants and bacteria. The branches distribution suggests the GH10 analyzed proteins are evolutionarily
closer to those of other microalgae and higher plants.
In addition, we built a sequence homology model of the Sceob1|4623 exocellulase using RaptorX Contact
Prediction is shown in Figure 14. The superposition with Cellulomonas fimi exocellulase crystal structure
(PBD code: 1EXG) shows a spatial location conservation of the algal protein residues Glu236 and Glu338
with respect to catalytic residues from the bacterial protein, also suggesting the conservation of the catalytic
site.
Conclusion
The discovery of new microbial sources of cellulases is a crucial strategy to reduce costs of various industrial
processes using such enzymes. Cellulases are produced by various microorganisms including bacteria, fungi
and actinomycetes. Recently was reported that they are also produced by some animals like termites and
crayfish without certainties about his role in vivo 45 . The search, isolation and identification of new cellulose
degrading microorganisms from different environments are of crucial importance to get new cellulases with
unique and distinctive characteristics.
Microalgae are considered a valuable source of new enzymes with biotechnological potential. However, the
presence of cellulolytic enzymes is meagre studied form this photosynthetic microorganisms.
Different works published during the last decade report cellulolytic activity (either by experimental evidence
or by bioinformatic analysis) in C. reinhardtii , V. carteri , G. pectorale andA. protothecoides but the genus
Scenedesmus had not been analysed3,5,46 .
This is the first bioinformatic analysis of Scenedesmaceae cellulases reported. It comprises GH5 and GH9
β-1,4-endoglucanases, GH1 β-glucosidases and GH10 exoglucanases. Our results shows that GH9 endoglucanases analyzed are phylogenetically closer to invertebrates, termites and bivalve rather than higher plant,
bacteria or fungi. On the other hand, most of GH1 β-glucosidases analyzed are evolutionarily closer to
enzymes of other microalgae, however, four of them are grouped in a branch close to the bacteria enzymes,
7
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
result that suggests the probable gaining of their genes by horizontal transfer. In contrast, GH5 and GH10
studied enzymes are evolutionarily closer to enzymes of other microalgae and higher plants.
Most of the analyzed enzymes present signal peptides for membrane anchoring or extracellular secretion.
This result suggests the presence of extracellular cellulolytic machinery in Scenedesmaceae. Only some
of the analyzed enzymes were found to have additional modules and linkers besides its GH domains, and
particularly a few endoglucanases have CBM modules, from CBM1 and CBM2 families.
The combination of GH catalytic domains together with CBMs and, in some cases linkers, propose that
these cellulases would present an enhanced cellulolytic activity.
The presence of this battery of enzymes in the photoheterotrophic algaeScenedesmus suggest that these
organisms are perfectly prepared for use of cellulose as carbon source. This strategy would represent an
advantage that would have allowed Scenedesmaceae to occupy many environments in nature.
The findings reported in this work explores just one family within the Chlorophyta taxon, but it increases
the evidence in favor of the presence of conserved cellulolytic machinery in photoheterotrophic organisms
and encourages to continue with the search for cellulases in other species of microalgae.
References
1 Saini, J. K., Saini, R. & Tewari, L. Lignocellulosic agriculture wastes as biomass feedstocks for
second-generation bioethanol production: concepts and recent developments. 3 Biotech5 , 337-353,
doi:10.1007/s13205-014-0246-5 (2015).
2 Lynd, L. R., Weimer, P. J., van Zyl, W. H. & Pretorius, I. S. Microbial cellulose utilization: fundamentals
and biotechnology.Microbiology and molecular biology reviews : MMBR 66 , 506-577, table of contents
(2002).
3 Blifernez-Klassen, O. et al. Cellulose degradation and assimilation by the unicellular phototrophic eukaryote Chlamydomonas reinhardtii. Nature communications 3 , 1214, doi:10.1038/ncomms2210 (2012).
4 Menon, V. & Rao, M. Trends in bioconversion of lignocellulose: Biofuels, platform chemicals & biorefinery concept.
Progress in Energy and Combustion Science 38 , 522-550,
doi:https://doi.org/10.1016/j.pecs.2012.02.002 (2012).
5 Guerriero, G. et al. Novel Insights from Comparative In Silico Analysis of Green Microalgal Cellulases.
International journal of molecular sciences 19 , doi:10.3390/ijms19061782 (2018).
6 Hayashi, T., Yoshida, K., Park, Y. W., Konishi, T. & Baba, K. Cellulose metabolism in plants. Int Rev
Cytol 247 , 1-34, doi:10.1016/S0074-7696(05)47001-1 (2005).
7 Minic, Z. Physiological roles of plant glycoside hydrolases.Planta 227 , 723-740, doi:10.1007/s00425-0070668-y (2008).
8 Dasgupta, C. N. et al.
Dual uses of microalgal biomass:
proach
for
biohydrogen
and
biodiesel
production.Applied
Energy
doi:https://doi.org/10.1016/j.apenergy.2015.01.070 (2015).
An integrative ap146
,
202-208,
9 Nag Dasgupta, C. et al. Draft genome sequence and detailed characterization of biofuel production by oleaginous microalga Scenedesmus quadricauda LWG002611. Biotechnology for biofuels11 , 308,
doi:10.1186/s13068-018-1308-4 (2018).
10 Dvořáková-Hladká, J. Utilization of organic substrates during mixotrophic and heterotrophic cultivation
of algae. Biologia Plantarum 8 , 354, doi:10.1007/bf02930672 (1966).
11 Burczyk, J., Grzybek, H., Banas, J. & Banas, E. Presence of cellulase in the algae Scenedesmus. Experimental cell research63 , 451-453 (1970).
8
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
12 Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for
protein modeling, prediction and analysis. Nature protocols 10 , 845-858, doi:10.1038/nprot.2015.053 (2015).
13 Boratyn, G. M. et al. BLAST: a more efficient report with usability improvements. Nucleic acids research
41 , W29-33, doi:10.1093/nar/gkt282 (2013).
14 Grigoriev, I. V. et al. PhycoCosm, a comparative algal genomics resource. Nucleic acids research 49 ,
D1004-D1011, doi:10.1093/nar/gkaa898 (2021).
15 Sigrist, C. J. et al. PROSITE: a documented database using patterns and profiles as motif descriptors.
Brief Bioinform3 , 265-274, doi:10.1093/bib/3.3.265 (2002).
16 Thumuluri, V., Almagro Armenteros, J. J., Johansen, A. R., Nielsen, H. & Winther, O. DeepLoc
2.0: multi-label subcellular localization prediction using protein language models. Nucleic acids research
, doi:10.1093/nar/gkac278 (2022).
17 Tardif, M. et al. PredAlgo: a new subcellular localization prediction tool dedicated to green algae. Molecular biology and evolution 29 , 3625-3639, doi:10.1093/molbev/mss178 (2012).
18 Sievers, F. & Higgins, D. G. The Clustal Omega Multiple Alignment Package. Methods Mol Biol 2231 ,
3-16, doi:10.1007/978-1-0716-1036-7 1 (2021).
19 Robert, X. & Gouet, P. Deciphering key features in protein structures with the new ENDscript server.
Nucleic acids research42 , W320-324, doi:10.1093/nar/gku316 (2014).
20 Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.
Molecular biology and evolution 17 , 540-552, doi:10.1093/oxfordjournals.molbev.a026334 (2000).
21 Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics
Analysis version 6.0.Molecular biology and evolution 30 , 2725-2729, doi:10.1093/molbev/mst197 (2013).
22 Xu, J. Distance-based protein folding powered by deep learning.Proceedings of the National Academy of
Sciences of the United States of America 116 , 16856-16865, doi:10.1073/pnas.1821309116 (2019).
23 Berman, H. M. et al. The Protein Data Bank. Nucleic acids research
doi:10.1093/nar/28.1.235 (2000).
28 , 235-242,
24 Drula, E. et al. The carbohydrate-active enzyme database: functions and literature. Nucleic acids research
50 , D571-D577, doi:10.1093/nar/gkab1045 %J Nucleic Acids Research (2021).
25 Nguyen, K. H. V. et al. Some characters of bacterial cellulases in goats’ rumen elucidated by metagenomic
DNA analysis and the role of fibronectin 3 module for endoglucanase function. Anim Biosci 34 , 867-879,
doi:10.5713/ajas.20.0115 (2021).
26 Phitsuwan, P., Lee, S., San, T. & Ratanakhanokchai, K. CalkGH9T: A Glycoside Hydrolase Family 9
Enzyme from Clostridium alkalicellulosi.11 , 1011 (2021).
27 Sabbadin, F. et al. An ancient family of lytic polysaccharide monooxygenases with roles in arthropod
development and biomass digestion. Nature communications 9 , 756, doi:10.1038/s41467-018-03142-x (2018).
28 Tomme, P. et al. Identification of a histidyl residue in the active center of endoglucanase D from Clostridium thermocellum.The Journal of biological chemistry 266 , 10313-10318 (1991).
29 Sammond, D. W. et al. An iterative computational design approach to increase the thermal endurance of
a mesophilic enzyme.Biotechnology for biofuels 11 , 189, doi:10.1186/s13068-018-1178-9 (2018).
30 George, R. A. & Heringa, J. An analysis of protein domain linkers: their classification and role in protein
folding. Protein engineering 15 , 871-879, doi:10.1093/protein/15.11.871 (2002).
9
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
31 Receveur, V., Czjzek, M., Schulein, M., Panine, P. & Henrissat, B. Dimension, shape, and conformational
flexibility of a two domain fungal cellulase in solution probed by small angle X-ray scattering. The Journal
of biological chemistry 277 , 40887-40892, doi:10.1074/jbc.M205404200 (2002).
32 Foley, M. H. et al. A Cell-Surface GH9 Endo-Glucanase Coordinates with Surface Glycan-Binding Proteins
to Mediate Xyloglucan Uptake in the Gut Symbiont Bacteroides ovatus. Journal of molecular biology 431 ,
981-995, doi:10.1016/j.jmb.2019.01.008 (2019).
33 Davies, G. & Henrissat, B. Structures and mechanisms of glycosyl hydrolases. Structure 3 , 853-859,
doi:10.1016/S0969-2126(01)00220-9 (1995).
34 de Giuseppe, P. O. et al. Structural basis for glucose tolerance in GH1 beta-glucosidases. Acta crystallographica. Section D, Biological crystallography 70 , 1631-1639, doi:10.1107/S1399004714006920 (2014).
35 Withers, S. G. et al. Unequivocal demonstration of the involvement of a glutamate residue as a nucleophile
in the mechanism of a retaining glycosidase. Journal of the American Chemical Society112 , 5887-5889,
doi:10.1021/ja00171a043 (1990).
36 Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic acids research 48 , D265D268, doi:10.1093/nar/gkz991 (2020).
37 Powers, S. L. & Robinson, A. S. PDI improves secretion of redox-inactive beta-glucosidase. Biotechnology
progress23 , 364-369, doi:10.1021/bp060287p (2007).
38 Quiroz-Castañeda, R. E. & Folch-Mallol, J. L. Plant cell wall degrading and remodeling proteins: current
perspectives %J Biotecnologı́a Aplicada. 28 , 205-215 (2011).
39 Duedu, K. O. & French, C. E. Characterization of a Cellulomonas fimi exoglucanase/xylanaseendoglucanase gene fusion which improves microbial degradation of cellulosic biomass. Enzyme and microbial
technology 93-94 , 113-121, doi:10.1016/j.enzmictec.2016.08.005 (2016).
40 Martin, M., Wayllace, N. Z., Valdez, H. A., Gomez-Casati, D. F. & Busi, M. V. Improving the glycosyltransferase activity of Agrobacterium tumefaciens glycogen synthase by fusion of N-terminal starch binding
domains (SBDs). Biochimie 95 , 1865-1870, doi:10.1016/j.biochi.2013.06.009 (2013).
41 Gao, F., Jiang, Y., Zhou, G. H. & Han, Z. K. The effects of xylanase supplementation on growth, digestion,
circulating hormone and metabolite levels, immunity and gut microflora in cockerels fed on wheat-based diets.
Br Poult Sci 48 , 480-488, doi:10.1080/00071660701477320 (2007).
42 Schubot, F. D. et al. Structural basis for the exocellulase activity of the cellobiohydrolase CbhA from
Clostridium thermocellum.Biochemistry 43 , 1163-1170, doi:10.1021/bi030202i (2004).
43 Gilkes, N. R. et al. Structural and functional relationships in two families of beta-1,4-glycanases. European
journal of biochemistry / FEBS 202 , 367-377, doi:10.1111/j.1432-1033.1991.tb16384.x (1991).
44 MacLeod, A. M., Lindhorst, T., Withers, S. G. & Warren, R. A. The acid/base catalyst in the exoglucanase/xylanase from Cellulomonas fimi is glutamic acid 127: evidence from detailed kinetic studies of
mutants.Biochemistry 33 , 6371-6376, doi:10.1021/bi00186a042 (1994).
45 Watanabe, H. & Tokuda, G. Animal cellulases. Cellular and molecular life sciences : CMLS 58 , 11671178, doi:10.1007/PL00000931 (2001).
46 Vogler, B. W. et al. Characterization of plant carbon substrate utilization by Auxenochlorella protothecoides. Algal Research 34 , 37-48, doi:https://doi.org/10.1016/j.algal.2018.07.001 (2018).
Table 1: Predicted cellulases# of S. quadricauda LWG 002611
S. No.
Genes
Genes
Predicted mRNAs
Predicted proteins
Predicted proteins
Predicted
ID
Length (bp)
Length (bp)
Length (aa)
Size (KD)
GH Fami
10
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
S. No.
Genes
Genes
Predicted mRNAs
Predicted proteins
Predicted proteins
Predicted
1.
2.
3.
4.
5.
6.
8.
9.
10.
11.
12.
2009
3068
8404
9353
4665
3544
9833
10006
13370
13657
547
2176
3243
1472
635
3365
4540
5359
1578
1613
523
4033
963
1539
303
225
1332
1272
1173
540
291
363
2165
321
513
101
75
443
424
390
181
96
121
734
34.45
54.83
11.19
7.69
48.97
48.28
43.37
10.78
10.78
11.91
80.85
GH5
GH9
inc
inc
GH9
GH1
GH1
GH1
inc
inc
GH10
#
Sequences are separately provided in Supplementary file; inc: Inconclusive result due to
short sequence
Table 2
Organism
ID/PBD
ENDOGLUCANASES
Scenedesmus
365111
obliquus
EN0004 v1.0
Scenedesmus
579152
obliquus EN0004
v1.0
Scenedesmus
610731
obliquus
EN0004 v1.0
199
Scenedesmus
obliquus UTEX
B 3031
Scenedesmus
8271
obliquus
UTEX B 3031
Scenedesmus
14060
obliquus
UTEX B 3031
Scenedesmus
18668
obliquus UTEX
B 3031
25222
Scenedesmus
obliquus UTEX
B 3031
Scenedesmus
35062
obliquus
UTEX B 3031
Scenedesmus
19147
obliquus
UTEX B 3031
Conserved
domains
Cell Location
Size (aa),
mol.wt
Taxonomic
group
CBM2, GH5
Cytoplasm
555, 59.86 kDa
Bacteria
GH9
Cytoplasm
716, 75.57 kDa
Green algae,
Insect
5TM, GH9
Extracellular
891,91.43 kDa
Termites
1TM, GH9
Cell Membrane/
Extracelular
665, 70.87 kDa
Earthworms
GH9
Extracellular
489, 53.02 kDa
CBM2, GH5
Cytoplasm
551, 59.37 kDa
Sea urchin
(animal).
Insect
Bacteria
CBM1, GH9
Extracellular
710, 74.88 kDa
Green algae,
Insect
GH9
Extracellular
861, 88.71 kDa
Green algae, Sea
urchin
GH9
Extracellular
487, 53.70 kDa
Termites,
insects
CBM1, GH9
Extracellular
730, 77.78 kDa
Thermophilic
bacterium
11
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Scenedesmus
obliquus UTEX
B 3031
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus
obliquus var.
DOE0013 v1.0
32711
1TM, GH9
Extracellular
629, 67.22 kDa
Microalgae, Sea
urchin, Insect
1035052
GH9
Extracellular
439, 47.68 kDa
Earthworms,
Chordates
739074
GH9
Extracellular
479, 51.60 kDa
Green algae, Sea
urchin
757385
GH9
Extracellular
733, 77.30 kDa
Green algae,
Termites
776386
1TM, GH9
Cell Membrane
(GH9 outside)
521, 57.08 kDa
Termites
826696
1TM, GH9
Cell Membrane
(GH9 outside)
611, 65.76 kDa
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus
obliquus var.
DOE0013 v1.0
Scenedesmus sp.
NREL 46B-D3
v1.0
Scenedesmus sp.
NREL 46B-D3
v1.0
Scenedesmus
sp. NREL
46B-D3 v1.0
Scenedesmus
sp. NREL
46B-D3 v1.0
Scenedesmus sp.
NREL 46B-D3
v1.0
Scenedesmus sp.
NREL 46B-D3
v1.0
Scenedesmus
PABB0004
Scenedesmus
PABB0004
1002809
GH5
Extracellular
419, 45.87 kDa
1008808
CBM2 GH5
Cytoplasm
518, 55.90 kDa
Earthworms,
Termites Green
algae (una sola
coccomyxa)
Fungi,
cellulolityc
bacteria
Bacteria
761682
CBM1, GH9
Extracellular
682, 72.02 kDa
Green algae,
Worm
956336
GH9
Cytoplasm
511, 55.45 kDa
Worm, Termite
1003724
GH5
Cytoplasm
424, 46.48 kDa
Archaea,
Bacteria
1274183
GH9
Extracellular
473, 50.60 kDa
Sea urchin
Termites
1508728
GH5
Extracellular
451, 49.56 kDa
Worms,
termite
1655835
GH9
Extracellular
360, 38.96 kDa
Amoeba,
Bacteria, worms
1693221
GH9
Extracellular
614, 66.93 kDa
Green algae,
Termite
KAF8061121.1
(celF)
KAF8066338.1
(celD)
CBM1, GH9
Cytoplasm
769, 80.61 kDa
LPMO, GH9
Cell membrane
672, 69.77 kDa
Green algae,
sea urchin
Green algae,
Worm Anemones
12
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Scenedesmus
PABB0004
KAF8065624.1
(celF)
GH9
Extracellular
1166, 119.94 kDa
Scenedesmus
KAF8062061.1
PABB0004
(celZ)
Scenedesmus
2009
quadricauda
LGW0026011
βΓΛΥ῝ΟΣΙΔΑΣΕΣ
GH9
Cell membrane
766, 77.74 kDa
GH5
Cell Membrane
(GH5 outside)
321, 34.33 kDa
Organism
Scenedesmus obliquus EN0004 v1.0
Scenedesmus obliquus EN0004 v1.0
Scenedesmus obliquus EN0004 v1.0
Scenedesmus obliquus EN0004 v1.0
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
Scenedesmus sp. NREL 46B-D3 v1.0
Scenedesmus sp. NREL 46B-D3 v1.0
Scenedesmus sp. NREL 46B-D3 v1.0
Scenedesmus sp. NREL 46B-D3 v1.0
Scenedesmus sp. NREL 46B-D3 v1.0
Scenedesmus sp. NREL 46B-D3 v1.0
Scenedesmus sp. NREL 46B-D3 v1.0
[Scenedesmus sp. PABB004]
[Scenedesmus sp. PABB004]
[Scenedesmus sp. PABB004]
[Scenedesmus sp. PABB004]
S.quadricauda LWG 002611
Scenedesmus obliquus var. DOE0013
Scenedesmus obliquus var. DOE0013
Scenedesmus obliquus var. DOE0013
Scenedesmus obliquus var. DOE0013
Scenedesmus obliquus var. DOE0013
Scenedesmus obliquus var. DOE0013
EXOCELLULASES
Scenedesmus obliquus EN0004 v1.0
Scenedesmus obliquus EN0004 v1.0
Scenedesmus obliquus UTEX B 3031
Scenedesmus obliquus UTEX B 3031
v1.0
v1.0
v1.0
v1.0
v1.0
v1.0
Green algae, Sea
anemone
Crustaceans,
Green algae,
Termite Bivalve.
Green algae,
Bacteria
ID/PBD
574933
575894
610267
617109
2100
3494
8342
9031
10236
17466
23136
26740
32074
32663
35463
88126
1297936
1298554
1298717
1507547
1644545
1509300
KAF8062692.1
KAF8062654.1
KAF8060308.1
KAF8059426.1
9833
67487
177517
746725
797715
882370
1019541
Conserved domains
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1 / PDI
GH1
GH1
GH1
GH1
GH1
GH1
GH1
GH1
Cell Location
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Extracellular
Membrane
Extracellular
Extracellular
Extracellular
Membrane
Membrane
Extracellular
Extracellular
Extracellular
Chloroplastic Membrane
Extracellular
Extracellular
Extracellular
Mitochondrion
Cytoplasm
Extracellular
Cytoplasm
Cell membrane
Extracellular
Cell membrane
Size (a
565, 62.
795, 87.
596, 64.
484, 54.
490, 54.
589, 64.
707, 78.
389, 43.
361, 40.
814, 89.
558, 61.
560, 61.
917, 100
573, 64.
477, 54.
539, 59.
512, 57.
489, 53.
488, 53.
600, 66.
461, 52.
825, 91.
1210, 13
744, 75.
2136, 22
1038, 10
391, 43.
62.11 kD
579, 61.
513, 57.
563, 62.
496, 54.
255, 24.
352887
587573
4623
16336
GH10
GH10
GH10
GH10
Cytoplasm, Soluble
Cytoplasm, Soluble
Extracellular
Cytoplasm, Soluble
635,
849,
498;
440;
13
70.
93.
55.
49.
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Scenedesmus
Scenedesmus
Scenedesmus
Scenedesmus
Scenedesmus
Scenedesmus
Scenedesmus
obliquus UTEX B 3031
obliquus var. DOE0013 v1.0
obliquus var. DOE0013 v1.0
obliquus var. DOE0013 v1.0
obliquus var. DOE0013 v1.0
PABB0004
quadricauda LGW0026011
10793
977091
243254
752374
750376
KAF8058849.1
547
GH10
GH10
GH10
GH10
GH10
GH10
GH10
Extracellular
Cytoplasm, Soluble
Extracellular
Cytoplasm, Soluble
Cytoplasm, Soluble
Mitochondrion, Membrane
Extracellular.
Fig 1
Fig.1 Domain architecture of the putative endoglucanases enzymes in the proteome of Scenedemaceae. The
figure shows all predicted proteins containing domains annotated as glycosyl hydrolases in families GH9 (A
), or GH5 (B ), in the analyzed strains. Red and yellow squares represent signal peptides and transmembrane
domains, respectively. CBM: carbohydrate binding domain. LPMO: Lytic polysaccharide monooxygenases
Fig 2A
2B
Fig.2 A GH9 family endoglucanase alignment. Alignment between Scenedesmus GH9 endoglucanases, and
Nasta|1KS8 A and Trire|4BMF A amino acid sequences used as the template for 3D modeling. Red asterisk above the alignment indicates catalytic residues, characterized in Nasta|1KS8. Cyan asterisk indicate
cellulose binding residues and purple asterisk indicate conserved cysteines. Other conserved positions are
shown in red and boxed in blue. CBM domain is framed in grey. Abbreviations: ScespPA4: Scenedesmus sp.
PABB004, Scesp1: Scenedesmus sp. NREL 46B-D3 v1.0, Sceob1:Scenedesmus obliquus UTEX B 3031, SceobDOE: Scenedesmus obliquus var. DOE0013 v1.0, SceobE4: Scenedesmus obliquus EN0004 v1.0, Scequ2611:
Scenedesmus quadricauda LWG 002611, Nasta:Nasutitermes takasagoensis and Trire: Trichoderma Reesei .
B Scequ2611|3068 and Scequ2611|4665 amino acid sequences alignment. Alignment between
Scenedesmus GH9 endoglucanases, and two enzymes from close species. Red asterisk above the alignment indicates the catalytic residues. Other conserved positions are shown in red and boxed in blue.
Abbreviations: ScespPA4: Scenedesmus sp.PABB004, Scesp1: Scenedesmus sp . NREL 46B-D3 v1.0,
Scequ2611:Scenedesmus quadricauda LWG 002611, Monne: Monoraphidium neglectum and Rapsu: Raphidocelis subcapitata
Fig. 3
14
443;
393;
495;
839;
410;
829;
735;
49,
44.
55.
92.
46.
90.
79.
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Fig. 3 GH9 family endoglucanase Phylogenetic tree. The tree is based on the alignment of the
regions obtained with Gblock and it was built by Maximum Likelihood method in MEGA 6.06 version,
under the LG + G model, suggested by MEGA. Phylogenies were determined by Bootstrap Analysis of
100 replicates. Branch lengths are proportional to distances. Bootstrap values are shown above branches.
The dark green branches contain GH9 from plants, the lighter green ones correspond to GH9 from algae,
the orange ones contain GH9 from fungi, the red ones correspond to invertebrates and those colored purple contain GH9 from bacteria. Abbreviations: Gonpe: Gonium pectorale, Volca:Volvox carteri, Chlre:
Chlamydomonas reinhardtii, Sceob1:Scenedesmus obliquus UTEX B 3031, SceobDOE: Scenedesmus obliquus
var. DOE0013 v1.0, Scesp1: Scenedesmus sp. NREL 46B-D3 v1.0, ScespPA4: Scenedesmus sp. PABB004,
SceobE4:Scenedesmus obliquus EN0004 v1.0, Scequ2611: Scenedesmus quadricauda LWG 002611, Tetob:
Tetradesmus obliquus , Rapsu:Raphidocelis subcapitata, Retfl: Reticulitermes flavipes , Copfo: Coptotermes formosanus , Mesnu: Mesocentrotus nudus , Methi: Metaphire hilgendorfi , Rhiso: Rhizoctonia solani
, Citun: Citrus unshiu , Mucpr: Mucuna pruriens, Macco:Macleaya cordata , Artan: Artemisia annua ,
Anaco:Ananas communis , Jatcu: Jatropha curcas, Eucgr:Eucalyptus grandis , Maldo: Malus domestica ,
Spiol:Spinacia oleracea , Actch: Actinidia chinensis var. chinensis , Jugre: Juglans regia , Leepe: Leersia
perrieri,Oryba: Oryza barthii, Basme: Basidiobolus meristosporus,Neoca: Neocallimastix californiae, Pirfi:
Piromyces finnis , Anaro: Anaeromyces robustus, Pirsp: Piromyces sp.E2, Paele: Paenibacillus lentus, Flasp:
Flammeovirga sp. SJP92, Hahch:Hahella chejuensis KCTC 2396, Marba: marine bacteriumAO1-C, Niasp:
Niastella sp . CF465, Niavi: Niastella vici,Chisp: Chitinophaga sp. YR573, Chisa: Chitinophaga sancti
, Allop: Allonocardiopsis opalescens, Celsp: Cellulomonas sp ., Celsh: Cellulomonas shaoxiangyii , Celce:
Cellulomonas cellasea DSM 20118, Strsp: Streptomyces sp . Ru87, Plasp:Plantactinospora sp. KBS50, Micro: Microbispora rosea,Miccr: Micromonospora craniellae, Micya: Micromonospora yangpuensis , Micen:
Micromonospora endolithica, Micsp:Micromonospora sp. CNZ295
15
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Fig. 4
Fig. 4 3D modeling of GH9 endoglucanase 32711 from Scenedesmus obliquus UTEX B 3031.
A. Proposed model of Sceobl1|32711 (pink) superimposed to Nasta|1KS8 A GH9 and Trire| 4BMF A CBM1
templates (cyan); putative catalytic and binding residues are shown. B. Inset of catalytic residues area.
Abbreviations: Nasta: Nasutitermes takasagoensis and Trire:
Trichoderma Reesei
Fig. 5
Fig. 5.GH5 family endoglucanase alignment. Multiple alignment between Scenedesmus GH5 endoglucanases, and Pyrho|3W6M A used as the template for 3D modeling. The multiple sequence alignment
was performed using Clustal Omega program and ESPript 3.0. Red asterisk above the alignment indicates catalytic residues. Cyan asterisk indicate cellulose binding residues. Other conserved positions are
shown in red and boxed in blue. CBM domain is framed in grey. Abbreviations: ScespPA:Scenedesmus
sp. PABB004, Scesp1: Scenedesmus sp. NREL 46B-D3 v1.0, Sceob1: Scenedesmus obliquus UTEX B
3031, SceobDOE:Scenedesmus obliquus var. DOE0013 v1.0, SceobE4:Scenedesmus obliquus EN0004 v1.0,
Scequ2611: Scenedesmus quadricauda LWG 002611, Pyrho: Pyrococcus horikoshii
Fig. 6
Hosted file
image11.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
Fig. 6 GH5 family endoglucanase Phylogenetic tree. The tree is based on the alignment of the
regions obtained with Gblock and it was built by Maximum Likelihood method in MEGA 6.06 version,
under the WAG + G model suggested by MEGA. Phylogenies were determined by Bootstrap Analysis of 100 replicates. Branch lengths are proportional to distances. Bootstrap values are shown above
branches. The dark green branches contain GH5 from plants, the lighter green ones correspond to GH5
from algae, the orange ones contain GH5 from fungi, and those colored purple contain GH5 from bacteria.
Abbreviatures: Aciva:Acidovorax valerianellae, Xanal: Xanthomonas albilineans , Xanca: Xanthomonas
campestris pv. Campestris , Xanor:Xanthomonas oryzae pv. Oryzae, Acici: Acidovorax citrulli , Xantr:
Xanthomonas translucens pv. Translucens, Xylfa: Xylella fastidiosa, Acian: Acidovorax anthurii , Acisp:
Acidovorax sp. MR-S7 , Canma: Candidatus Magnetobacterium bavaricum , Thiba: Thiotrichales bacterium, Ulisp:Uliginosibacterium sp., Halsp: Halothiobacillus sp.,Massp: Massilia sp., Albte: Albitalea terrae, Crepo: Crenothrix polyspora , Metis: Methylomagnum ishizawai , Hydsp:Hydrogenophaga sp. A37 ,
Polbr: Polyangium brachysporum,Abypr: Abyssibacter profundi , Rapsu: Raphidocelis subcapitata , Monne:
Monoraphidium neglectum, Sceob1:Scenedesmus obliquus UTEX B 3031, Scesp1: Scenedesmus sp. NREL
46B-D3 v1.0, SceobDOE: Scenedesmus obliquus var. DOE0013 v1.0, Tetob: Tetradesmus obliquus , SceobE4:
Scenedesmus obliquus EN0004 v1.0, Kleni: Klebsormidium nitens, Neoca: Neocallimastix californiae, Pirfi:
Piromyces finnis,Pirsp: Piromyces sp., Anaro: Anaeromyces robustus, Cocsu:Coccomyxa subellipsoidea,
Chleu: Chlamydomonas eustigma,Monsp: Monosporascus sp ., Micco: Micromonas commode,Glyso: Glycine
soja, Vigra: Vigna radiata var. radiata,Phaan: Phaseolus angularis, Arath: Arabidopsis thaliana,Braca:
Brassica campestris, Macco: Macleaya cordata
Fig.7
Fig. 7 3D modeling of GH5 endoglucanase 14060 from Scenedesmus obliquus UTEX B 3031.
A. Proposed model of Sceobl1|14060 (pink) superimposed to Pyrho|3W6M A GH5 and Strco|2RTT A CBM2
templates (cyan); putative catalytic and binding residues are shown. B. Inset of substrate binding residues
area. C. Inset of catalytic residues area. Abbreviations: Strco:Streptomyces coelicolor ; Pyrho: Pyrococcus
horikoshii.
Fig.8
16
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Fig. 8 Domain architecture of the putative β-glucosidases enzymes in the proteome of Scenedemaceae. The
figure shows all predicted proteins containing domains annotated as glycosyl hydrolases in families GH1,
in the analyzed strains. Red, yellow, magenta and cyan boxes represent signal peptides, transmembrane
domains, RAMA domain, and ER-PDI superfamily domain, respectively
Fig. 9A
Hosted file
image14.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
Hosted file
image15.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
9B
Φιγ. 9.Α ΓΗ1 φαμιλψ β-γλυςοσιδασε αλιγνμεντ. (ὃλορ ονλινε, δουβλε ςολυμν). Multiple
alignment between Scenedesmus GH1 endoglucanases, and Nanoc|5YJ7 A, sequence used as the template for
3D modeling. The multiple sequence alignment was performed using Clustal Omega program and ESPript
3.0. Red asterisk above the alignment indicates the catalytic residues. Abbreviations: ScespPA4:Scenedesmus
sp . PABB004, Scesp1: Scenedesmus sp . NREL 46B-D3 v1.0, Sceob1: Scenedesmus obliquus UTEX B
3031, SceobDOE:Scenedesmus obliquus var. DOE0013 v1.0, SceobE4:Scenedesmus obliquus EN0004 v1.0,
Scequ2611: Scenedesmus quadricauda LWG 002611, Nanoc: Nannochloropsis oceanica .B. Scequ2611|3544
amino acid sequences alignment. Alignment between Scenedesmus 3544 putative endoglucanase, and
enzymes from close species. The multiple sequence alignment was performed using Clustal Omega program
and ESPript 3.0. Red asterisk above the alignment indicates catalytic residues. Abbreviations: ScespPA4:
Scenedesmus sp. PABB004, Scequ2611: Scenedesmus quadricauda LWG 002611, Monne: Monoraphidium
neglectum and Rapsu: Raphidocelis subcapitata
Fig. 10
Hosted file
image17.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
Fig. 10 GH1 family endoglucanase Phylogenetic tree.The tree is based on the alignment of the
regions obtained with Gblock and it was built by Maximum Likelihood method in MEGA 6.06 version,
under the LG + G model, suggested by MEGA. Phylogenies were determined by Bootstrap Analysis of 100
replicates. Branch lengths are proportional to distances. Bootstrap values are shown above branches. The
dark green branches contain GH1 from plants, the lighter green ones correspond to GH1 from algae, the
orange ones contain GH1 from fungi, and those colored purple contain GH1 from bacteria. Abbreviations:
Sceob1:Scenedesmus obliquus UTEX B 3031, SceobDOE: Scenedesmus obliquus var. DOE0013 v1.0, Scesp1:
Scenedesmus sp. NREL 46B-D3 v1.0, ScespPA4: Scenedesmus sp. PABB004, SceobE4:Scenedesmus obliquus
EN0004 v1.0, Scequ2611: Scenedesmus quadricauda LWG 002611, Botbo: Botryobasidium botryosum , Aspud:Aspergillus udagawae , Rasem: Rasamsonia emersonii , Exoaq:Exophiala aquamarine , Aurna: Aureobasidium namibiae , Aurme: Aureobasidium melanogenum , Aurpu: Aureobasidium pullulans , Baupa:
Baudoinia panamericana , Aciri:Acidomyces richmondensis , Horwe: Hortaea werneckii , Dotse: Dothistroma septosporum (strain NZE10 / CBS 128990) (Mycosphaerella pini), Zymtr: Zymoseptoria tritici ,
Cerze: Cercospora Zeina , Cerbe: Cercospora berteroae , Ceret: Cercospora beticola , Calfi: Caldanaerobius
fijiensis , Actre:Actinoplanes regularis , Vulte: Vulcaniibacterium tengchongense , Vicva: Victivallis vadensis
, Rubsp:Rubrivirga sp. , Lewag: Lewinella agarilytica , Ulvma:Ulvibacterium marinum , Arexa: Arenicella
xanthan , Verba:Verrucomicrobiae bacterium, Sacde: Saccharophagus degradans, Lenar: Lentisphaera araneosa, Halhy:Haliscomenobacter hydrossis, Mansp: Mangrovimonas sp.,Urecr: Urechidicola croceus, Rosmi:
17
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
Roseivirga misakiensis, Polsp: Polaribacter sp., Flaba: Flaviramulus basaltis, Aquag: Aquimarina aggregate,
Confl:Confluentibacter flavum, Helan: Helianthus annuus, Orypu:Oryza punctate, Dauca: Daucus carota
subsp. Sativus,Solly: Solanum lycopersicum, Artan: Artemisia annua,Oryba: Oryza barthii, Braol: Brassica
oleracea var. Oleracea, Leepe: Leersia perrieri, Cinmi: Cinnamomum micranthum f. kanehirae, Anaco:
Ananas comosus, Orypu:Oryza punctate, Arahy: Arachis hypogaea, Vitvi:Vitis vinifera, Cajca: Cajanus
cajan.
Fig. 11
Φιγ. 11 ΓΗ1 β-γλυςοσιδασε 3Δ μοδελ 3D modeling of GH1 β-glucosidase 3494 from Scenedesmus
obliquus UTEX B 3031. A.Proposed model of Sceobl1|3494 (pink) superimposed to Nanoc|5YJ7 A GH1
template (cyan); putative catalytic residues are shown. B. Inset of catalytic residues area. Abbreviations:
Nanoc: Nannochloropsis oceanica
Fig. 12 A
B
Hosted file
image20.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
Hosted file
image21.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
Fig. 12A Domain architecture of the putative exoglucanases enzymes in the proteome of
Scenedemaceae The figure shows all predicted proteins containing domains annotated as glycosyl hydrolases in families GH10, in the analyzed strains. Red boxes represent signal peptides. Yellow squares represent
transmembrane domains
B GH10 family exoglucanase alignment. Multiple alignment between Scenedesmus GH10 exoglucanases, and Celfi|P07986, sequence used as the template for 3D modeling. The multiple sequence alignment
was performed using Clustal Omega program and ESPript 3.0. Red asterisk above the alignment indicates
catalytic residues. Abbreviations: ScespPA4: Scenedesmus sp. PABB004, Scesp1: Scenedesmus sp. NREL
46B-D3 v1.0, Sceob1:Scenedesmus obliquus UTEX B 3031, SceobDOE: Scenedesmus obliquus var. DOE0013
v1.0, SceobE4: Scenedesmus obliquus EN0004 v1.0, Scequ2611: Scenedesmus quadricauda LWG 002611,
Celfi:Cellulomonas fimi .
Fig. 13
Hosted file
image22.emf available at https://authorea.com/users/500895/articles/581523-molecular-insightinto-cellulose-degradation-by-the-phototrophic-green-alga-scenedesmus
Fig. 13 GH10 family exoglucanase Phylogenetic tree . The tree is based on the alignment of
the regions obtained with Gblock and it was built by Maximum Likelihood method in MEGA 6.06 version, under the WAG + G model, suggested by MEGA. Phylogenies were determined by Bootstrap Analysis of 100 replicates. Branch lengths are proportional to distances. Bootstrap values are shown above
branches. The dark green branches contain GH10 from plants, the lighter green ones correspond to GH10
from algae, the orange ones contain GH10 from fungi, and those colored purple contain GH10 from bacteria. Abbreviations: Isodo:Isoptericola dokdonensis, Isosp: Isoptericola sp. Jonde:Jonesia denitrificans
, Celbo: Cellulomonas bogoriensis , Actfe: Actinotalea fermentans, Celgi: Cellulomonas gilvus , Celsp:
Cellulomonas sp ., Celbi: Cellulomonas biazotea , Celal: Cellulomonas algicola, Celfi: Cellulomonas fimi
18
Posted on Authorea 1 Sep 2022 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.166030463.37247755/v2 — This a preprint and has not been peer reviewed. Data may be preliminary.
, Micha: Micromonospora haikouensis Micsp: Micromonospora sp ., Micec: Micromonospora echinaurantiaca, Micco:Micromonospora coxensis, Celsp: Cellulomonas sp. , Celfl:Cellulomonas flavigena , Ktera:
Ktedonobacter racemifer , Actsp: Actinomadura sp . GC306, Actda: Actinomadura darangshiensis , Stral:
Streptomyces alni, Thebi:Thermobispora bispora, Phach: Phanerodontia chrysosporium , Neopa: Neocallimastix patriciarum, Arath: Arabidopsis thaliana, Nagal: Naganishia albida, Gibze: Gibberella zeae , Fusox:
Fusarium oxysporum f. sp. lycopersici ,Aurpu: Aureobasidium pullulans, Humgr: Humicola grisea var.
Thermoidea, Maggr: Magnaporthe grisea, Magor: Magnaporthe oryzae, Talfu: Talaromyces funiculosus,
Agabi: Agaricus bisporus , Phach: Phanerodontia chrysosporium, Hypje:Hypocrea jecorina , Ustma: Ustilago maydis ,Clapu: Claviceps purpurea, Penca: Penicillium canescens,Pensi: Penicillium simplicissimum,
Talpu: Talaromyces purpureogenus, Aspac: Aspergillus aculeatus, Aspni:Aspergillus niger, Aspka: Aspergillus
kawachii , Rhior:Rhizopus oryzae, Pench: Penicillium chrysogenum, Theau:Thermoascus aurantiacus, Aspte:
Aspergillus terreus,Aspor: Aspergillus oryzae , Neofu: Neosartorya fumigata ,Neofi: Neosartorya fischeri,
Aspcl: Aspergillus clavatus, Aspfl: Aspergillus flavus , Emeni: Emericella nidulans , Aspor: Aspergillus
oryzae , Aurpu:Aureobasidium pullulans, Nagal: Naganishia albida.
Fig. 14
Fig. 14 GH10 exoglucanase 3D model. 3D modeling of GH10 exoglucanase 4623 from Scenedesmus
obliquus UTEX B 3031.A. Proposed model of Sceobl1|4623 (pink) superimposed to Celfi|P07986 GH10
template (cyan); putative catalytic residues are shown. B. Inset of catalytic residues area. Abbreviations:
Celfi: Cellulomonas fimi
Supporting information:
Supplementary Figure 1: In a separate file
Data Availability Statement
Not applicable
Conflict of interests
The authors declare no potential financial or other interests that could be perceived to influence the outcomes
of the research. No conflicts, informed consent, human or animal rights applicable. All authors declare
agreement to authorship and submission of the manuscript for peer review.
Acknowledgements
This work was supported by Grants from Consejo Nacional de Investigaciones Cientı́ficas y Técnicas (CONICET, PIP2015-0476), Agencia Nacional de Promoción Cientı́fica y Tecnológica (ANPCyT, PICT2016-0350),
and Agencia Santafecina de Ciencia Tecnologı́a e Innovación (ASaCTeI IO2018-00098). JB, MVB and DGC
are research members from CONICET. CND is assistant professor from Amity University Uttar Pradesh,
Lucknow, India. MBV is a doctoral fellow from CONICET.
Authorship.
JB, MBV, DGC, MVB and CND designed the conception and delineation of the
study; and prepared the manuscript and reviewed it before submission. MBV
performed the in silico characterization. JB and MBV performed the acquisition of the data or analyzed
such information. All authors read and approved the final manuscript.
19