Downloaded from orbit.dtu.dk on: Nov 07, 2023
Ecological generalism drives hyperdiversity of secondary metabolite gene clusters in
xylarialean endophytes
Franco, Mario E. E.; Wisecaver, Jennifer H.; Arnold, A. Elizabeth; Ju, Yu-Ming; Slot, Jason C.; Ahrendt,
Steven; Moore, Lillian P.; Eastman, Katharine E.; Scott, Kelsey; Konkel, Zachary
Total number of authors:
36
Published in:
New Phytologist
Link to article, DOI:
10.1111/nph.17873
Publication date:
2022
Document Version
Peer reviewed version
Link back to DTU Orbit
Citation (APA):
Franco, M. E. E., Wisecaver, J. H., Arnold, A. E., Ju, Y-M., Slot, J. C., Ahrendt, S., Moore, L. P., Eastman, K. E.,
Scott, K., Konkel, Z., Mondo, S. J., Kuo, A., Hayes, R. D., Haridas, S., Andreopoulos, B., Riley, R., LaButti, K.,
Pangilinan, J., Lipzen, A., ... U'Ren, J. M. (2022). Ecological generalism drives hyperdiversity of secondary
metabolite gene clusters in xylarialean endophytes. New Phytologist, 233(3), 1317-1330.
https://doi.org/10.1111/nph.17873
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright
owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
You may not further distribute the material or use it for any profit-making activity or commercial gain
You may freely distribute the URL identifying the publication in the public portal
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Accepted Article
DR MARIO E.E. FRANCO (Orcid ID : 0000-0002-4959-1257)
DR RICHARD HAYES (Orcid ID : 0000-0002-5236-7918)
PROF. FRANÇOIS LUTZONI (Orcid ID : 0000-0003-4849-7143)
DR JANA M U'REN (Orcid ID : 0000-0001-7608-5029)
Article type
: Regular Manuscript
Ecological generalism drives hyperdiversity of secondary metabolite gene clusters in xylarialean
endophytes
Mario E.E. Franco1, Jennifer H. Wisecaver2, A. Elizabeth Arnold3, Yu-Ming Ju4, Jason C. Slot5,
Steven Ahrendt6, Lillian P. Moore1, Katharine E. Eastman2, Kelsey Scott5, Zachary Konkel5, Stephen
J. Mondo6, Alan Kuo6, Richard D. Hayes6, Sajeet Haridas6, Bill Andreopoulos6, Robert Riley6, Kurt
LaButti6, Jasmyn Pangilinan6, Anna Lipzen6, Mojgan Amirebrahimi6, Juying Yan6, Catherine Adam6,
Keykhosrow Keymanesh6, Vivian Ng6, Katherine Louie6, Trent Northen6, Elodie Drula7,8, Bernard
Henrissat9,10, Huei-Mei Hsieh4, Ken Youens-Clark1, François Lutzoni11, Jolanta Miadlikowska11,
Daniel C. Eastwood12, Richard C. Hamelin13, Igor V. Grigoriev6,14, Jana M. U’Ren1*
1BIO5
Institute and Department of Biosystems Engineering, The University of Arizona, Tucson,
Arizona, 85721, United States of America; 2Department of Biochemistry, Purdue University, West
Lafayette, Indiana, 47907, United States of America; 3School of Plant Sciences and Department of
Ecology and Evolutionary Biology, The University of Arizona, Tucson, Arizona, 85721, United
States of America; 4Institute of Plant and Microbial Biology, Academia Sinica, Taipei, 11529,
Taiwan; 5Department of Plant Pathology, The Ohio State University, Columbus, Ohio, 43210, United
States of America; 6Department of Energy, The Joint Genome Institute, Lawrence Berkeley National
Laboratory, Berkeley, California, 94720, United States of America; 7Architecture et Fonction des
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process, which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1111/NPH.17873
This article is protected by copyright. All rights reserved
Accepted Article
Macromolécules Biologiques, CNRS, Aix-Marseille Université, Marseille, 13288, France; 8INRAE,
Marseille, 13288, France; 9Department of Biotechnology and Biomedicine, Technical University of
Denmark, Lyngby, DK-2800, Denmark;10Department of Biological Sciences, King Abdulaziz
University, Jeddah, 21589, Saudi Arabia; 11Department of Biology, Duke University, Durham, North
Carolina, 27708, United States of America; 12Department of Biosciences, Swansea University,
Swansea, Wales, SA2 8PP, United Kingdom; 13Department of Forest and Conservation Sciences,
University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada; 14Department of
Plant and Microbial Biology, University of California, Berkeley, California, United States of
America.
* Corresponding author
Email: juren@email.arizona.edu, Phone: (520) 626-0426
Received: 16 August 2021
Accepted: 7 November 2021
Author's ORCID
Mario E.E. Franco (0000-0002-4959-1257)
Jennifer H. Wisecaver (0000-0001-6843-5906)
A. Elizabeth Arnold (0000-0002-7013-4026)
Yu-Ming Ju (0000-0002-8202-6145)
Jason C. Slot (0000-0001-6731-3405)
Steven Ahrendt (0000-0001-8492-4830)
Katharine E. Eastman (0000-0002-4438-3854)
Kelsey Scott (0000-0003-1378-5348)
Sajeet Haridas (0000-0002-0229-0975)
Kurt LaButti (0000-0002-5838-1972)
Anna Lipzen (0000-0003-2293-9329)
This article is protected by copyright. All rights reserved
Accepted Article
Vivian Ng (0000-0001-8941-6931)
Elodie Drula (0000-0002-9168-5214)
Bernard Henrissat (0000-0002-3434-8588)
Huei-Mei Hsieh (0000-0002-7142-3209)
Ken Youens-Clark (0000-0001-9961-144X)
François Lutzoni (0000-0003-4849-7143)
Jolanta Miadlikowska (0000-0002-5545-2130)
Daniel C. Eastwood (0000-0002-7015-0739)
Jana M. U’Ren (0000-0001-7608-5029)
This article is protected by copyright. All rights reserved
Accepted Article
SUMMARY
● Although secondary metabolites are typically associated with competitive or pathogenic
interactions, the high bioactivity of endophytic fungi in the Xylariales, coupled with their
abundance and broad host ranges spanning all lineages of land plants and lichens, suggests
that enhanced secondary metabolism might facilitate symbioses with phylogenetically diverse
hosts.
● Here, we examined secondary metabolite gene clusters (SMGCs) across 96 Xylariales
genomes in two clades (Xylariaceae s.l. and Hypoxylaceae), including 88 newly sequenced
genomes of endophytes and closely related saprotrophs and pathogens. We paired genomic
data with extensive metadata on endophyte hosts and substrates, enabling us to examine
genomic factors related to the breadth of symbiotic interactions and ecological roles.
● All genomes contain hyperabundant SMGCs; however, Xylariaceae have increased numbers
of gene duplications, horizontal gene transfers (HGTs), and SMGCs. Enhanced metabolic
diversity of endophytes is associated with a greater diversity of hosts and increased capacity
for lignocellulose decomposition.
● Our results suggest that as host and substrate generalists, Xylariaceae endophytes experience
greater selection to diversify SMGCs compared to more ecologically specialized
Hypoxylaceae species. Overall, our results provide new evidence that SMGCs may facilitate
symbiosis with phylogenetically diverse hosts, highlighting the importance of microbial
symbioses to drive fungal metabolic diversity.
Keywords: Ascomycota, endophyte, plant-fungal interactions, saprotroph, specialized metabolism,
trophic mode, symbiosis, Xylariales
This article is protected by copyright. All rights reserved
Accepted Article
INTRODUCTION
Fungal endophytes inhabit asymptomatic, living photosynthetic tissues of all major lineages of plants
and lichens to form one of earth’s most prevalent groups of symbionts (Arnold et al., 2009; Peay et
al., 2016). Known from a wide range of biomes and agroecosystems (e.g., U’Ren et al., 2012, 2019),
endophytes impact plant health, productivity, and evolution (Rodriguez et al., 2009). Although
classified together due to ecologically similar patterns of colonization, transmission, and in planta
biodiversity (Rodriguez et al., 2009), foliar fungal endophytes represent a diversity of evolutionary
histories, life history strategies, and functional traits (Porras-Alfaro & Bayman, 2011). Despite the
recent surge of interest in plant microbiome research (Trivedi et al., 2020) the genomic and molecular
mechanisms foliar fungal endophytes employ to establish symbiotic host associations remain largely
unknown.
Global, large-scale surveys of phylogenetically diverse plant and lichen hosts have revealed
that many foliar endophyte species preferentially associate with particular host species and lineages,
resulting in host structured endophyte communities at local to global scales (e.g., U’Ren et al., 2019).
In contrast, endophytic fungi in the Xylariales (Sordariomycetes, Pezizomycotina, Ascomycota)
appear unique in that they typically have broad host ranges that span multiple lineages of land plants
(e.g., angiosperms, conifers, lycophytes, ferns, and mosses) as well as green algae and cyanobacteria
within lichen thalli (e.g., Arnold et al., 2009; U’Ren et al., 2016). In contrast, the majority of
described Xylariales species are typically associated with angiosperms as wood- or litter-degrading
saprotrophs or woody pathogens (Hsieh et al., 2005, 2010). Although the genetic factors that
determine foliar endophyte host range are unknown, research on fungal pathogens has shown that host
specificity is often determined by the presence of avirulence proteins (i.e., effectors), proteinaceous
host-specific toxins, and secondary metabolites (SMs) (Li et al., 2020). Horizontal gene transfer
(HGT) of these host-determining genes frequently alters and/or expands pathogen host range (Li et
al., 2020).
Xylariales genomes sequenced to date have revealed a rich repertoire of secondary metabolite
gene clusters (SMGCs) (Wibberg et al., 2021), often exceeding the numbers reported for saprotrophic
fungi well-known for their SM production (Aspergillus, Penicillium) (Nielsen et al., 2017; Drott et
al., 2021). Previously, it was postulated that intense competition with diverse communities of soil
This article is protected by copyright. All rights reserved
Accepted Article
organisms increases selection to maintain and diversify SMGCs (Slot, 2017). However, the high
bioactivity of Xylariales fungi (>500 SMs reported to date; (Becker & Stadler, 2021)), their broad
host ranges as endophytes and ability to persist in leaf litter as saprotrophs that decompose
lignocellulose (U’Ren et al., 2016; U’Ren & Arnold, 2016), led us to hypothesize that enhanced
secondary metabolism might play a role in facilitating ecological generalism in both substrate use and
the phylogenetic breadth of their symbiotic associations with plants and lichens.
To test this hypothesis, we examined the genomic factors associated with endophyte host
range and ecological roles (i.e., endophytic, pathogenic, and saprotrophic) across 96 genomes of
Xylariales, including 88 newly sequenced genomes of endophytes, saprotrophs, and plant pathogens
within two major clades of Xylariales (Hypoxylaceae and Xylariaceae s.l.). We paired genomic data
with extensive metadata on endophyte host associations, geographic distributions, and substrate usage
gleaned from a collection of >6,000 xylarialean endophytes isolated from phylogenetically diverse
plants and lichens across North America (U’Ren et al., 2016), enabling us to examine for the first
time the genomic factors related to the breadth of symbiotic interactions and ecological roles in this
dynamic and ecologically important fungal clade.
MATERIALS AND METHODS
Fungal strain selection. We sequenced genomes of 44 endophytic taxa (U’Ren et al., 2012; U’Ren &
Arnold, 2016) and 44 named taxa of Xylariaceae s.l. and Hypoxylaceae representing ca. 24 genera
and 80 species, as well as an additional two undescribed species of endophytic Xylariales
(Pestalotiopsis sp. NC0098 and Xylariales sp. AK1849) included in the outgroup (Table S1). Isolates
were selected based on their phylogenetic position and ecological mode from (U’Ren et al., 2016)
Although classifying fungal ecological modes broadly as ‘‘endophytic” or ‘‘saprotrophic” based on
the condition of the tissue from which they are cultured is often insufficient to adequately define their
ecological roles, for the purposes of this study, isolates cultured from living host tissues (either plant
or lichen) are referred to as endophytes even if other isolates in the same fungal operational
taxonomic unit (OTU) were found in non-living tissues as well. Isolates were defined as saprotrophs
only if all isolates in the OTU were cultured from non-living plant tissues such as senescent leaves or
leaf litter (U’Ren et al., 2016). To minimize the effect of phylogeny when assessing the impact of
This article is protected by copyright. All rights reserved
Accepted Article
ecological mode on genome evolution, we also selected 15 pairs of closely related sister taxa with
contrasting ecological modes (i.e., endophyte vs. non-endophyte) (U’Ren et al., 2016). For reference
species that lacked host and substrate metadata, ecological modes were estimated based on
information for that species in the literature as described in U’Ren et al. (2016).
DNA and RNA purification. We used two different mycelial growth and cultivation techniques to
obtain DNA for either Illumina or PacBio Single-Molecule Real-Time (SMRT) sequencing. DNA
isolations were performed using modified phenol:chloroform extractions (Supporting Information
Methods S1). RNA was extracted for each isolate with the Ambion Purelink RNA Kit (Thermo Fisher
Scientific, Waltham, MA). DNA and RNA were quantified with a Qubit fluorometer (Invitrogen) and
sample purity was assessed with a NanoDrop (BioNordika). RNA was treated with DNase (Thermo
Fisher Scientific) following the manufacturer’s instructions and RNA integrity was assessed on a
BioAnalyzer at the University of Arizona Genomics Core Facility.
Genome and transcriptome sequencing and assembly. Genomes were generated at the Department
of Energy Joint Genome Institute using Illumina and PacBio technologies (Table S1). For 66 isolates,
Illumina standard shotgun libraries (insert sizes of 300bp or 600bp) were constructed and sequenced
using the NovaSeq platform. Raw reads were filtered using the JGI QC pipeline. An assembly of the
target genome was generated using the resulting non-organelle reads with SPAdes (Bankevich et al.,
2012). PacBio SMRT sequencing was performed for 22 isolates of Xylariaceae and Hypoxylaceae, as
well as Xylariales spp. NC0098, and AK1849 on a PacBio Sequel. Library preparation was performed
either using the PacBio Low Input 10kb or PacBio >10kb with AMPure Bead Size Selection. Filtered
sub-read data were processed with the JGI QC pipeline and de novo assembled using Falcon
(SEQUEL) or Flye (SEQUEL II). Stranded RNASeq libraries were created and quantified by qPCR
and transcriptome sequencing was performed on an Illumina NovaSeq S4. For both Hypoxylaceae
and Xylariaceae ~25% of genomes were sequenced with PacBio, although a higher proportion of
endophyte genomes were sequenced with PacBio than Illumina (43% vs. 28% overall). Genome
completeness was assessed by Benchmarking Universal Single-Copy Orthologs (BUSCO) v2.0 using
the "eukaryota_odb9" (2016-11-02) dataset (https://doi.org/10.1093/bioinformatics/btv351).
This article is protected by copyright. All rights reserved
Accepted Article
Genome annotation. Gene prediction and annotation was performed using the JGI pipeline (Kuo et
al., 2014; Grigoriev et al., 2014) (Supporting Information Methods S1). Predicted genes were
annotated using functional information from InterPro (Mitchell et al., 2019), PFAM (El-Gebali et al.,
2019), Gene Ontology (GO) (The Gene Ontology Consortium, 2019), Kyoto Encyclopedia of Genes
and Genomes (KEGG) (Kanehisa et al., 2006), Eukaryotic Orthologous Groups of Proteins (KOG)
(Tatusov et al., 2003), the Carbohydrate-Active EnZymes database (CAZy) (Lombard et al., 2014),
MEROPS database (Rawlings et al., 2016), the Transporter Classification Database (TCDB) (Saier et
al., 2016), SignalP v3.0a (Nielsen, 2017), and EffectorP 2.0 (Sperschneider et al., 2018). CAZymes
involved in the degradation of the plant cell wall were classified by substrate (Kameshwar et al.,
2019). We examined repetitive elements using RepeatScout (Price et al., 2005), which identifies
novel repeats in the genomes, and RepeatMasker (http://repeatmasker.org), which identifies known
repeats based on the Repbase library (Bao et al., 2015). Candidate effectors were predicted using
EffectorP v2.0 (Sperschneider et al., 2018).
Orthogroup prediction, functional annotation, and ancestral state reconstruction. For comparative
analyses, data from an additional eight taxa in Xylariaceae sensu lato (Wu et al., 2017) and 23
additional genomes of Sordariomycetes were obtained from MycoCosm (Grigoriev et al., 2014)
(Table S1). Orthologous gene families (i.e., orthogroups) for all 121 genomes (ingroup and outgroup)
were inferred by OrthoFinder v2.3.3 (Emms & Kelly, 2019), which was executed using DIAMOND
v0.9.22 (Buchfink et al., 2015) for the all-versus-all sequence similarity search and MAFFT v7.427
(Katoh & Standley, 2013) for sequence alignment. Orthogroups were assigned functional annotations
with KinFin v1.0 (Laetsch & Blaxter, 2017), which performs a representative functional annotation of
the orthogroups based on both the proportion of proteins in the group carrying a specific annotation as
well as the proportion of taxa in the cluster with such annotation. KinFin also was used to perform
network analysis of orthogroups, classify orthogroups and SMGCs into isolate-specific, clade-specific
(Hypoxylaceae and Xylariaceae), and universal (i.e., orthogroups present in all taxa) categories, and
to identify orthogroups that were significantly enriched or depleted in the Xylariaceae or
Hypoxylaceae using the Mann-Whitney U test. We used Count v10.04 (Csurös, 2010) with the
This article is protected by copyright. All rights reserved
Accepted Article
unweighted Wagner parsimony method (gain and loss penalties both set to 1) to assess changes in the
size of orthologous gene families over evolutionary time. Orthogroup annotations were also used to
reconstruct the ancestral gene content for subsets of orthologous gene families corresponding to
different functional categories.
Phylogenomic analysis. Protein sequences of 1,526 single-copy orthogroups defined by OrthoFinder
were aligned using MAFFT v7.427 (Katoh & Standley, 2013), concatenated, and analyzed using
maximum-likelihood in IQ-TREE multicore v1.6.11 (Nguyen et al., 2015) with the Le Gascuel (LG)
substitution model. Node support was calculated with 1,000 ultrafast bootstrap replicates. Additional
phylogenomic analyses with different models of evolution, gene sets, and outgroup taxa resulted in
nearly identical topologies (Supporting Information Methods S1).
Metabolic gene cluster prediction. SMGCs were predicted using antiSMASH version 5.1.0 (Blin et
al., 2019) setting the strictness to 'relaxed' and enabling 'KnownClusterBlast', 'ClusterBlast',
'SubClusterBlast', 'ActiveSiteFinder', 'Cluster Pfam analysis' and 'Pfam-based GO term annotation'.
Clinker and clustermap.js were used to visualize and compare SMGCs (Gilchrist & Chooi, 2021).
Sequence similarity network analysis of the SMGCs was performed using BiG-SCAPE v1.0.1
(Navarro-Muñoz et al., 2020). BiG-SCAPE was executed under the hybrid mode, enabling the
inclusion of singletons and the SMGCs from the MIBiG repository version 1.4 (Medema et al., 2015).
The output from BiG-SCAPE was incorporated into KinFin (Laetsch & Blaxter, 2017) to visualize
gene content similarity as network graphs and examine SMGC distribution across clades.
We used a custom pipeline (https://github.com/egluckthaler/cluster_retrieve) to examine
fungal metabolic gene clusters involved in the degradation of a broad array of plant phenylpropanoids
(Gluck-Thaler et al., 2018) (hereafter, catabolic gene clusters: CGCs). Cluster_retrieve searches for
multiple "cluster models'' containing one of 13 anchor genes (Gluck-Thaler et al., 2018). Homologous
genes in each locus were defined by a minimum BLASTp (v2.2.25+) bitscore of 50, 30% amino acid
identity, and target sequence alignment 50-150% of the query sequence length. Homologs of query
genes were considered clustered if separated by < 7 intervening genes. However, CGCs often share
many gene families among classes, resulting in overlapping and adjacent clusters detected by different
This article is protected by copyright. All rights reserved
Accepted Article
cluster profile searches. As the majority of CGCs have not been functionally characterized, rather than
splitting loci by functional annotation alone, we empirically assessed the spatial distribution of genes
in 25 contigs that contained multiple consolidated cluster predictions. Based on these results, we
selected a gap size of 30kb to define discrete clusters (i.e., clusters on the same contig were
consolidated if separated by less than 30kb). Homologous cluster families across genomes were
inferred using a modified version of BiG-SCAPE (Navarro-Muñoz et al., 2020) (i.e., adding catabolic
anchor genes to “anchor_domains.txt” and manually tuning the “Others” cluster type model
parameters until known related clusters, such as quinate dehydrogenase clusters, merged into
families). Tuning resulted in the values 0.35 for the Jaccard dissimilarity of cluster Pfams, 0.63 for
Pfam sequence similarity, 0.02 adjacency index, and 2.0 anchor boost.
Detection of HGT events. We used the Alien Index (AI) pipeline
(https://github.itap.purdue.edu/jwisecav/wise) (Wisecaver et al., 2016; Verster et al., 2019) to identify
HGT candidate genes. Each predicted protein sequence was queried against a custom protein database
using Diamond v0.9.22.123 (Buchfink et al., 2015). The custom database consisted of protein
sequences from NCBI RefSeq (release 98) (O’Leary et al., 2016), the Marine Microbial Eukaryotic
Transcriptome Sequencing Project (MMETSP) (Keeling et al., 2014), and the 1000 Plants
transcriptome sequencing project (OneKP) (Matasci et al., 2014). Diamond results were sorted based
on the normalized bitscore (nbs), where nbs was calculated as the bitscore of the single best high
scoring segment pair (HSP) in the hit sequence divided by the best bitscore possible for the query
sequence (i.e., the bitscore of the query aligned to itself).
To identify HGT candidates, an ancestral lineage is first specified, and the AI score calculated
using the formula: AI=nbsO-nbsA, where nbsO is the normalized bit score of the best hit to a species
outside of the ancestral lineage and nbsA is the normalized bit score of the best hit to a species within
the ancestral lineage. AI scores range from -1 to 1, being greater than zero if the predicted protein
sequence had a better hit to species outside of the ancestral lineage and can be suggestive of either
HGT or contamination (Wisecaver et al., 2016). To identify HGTs present in multiple species, a
recipient sub-lineage within the larger ancestral lineage may also be specified to identify their shared
HGT candidates (Fig. S1). All hits to the recipient lineage are skipped so as not to be included in the
This article is protected by copyright. All rights reserved
Accepted Article
nbsA calculation. To identify candidate HGTs acquired from distant gene donors (e.g. viruses,
bacteria, or plants) we performed a first AI screen using Ascomycota (NCBI:txid4890) and
Xylariomycetidae (NCBI:txid 222545) as the ancestral and recipient lineages, respectively (Fig. S1).
To identify candidate horizontal transfers of genes predicted by antiSMASH to be in a SMGC from
more closely related donors (e.g., other filamentous fungi), we ran the AI pipeline a second time using
Xylariales (NCBI:txid 37989) as the ancestral lineage and manually curated subclades (see Table S1)
as recipient lineages (see Fig. S1). Genes from both the first (i.e., all genes, distant donors) and
second (i.e., SMGC genes, closely related donors) Genes were considered putative HGT candidates if
they passed the following filters: (i) AI score of > 0, (ii) significant hits to at least 25 sequences in the
custom database, and (iii) at least 50% of top hits to sequences outside of the ancestral lineage.
Candidates from the first AI screen were further validated using phylogenetic analyses
(described below) and designated as either high or low confidence HGT. Full-length proteins
corresponding to the top < 200 hits (E-value < 1 × 10-3) to each AI screen 1 candidate were extracted
from the custom database using esl-sfetch (Eddy, 2009). As our initial query-based trees often lacked
sufficient taxon sampling to assess HGT, we combined all orthogroup sequences with all extracted
top hits to each AI candidate. Sequences were aligned using MAFFT v7.407 using --auto (Katoh &
Standley, 2013) and the number of well aligned columns was determined with trimAL v.1.4. rev15
using its gappyout strategy (Capella-Gutiérrez et al., 2009). Only alignments with ≥50 retained
columns after trimAL were retained for phylogenetic analysis. Phylogenetic trees were constructed
with IQ-TREE v1.6.10 (Nguyen et al., 2015) in a single run with ModelFinder (Kalyaanamoorthy et
al., 2017) and SH-aLRT combined with ultrafast bootstrapping analyses (1,000 replicates each).
Phylogenies were visualized using iTOL v4 (Letunic & Bork, 2019). Each phylogenetic tree was
manually curated to verify HGT with either high or low confidence. High confidence HGT events had
to meet the following criteria: (i) the association between donor and recipient clades was supported by
ultrafast bootstrap >= 95 and (ii) recipient clade consisted of sequences from two or more species. If
the candidate met one of the two criteria, HGT was considered lower confidence.
Statistical analyses. To assess whether genes within different functional categories are associated
with endophytic ecological mode we performed phylogenetically independent contrasts (PICs)
This article is protected by copyright. All rights reserved
Accepted Article
(Felsenstein, 1985) with the function 'brunch' of the package 'caper' version 1.0.1 (Orme et al., 2012)
in R version 3.6.1. All other statistical analyses were done in R version 3.6.1 or JMP version 15.1
(SAS Institute Inc., Cary, NC).
RESULTS AND DISCUSSION
Genomes of 96 Xylariales taxa correspond to the previously recognized family Xylariaceae (Ju &
Rogers, 1996) that was recently split into multiple families (Hypoxylaceae, Xylariaceae,
Graphostromataceae, Barrmaeliaceae (Voglmayr et al., 2018; Wendt et al., 2018) (Fig. 1a; Fig. S2).
Here, we use the term xylarialean to refer to this monophyletic clade within the Xylariales. In
addition, because our analyses revealed seven undescribed endophytic isolates in five distinct clades
(i.e., clades E2, E4, E5, E6, and E6; Fig. S2) nested between the Graphostromataceae and Xylariaceae
sensu stricto, we refer to the sister clade to Hypoxylaceae as Xylariaceae sensu lato (hereafter,
Xylariaceae) following Voglmayr et al. (2018) (Fig 1).
Genome sequencing yielded eukaryotic BUSCO values ≥95% (Table S1). Xylarialean
genomes ranged in size from 33.7-60.3 Mbp (average 43.5 Mbp; Fig. S3; Table S1) and contained ca.
8,000-15,000 predicted genes (mean 11,871; Fig. S3), congruent with average genome and proteome
sizes of other Pezizomycotina (Shen et al., 2020). The percentage of repetitive elements per genome
ranged from <1- 24% (average 1.6%; Table S2), but unlike mycorrhizal fungi (Miyauchi et al., 2020),
repeat content was not corrected with ecological mode (Fig. S3).
Xylariaceae and Hypoxylaceae genomes contain hyperdiverse metabolic gene clusters. To
investigate the diversity and composition of metabolic gene clusters in xylarialean genomes, we used
antiSMASH (Blin et al., 2019) to mine genomes for SMGCs, as well as a custom pipeline to examine
catabolic gene clusters (CGCs) involved in fungal degradation of a broad array of plant
phenylpropanoids (Gluck-Thaler et al., 2018). Across 96 xylarialean genomes we predicted a total of
6,879 putative SMGCs (belonging to 3,313 cluster families) and 973 putative CGCs (belonging to 190
cluster families) (Tables S3 and S4). In comparison, recent large-scale analyses predicted 3,399
SMGCs (in 719 cluster families) across 101 Dothideomycetes genomes (Gluck-Thaler et al., 2020)
and 1,110 CGCs across 341 fungal genomes (Gluck-Thaler & Slot, 2018). Only 25% of predicted
This article is protected by copyright. All rights reserved
Accepted Article
SMGCs (n = 1,711 belonging to 816 cluster families) had BLAST hits to 168 unique MIBiG
(Medema et al., 2015) accession numbers (Table S3).
Total SMGCs diversity in the Xylariaceae and Hypoxylaceae is reflected in a high number of
SMGCs per genome: the average number of SMGCs per genome was 71.2 (median 68), which is
significantly higher than the average for other fungi in the Pezizomycotina (mean 42.8; Fig. 1b). At
least eight xylarialean genomes contained more than 100 predicted SMGCs, with a maximum of 119
in Anthostoma avocetta NRRL 3190 (Fig. 1b; Table S3). In comparison, a recent study of 24 species
of Penicillium found an average of 54.9 SMGCs per genome, with a maximum number of 78 SMGCs
observed in P. polonicum (Nielsen et al., 2017). Genomes of Xylariaceae and Hypoxylaceae
contained on average 3.3X more CGCs per genome (average 10.1; Table S4) compared to other
genomes of Pezizomycotina (average 3.0 (Gluck-Thaler et al., 2018)).
Every xylarialean genome contained SMGCs for the production of polyketides (PK; 2,871
total), nonribosomal peptides (NRP; 2,482 total), and terpenes (1,322 total; Fig. 1b; Table S3).
SMGCs for ribosomally synthesized and post-translationally modified peptides (RiPPs) and hybrid
NRP-PK compounds occurred less frequently (Fig. 1b). The most widely distributed and abundant
CGCs were pterocarpan hydroxylases (n = 93), putatively involved in isoflavonoid metabolism (Fig.
1d,e; Table S5). CGCs involved in the breakdown of plant salicylic acid (Ambrose et al., 2015) (n =
251 salicylate hydroxylases) and plant flavonoids (n = 170 naringenin 3−dioxygenases) also were
abundant (Fig. 1d,e). CGCs classified into nine other categories (e.g., phenol 2-monooxygenase,
quinate dehydrogenase) (Gluck-Thaler et al., 2018) occurred more rarely (Table S4). Vanillyl alcohol
oxidases, which were previously shown to be enriched in genomes of soil saprotrophs (Gluck-Thaler
et al., 2018), were absent in xylarialean genomes.
Consistent with the hyperdiversity of SMGCs in the Hypoxylaceae and Xylariaceae, we
observed that only ca. 10% of SMGCs were shared among genomes from both Xylariaceae and
Hypoxylaceae (Fig. 1c), and no SMGCs were universally present in both clades (Table S3). On
average, 21.4% and 28.2% of SMGCs per genome were unique to either a taxon in the Hypoxylaceae
or the Xylariaceae, respectively (range 0-82%; Fig. 1c; Table S4), but no SMGCs were universally
present within either clade. For most isolates, the majority of SMGCs were unique (i.e., 'isolate
specific'; Fig. 1c). Isolate specific SMGCs represented an average of 36.6% (SD ± 21.1) of the
This article is protected by copyright. All rights reserved
Accepted Article
clusters per genome (range 0-85.7%; Fig. 1c). Even when multiple isolates of the same species were
compared (e.g., Nemania serpens clade) 30-41% of the SMGCs appeared specific to a single isolate
(Fig 1b; Table S3), similar to intraspecific SMGC variation in Aspergillus flavus (Drott et al., 2021).
Impact of HGT on xylarialean genome evolution. To assess the role of HGT in shaping the genome
evolution of Xylariaceae and Hypoxylaceae we performed two Alien Index (AI) analyses (Alexander
et al., 2016; Wisecaver et al., 2016; Gonçalves et al., 2018). The first AI screen—designed to detect
candidate HGTs from more distantly related donor lineages (e.g., bacteria, plants)—flagged 4,262
genes representing 647 orthogroups (Table S5). Using a custom phylogenetic pipeline (see Methods)
we manually validated 168 of these genes as likely HGT events to Xylariaceae and Hypoxylaceae.
Based on branch support and the presence of multiple xylarialean taxa in the recipient clade, we
deemed 92 of these genes as high-confidence HGTs and the remaining 76 as lower confidence HGTs
(Fig. 2; Table S5). Similar to previous studies (Marcet-Houben & Gabaldón, 2010; Lawrence et al.,
2011), the majority of high-confidence HGTs are predicted to have been acquired from bacteria (n =
86) (Fig. 2). Overall, 66% of genes identified as HGT from bacterial donors do not contain introns
(compared to 6% of genes across 121 genomes). Other donor lineages include viruses (n = 3),
Basidiomycota (n = 2), and plants (n = 1) (Fig. 2; Table S5). On average, xylarialean genomes had
16.2 high-confidence HGT events per genome (range: 7-30; Table S5). The highest number of highconfidence HGT events per genome occurred in the genome of Xylaria flabelliformis CBS 123580 (n
= 30).
HGT candidate genes were typically distributed across taxa in numerous diverse clades (n =
85 of 92 genes) rather than in monophyletic clades (Fig. 2). For example, an Enoyl-acyl carrier
protein reductase protein (EC 1.3.1.9)—a key enzyme of the type II fatty acid synthesis (FAS) system
(Massengo-Tiassé & Cronan, 2009)—occurred in bacteria (putative donor) and four distantly related
recipient taxa: Xylariales sp. PMI 506, Hypoxylon rubiginosum ER1909; H. cercidicola CBS 119009;
H. fuscum CBS 119018 (HGT0001; Table S5). Multiple evolutionary scenarios could result in patchy
taxonomic distributions. For example, multiple fungi could have independently acquired the same
gene from closely related bacterial donors (Marcet-Houben & Gabaldón, 2010). Alternatively, an
initial HGT from bacteria to fungi may have been followed by fungal-fungal HGTs. In total, 38 HGT
This article is protected by copyright. All rights reserved
Accepted Article
candidate genes occurred in genomes of both Sordariomycetes outgroup and Xylariales genomes, 28
were found in only Xylariales genomes, and 26 were only observed in genomes of Xylariaceae and
Hypoxylaceae (Fig. 2; Table S5).
Functional annotation revealed the majority of candidate HGT genes were associated with at
least one type of annotation (i.e., 95% of the highly confident and 82% of the ambiguous events;
Table S5). Six high-confidence HGT candidate genes were annotated as CAZymes, including three
predicted plant cell wall degrading enzymes (PCWDEs) transferred from bacteria to diverse
Xylariales (Fig. 2). No genes predicted in CGCs were identified as candidate HGTs, consistent with
convergent evolution to result in similar clustering of fungal phenolic metabolism genes (GluckThaler et al., 2018). However, 43% of candidate HGT genes were predicted to be part of a SMGC
(i.e., 40 of 92) (Fig. 2; Tables S3 and S5). These include 13 genes predicted to have a biosynthetic
function, such as a putative FsC-acetyl coenzyme A-N2-transacetylase (HGT076; Table S5), which is
part of the siderophore biosynthetic pathway in Aspergillus implicated in fungal virulence (Blatzer et
al., 2011).
Due to the high prevalence of HGT among genes predicted to be part of SMGCs, we
performed a second AI screen to detect intra-fungal HGT events of genes within the boundaries of
SMGCs (n = 93,066 genes) (see Methods; Fig. S1). AI identified 1,148 genes in 660 SMGCs
(belonging to 594 cluster families) that were putatively transferred from other fungi to members of the
Xylariales (Table S5). Candidate HGT genes were primarily for polyketide and nonribosomal peptide
production (518 PKSs, 270 NRPSs, and 180 PKS-NRPS hybrid clusters). In addition, >75% of hits to
MIBiG contain genes identified by AI analyses as putative HGTs (see Fig. S4, bottom). SMGCs with
HGT candidate genes include those with 100% similarity to MIBiG accessions from Aspergillus,
Fusarium, and Parastagonospora involved in mycotoxin (e.g., cyclopiazonic acid, alternariol,
fusarin) and antimicrobial compound (asperlactone, koraiol) production, and clusters from Alternaria
that produce host-selective toxins (e.g., ACT-Toxin II) (Tables S3 and S5). Although the second AI
analysis did not flag every gene in these clusters as potential HGTs (e.g., only 4 of the 19 genes in the
alternariol cluster from Hypoxylon cercidicola CBS 119009 were HGT candidates based on AI; Table
S5) and we were not able to further validate candidates based on the same criteria used for high-
This article is protected by copyright. All rights reserved
Accepted Article
confidence HGT, the phylogenetic distribution of many of these SMGCs across Xylariales is
consistent with the acquisition of SMGCs via HGT (Fig. S4).
In addition to the AI screen for HGT candidates, we identified additional putative HGTs of
SMGCs to Xylariaceae and Hypoxylaceae based on their (i) high similarity to fungal MIBiG
accessions from distantly related fungi, and (ii) discontinuous phylogenetic distributions (Fig. S4).
Putative HGT of SMGCs included xylarialean SMGCs with >70% similarity to clusters for ergoline
alkaloids and their precursors (e.g., loline, ergovaline, and lysergic acid production) produced by
Clavicipitaceae endophytes, as well as the phytotoxin cichorine cluster from Aspergillus (Fig. S4;
Table S3). The griseofulvin cluster from Penicillium aethiopicum, which produces a potent antifungal
compound (Chooi et al., 2010), also appears horizontally transferred to the clade containing X.
castorea and X. flabelliformis isolates (Figs. S4-S5). Although the discontinuous phylogenetic
distributions of SMGCs observed here may represent unequal gene loss across taxa (Slot, 2017;
Rokas et al., 2018), the presence of entire clusters known from Eurotiomycetes and Sordariomycetes
in multiple endophytic and non-endophytic taxa provides additional support for HGTs. Overall, our
first AI analysis provides the highest support for HGTs primarily from distantly related hosts such as
bacteria (Fig. 2) (see also (Marcet-Houben & Gabaldón, 2010), yet our second AI screen and
comparisons of SMGCs to MiBIG within our phylogenomic framework also support fungal-fungal
HGT as an important mechanism of metabolic innovation in the Xylariales, similar to pathogenic
fungi (Qiu et al., 2016).
Expansion of Xylariaceae genomes due to increased gene duplication and HGTs. Despite the close
evolutionary relationship and similar ecological niches of taxa in the Xylariaceae and Hypoxylaceae,
genomes of Xylariaceae were on average ca. 7.2 Mbp larger than genomes of Hypoxylaceae (Fig. 3a;
Table S6). Larger genome size was associated with higher repeat content: Xylariaceae contained an
average of 2-fold more repetitive elements (Fig. 3b; Table S6) and had a higher density of repetitive
elements surrounding genes (including effectors and genes identified as HGT candidates) compared to
Hypoxylaceae genomes (Fig. S6).
In addition to greater repeat content, Xylariaceae genomes also contained on average 750
more protein-coding genes compared to Hypoxylaceae (P<0.0001; Table S6). Ancestral state
This article is protected by copyright. All rights reserved
Accepted Article
reconstructions reveal that Xylariaceae genomes have experienced significantly more gene gains (n =
472), gene duplication events (n = 136), orthogroup gains (n = 313), and orthogroup expansion events
(n = 90) compared to Hypoxylaceae clade since the radiation from their last common ancestor (Fig.
3c-d), although both clades underwent similar numbers of gene losses (t95 = 0.51, P=0.61; Table S6).
Xylariaceae genomes also experienced on average ca. 2-fold more HGTs events compared to
Hypoxylaceae genomes (Fig. 3e).
Increased genome sizes resulting from HGTs were positively associated with increased
numbers of SMGCs across both clades (Fig. 3f), reflecting the fact that clustered metabolite genes in
fungi are more likely to undergo HGT compared to unclustered genes (Wisecaver et al., 2014).
Genomes of Xylariaceae contained on average ca. 20 more SMGCs than Hypoxylaceae genomes
(Table S6) and ca. 2-fold greater cumulative richness of SMGCs compared to Hypoxylaceae clade
(2,336 vs. 1,075 total; 587 vs. 282 non-singleton). Rarefaction analysis reveals the richness of
SMGCs increases at a greater rate in the Xylariaceae clade (Fig. S7). Genomes of Xylariaceae also
contained a greater fraction of isolate specific SMGCs compared to Hypoxylaceae, regardless of
SMGC type (Xylariaceae: 31.2 ± 16.1; Hypoxylaceae: 19.8 ± 15.3; P = 0.0007; Fig. 1c; Fig. S8). Yet
despite the high variation of SMGCs among taxa, network analysis illustrates that the composition of
SMGCs is more similar among isolates from the same clade, regardless of ecological mode (Fig. S9).
In contrast to the pattern observed for SMGCs, genomes of Hypoxylaceae contained a greater
number of CGCs than Xylariaceae genomes (Xylariaceae: 9.5 ± 0.4; Hypoxylaceae: 11.0 ± 0.4; P =
0.0068; Table S4) and different classes of CGC dominated the two clades (Fig. 1d,e). For example,
salicylate hydroxylases were the most abundant CGCs among Hypoxylaceae, but were absent from
25% Xylariaceae genomes (Fig. 1d). Four types of CGCs were universally present across
Hypoxylaceae: salicylate hydroxylase, pterocarpan hydroxylase, naringenin 3-dioxygenase, phenol 2monooxygenase (Fig. 1d). CGCs classified as pterocarpan hydroxylases were the most abundant CGC
type in genomes of Xylariaceae (Fig. 1d), but were not found in all Xylariacaee genomes. Only CGCs
classified as naringenin 3-dioxygenases were found across all Xylariaceae genomes.
In addition to distinct metabolic gene cluster content and prevalence of HGT, comparison of
gene ontology (GO) terms for shared orthogroups significantly enriched in either Xylariaceae or
Hypoxylaceae (i.e., 74 and 26, respectively) revealed that the Hypoxylaceae had a significant increase
This article is protected by copyright. All rights reserved
Accepted Article
in the number of GO terms associated with membrane transport, whereas Xylariaceae had a
significant increase in the number of GO terms for catalytic activities and binding (Fig. S10).
Xylariaceae genomes also contained greater numbers of genes with signaling peptides, as well as
genes annotated as effectors, membrane transport proteins, transcription factors, peptidases, and
CAZymes compared to Hypoxylaceae, even after accounting for differences in genome size (Table
S6). On average genomes of Xylariaceae contained ca. 50 more CAZymes than Hypoxylaceae
(Xylariaceae 579.9 ± 7.7; Hypoxylaceae 529.6 ± 9.1, P <0.0001), including a significant increase in
PCWDEs involved in the degradation of cellulose, hemicellulose, lignin, pectin, and starch (Table
S6).
As genomes of fungi with saprotrophic lifestyles typically contain more CAZymes and
PCWDEs compared to plant pathogens and mycorrhizal symbionts (Knapp et al., 2018; Haridas et al.,
2020; Miyauchi et al., 2020), our genomic results are consistent with the potential for Xylariaceae
fungi (including endophytes) to have greater saprotrophic abilities compared to Hypoxylaceae fungi
(Osono, 2006). To test this prediction, we compared the abilities of 20 isolates to degrade leaves of
Pinus and Quercus. Regardless of trophic mode, isolates of Xylariaceae with expanded CAZymes and
PCWDEs repertoires caused greater mass loss compared to taxa with fewer genes predicted to
degrade lignocellulose (i.e., Hypoxylaceae and Xylariaceae from animal-dung clade; Fig. S11). In
addition to increased capacity for lignocellulose degradation, Xylariaceae endophyte species associate
with a greater phylogenetic diversity of plant and lichen hosts compared to species of Hypoxylaceae
endophytes (t42 = 2.25; P = 0.0294; Fig. 3g). Host breadth of Xylariaceae endophytes also is
positively associated with the number of total HGT events (r = 0.43, P = 0.0193), as well as the
number of peptidases (r = 0.37, P = 0.0444) and nonribosomal peptide (NRP) SMGCs (Fig. 3h).
Genomic differences between endophytic and non-endophytic fungi. Both culture-based and culturefree studies of healthy photosynthetic tissues of plants and lichens demonstrate the abundance and
novel diversity represented by xylarialean endophytes (U’Ren et al., 2016). However, some
endophytes can occur in both living host tissues as well as decomposing plant materials (Okane et al.,
2008; U’Ren et al., 2016; U’Ren & Arnold, 2016) and often are closely related to described species of
saprotrophs and pathogens (U’Ren et al., 2016). This suggests that for some species, endophytism
This article is protected by copyright. All rights reserved
Accepted Article
may represent only part of a complex life cycle that blurs the lines between distinct ecological modes
(U’Ren et al., 2016; Chen et al., 2018) and few genomic signatures may be associated with the
evolution of endophytism in the Xylariaceae and Hypoxylaceae.
Overall, when we analyzed all ingroup genomes we observed no clear distinctions in genome
size or content due to different ecological modes, even after taking phylogeny into account (Table
S6). One exception was the reduced genomes and CAZyme content of termite-associated Xylaria spp.
(i.e., X. nigripes YMJ 653, X. sp. CBS 124048, and X. intraflava YMJ725; Figs. S3 and S12) that
reflects a single evolutionary transition to specialization on termite nest substrates decomposed by a
basidiomycete fungus (Hsieh et al., 2010). However, as evolutionary distance among taxa can impede
detection of finer-scale genomic differences due to ecological mode (Harrington et al., 2019), we
restricted our analyses to comparisons of 15 pairs of sister taxa across both clades with contrasting
ecological modes. These pairwise comparisons revealed that endophytic Hypoxylaceae genomes
contain significantly fewer genes with signaling peptides, protein coding genes, transporters,
peptidases, PCWDEs (especially those involved in decomposition of cellulose and lignin), SMGCs,
and CGCs compared to non-endophytes (Fig. 4). Yet, similar to the lack of reduced genome
repertoires in some root endophytes (Xu et al., 2015; Lahrmann et al., 2015), no significant
differences in genomic content were observed between paired endophytes and non-endophytes in the
Xylariaceae clade (Fig. 4; Table S6).
These results suggest that compared to endophytes and saprotrophs in the Hypoxylaceae,
Xylariaceae taxa may have less distinct ecological modes, and their increased metabolic versatility
may be the result of selection maintaining diverse genes for both endophytism and saprotrophy. As
saprotrophs, fungi experience strong selection to maintain highly diverse SMGCs that increase
competitive abilities in diverse microbial communities (Richards & Talbot, 2013; Rokas et al., 2018;
Naranjo‐Ortiz & Gabaldón, 2020), as well as large gene repertoires to degrade lignocellulosic
compounds (Haridas et al., 2020). Accordingly, we observed that in genomes of non-endophytic
Xylariaceae and Hypoxylaceae SMGC abundance is positively correlated with the number of genes
important for saprotrophy (e.g., CAZymes, transporters) and putative pathogenicity (e.g., signaling
peptides, effectors, peptidases), even after accounting for differences among clades and genome sizes
(Fig. 5; Table S6). In contrast, we found that endophyte SMGC abundance was decoupled from the
This article is protected by copyright. All rights reserved
Accepted Article
majority of genomic factors involved in plant-fungal interactions (Fig. 5), due in part to fewer
numbers of CAZymes, transports, and peptidases annotated in SMGCs (Table S6). These results are
consistent with different selection pressures and ecological roles of SMGCs in endophytic and nonendophytic fungi and highlight the importance of phylogenetically informed comparisons to detect
genomic differences associated with endophytism, as well as complexity of linking genotype to
phenotype for complex traits, especially in dynamic genomes undergoing frequent HGT.
CONCLUSIONS
Our analysis of 96 phylogenetically and ecologically diverse Xylariaceae and Hypoxylaceae genomes
reveals that gene duplication, gene family expansion, and HGT of SMGCs, effectors, and peptidases
from putative bacterial and fungal donors drives metabolic versatility in the Xylariaceae. Expanded
metabolic diversity and secondary metabolism of Xylariaceae taxa is associated with greater
ecological generalism in both substrate usage and the phylogenetic breadth of symbiotic associations
compared to Hypoxylaceae taxa. Correlations between endophyte host breadth, HGT, and abundance
of NRPs also indicate that SMGCs may play a key role in facilitating xylarialean endophyte
colonization of diverse hosts. For example, although NRPs are known for their role as virulence
factors of phytopathogenic fungi (e.g., host-selective toxins or siderophores) (Oide & Turgeon, 2020),
previous research has shown that an NRPS is essential for the endophyte Neotyphodium/Epichloë to
establish symbiosis with its host (Johnson et al., 2007). Overall, our results highlight the importance
of plant-fungal symbioses to drive not only fungal speciation and ecological diversification (Joy,
2013), but vast chemical biodiversity that can be leveraged for novel pharmaceuticals and
agrochemicals (Becker & Stadler, 2021; Robey et al., 2021).
ACKNOWLEDGEMENTS
Funding for the project was provided by the DOE JGI Large-scale Community Science Project (Grant
number 503506 to JMU, JHW, AEA). MEEF was funded by the Office for Research, Innovation and
Impact at the University of Arizona and the University of Arizona BIO5 Postdoctoral Fellowship
Program. FL and JM received financial support from NSF DEB-1541548 and DEB-1046065, and
AEA received support from NSF DEB-1541496 and DEB-1045766. These awards and NSF DEB-
This article is protected by copyright. All rights reserved
Accepted Article
0640996 to AEA and DEB-1010675 to AEA and JMU supported the initial collections of endophytes.
We thank F. Martin, P. Gladieux, J. Spatafora, R. Vilgalys, and K. O`Donnell for permission to use
unpublished JGI F1000 genomes; D. Bellomo, Y. Sanchez-Rosario, and S. Valdez for laboratory
assistance; and the Genomics Analysis and Sequencing Core (GATC), the Arizona Genomics Institute
(AGI), and the High-Performance Computer (HPC) at the University of Arizona for technical support.
The authors declare no competing interests.
AUTHOR CONTRIBUTIONS
Designed research: JMU, JHW, AEA, MEEF; Performed field or laboratory research: JMU, LPM,
YMJ, HMH, AEA, FL, JM; Contributed fungal isolates: YMJ, HMH, AEA, DCE, RCH; Genome and
transcriptome sequencing, assembly, annotation: SA, SJM, AK, RH, SH, BA, RR, K. LaButti, JP, AL,
MA, JY, CA, KK, VN, IVG; Metabolomics: K. Louie, TN. Contributed analytic tools: JHW, JCS,
KYC; Analyzed data: MEEF, JMU, JHW, SA, KEE, KS, ZK, ED, BH; Wrote the paper: MEEF,
JMU, JHW, with contributions from AEA, JCS, FL, JM, IVG, SA.
DATA AVAILABILITY
Raw sequence data, assembled sequences, and genome annotations are available through the
corresponding MycoCosm portal (https://mycocosm.jgi.doe.gov/). NCBI accession numbers are listed
in Table S1. All other data, including Supporting Data Note S1, can be found in FigShare Repository
(DOI 10.6084/m9.figshare.c.5314025).
This article is protected by copyright. All rights reserved
Accepted Article
REFERENCES
Alexander WG, Wisecaver JH, Rokas A, Hittinger CT. 2016. Horizontally acquired genes in
early-diverging pathogenic fungi enable the use of host nucleosides and nucleotides. Proceedings of
the National Academy of Sciences of the United States of America 113: 4116–4121.
Ambrose KV, Tian Z, Wang Y, Smith J, Zylstra G, Huang B, Belanger FC. 2015. Functional
characterization of salicylate hydroxylase from the fungal endophyte Epichloë festucae. Scientific
Reports 5: 10939.
Arnold AE, Miadlikowska J, Higgins KL, Sarvate SD, Gugger P, Way A, Hofstetter V, Kauff F,
Lutzoni F. 2009. A phylogenetic estimation of trophic transition networks for ascomycetous fungi:
are lichens cradles of symbiotrophic fungal diversification? Systematic Biology 58: 283–297.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko
SI, Pham S, Prjibelski AD, et al. 2012. SPAdes: a new genome assembly algorithm and its
applications to single-cell sequencing. Journal of Computational Biology 19: 455–477.
Bao W, Kojima KK, Kohany O. 2015. Repbase Update, a database of repetitive elements in
eukaryotic genomes. Mobile DNA 6: 11.
Becker K, Stadler M. 2021. Recent progress in biodiversity research on the Xylariales and their
secondary metabolism. The Journal of Antibiotics 74: 1–23.
Blatzer M, Schrettl M, Sarg B, Lindner HH, Pfaller K, Haas H. 2011. SidL, an Aspergillus
fumigatus transacetylase involved in biosynthesis of the siderophores ferricrocin and
hydroxyferricrocin. Applied and Environmental Microbiology 77: 4959–4966.
Blin K, Shaw S, Steinke K, Villebro R, Ziemert N, Lee SY, Medema MH, Weber T. 2019.
antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids
Research 47: W81–W87.
Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND.
Nature Methods 12: 59–60.
This article is protected by copyright. All rights reserved
Accepted Article
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973.
Chen K-H, Liao H-L, Arnold AE, Bonito G, Lutzoni F. 2018. RNA-based analyses reveal fungal
communities structured by a senescence gradient in the moss Dicranum scoparium and the presence
of putative multi-trophic fungi. New Phytologist 218: 1597–1611.
Chooi Y-H, Cacho R, Tang Y. 2010. Identification of the viridicatumtoxin and griseofulvin gene
clusters from Penicillium aethiopicum. Chemistry & Biology 17: 483–494.
Csurös M. 2010. Count: evolutionary analysis of phylogenetic profiles with parsimony and
likelihood. Bioinformatics 26: 1910–1912.
Drott MT, Rush TA, Satterlee TR, Giannone RJ, Abraham PE, Greco C, Venkatesh N, Skerker
JM, Glass NL, Labbé JL, et al. 2021. Microevolution in the pansecondary metabolome of
Aspergillus flavus and its potential macroevolutionary implications for filamentous fungi.
Proceedings of the National Academy of Sciences of the United States of America 118: e2021683118.
Eddy SR. 2009. A new generation of homology search tools based on probabilistic inference.
Genome informatics 23: 205–211.
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ,
Salazar GA, Smart A, et al. 2019. The Pfam protein families database in 2019. Nucleic Acids
Research 47: D427–D432.
Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics.
Genome Biology 20: 238.
Felsenstein J. 1985. Phylogenies and the comparative method. The American Naturalist 125: 1–15.
Gilchrist CLM, Chooi Y-H. 2021. clinker & clustermap.js: Automatic generation of gene cluster
comparison figures. Cold Spring Harbor Laboratory: 37: 2473–2475.
Gluck-Thaler E, Haridas S, Binder M, Grigoriev IV, Crous PW, Spatafora JW, Bushley K, Slot
This article is protected by copyright. All rights reserved
Accepted Article
JC. 2020. The architecture of metabolism maximizes biosynthetic diversity in the largest class of
fungi. Molecular Biology and Evolution 37: 2838–2856.
Gluck-Thaler E, Slot JC. 2018. Specialized plant biochemistry drives gene clustering in fungi. The
ISME Journal 12: 1694–1705.
Gluck-Thaler E, Vijayakumar V, Slot JC. 2018. Fungal adaptation to plant defences through
convergent assembly of metabolic modules. Molecular Ecology 27: 5120–5136.
Gonçalves C, Wisecaver JH, Kominek J, Oom MS, Leandro MJ, Shen X-X, Opulente DA, Zhou
X, Peris D, Kurtzman CP, et al. 2018. Evidence for loss and reacquisition of alcoholic fermentation
in a fructophilic yeast lineage. eLife 7: e33034.
Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R, Riley R, Salamov A, Zhao X,
Korzeniewski F, et al. 2014. MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids
Research 42: D699–704.
Haridas S, Albert R, Binder M, Bloem J, LaButti K, Salamov A, Andreopoulos B, Baker SE,
Barry K, Bills G, et al. 2020. 101 Dothideomycetes genomes: A test case for predicting lifestyles and
emergence of pathogens. Studies in Mycology 96: 141–153.
Harrington AH, Olmo-Ruiz M del, U’Ren JM, Garcia K, Pignatta D, Wespe N, Sandberg DC,
Huang Y-L, Hoffman MT, Arnold AE. 2019. Coniochaeta endophytica sp. nov., a foliar endophyte
associated with healthy photosynthetic tissue of Platycladus orientalis (Cupressaceae). Plant and
Fungal Systematics 64: 65–79.
Hsieh H-M, Ju Y-M, Rogers JD. 2005. Molecular phylogeny of Hypoxylon and closely related
genera. Mycologia 97: 844–865.
Hsieh H-M, Lin C-R, Fang M-J, Rogers JD, Fournier J, Lechat C, Ju Y-M. 2010. Phylogenetic
status of Xylaria subgenus Pseudoxylaria among taxa of the subfamily Xylarioideae (Xylariaceae)
and phylogeny of the taxa involved in the subfamily. Molecular Phylogenetics and Evolution 54:
957–969.
This article is protected by copyright. All rights reserved
Accepted Article
Johnson R, Voisey C, Johnson L, Pratt J, Fleetwood D, Khan A, Bryan G. 2007. Distribution of
NRPS gene families within the Neotyphodium/Epichloë complex. Fungal Genetics and Biology 44:
1180–1190.
Joy JB. 2013. Symbiosis catalyses niche expansion and diversification. Proceedings of the Royal
Society B: Biological Sciences 280: 20122820.
Ju YM, Rogers JD. 1996. A revision of the genus Hypoxylon. Mycologia Memoir no. 20. St. Paul,
MN: APS Press.
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. ModelFinder: fast
model selection for accurate phylogenetic estimates. Nature Methods 14: 587–589.
Kameshwar AKS, Ramos LP, Qin W. 2019. CAZymes-based ranking of fungi (CBRF): an
interactive web database for identifying fungi with extrinsic plant biomass degrading abilities.
Bioresources and Bioprocessing 6: 51.
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki
M, Hirakawa M. 2006. From genomics to chemical genomics: new developments in KEGG. Nucleic
Acids Research 34: D354–7.
Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7:
improvements in performance and usability. Molecular Biology and Evolution 30: 772–780.
Keeling PJ, Burki F, Wilcox HM, Allam B, Allen EE, Amaral-Zettler LA, Armbrust EV,
Archibald JM, Bharti AK, Bell CJ, et al. 2014. The Marine Microbial Eukaryote Transcriptome
Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans
through transcriptome sequencing. PLoS Biology 12: e1001889.
Knapp DG, Németh JB, Barry K, Hainaut M, Henrissat B, Johnson J, Kuo A, Lim JHP, Lipzen
A, Nolan M, et al. 2018. Comparative genomics provides insights into the lifestyle and reveals
functional heterogeneity of dark septate endophytic fungi. Scientific Reports 8: 6321.
Kuo A, Bushnell B, Grigoriev IV. 2014. Fungal genomics: sequencing and annotation. Advances in
This article is protected by copyright. All rights reserved
Accepted Article
Botanical Research 70: 1–52.
Laetsch DR, Blaxter ML. 2017. KinFin: software for taxon-aware analysis of clustered protein
sequences. G3 7: 3349–3357.
Lahrmann U, Strehmel N, Langen G, Frerigmann H, Leson L, Ding Y, Scheel D, Herklotz S,
Hilbert M, Zuccaro A. 2015. Mutualistic root endophytism is not associated with the reduction of
saprotrophic traits and requires a noncompromised plant innate immunity. New Phytologist 207: 841–
857.
Lawrence DP, Kroken S, Pryor BM, Arnold AE. 2011. Interkingdom gene transfer of a hybrid
NPS/PKS from bacteria to filamentous Ascomycota. PloS one 6: e28231.
Letunic I, Bork P. 2019. Interactive Tree Of Life (iTOL) v4: recent updates and new developments.
Nature Acids Research 47: W256–W259.
Li J, Cornelissen B, Rep M. 2020. Host-specificity factors in plant pathogenic fungi. Fungal
genetics and Biology: FG & B 144: 103447.
Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. 2014. The carbohydrateactive enzymes database (CAZy) in 2013. Nucleic Acids Research 42: D490–5.
Marcet-Houben M, Gabaldón T. 2010. Acquisition of prokaryotic genes by fungal genomes. Trends
in Genetics: TIG 26: 5–8.
Massengo-Tiassé RP, Cronan JE. 2009. Diversity in enoyl-acyl carrier protein reductases. Cellular
and Molecular Life Sciences: CMLS 66: 1507–1517.
Matasci N, Hung L-H, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, Nguyen N, Warnow T,
Ayyampalayam S, Barker M, et al. 2014. Data access for the 1,000 Plants (1KP) project.
GigaScience 3: 17.
Medema MH, Kottmann R, Yilmaz P, Cummings M, Biggins JB, Blin K, de Bruijn I, Chooi
YH, Claesen J, Coates RC, et al. 2015. Minimum Information about a Biosynthetic Gene cluster.
This article is protected by copyright. All rights reserved
Accepted Article
Nature Chemical Biology 11: 625–631.
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang H-Y, ElGebali S, Fraser MI, et al. 2019. InterPro in 2019: improving coverage, classification and access to
protein sequence annotations. Nucleic Acids Research 47: D351–D360.
Miyauchi S, Kiss E, Kuo A, Drula E, Kohler A, Sánchez-García M, Morin E, Andreopoulos B,
Barry KW, Bonito G, et al. 2020. Large-scale genome sequencing of mycorrhizal fungi provides
insights into the early evolution of symbiotic traits. Nature Communications 11: 5125.
Naranjo‐Ortiz MA, Gabaldón T. 2020. Fungal evolution: cellular, genomic and metabolic
complexity. Biological Reviews. 95: 1198–1232.
Navarro-Muñoz JC, Selem-Mojica N, Mullowney MW, Kautsar SA, Tryon JH, Parkinson EI,
De Los Santos ELC, Yeong M, Cruz-Morales P, Abubucker S, et al. 2020. A computational
framework to explore large-scale biosynthetic diversity. Nature Chemical Biology 16: 60–68.
Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective
stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and
Evolution 32: 268–274.
Nielsen H. 2017. Predicting secretory proteins with SignalP. Methods in Molecular Biology 1611:
59–73.
Nielsen JC, Grijseels S, Prigent S, Ji B, Dainat J, Nielsen KF, Frisvad JC, Workman M, Nielsen
J. 2017. Global analysis of biosynthetic gene clusters reveals vast potential of secondary metabolite
production in Penicillium species. Nature Microbiology 2: 17044.
Oide S, Turgeon BG. 2020. Natural roles of nonribosomal peptide metabolites in fungi. Mycoscience
61: 101–110.
Okane I, Toyama K, Nakagiri A, Suzuki K-I, Srikitikulchai P, Sivichai S, Hywel-Jones N,
Potacharoen W, Læssøe T. 2008. Study of endophytic Xylariaceae in Thailand: diversity and
taxonomy inferred from rDNA sequence analyses with saprobes forming fruit bodies in the field.
This article is protected by copyright. All rights reserved
Accepted Article
Mycoscience 49: 359–372.
O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B,
Smith-White B, Ako-Adjei D, et al. 2016. Reference sequence (RefSeq) database at NCBI: current
status, taxonomic expansion, and functional annotation. Nucleic Acids Research 44: D733–45.
Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, Pearse W, Orme MD. 2012
caper: Comparative Analyses of Phylogenetics and Evolution in R. R package version 1.0.1.
https://CRAN.R-project.org/package=caper.
Osono T. 2006. Role of phyllosphere fungi of forest trees in the development of decomposer fungal
communities and decomposition processes of leaf litter. Canadian Journal of Microbiology 52: 701–
716.
Peay KG, Kennedy PG, Talbot JM. 2016. Dimensions of biodiversity in the Earth mycobiome.
Nature Reviews Microbiology 14: 434–447.
Porras-Alfaro A, Bayman P. 2011. Hidden fungi, emergent properties: endophytes and
microbiomes. Annual Review of Phytopathology 49: 291–315.
Price AL, Jones NC, Pevzner PA. 2005. De novo identification of repeat families in large genomes.
Bioinformatics 21 Suppl 1: i351–8.
Qiu H, Cai G, Luo J, Bhattacharya D, Zhang N. 2016. Extensive horizontal gene transfers between
plant pathogenic fungi. BMC biology 14: 41.
Rawlings ND, Barrett AJ, Finn R. 2016. Twenty years of the MEROPS database of proteolytic
enzymes, their substrates and inhibitors. Nucleic Acids Research 44: D343–50.
Richards TA, Talbot NJ. 2013. Horizontal gene transfer in osmotrophs: playing with public goods.
Nature Reviews Microbiology 11: 720–727.
Robey MT, Caesar LK, Drott MT, Keller NP, Kelleher NL. 2021. An interpreted atlas of
biosynthetic gene clusters from 1,000 fungal genomes. Proceedings of the National Academy of
This article is protected by copyright. All rights reserved
Accepted Article
Sciences of the United States of America 118: e2020230118
Rodriguez RJ, White JF Jr, Arnold AE, Redman RS. 2009. Fungal endophytes: diversity and
functional roles: Tansley review. New Phytologist 182: 314–330.
Rokas A, Wisecaver JH, Lind AL. 2018. The birth, evolution and death of metabolic gene clusters
in fungi. Nature Reviews Microbiology 16: 731–744.
Saier MH Jr, Reddy VS, Tsu BV, Ahmed MS, Li C, Moreno-Hagelsieb G. 2016. The Transporter
Classification Database (TCDB): recent advances. Nature Acids Research 44: D372–9.
Shen X-X, Steenwyk JL, LaBella AL, Opulente DA, Zhou X, Kominek J, Li Y, Groenewald M,
Hittinger CT, Rokas A. 2020. Genome-scale phylogeny and contrasting modes of genome evolution
in the fungal phylum Ascomycota. Science Advances 6. eabd0079.
Slot JC. 2017. Fungal Gene Cluster Diversity and Evolution. Advances in Genetics 100: 141–178.
Sperschneider J, Dodds PN, Gardiner DM, Singh KB, Taylor JM. 2018. Improved prediction of
fungal effector proteins from secretomes with EffectorP 2.0. Molecular Plant Pathology 19: 2094–
2110.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM,
Mazumder R, Mekhedov SL, Nikolskaya AN, et al. 2003. The COG database: an updated version
includes Eukaryotes. BMC bioinformatics 4: 41.
The Gene Ontology Consortium. 2019. The Gene Ontology Resource: 20 years and still GOing
strong. Nature Acids Research 47: D330–D338.
Trivedi P, Leach JE, Tringe SG, Sa T, Singh BK. 2020. Plant–microbiome interactions: from
community assembly to plant health. Nature Reviews Microbiology 18: 607–621.
U’Ren JM, Arnold AE. 2016. Diversity, taxonomic composition, and functional aspects of fungal
communities in living, senesced, and fallen leaves at five sites across North America. PeerJ 4: e2768.
U’Ren JM, Lutzoni F, Miadlikowska J, Laetsch AD, Arnold AE. 2012. Host and geographic
This article is protected by copyright. All rights reserved
Accepted Article
structure of endophytic and endolichenic fungi at a continental scale. American Journal of Botany 99:
898–914.
U’Ren JM, Lutzoni F, Miadlikowska J, Zimmerman NB, Carbone I, May G, Arnold AE. 2019.
Host availability drives distributions of fungal endophytes in the imperiled boreal realm. Nature
Ecology and Evolution 3: 1430–1437.
U’Ren JM, Miadlikowska J, Zimmerman NB, Lutzoni F, Stajich JE, Arnold AE. 2016.
Contributions of North American endophytes to the phylogeny, ecology, and taxonomy of
Xylariaceae (Sordariomycetes, Ascomycota). Molecular Phylogenetics and Evolution 98: 210–232.
Verster KI, Wisecaver JH, Karageorgi M, Duncan RP, Gloss AD, Armstrong EE, Price DK,
Menon AR, Ali ZM, Whiteman NK. 2019. Horizontal transfer of bacterial cytolethal distending
Toxin B genes to insects. Molecular Biology and Evolution 36: 2105–2110.
Voglmayr H, Friebes G, Gardiennet A, Jaklitsch WM. 2018. Barrmaelia and Entosordaria in
Barrmaeliaceae (fam. nov., Xylariales) and critical notes on Anthostomella -like genera based on
multigene phylogenies. Mycological Progress 17: 155–177.
Wendt L, Sir EB, Kuhnert E, Heitkämper S, Lambert C, Hladki AI, Romero AI, Luangsa-ard
JJ, Srikitikulchai P, Peršoh D, et al. 2018. Resurrection and emendation of the Hypoxylaceae,
recognised from a multigene phylogeny of the Xylariales. Mycological Progress 17: 115–154.
Wibberg D, Stadler M, Lambert C, Bunk B, Spröer C, Rückert C, Kalinowski J, Cox RJ,
Kuhnert E. 2021. High quality genome sequences of thirteen Hypoxylaceae (Ascomycota)
strengthen the phylogenetic family backbone and enable the discovery of new taxa. Fungal Diversity.
106: 7–28.
Wisecaver JH, Alexander WG, King SB, Hittinger CT, Rokas A. 2016. Dynamic evolution of
Nitric Oxide detoxifying flavohemoglobins, a family of single-protein metabolic modules in Bacteria
and Eukaryotes. Molecular Biology and Evolution 33: 1979–1987.
Wisecaver JH, Slot JC, Rokas A. 2014. The evolution of fungal metabolic pathways. PLoS Genetics
This article is protected by copyright. All rights reserved
Accepted Article
10: e1004816.
Wu W, Davis RW, Tran-Gyamfi MB, Kuo A, LaButti K, Mihaltcheva S, Hundley H, Chovatia
M, Lindquist E, Barry K, et al. 2017. Characterization of four endophytic fungi as potential
consolidated bioprocessing hosts for conversion of lignocellulose into advanced biofuels. Applied
Microbiology and Biotechnology 101: 2603–2618.
Xu X-H, Su Z-Z, Wang C, Kubicek CP, Feng X-X, Mao L-J, Wang J-Y, Chen C, Lin F-C,
Zhang C-L. 2015. The rice endophyte Harpophora oryzae genome reveals evolution from a pathogen
to a mutualistic endophyte. Scientific Reports 4: 5783.
SUPPORTING INFORMATION
Methods S1. Additional information on strain selection, fungal growth and nucleic acid extraction,
genome and transcriptome sequencing, gene prediction and genome assembly, phylogenetic analyses,
comparative genomic analyses, litter decomposition assays, and metabolomics.
Note S1. List and description of appendices S1-S10 available on FigShare Repository, DOI
10.6084/m9.figshare.c.5314025.
Fig. S1. Overview of Alien Index (AI) calculations to identify HGT.
Fig. S2. Results of phylogenomic and network analysis of 1,526 universal single-copy orthologous
protein sequences.
Fig. S3. Phylogenomic reconstruction of Xylariaceae s.l. and Hypoxylaceae and genome statistics.
Fig S4. Dynamic distribution of 168 Xylariaceae and Hypoxylaceae SMGCs with hits to known
metabolites in the MIBiG repository.
Fig. S5. Similarity of the griseofulvin SMGC in Penicillium and Xylaria supports HGT.
This article is protected by copyright. All rights reserved
Accepted Article
Fig. S6. The density of repetitive elements surrounding genes was higher for Xylariaceae s.l. than for
Hypoxylaceae genomes.
Fig. S7. Rarefaction analysis illustrates higher SMGC diversity in Xylariaceae compared to
Hypoxylaceae.
Fig. S8. The majority of SMGCs are specific to Hypoxylaceae or Xylariaceae s.l. clades or individual
isolates regardless of SMGC type.
Fig. S9. Network analysis illustrates the importance of clade rather than ecological mode for SMGC
content.
Fig. S10. Orthogroup enrichment suggests functional differences for Xylariaceae s.l. and
Hypoxylaceae.
Fig. S11. Xylariaceae s.l. taxa demonstrate increased decomposition abilities (estimated via mass
loss) on leaf litter compared to fungi with reduced genomes (i.e., Hypoxylaceae and animal dung
Xylariaceae s.l. in the Poronia clade).
Fig. S12. Relative abundance of functional gene categories across Xylariaceae s.l. and Hypoxylaceae.
Table S1. Sequencing and assembly statistics for the 121 genomes included in this study.
Table S2. RepeatMasker, RepeatScout, and RepBase Update classification of repetitive elements for
96 genomes of Xylariaceae s.l. and Hypoxylaceae.
Table S3. Secondary metabolite gene clusters (SMGCs) and cluster families for the 96 Hypoxylaceae
and Xylariaceae s.l. genomes included in this study.
This article is protected by copyright. All rights reserved
Accepted Article
Table S4. Catabolic gene clusters (CGCs) and cluster families for the 96 Hypoxylaceae and
Xylariaceae s.l. genomes included in this study.
Table S5. Taxonomic, phylogenetic, and functional annotation information for HGT candidate genes
identified by Alien Index (AI) analyses.
Table S6. Counts and statistical comparisons of genome content as a function of major clade
(Xylariaceae s.l. vs. Hypoxylaceae) and ecological mode (endophyte vs. non-endophyte).
FIGURE LEGENDS
Fig 1. Xylariaceae s.l. and Hypoxylaceae genomes are characterized by hyperdiverse and
dynamic metabolic gene clusters. (a) Maximum likelihood phylogenetic analyses of 1,526 universal,
single-copy orthogroups support the sister relationship of the Xylariaceae s.l. (containing Xylariaceae
sensu stricto, Graphostromataceae, and Barrmaeliaceae) and the Hypoxylaceae, as well as previously
denoted relationships among genera (see Fig. S2). Phylogenetic analyses included genomes of 25
outgroup taxa representing five other families of Xylariales and eight orders of Sordariomycetes (total
121 genomes; Fig. S2). Taxon names are colored by ecological mode and branches colored by major
clade (red: Xylariaceae s.l.; blue: Hypoxylaceae). Taxa with asterisks (*) represent 15 pairs of
endophyte/non-endophyte sister taxa used to assess differences in genomic content due to ecological
mode (see Fig. 4). Within this phylogenetic framework, we compared the: (b) abundance of different
secondary metabolite gene cluster (SMGC) families per genome. Dotted lines indicate the averages
for Pezizomycotina (black), Xylariaceae s.l. (red), and Hypoxylaceae (blue); (c) relative abundance of
family-specific, clade-specific, and isolate-specific SMGCs; (d) relative abundance and (e)
presence/absence of catabolic gene clusters (CGCs), colored by anchor gene identity (Gluck-Thaler &
Slot, 2018). Hierarchical clustering of CGCs (see bottom) was performed with the unweighted pair
group method with arithmetic mean (UPGMA).
This article is protected by copyright. All rights reserved
Accepted Article
Fig 2. Phylogenetic distribution and functional annotation of high confidence horizontal gene
transfers (HGTs) to genomes of Xylariaceae s.l. and Hypoxylaceae. Phylogeny matches Fig. 1a.
Blue boxes represent genes predicted to be high-confidence HGT events (detected with the first round
of Alien Index analyses; Table S5). HGT events are ordered from left to right based on their
abundance. Transfers with more than one gene copy per genome are indicated with >1. Colored boxes
indicate putative functional annotations of HGTs: secondary metabolite gene clusters (SMGCs),
effectors, signaling peptides, transporters, peptidases, and carbohydrate active enzymes (CAZymes).
SMGCs predicted as 'biosynthetic-core' and 'biosynthetic-additional' are shown with darker purple,
whereas other genes in SMGCs are shown with light purple. For CAZyme predictions, dark brown
color indicates plant cell wall-degrading enzymes (PCWDEs). The bottom panel (Transfer Direction)
indicates the taxonomic identity of putative donor and recipient lineage(s) inferred from phylogenetic
analyses.
Fig 3. Larger genomes in the Xylariaceae s.l. clade reflect increased repetitive regions, gene
gains and duplications, and horizontal gene transfers (HGTs). Median (a) genome size, (b)
repetitive element content, (c) gene gains, (d) gene duplications, and (e) number of putative HGT
events (high confidence only) for genomes of Xylariaceae s.l. (red) and Hypoxylaceae (blue). Box
plot boundaries reflect the interquartile range. Summary statistics (mean, standard deviation, and
sample size) are reported in Table S6. Gene gains/losses were inferred with Wagner Parsimony under
a gain penalty=loss penalty=1; (f) Relationship between the number of HGT events and secondary
metabolite gene clusters (SMGCs) as a function of clade. The shaded region indicates 95%
confidence interval of linear fit. Statistics represent Pearson's correlation coefficient (r) and p-value;
(g) A quantile box plot showing the interquartile range and median of endophyte host breadth
(measured as total number of plant families and lichen orders with which a fungal operational
taxonomic unit (OTU) was cultured (U’Ren et al., 2016)) as a function of major clade (color). A
similar pattern was observed when only the number of plant families are compared (Wilcoxon: 𝛘2 =
4.14, P=0.0413), but not lichen orders (Wilcoxon: 𝛘2 = 1.77, P=0.1834). (h) Relationship of
Xylariaceae endophyte host breadth and the number of SMGCs classified as nonribosomal peptides
This article is protected by copyright. All rights reserved
Accepted Article
(NRPs) per genome. The shaded region indicates 95% confidence interval of linear fit. Statistics
represent Pearson's correlation coefficient (r) and p-value.
Fig 4. Pairwise comparisons of sister taxa illustrate ecological modes are more distinct in the
Hypoxylaceae. Box plots of the median and interquartile difference in gene counts of plant cell walldegrading enzymes (PCWDEs), peptidases, secondary metabolite gene clusters (SMGCs) (y-axis on
left), and transporters (y-axis on right) between 15 pairs of sister taxa with contrasting ecological
modes for Xylariaceae s.l. and Hypoxylaceae (sister taxa are indicated with asterisks in Fig. 1a).
Values greater than zero indicate higher gene counts in non-endophytic taxa, whereas differences less
than zero indicate higher gene counts in endophytes. Statistical differences were assessed with least
squares means contrast under the null hypothesis: non-endophyte value - endophyte value = 0 (see
Table S6 for summary statistics). P-values <0.05 are indicated with an asterisk (*).
Fig. 5. Correlation of secondary metabolite gene cluster (SMGC) content and genes involved in
saprotrophy and/or pathogenicity. Relationship between SMGC abundance and number of genes
annotated as (a) carbohydrate active enzymes (CAZymes), (b) effectors, (c) peptidases, and (d)
transporters for endophytes (top row) and non-endophytes (bottom row). Values for each genome
represent the residuals after accounting for genome size. Points, linear regression line, and shaded
95% confidence intervals of fit are color-coded by clade (red, Xylariaceae; blue Hypoxylaceae).
Statistical values represent Pearson correlation coefficient. See Table S6 for additional details.
This article is protected by copyright. All rights reserved
Xylariaceae s.l.
Hypoxylaceae
(a)
(c)
(b)
(d)
(e)
Eutypa lata UCREL1
Microdochium bolleyi J235TASD1
Microdochium trichocladiopsis MPI-CAGE-CH0230
Whalleya microplaca YMJ1829*
Hypoxylaceae sp. FL0662B*
Durotheca rogersii YMJ 92031201
Hypoxylon fuscum CBS 119018
Hypoxylon sp. FL1284
Hypoxylon sp. FL1150*
Hypoxylon rubiginosum ER1909
Hypoxylon cercidicola CBS 119009*
Hypoxylon rubiginosum CBS 119005
Hypoxylon fragiforme CBS 206.31
Hypoxylon crocopeplum CBS 119004
Hypoxylon sp. NC1633
Hypoxylon monticulosum FL0542
Hypoxylon submonticulosum NC0708
Daldinia eschscholtzii EC12
Daldinia eschscholtzii FL0578
Daldinia bambusicola CBS 122872
Daldinia caldariorum CBS 122874
Daldinia sp. FL1419
Daldinia vernicosa CBS 139.73
Daldinia grandis CBS 114736
Daldinia decipiens CBS 113046
Daldinia loculata AZ0526*
Daldinia loculata CBS 113971*
Hypoxylon trugodes CBS 135444*
Hypoxylon sp. CI-4A*
Hypoxylon sp. FL1857
Hypoxylon sp. FL0890
Hypoxylon sp. FL0543
Hypoxylon sp. NC0597
Hypoxylon sp. EC38
Hypoxylon sp. CO27-5
Annulohypoxylon minutellum CBS 135445
Annulohypoxylon bovei var. microspora CBS124037
Annulohypoxylon moriforme CBS 123579
Annulohypoxylon maeteangense CBS 123835
Annulohypoxylon truncatum CBS 140777
Rostrohypoxylon terebratum CBS 119137
Annulohypoxylon stygium FL0470*
Annulohypoxylon nitens CBS 120705*
Xylariaceae sp. FL2044
Biscogniauxia marginata CBS 124505
Biscogniauxia nummularia BnCUCC2015
Camillea tinctor CBS 203.56
Biscogniauxia sp. FL1348
Biscogniauxia mediterranea AZ0048*
Biscogniauxia mediterranea CBS 129072*
Xylariaceae sp. FL0641
Barrmaeliaceae sp. FL0016
Barrmaeliaceae sp. FL0804
Xylariaceae sp. FL0255
Xylariaceae sp. FL1272
Xylariaceae sp. FL1019
Xylariaceae sp. FL1651
Anthostoma avocetta NRRL 3190*
Xylariaceae sp. AK1471*
Xylariaceae sp. FL0594*
Poronia punctata CBS 180.79*
Xylaria nigripes YMJ 653
Xylaria sp. CBS 124048
Xylaria intraflava YMJ 725
Xylaria hypoxylon OSC100004
Xylaria digitata CBS 161.22
Xylaria grammica CBS 120713
Xylaria sp. FL1777
Kretzschmaria deusta IL1129*
Kretzschmaria deusta CBS 826.72*
Xylaria arbuscula FL1030*
Xylaria bambusicola CBS 139988*
Xylaria arbuscula CBS 124340
Xylaria venustula FL0490
Xylaria sp. FL1042
Xylaria sp. FL0064
Xylaria sp. FL0043
Xylaria sp. FL0933
Entoleuca mammata CFL468
Nemania sp. CBS 527.63*
Nemania diffusa NC0034*
Nemania abortiva FL1152
Nemania sp. FL0031
Nemania sp. FL0916
Nemania sp. NC0429
Nemania serpens AK0226
Nemania serpens AZ0576*
Nemania serpens CBS 679.86*
Xylaria palmicola CBS 124036
Astrocystis sublimbata CBS 130006
Xylaria telfairii CBS 121673
Xylaria cf. heliscus FL0509
Xylaria acuta CBS 122032
Xylaria longipes CBS 148.73
Xylaria cf. castorea CBS 124033
Xylaria flabelliformis CBS 116.85*
Xylaria flabelliformis NC1011*
Xylaria flabelliformis CBS 123580
Xylaria flabelliformis CBS 114988
Newly sequenced genome
PacBio Illumina
Ecological Mode
Endophytic (living leaves, lichens)
Saprotrophic (dead wood, litter, fruit)
Saprotrophic (insect/animal dung)
Pathogenic
0
20
40
60
80
100 120 0
No. of Predicted SMGCs
SMGC Classification
PKS other
PKS
Other
NRPS
Terpene
RiPP
PKS-NRP Hybrid
20
40
60
80 100
Relative Abundance (%)
SMGC Distribution
Specific to:
Isolate
Xylariaceae/Hypoxylaceae
Hypoxylaceae
Xylariaceae s.l.
Other
0
25
50
75 100
Relative Abundance (%)
Catabolic Gene Cluster (CGC) Classification
Salicylate Hydroxylase
Pterocarpan Hydroxylase
Naringenin 3-dioxygenase
Phenol 2-monooxygenase
Quinate Dehydrogenase
Benzoate 4-monooxygenase
Ferulic Acid Esterase
Epicatechin Laccase
Stilbene Dioxygenase
Catechol Dioxygenase
Vanillyl Alcohol Oxidase
Ferulic Acid Decarboxylase
Hybrid (families)
Aromatic Ring-opening Dioxygenase
HGT125
HGT123
HGT122
HGT121
HGT117
HGT116
HGT115
HGT108
HGT105
HGT087
HGT083
HGT081
HGT059
HGT052
HGT042
HGT041
HGT029
HGT025
HGT024
HGT011
HGT092
HGT017
HGT002
HGT110
HGT113
HGT112
HGT107
HGT104
HGT103
HGT100
HGT098
HGT095
HGT088
HGT084
HGT075
HGT074
HGT032
HGT023
HGT114
HGT049
HGT048
HGT007
HGT106
HGT094
HGT001
HGT093
HGT091
HGT089
HGT082
HGT079
HGT062
HGT009
HGT096
HGT073
HGT072
HGT069
HGT054
HGT012
HGT010
HGT003
HGT080
HGT078
HGT070
HGT071
HGT067
HGT129
HGT066
HGT063
HGT013
HGT064
HGT061
HGT058
HGT057
HGT008
HGT053
HGT056
HGT055
HGT050
HGT051
HGT043
HGT047
HGT046
HGT045
HGT044
HGT040
HGT038
HGT039
HGT128
HGT036
HGT030
HGT031
HGT014
HGT037
HGT035
HGT026
HGT015
HGT033
HGT027
HGT019
HGT018
HGT016
HGT005
>1
Eutypa lata UCREL1
M. bolleyi J235TASD1
M. trichocladiopsis MPI-CAGE-CH0230
Whalleya microplaca YMJ1829
Hypoxylaceae sp. FL0662B
Durotheca rogersii YMJ 92031201
Hypoxylon fuscum CBS 119018
Hypoxylon sp. FL1284
Hypoxylon sp. FL1150
Hypoxylon rubiginosum ER1909
Hypoxylon cercidicola CBS 119009
Hypoxylon rubiginosum CBS 119005
Hypoxylon fragiforme CBS 206.31
Hypoxylon crocopeplum CBS 119004
Hypoxylon sp. NC1633
Hypoxylon monticulosum FL0542
Hypoxylon submonticulosum NC0708
Daldinia eschscholtzii EC12
Daldinia eschscholtzii FL0578
Daldinia bambusicola CBS 122872
Daldinia caldariorum CBS 122874
Daldinia sp. FL1419
Daldinia vernicosa CBS 139.73
Daldinia grandis CBS 114736
Daldinia decipiens CBS 113046
Daldinia loculata AZ0526
Daldinia loculata CBS 113971
Hypoxylon trugodes CBS 135444
Hypoxylon sp. CI-4A
Hypoxylon sp. FL1857
Hypoxylon sp. FL0890
Hypoxylon sp. FL0543
Hypoxylon sp. NC0597
Hypoxylon sp. EC38
Hypoxylon sp. CO27-5
A. minutellum CBS 135445
A. bovei var. microspora CBS 124037
A. moriforme CBS 123579
A. maeteangense CBS 123835
A. truncatum CBS 140777
Rostrohypoxylon terebratum CBS 119137
Annulohypoxylon stygium FL0470
Annulohypoxylon nitens CBS 120705
Xylariaceae sp. FL2044
Biscogniauxia marginata CBS 124505
Biscogniauxia nummularia BnCUCC2015
Camillea tinctor CBS 203.56
Biscogniauxia sp. 304 FL1348
Biscogniauxia mediterranea AZ0048
Biscogniauxia mediterranea CBS 129072
Xylariaceae sp. FL0641
Barrmaeliaceae sp. FL0016
Barrmaeliaceae sp. FL0804
Xylariaceae sp. FL0255
Xylariaceae sp. FL1272
Xylariaceae sp. FL1019
Xylariaceae sp. FL1651
Anthostoma avocetta NRRL 3190
Xylariaceae sp. AK1471
Xylariaceae sp. FL0594
Poronia punctata CBS 180.79
Xylaria nigripes YMJ 653
Xylaria sp. CBS 124048
Xylaria intraflava YMJ 725
Xylaria hypoxylon OSC100004
Xylaria digitata CBS 161.22
Xylaria grammica CBS 120713
Xylaria sp. FL1777
Kretzschmaria deusta IL1129
Kretzschmaria deusta CBS 826.72
Xylaria arbuscula FL1030
Xylaria bambusicola CBS 139988
Xylaria arbuscula CBS 124340
Xylaria venustula FL0490
Xylaria sp. FL1042
Xylaria sp. FL0064
Xylaria sp. FL0043
Xylaria sp. FL0933
Entoleuca mammata CFL468
Nemania sp. CBS 527.63
Nemania diffusa NC0034
Nemania abortiva FL1152
Nemania sp. FL0031
Nemania sp. FL0916
Nemania sp. NC0429
Nemania serpens AK0226
Nemania serpens AZ0576
Nemania serpens CBS 679.86
Xylaria palmicola CBS 124036
Astrocystis sublimbata CBS 130006
Xylaria telfairii CBS 121673
Xylaria cf. heliscus FL0509
Xylaria acuta CBS 122032
Xylaria longipes CBS 148.73
Xylaria cf. castorea CBS 124033
Xylaria flabelliformis CBS 116.85
Xylaria flabelliformis NC1011
Xylaria flabelliformis CBS 123580
Xylaria flabelliformis CBS 114988
SMGCs
Functional
Effectors
Signal Peptides
annotations Transporters
Peptidases
Donor
CAZymes
Virus
Actinobacteria
Proteobacteria
Bacteria
Plants
Fungi
Basidiomycota
Sordariomycetes
Xylariales
Xylariaceae/Hypoxylaceae
Transfer
direction
Recipient
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1 >1
>1 >1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1 >1
>1
>1
>1
>1
>1
>1
>1
>1
>1
>1
(a)
(b)
(c)
(d)
(e)
3000
2000
Putative HGT Events
4e+07
6000
Gene Duplications
5e+07
30
1500
4000
9000
Gene Gains
Repeative Elements
Genome Size (bp)
6e+07
1000
500
3000
1000
t95 = -7.35, P < 0.0001
0
(g)
(f) 55
r = 0.72, P < 0.0001
r = 0.72, P < 0.0001
45
40
35
30
25
20
15
Xylariaceae s.l.
Hypoxylaceae
20
40
60
8
0
t95 = 2.11, P = 0.038
t95 = -9.67, P < 0.0001
(h)
r = 0.43, P = 0.0195
Endophyte Host Breadth
Putataive HGT Events
50
10
t95 = 3.40, P = 0.001
t95 = -4.66, P < 0.0001
20
80
SMGCs
100
120
6
4
2
0
-2
-4
-6
t42 = 2.25, P = 0.0294 10
15
20
25
30
Nonribosomal peptide (NRP) SMGCs
PCWDEs
40
*
Xylariaceae s.l.
Hypoxylaceae
Higher value
non-endophyte
50
Transporters
SMGCs
Peptidases
*
75
50
*
30
*
20
20
10
0
-10
-20
0
Higher value
endophyte
Difference Non-Endophyte - Endophyte
60
P=0.7739
P=0.0366
P=0.1468
P=0.0269
P=0.2545
P=0.0387
P=0.4862
P=0.0106
-25
(a)
SMGCs (residuals)
ENDOPHYTE
40
(b)
(c)
(d)
r = 0.45, P = 0.0683
r = 0.35, P = 0.0727
r = 0.19, P = 0.4679
r = 0.33, P = 0.0938
r = 0.31, P = 0.2241
r = 0.32, P = 0.1092
r = 0.13, P = 0.6125
r = -0.05, P = 0.8169
r = 0.84, P < 0.0001
r = 0.75, P < 0.0001
r = 0.52, P = 0.0106
r = 0.77, P < 0.0001
r = 0.82, P < 0.0001
r = 0.80, P < 0.0001
r = 0.77, P < 0.0001
r = 0.54, P = 0.0027
20
0
-20
-40
SMGCs (residuals)
NON-ENDOPHYTE
40
20
0
-20
Xylariaceae s.l.
Hypoxylaceae
-40
-100
-50
0
50
CAZymes (residuals)
100
-50
0
50
Effectors (residuals)
100
-50
0
50
Peptidases (residuals)
100
-200
-100
0
100
Transporters (residuals)