Next Generation
Sequencing in
Forensic Science
Next Generation
Sequencing in
Forensic Science
A Primer
Kelly M. Elkins and Cynthia B. Zeller
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2022 Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of
their use. The authors and publishers have attempted to trace the copyright holders of all material
reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and
let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known
or hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@
tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
ISBN: 978-0-367-47893-3 (hbk)
ISBN: 978-1-032-07204-3 (pbk)
ISBN: 978-1-003-19646-4 (ebk)
DOI: 10.4324/9781003196464
Typeset in Minion
by codeMantra
To Emily, Julie, Madeleine, Kay and Sara and
all future scientists and teachers
Contents
Foreword
Preface
Acknowledgments
Authors
List of Figures
List of Tables
List of Credits
List of Abbreviations
1
2
xi
xiii
xv
xvii
xix
xxiii
xxv
xxvii
History of DNA-Based Human Identification in
Forensic Science
1
1.1 Introduction
1.2 Application of DNA Sequencing to Human DNA
1.3 History of DNA Typing
1.4 Next Generation Sequencing for Forensic DNA Typing
1.5 Conclusion
Questions
References
1
1
2
8
10
10
11
History of Sequencing for Human DNA Typing
13
2.1
2.2
13
13
13
14
16
17
17
19
19
19
19
19
20
21
2.3
2.4
Introduction
Common Chemistries Used in Sequencing Applications
2.2.1 Chain Termination Sequencing
2.2.2 Pyrosequencing
2.2.3 Sequencing by Ligation
Detection Techniques
2.3.1 Fluorescence
2.3.2 Pyrosequencing
2.3.3 Ion Detection
Sequencing Platforms
2.4.1 First-Generation Sequencing Techniques
2.4.1.1 Sanger Sequencing
2.4.1.2 SNaPShot Sequencing
2.4.1.3 Pyrosequencing
vii
viii
Contents
2.5
3
4
5
Massively Parallel Sequencing
2.5.1 Reversible Chain Termination MPS Platforms
2.5.2 Ion Detection Platforms
2.5.3 Sequencing by Ligation Platforms
2.5.4 Single Base Extension Platforms
2.5.5 Third-Generation Platforms
2.6 NGS Instruments Adopted for Forensic Science
Questions
References
23
23
23
24
25
25
25
28
28
Sample Preparation, Standards, and Library
Preparation for Next Generation Sequencing
31
3.1 Overview of the NGS Sample Preparation Process
3.2 Sample Handling and Processing
3.3 DNA Extraction
3.4 DNA Quantitation
3.5 Library Preparation
3.6 Library Purification and Normalization
3.7 Multiplexing and Denaturation
Questions
References
31
31
32
34
35
39
41
42
42
Performing Next Generation Sequencing
47
4.1 Performing Next Generation Sequencing
4.2 Verogen MiSeq FGx® Sequencing
4.3 ThermoFisher Ion Torrent™ and Ion PGM Sequencing
4.4 The Next Step
Questions
References
47
47
53
54
54
55
Next Generation Sequencing Data Analysis
and Interpretation
57
5.1 NGS Data Analysis
5.2 Verogen Universal Analysis Software
5.3 ThermoFisher Converge Software
5.4 Phenotype Analysis Using the Erasmus Server
5.5 Other Sequence Analysis Software
5.6 Additional Tools for Mixture Interpretation
5.7 Other NGS Sequence Data Analysis Tools
5.8 NGS Validation and Applications
Questions
References
57
58
69
74
77
78
79
80
83
83
Contents
6
7
Next Generation Sequencing Troubleshooting
87
6.1 Troubleshooting NGS Sequencing
6.2 Troubleshooting MiSeq FGx Instrument Failure
6.3 Troubleshooting MiSeq FGx Run Failure
6.4 Troubleshooting Ion Series Run Failure
Questions
References
87
87
89
92
94
94
Mitochondrial DNA Typing Using Next
Generation Sequencing
95
7.1
7.2
7.3
7.4
Introduction to Mitochondrial DNA Typing
The Sequence of the Mitochondrial Chromosome
Mitochondrial DNA Typing Methods
Mitochondrial DNA Typing Using Next Generation
Sequencing
7.5 Mitochondrial Sequence Data Interpretation and
Reporting
7.6 Recent Reports of Mitotyping Using NGS for Forensic
Applications
7.7 Mitochondrial Sequence Data and Databases
Questions
References
8
ix
Microbial Applications of Next Generation
Sequencing for Forensic Investigations
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
Introduction to Microbial DNA Profiling
Why NGS?
The Human Microbiome Project
Sampling and Processing
NGS Methodology in Microbial Forensics
Results from the Human Microbiome Project
HMP Applications for Forensic Science
NGS Applications in Geolocation, Autopsy, PMI, and
Lifestyle Analysis
8.9 Bioinformatic Approaches and Tools
8.10 Bioforensics and Biosurveillance
8.11 Infectious Disease Diagnostics
8.12 NGS Applications in Archeology
8.13 Summary of NGS Microbial Sequencing Applications
in Forensic Investigation
Questions
References
95
96
98
98
102
107
108
109
109
117
117
118
118
118
119
120
121
125
126
127
128
129
129
130
130
x
Contents
9
Body Fluid Analysis Using Next Generation
Sequencing
9.1 Introduction
9.2 Epigenetic-Based Tissue Source Attribution
9.3 mRNA-Based Tissue Source Attribution
9.4 MicroRNA Analysis
9.5 The Future of Body Fluid Assays
Questions
References
10
Conclusions and Future Outlook of Next
Generation Sequencing in Forensic Science
10.1 NGS Is Here
10.2 Why NGS?
10.3 Ongoing Challenges of Adopting NGS for Forensic
Investigations
10.4 Early Successes of NGS in Forensic Cases
10.5 Summary
Questions
References
Index
137
137
137
139
140
141
141
142
145
145
146
147
152
154
154
154
159
Foreword
Next Generation Sequencing (NGS) for the longest time was considered “the
future of forensic DNA analysis.” However, it is rapidly becoming a powerful tool in forensic DNA laboratories today for short tandem repeat (STR)
typing, single nucleotide polymorphism (SNP) typing, and mitochondrial
DNA. NGS technology can be a game changer for helping to solve crimes,
create investigational leads, and solve complex ancestry cases. While capillary electrophoresis (CE) remains routine in forensic DNA analysis, the
introduction of NGS to the forensic DNA field allows an alternative solution
for the analysis of challenging forensic casework samples. In recent years,
commercial companies have been releasing ready-to-use chemistries and
protocols that can easily be incorporated into existing workflows in forensic
DNA laboratories.
A clear advantage of using NGS for DNA typing is the sheer amount of
additional information that can be obtained from the same sample input that
is currently used with CE technologies. This includes additional loci available
across autosomal, Y-, and X-STR markers as well as multiplexing multiple
marker types within a single amplification. For example, the largest commercially available CE kit (at the present time) is the Investigator Argus Y-28
kit from Qiagen that includes 28 Y-STR markers in one system. The Forensic
DNA Signature Prep kit from Verogen combines 27 autosomals, 7 X-STRs,
and 24 Y-STRs in addition to 94 identity, 56 ancestry, and 22 phenotypicinformative SNPs for over 200 markers in a single multiplex. The additional
information also includes sequence variations within markers that could
potentially aid the resolution of complex cases with degraded or low amounts
of DNA, assist with mixture deconvolution, help resolve kinship scenarios,
and strengthen the statistics in population databases.
This textbook takes you through the history of forensic DNA-based
human identification to include a variety of techniques such as VNTR, RFLP,
STR, and SNP DNA typing and progresses to the history of sequencing in
the forensic DNA community. Readers will dive into the entire process of
NGS, to include sample and library preparation with a variety of commercial chemistries, setting up and performing sequencing with two different
instruments, data analysis, interpretation, and troubleshooting issues that
can occur. In addition, it covers a multitude of marker sets to include SNPs
and mitochondrial DNA, and several NGS applications such as microbial
xi
xii
Foreword
DNA and body fluid analysis. Readers will learn about future considerations
and applications of this rapidly emerging technology.
The coauthors, Dr. Kelly Elkins and Dr. Cynthia Zeller, are both exceptional professors and researchers at Towson University. I’ve had the pleasure
of knowing both for many years and can say with certainty that they have
vast knowledge, experience, and enthusiasm for NGS and all it has to offer the
forensic DNA community. They review the NGS process in detail and have
written one of the first books to prepare practitioners to utilize and implement this important technology into their laboratories for forensic casework.
In addition, this resource will aid in the education of future forensic scientists
as they want to learn more about this ever-evolving technology.
Carolyn (Becky) Steffen, M.S., Research Biologist at the National
Institute of Standards and Technology
Material Measurement Laboratory, Biomolecular Measurement Division,
Applied Genetics Group
June 2021
Preface
We have both taught Forensic Biology for several years but introduced our
first course on next generation sequencing to our students only two years
ago. This endeavor was supported by our Provost, Dean, Department Chair,
and Program Director who strongly advocated for and financially supported
instrument upgrades, reagents and consumables, and time for us to develop
two new courses. Our department and program have added courses centered
on course-based research using mitochondrial and autosomal DNA sequencing for undergraduate and graduate students. We have used the primary literature, books, videos, recorded lectures, and conference lectures and posters
as our sources as we developed the courses. Our goal in this project was to
bring the essential content and references together in one place for individuals interested in learning about next generation sequencing and its forensic
applications. We hope that professional forensic DNA analysts, laboratory
directors, and educators at all levels and their students will find this book to
be an introduction to next generation sequencing in forensic science. We look
forward to hearing from our readers.
Kelly M. Elkins
Cynthia B. Zeller
February 2021
xiii
Acknowledgments
We are fortunate to have received support from many people to help us
succeed in this endeavor. From Towson University, Mark Profili, Program
Director for our Forensic Programs at Towson University (TU), spearheaded
the effort to introduce NGS in our forensic programs, and we thank him
for his vision and leadership. We thank Ryan Casey, Chair of the Chemistry
Department, for supporting us in adding this capability to our forensic program and allowing us flexibility in scheduling and workload. Vonnie Shields,
Interim Dean, and David Vanko, Dean, supported us by funding two proposals for the new courses through the Jess and Mildred Fisher College of Science
and Mathematics Fisher Endowment Funds. The TU Provost’s Budget Office
funded the mitochondrial DNA analysis software upgrade. Brian Masters
was the Principle Investigator for the awarded NSF grant that enabled TU
to acquire a MiSeq in 2013, and he willingly agreed to the FGx upgrade and
additional users we represent – we couldn’t have done this without him.
Thanks to Dana Kollmann for the wonderful collaboration and opportunity to put the science to work. We were welcomed by Laura Gough and
Matt Hemm into their Towson University Research Enhancement Program
(TU-REP) generously funded by the Howard Hughes Medical Foundation
(HHMI). They provided both financial support for course supplies and also
excellent professional development training led by Rommel Miranda to us
as we engaged in offering course-based undergraduate research experience
(CURE) courses. We thank Michelle Snyder, Chris Ouferio, Barry Margulies,
Larry Wimmers, Vanessa Beauchamp, and Elana Ehrlich for excellent discussions in the TU-REP community. Thanks to Adam Klavens for working
with us to conduct many of the experiments and troubleshooting that guide
our explanations in this book and for providing helpful comments and suggestions on the manuscript. Hirak Ranjan Dash reviewed several chapters
and we appreciate his feedback. We are also grateful to the teams at Verogen –
including Danny Hall, Kristi Kim, and Melissa Kotkin – and Qiagen – including
Mary Jones-Dukes and Mark Guillano – for teaching us and guiding us in
troubleshooting and Becky Steffen of NIST for invaluable discussions. Finally,
this book would not have been possible without our TU students who are the
reason for our new courses, which laid the groundwork for this book.
Thanks to Mark Listewnik, acquisitions editor, who immediately supported this project and rapidly sent our proposal for review and found
xv
xvi
Acknowledgments
excellent reviewers who provided invaluable suggestions. It is always a pleasure to work with you. Thank you also to Katie Horsfall at CRC Press for
administrative support.
We look forward to hearing from our readers and forensic analysts who
introduce this process into their daily workflow.
I (Kelly) never thought I’d write a book – and this marks my third, not
counting several additional book chapters. My husband and children continue to support me, cheer me on, and listen to me talk through my projects.
I appreciate them all more than words can say.
Authors
Kelly M. Elkins, PhD, is an Associate Professor of Chemistry at Towson
University and a founding editor-in-chief of the Journal of Forensic Science
Education. She has authored the books Forensic DNA Biology: A Laboratory
Manual and Introduction to Forensic Chemistry, in addition to ten invited
book chapters and more than thirty-five journal papers on her research in
journals, including the Journal of Forensic Sciences, Analytical Biochemistry,
Drug Testing and Analysis, and Medicine, Science and the Law. She has taught
courses in forensic biology and forensic chemistry under various course
numbers at four colleges and universities since 2006. She is an active member
of the American Chemical Society and a Fellow of the American Academy
of Forensic Sciences. She is a member of the Council of Forensic Science
Educators and served as its President in 2012. She is a member of the ACS
Ethics Committee and co-wrote the 2018 ACS Exams Institute Diagnostic of
Undergraduate Chemical Knowledge (DUCK) exam. Her research has been
funded by the Forensic Sciences Foundation, NSF, NIH, Maryland TEDCO,
and ACS. She enjoys communicating science in the classroom, via outreach
activities, in interviews, and on television. She is the editor for three books
in production.
Cynthia B. Zeller, PhD, is an Associate Professor of Chemistry at Towson
University. She has taught several forensic biology and DNA typing courses
at Frederick Community College and Towson University for over fifteen
years. After completing postdoctoral appointments in the School of Medicine
at Johns Hopkins University for six years, she served as a Serologist and DNA
Analyst at the Maryland State Police Forensic Science Division for six years.
She is a member of the Mid-Atlantic Association of Forensic Scientists. She
has published ten scientific publications and has delivered more than hundred conference and seminar presentations. Her work has been published in
the Journal of Forensic Sciences, Fibrogenesis Tissue Repair, and The American
Journal of Physiology, and it has been funded by the National Institutes of
Justice. This is her first book.
xvii
List of Figures
Figure 1.1 Comparison of a single nucleotide polymorphism
and short tandem repeat. (a) rs12913832 is a SNP near
the OCA2 gene that has been shown to be linked to
blue (G, G) or brown (A, A or A, G) eye color and is
used in phenotype prediction assays. (b) DYS19 is an
STR (NCBI Accession number X77751) with a variable
number of repeats
7
Data collected using (a) a modern six-dye kit using CE
and (b) NGS. (Courtesy of Adam Klavens.)
9
Figure 1.2
Figure 1.3 Decreasing cost per megabase of DNA sequence over
time. (https://search.creativecommons.org/photos/
a2a51821-36b0-4432-9a8b-0e86317fca5c.)
10
Figure 2.1 DNA synthesis with growing DNA template and
pyrophosphate byproduct. (Michał Sobkowski, CC BY
3.0 license.)
14
Figure 2.2
Figure 2.3
Comparison of Sanger sequencing and NGS processes.
(Dale Muzzey, Eric A. Evans, Caroline Lieber, CC BY
4.0 license. https://www.ncbi.nlm.nih.gov/pmc/articles/
PMC4633438/figure/Fig1/.)
15
Overview of several DNA sequencing techniques with the
principle of (a) Sanger sequencing, (b) pyrosequencing
(e.g., 454), (c) em-PCR (e.g., 454, SOLiD® and Ion
Torrent™) and (d) bridge amplification/cluster PCR
(e.g., Solexa). (Brigitte Bruijns, Roald Tiggelaar, and Han
Gardeniers, CC BY NC ND 4.0 license. https://www.ncbi.
nlm.nih.gov/pmc/articles/PMC6282972/figure/elps6707fig-0002/?report=objectonly.)
17
Figure 2.4 Overview of several DNA sequencing techniques
with the principle of (a) sequencing by ligation (SBL,
e.g., SOLiD®), (b) ion detection (e.g., Ion Torrent™), (c)
zero-mode waveguides (ZMWs, e.g., PacBio®), and (d)
nanopores (e.g., Oxford Nanopore). (Brigitte Bruijns,
xix
xx
List of Figures
Roald Tiggelaar, and Han Gardeniers, CC BY NC ND
4.0 license. https://www.ncbi.nlm.nih.gov/pmc/articles/
MC6282972/figure/elps6707-fig-0003/?report=objectonly.)
18
Figure 2.5 A sample Sanger sequence read.
21
Figure 2.6 Qiagen Q48 pyrosequencing instrument.
22
Figure 2.7 Verogen MiSeq FGx instrument.
24
Figure 3.1 Overview of steps in the NGS sample preparation process.
32
Comparison of steps in the library preparation
workflows for three products.
37
Agarose gel of ForenSeq library preparation amplicons
(From left to right: Lane 1: Trackit 50 bp ladder with
bright bands at 350/800/2500 bp, Lanes 2–6: DNA
standards, Lane 7: NTC, Lane 8: DNA standard from
10-month-old library prep). (Courtesy of Adam Klavens.)
40
Figure 3.4
QIAcel graph of PCR amplicon for pyrosequencing.
41
Figure 4.1
Setting up a sequencing run on the MiSeq FGx (RUO or
Forensic Use).
48
Figure 3.2
Figure 3.3
Figure 4.2 Wash screen on MiSeq FGx.
Figure 4.3
Preparing a new run in the Verogen Universal Analysis
Software.
Figure 4.4 Micro (left) and standard (right) MiSeq flow cells.
48
49
50
Figure 4.5
Sequencing in process on the MiSeq FGx.
51
Figure 4.6
MiSeq FGx sequencing run completion viewed in UAS.
52
Figure 5.1
MiSeq FGx run metrics for a successful sequencing run.
58
Figure 5.2 Passing HSC in MiSeq FGx sequencing run.
Figure 5.3
ForenSeq sequencing run negative control with no alleles.
63
63
Figure 5.4 UAS reads versus length graph for 2800M.
64
Figure 5.5 Full profile for 2800M using ForenSeq library prep.
64
Figure 5.6
Figure 5.7
Figure 5.8
Sample comparison in UAS for 9948 at two input
concentrations.
65
D7S820 locus displaying stutter and peak imbalance
for a sample.
65
Relatively Balanced rs12913832 SNP reads for a sample.
66
List of Figures
Figure 5.9
Imbalanced rs1413212 SNP reads per sample.
xxi
67
Figure 5.10 UAS phenotype estimate for 2800M.
68
Figure 5.11 GlobalFiler NGS STR panel v2 data viewed with
Converge software.
72
Figure 5.12 Eye color prediction tree using SNPs.
74
Figure 5.13 Erasmus SNP input for K562 prediction of phenotype.
75
Figure 5.14 Erasmus K562 phenotype prediction.
76
Figure 5.15 UAS biogeographical ancestry and phenotype
prediction for K562 prepared with ForenSeq.
76
Number of alleles for Y-STRs analyzed using
CE and NGS.
81
Number of alleles for Y STRs analyzed using
CE and NGS.
82
Figure 6.1
MiSeq FGx Y-stage home error.
88
Figure 6.2
MiSeq FGx camera focus error.
89
Figure 6.3
MiSeq FGx run failure viewed in UAS.
91
Figure 7.1
Variation between SNP 73 in HL-60 as compared
to the rCRS.
103
Figure 7.2
Reads per sample.
103
Figure 7.3
Insertion at 315.1 in the HL-60 standard as compared to
the rCRS.
104
Figure 7.4
Total reads and calls for a sample at three
concentrations (1, 5, and 100 pg) at position 489.
105
A purple warning flag indicates that the number of
reads for position 523 is below the IT in a 1 pg sample.
105
Figure 5.16
Figure 5.17
Figure 7.5
Figure 8.1
Bias in sequencing in the gut microbiome using Sanger,
454, SOLiD and Shotgun-SOLiD sequencing methods.
(Suparna Mitra, Karin Forster-Fromme, Antje DammsMachado, Tim Scheurenbrand, Saskia Biskup, Daniel H.
Huson, Stephan C. Bischoff (CC 2.0). https://pubmed.
ncbi.nlm.nih.gov/24564472/.)
122
Figure 8.2
Normalized comparison between 16S samples obtained
using three technologies: “Sanger,” “16S-454,” and
“16S-SOLiD” datasets. Normalized comparison result
xxii
List of Figures
obtained using MEGAN for “Sanger”-dataset (blue),
“16S-454” dataset (cyan), and “16S-SOLiD” dataset
(magenta) without considering “No hits” node. The
tree is collapsed at “ family” level of NCBI taxonomy.
Circles are scaled logarithmically to indicate the
number of summarized reads. (Suparna Mitra, Karin
Förster-Fromme, Antje Damms-Machado, Tim
Scheurenbrand, Saskia Biskup, Daniel H. Huson,
Stephan C. Bischoff (CC 2.0). https://pubmed.ncbi.nlm.
nih.gov/24564472/#&gid=article-figures&pid=figure-3uid-2.)
123
Figure 8.3
Summary of forensic applications of microbial NGS.
130
Figure 9.1
Pyrograms resulting from a vaginal epithelial sample
analyzed with the Body Fluid Identification Multiplex.
Vaginal epithelia is characterized by moderate
methylation in the BCAS4 assay (a), hypomethylation in
the cg06379435 assay (b), moderate methylation in the
VE_8 assay (c), and hypermethylation in the ZC3H12D
assay (d). The combination of multiple body fluid assays
in a single reaction allows for higher accuracy in body
fluid identification while reducing sample consumption
and costs. (Courtesy of Quentin Gauthier.)
140
Figure 10.1
Summary of challenges of adopting NGS for forensic
investigations.
152
List of Tables
Table 1.1
Brief History of Some Notable Advances in Forensic Biology
2
Table 1.2 Core Str Loci for Several Countries and Unions
6
Table 2.1 NGS Read Length, Run Time, and Per Base Cost
26
Table 2.2
Table 3.1
Table 3.2
Advantages and Disadvantages of Various Sequencing
Approaches
27
Manufacturer’s Recommended DNA Input Quantity
for NGS
35
Comparison of Steps and Time Required for Library
Preparation for Kits from Commercial Suppliers
36
Table 3.3 i7 Index Labels and Sequences
38
i5 Index Labels and Sequences
38
Table 3.4
Table 5.1 ForenSeq and Precision ID Target Autosomal and Sex
Chromosomal STR Loci
60
Table 5.2 Key Notes for Inputting UAS Calls to Erasmus Server
76
Table 6.1
Troubleshooting the MiSeq FGx Instrument and
Sequencing Runs
Table 7.1 Variations between the CRS and rCRS Mitochondrial
Chromosome Sequences
Table 7.2
Frequently Probed Mitochondrial DNA SNP Positions in
the Variable Regions (HVI, HVII, and HVIII) and Outside
the Control Region (Other)
Table 9.1 Genetic Markers Identified Body Fluid Identification
Using Pyrosequencing
xxiii
93
96
97
139
List of Credits
Figure
Credit Line
Caption
Figure 1.2
Adam Klavens
Data collected using (a) a modern 6-dye kit using CE
and (b) NGS
Decreasing cost per megabase of DNA sequence over
time
DNA synthesis with growing DNA template and
pyrophosphate byproduct
Comparison of Sanger sequencing and NGS processes
Figure 1.3
@dullhunk (CC BY
2.0)
Figure 2.1 Michał Sobkowski,
(CC BY 3.0)
Figure 2.2 Dale Muzzey, Eric A.
Evans, Caroline
Lieber (CC BY 4.0)
Figure 2.3 Brigitte Bruijns, Roald Overview of several DNA sequencing techniques with
the principle of (A) Sanger sequencing, (B)
Tiggelaar, and Han
pyrosequencing (e.g., 454), (C) em‐PCR (e.g., 454,
Gardeniers (CC BY
SOLiD® and Ion Torrent™), and (D) bridge
NC ND 4.0)
amplification/cluster PCR (e.g., Solexa).
Figure 2.4 Brigitte Bruijns, Roald Overview of several DNA sequencing techniques with
the principle of (A) sequencing by ligation (SBL, e.g.,
Tiggelaar, and Han
SOLiD®), (B) ion detection (e.g., Ion Torrent™), (C)
Gardeniers, CC BY
zero-mode waveguides (ZMWs, e.g., PacBio®), and
NC ND 4.0
(D) nanopores (e.g., Oxford Nanopore).
Figure 2.6 Madeleine Phillips
Qiagen Q48 pyrosequencing instrument
Figure 2.7 Madeleine Phillips
Verogen MiSeq FGx instrument
Figure 3.3 Adam Klavens
Agarose gel of ForenSeq library preparation amplicons
(From left to right: Lane 1: Trackit 50 bp ladder with
bright bands at 350/800/2500 bp, Lanes 2–6: DNA
standards, Lane 7: NTC, Lane 8: DNA standard from
10 month old library prep)
Figure 4.4 Madeleine Phillips
Micro and standard MiSeq flow cells
Figure 5.11 Hirak Ranjan Dash
GlobalFiler NGS STR panel v2 data viewed with
Converge software
Figure 5.13 Adam Klavens
Erasmus SNP input for K562 prediction of phenotype
Figure 5.14 Adam Klavens
Erasmus K562 phenotype prediction
Figure 8.1 Suparna Mitra, Karin Bias in sequencing in the gut microbiome using Sanger,
454, SOLiD, and Shotgun-SOLiD sequencing methods
Förster-Fromme,
Antje DammsMachado, Tim
Scheurenbrand,
Saskia Biskup, Daniel
H. Huson, Stephan C.
Bischoff (CC 2.0)
xxv
xxvi
List of Credits
Figure
Credit Line
Figure 8.2
Suparna Mitra, Karin Normalized comparison between 16S samples obtained
using three technologies: “Sanger,” “16S-454,” and
Förster-Fromme,
“16S-SOLiD” datasets. Normalized comparison result
Antje Dammsobtained using MEGAN for “Sanger”-dataset (blue),
Machado, Tim
“16S-454” dataset (cyan) and “16S-SOLiD” dataset
Scheurenbrand,
Saskia Biskup, Daniel (magenta) without considering “No hits” node. The
H. Huson, Stephan C. tree is collapsed at “family” level of NCBI taxonomy.
Circles are scaled logarithmically to indicate the
Bischoff (CC 2.0)
number of summarized reads.
Caption
List of Abbreviations
A
AAFS
AFDIL
aiSNP
ALFRED
AMEL
AMP
AN
APS
AQME
aSTR
AT
BGA
bp
C
cDNA
CE
CODIS
CNV
CpG
CRS
CSF1PO
ddNTP
DNA
dNTP
dsDNA
EDTA
ELISA
ENFSI
ESS
FBI
FEPAC
FGA
FTA
Adenine
American Academy of Forensic Sciences
Armed Forces DNA Identification Laboratory
Ancestry informative single nucleotide polymorphism
ALlele frequency database
Amelogenin
Adenosine monophosphate
Allele number
Adenosine-5´-phosphosulfate
AFDIL-QIAGEN mtDNA Expert
Autosomal short tandem repeat
Analytical threshold
Biogeographical ancestry
Base pairs
Cytosine
Complementary DNA
Capillary electrophoresis
Combined DNA Index System
Copy number variation
Cytosine phosphate guanine
Cambridge Reference Sequence
c-fms proto-oncogene for CSF-1 receptor gene
2ʹ, 3ʹ-dideoxyribonucleotide triphosphate
Deoxyribonucleic acid
2ʹ-deoxyribonucleotide triphosphate
Double-stranded DNA
Ethylene diamine tetraacetic acid
Enzyme-linked immunosorbent assay
European Network of Forensic Science Institutes
European Standard Set
Federal Bureau of Investigation
Forensic Science Education Programs Accreditation
Commission
Alpha fibrinogen gene
Flinders Technology Associates
xxvii
xxviii
G
GATK
GI
GITAD
HID
HIPPA
HipSTR
HMP
HRM
HSC
HV
ID
IGV
iiSNP
INDEL
ISFET
ISP
IT
LDA
LR
miRNA
MPS
mRNA
mtDNA
MUSCLE
MyFLq
NCBI
NDIS
NGC
NGS
NIH
NIST
NTC
OF
ORF
OSAC
OTU
PCA
PCI or PCIA
PCR
PE
List of Abbreviations
Guanine
Genome analysis toolkit
Gastrointestinal
Ibero American Scientific Working Group on DNA
Analysis
Human identification
Health Insurance Portability and Accountability Act of 1996
Haplotype inference and phasing for short tandem repeats
Human microbiome project
High resolution melt
Human sequencing control
Highly/hyper variable region
Identity
Integrative genomics viewer
Identity informative single nucleotide polymorphism
Insertion/deletion
Ion-sensitive field effect transistor
Ion sphere particles
Interpretation threshold
Linear discriminant analysis
Likelihood ratio
MicroRNA
Massively parallel sequencing
Messenger RNA
Mitochrondrial DNA
MUltiple sequence comparison by log-expectation
My-forensic-loci-queries
National Center for Biotechnology Information
National DNA Index System
Next generation sequencing confirmation
Next generation sequencing
National Institutes of Health
National Institute of Standards and Technology
No template control
Off-ladder alleles
Open reading frame
Organization of Scientific Area Committee
Operational taxonomic unit
Principal component analysis
Phenol-chloroform-isoamyl alcohol
Polymerase chain reaction
Probability of exclusion
List of Abbreviations
PGM
PHR
PIRANHA
PLS
PMI
PNL
POP
PPE
PPI
pSNP
Q
QA
QC
OTUs
rCRS
RFID
RFLP
RFU
RMNE
RMP
RNA
rRNA
RSB
RT-PCR
RUO
SAM
SAP
SARS-CoV-2
SBS
SIDS
SMRT
SNP
snRNA
SOLiD
SOP
SRM
ssDNA
SSR
STR
SWG
SWGDAM
Personalized genomic machine
Peak height ratio
Programme d’interprétation résultats d’analyses NGS
hautement amélioré
Partial least squares
Post-mortem interval
Pooled, normalized libraries
Performance-optimized polymer
Personal protective equipment
Pyrophosphate
Phenotype single nucleotide polymorphism
Quality score
Quality assurance
Quality control
Operational taxonomic units
Revised Cambridge reference sequence
Radiofrequency identification
Restriction Fragment Length Polymorphism
Relative fluorescence units
Random man not excluded
Random match probability
Ribonucleic acid
Ribosomal RNA
Resuspension buffer
Reverse transcriptase-polymerase chain reaction
Research use only
Sequence alignment map
Shrimp alkaline phosphatase
Severe acute respiratory syndrome coronavirus 2
Sequencing by synthesis
Sudden infant death syndrome
Single molecule, real-time
Single nucleotide polymorphism
Small nuclear RNA
Sequencing by oligonucleotide ligation and detection
Standard operating procedure
Standard reference material
Single-stranded DNA
Simple sequence repeat
Short tandem repeat
Scientific Working Group
Scientific Working Group on DNA Analysis Methods
xxix
xxx
T
tDMR
TH01
TPOX
TWG
UAS
VCF
VNTR
VPN
VWA
WGA
WGS
WMS
List of Abbreviations
Thymine
Tissue-specific differentially methylated regions
Tyrosine hydroxylase 1 gene
Thyroid peroxidase gene
Technical Working Group
Universal Analysis System
Variable call format
Variable number tandem repeat
Virtual private network
von Willebrand factor gene
Whole genome amplification
Whole genome shotgun
Whole metagenome shotgun
History of DNA-Based
Human Identification
in Forensic Science
1
1.1 Introduction
Forensic biology is the application of serology and DNA typing methods for
human, wildlife, pet, and plant identification using bone, teeth, hair, body
fluids, and plant materials to help solve a crime. All of these materials, indeed
all cellular material with the exception of red blood cells, contain deoxyribonucleic acid (DNA), the chemical whose double-stranded helical structure
was elucidated in 1953 (Watson and Crick 1953, Franklin and Gosling 1953,
Wilkins et al. 1953). Advances in forensic biology have resulted in tremendous capabilities for human identification and have reduced the reliance on
and need for eyewitness accounts of crimes. A brief history of some of the
notable advances in forensic biology is listed in Table 1.1.
1.2 Application of DNA Sequencing to Human DNA
The human genome sequence was reported in 2001 (International Human
Genome Consortium 2001, Venter et al. 2001) and is comprised of DNA
housed in the nucleus. This huge advance built on principles and technology
that led to the sequencing of the bacteriophage phi X174 in 1977 (Sanger et al.
1977) and mitochondrial organelle chromosome in 1981 (Anderson et al.
1981). Whereas the human mitochondrial chromosome is circular in structure and is comprised of 16,569 base pairs (bp), the nuclear, or autosomal,
genome consists of 3.2 billion bp packaged in twenty-two pairs of linear chromosomes supercoiled on histone proteins (Anderson et al. 1981, International
Human Genome Consortium 2001, Venter et al. 2001). An additional set of
chromosomes, X and Y, are sex chromosomes that make the total number
of chromosomes in humans forty-six. Females have two X chromosomes,
while males are characterized by having an X and a Y chromosome. Most
of this book is focused on autosomal DNA typing and chapter 7 is focused
on mitochondrial DNA typing. Over several years, the sequences of bacteria
that have been used as foodborne pathogens and bioterror agents and others
found in and on the human body have also been sequenced and used to solve
questions in forensic cases; this is the focus of Chapter 8.
DOI: 10.4324/9781003196464-1
1
2
Next Generation Sequencing in Forensic Science
Table 1.1
Brief History of Some Notable Advances in Forensic Biology
Year
Advance
1953
Rosalind Franklin records X-ray autoradiographs of crystallized DNA fibers and
deduces basic features including that the structure was helical with the phosphates
on the outside and its basic dimensions of DNA strands and publishes with
Raymond Gosling in Nature
James Watson and Francis Crick solve three-dimensional structure of DNA from
Franklin’s X-ray crystallography data and publish it in Nature
Frederick Sanger invents a method for DNA sequencing
Anderson and team sequence human mitochondrial chromosome
Kary Mullis invents polymerase chain reaction method as reported in Science
Alec Jeffreys develops the first multi-locus DNA typing method and publishes it in
Nature
Human leukocyte antigen HLA-DQα multi-allelic locus was published for forensic
DNA typing and used in the first US criminal case
First application of Jeffrey’s method to rape and murder cases in Leicestershire, England
FBI establishes the Combined DNA Index System (CODIS) to allow for national
DNA comparisons
Promega introduces first commercial three loci STR typing kit targeting CSF1PO,
TPOX, and TH01, named “CTT” using the first letter of each locus
In Tennessee v. Ware, mitochondrial DNA typing was admitted for the first time in a
US court.
Applied Biosystems introduces the three-dye AmpFlSTR® Profiler Plus® PCR
Amplification Kit for typing nine STRs and Amelogenin
Applied Biosystems introduces the three-dye AmpFlSTR® COfiler® PCR
Amplification Kit for typing six STRs and Amelogenin
Promega introduces PowerPlex™ 16 System, the first commercial STR typing kit that
targeted all thirteen CODIS loci as well as the ENFSI loci, Interpol loci, and GITAD
loci in one PCR reaction
Applied Biosystems introduces five-dye AmpFLSTR™ Identifiler™ PCR Amplification
Kit targeting sixteen STR loci
ThermoFisher introduces GlobalFiler™, the first twenty-four-plex, six-dye STR kit
Promega introduces the PowerPlex® Fusion 24-locus STR DNA typing system
Promega introduces PowerSeq™ 46GY kit
Qiagen introduces the Investigator 24plex GO! and DS kits that target the CODIS
and ESS loci
Applied Biosystems introduces Precision ID NGS library prep kit
Verogen introduces NGS ForenSeq™ library prep kit for forensic applications
Precision ID and ForenSeq™ data approved for inclusion in CODIS
First application of NGS in a Dutch criminal case.
1953
1977
1981
1983
1985
1986
1986
1990
1994
1996
1997
1998
2000
2001
2014
2014
2016
2017
2018
2017
2019
2019
1.3 History of DNA Typing
Analysis of the DNA sequence and repeat polymorphisms is referred to as
DNA typing, or DNA profiling. DNA typing is used to determine the origin of forensic evidence and is applied to criminal, paternity, and missing
History of DNA-Based Human Identification
3
persons cases. When forensic DNA typing was pioneered and adopted in the
1980s, relatively small segments of the genome were probed and used to differentiate the origin of samples (Gill et al. 1985). Even without sequencing the
entire human genome, these DNA typing methods were shown to provide
high statistical confidence that a stain or sample can be assigned as originating from a specific individual (Butler 2005). However, as sequencing capabilities and tools have improved, they are now being applied to forensic cases.
DNA sequencing can overcome limitations of the now standard DNA typing
methods. For example, monozygotic twins and body fluids can now be differentiated genetically. In addition, new, so-called next generation sequencing
(NGS) methods can be used to determine the biogeographical ancestry and
phenotype characteristics including eye color, skin tone, and hair color from
fragments of human remains and trace body fluid or fingerprint sources.
It has been estimated that 30% of the human genome is comprised of
repeated segments. The origins of forensic DNA typing began with restriction
fragment length polymorphisms (RFLPs). The RFLP targets were sites with
a variable number of tandem repeats (VNTRs). These sites are also known
as minisatellite sequences and are comprised of long repeats of ten to one
hundred nucleotide bases consisting of thousands of bases in total (Butler
2005). The first RFLP DNA test was developed by Sir Alec Jeffreys in 1984 and
published in the journal Nature in 1985 (Gill et al. 1985). The test involved
analysis of patterns from multiple RFLP loci. In RFLP, restriction enzymes
were used to cut the DNA repeat region, and gel electrophoresis was used
for separation and sizing. Following electrophoresis, the DNA was chemically denatured to separate the strands. The fragments were transferred to a
nylon membrane and analyzed using a Southern Blot. The nylon membrane
was serially treated with individual radioactive probes containing an oligonucleotide complementary to the target RFLP sequence. After the sequence
hybridized with the target, the excess probes were washed off, the membrane
was developed on an X-ray film, and bands appeared where the radioactive
probe hybridized. The length of the VNTR was determined using a ladder
consisting of DNA fragments of known lengths. DNA typing was a slow process. Analysis of a sample could take six to eight weeks. The use of radioactive probes for detection exposed analysts to radioactivity that added up
over time. RFLP required relatively large quantities of high molecular weight
and intact double-stranded DNA, so it was useless with trace, damaged, or
degraded DNA samples. Finally, the gel bands were categorized into bins, or
size groupings, rather than discrete base pair fragment sizes.
RFLP analysis was first employed in casework in 1986 in an investigation
of the rape and murder of two girls in England (Butler 2005). Interestingly,
the initial test results exonerated an innocent suspect. Additional testing led
to the identification of the perpetrator, Colin Pitchfork. A few years later,
a seven-probe RFLP assay was used to analyze DNA extracted from body
4
Next Generation Sequencing in Forensic Science
fluids on intern Monica Lewinsky’s blue dress in a 1993 sexual case involving
then-U.S. President Bill Clinton (Butler 2005).
After the polymerase chain reaction (PCR) was invented by Kary Mullis
in 1983 and published in Science in 1985 (Saiki et al. 1985), it became possible to amplify DNA targets prior to RFLP typing, although the length of
the target RFLP sequences was not very conducive to PCR. Henry Erlich and
colleagues developed the first PCR-based forensic, HLA DQα (DQA1), test in
1986 (Erlich et al. 1986), and it was used in a civil court case that year. In the
case People v. Pestinikas, forensic scientist Edward Blake used a PCR-based
DNA test to confirm that different autopsy samples originated from the same
person. This was the first use of PCR-based DNA testing in the United States.
Short tandem repeats (STRs) are repeated elements in the nuclear genome.
STRs, or simple sequence repeats (SSRs), are VNTR microsatellites with short
repeats of only two to six base pairs. Even if the sequence is repeated tens of
times, the overall length is much more amenable to PCR. Additionally, STRs
are common in the genome, occurring approximately once per 10,000 bases
(Butler 2005). STRs were first used for human identification for law enforcement and forensic purposes in 1992 after Thomas Caskey, professor at Baylor
University in Texas, and colleagues published the first paper suggesting STRs
for forensic DNA analysis in 1991 (Edwards et al. 1991). Although small gels
were initially used to analyze STRs, large sequencing gels were eventually
employed to enable better differentiation, and most STR separation today is
performed using capillary electrophoresis (CE). Large polyacrylamide slab
gels and capillary electrophoresis enable discrete sizing to a single base pair
ending an important limitation of RFLP analysis.
Beginning in the 1990s, STR PCR primers were multiplexed so that
they could be used to probe multiple STR loci simultaneously. Promega
Corporation and Perkin-Elmer Corporation, in collaboration with Roche
Molecular Systems, independently developed commercial kits for forensic
DNA STR typing. The first commercially available multi-locus STR kit was
the triplex “CTT” kit from Promega introduced in 1994 that enabled the
determination of the number of repeats at the CSF1PO, TPOX, and TH01 loci
using silver staining. Additional kits followed, each increasing the number
of sites and/or varying the loci for identification. Applied Biosystems introduced the three-dye AmpFlSTR® Profiler Plus® PCR Amplification Kit for
typing nine STRs and the amelogenin marker for sex determination in 1997
and the COfiler® PCR Amplification Kit in 1998. In these kits, the PCR primers were labeled with a fluorescent dye that enables amplicon detection and
sizing with CE or a large slab gel. While STRs have been primarily used for
DNA typing, biallelic loci, such as those located on the X and Y sex chromosomes, can also be employed for DNA typing and are used for sex typing via
the amelogenin gene (Elkins 2013).
History of DNA-Based Human Identification
5
To enable interagency use of DNA typing data to solve criminal cases, the
United States’ Combined DNA Index System (CODIS) database was piloted
by the FBI Laboratory in 1990, and the National DNA Index System (NDIS)
was established by the DNA Identification Act of 1994 for law enforcement
purposes. CODIS also includes a mitochondrial DNA (mtDNA) database. By
1998, all of the states in the United States enacted statutes requiring mandatory DNA testing for convicted felons. Promega introduced the PowerPlex™
16 System in 2000. It was the first commercial STR typing kit that targeted
all thirteen CODIS loci as well as the European Network of Forensic Science
Institutes (ENFSI) loci, Interpol loci, and the Ibero American Scientific
Working Group on DNA Analysis (GITAD) loci in one PCR reaction.
Beginning January 1, 2017, the number of loci required to upload an
STR profile to CODIS increased from thirteen to twenty loci (Hares 2015).
The current CODIS loci are CSF1PO, D3S1358, D5S818, D7S820, D8S1179,
D13S317, D16S539, D18S51, D21S11, FGA, TH01, TPOX, vWA, D1S1656,
D2S441, D2S1338, D10S1248, D12S391, D19S433, and D22S1045 and are
listed in Table 1.2. Other nations and unions independently adopted their
own sets of STR loci for forensic use. This is also shown in Table 1.2. The
ThermoFisher GlobalFiler™ kit was introduced in 2014. The GlobalFiler™ kit
includes PCR primers labeled with one of six dyes to amplify twenty-four loci
simultaneously and covers all of the loci now required by the United States,
the United Kingdom, and in the European Standard Set (ESS) (Table 1.1).
The GlobalFiler™ Kit loci are D13S317, D7S820, D5S818, CSF1PO, D1S1656,
D12S391, D2S441, D10S1248, D18S51, FGA, D21S11, D8S1179, vWA, D16S539,
TH01, D3S1358, D2S1338, D19S433, DYS391, TPOX, D22S1045, SE33, amelogenin, and a Y-specific insertion/deletion locus (Yindel). Similarly, the
Promega PowerPlex® Fusion System 24-locus multiplex kit introduced in
2014 targets the thirteen core CODIS loci and twelve core ESS loci, amelogenin, and the DYS391 locus. There are now five and six dye Fusion system options. The Y-STRs and loci are used in commercial kits for detection
of a male contributor or tracing male lineages. The single Y chromosome
makes interpretation easier than the diploid autosomal loci. X- and Y-loci
can be used in paternity and maternity testing and familial lineage analysis. In 2017, Qiagen introduced the Investigator 24plex GO! Kit and amplifies the CODIS core loci, the ESS markers, plus SE33, D2S1338, D19S433,
amelogenin, and DYS391. By March 2018, NDIS contained more than thirteen million offender profiles and more than three million arrestee profiles
and 840,000 forensic profiles. The FBI reported CODIS reached 20 million
forensic DNA profiles on April 21, 2021 including over 14 million offender
profiles, four million arrestee profiles, and over one million forensic profiles
in NDIS. Over 558,000 CODIS hits have been reported and CODIS has aided
in over 545,000 investigations.
6
Next Generation Sequencing in Forensic Science
Table 1.2
Core STR Loci for Several Countries and Unions
STR
Locus
United States
(from 1997)
CSF1PO
FGA
TH01
TPOX
VWA
D3S1358
D5S818
D7S820
D8S1179
D13S317
D16S539
D18S51
D21S11
AMEL
D1S1656
D2S441
D2S1338
D10S1248
D12S391
D19S433
D22S1045
SE33
a
X
X
X
X
X
X
X
X
X
X
X
X
X
X
United States
(from
1-1-2017)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Germany
European
Standard
Set
Interpol
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
a
a
United
Kingdom
X
X
X
X
X
X
X
X
Also included/optional.
CE is regarded as the standard for STR DNA typing. The majority of
cases can be solved using CE. It is cost-effective, high-throughput, and offers
a fast turnaround. The workflow has been validated and implemented worldwide and is accepted in court. Analysts can obtain a profile for high-profile
cases in a few hours, and traditional DNA typing using CE is much lower in
cost than emerging methods. There are now eight dye systems that enable the
typing of up to thirty STRs simultaneously.
Even with all of the innovations and improvements in the STR typing
kits over the last twenty-five years, there are still drawbacks. For example,
the major drawback of using electrophoresis-based methods to separate the
PCR products in a single run is that the number of loci that can be multiplexed is limited by the size of the amplicons and the number of dye labels
History of DNA-Based Human Identification
7
the fluorimeter detection system can deconvolute. Another drawback is that
fragment sizing does not determine the sequence of the STR repeats, and
sequence mutations or variations may aid in the differentiation of samples.
More troubling, some experts estimate that 30% of samples do not produce a
full STR profile with this method.
Single nucleotide polymorphisms (SNPs) can also be used in human
identification. These biallelic marker sites are single-base differences documented to vary between individuals. SNPs are very common throughout
the human genome, occurring almost one in every thousand nucleotides.
SNPs have been used to predict phenotypic characteristics such as eye color,
skin tone, hair color, and biogeographical, ancestral, and behavioral characteristics. An advantage of SNPs is that, owing to their small size, amplifiable DNA fragments containing SNPs are often intact even when STRs sites
are not making them ideal for use with degraded DNA. However, they are
much less polymorphic than STR loci so more loci are needed to achieve
the same discriminating power as the current number of STRs that are used
for identification. Additionally, SNPs must be determined using sequencing
or sequence-based methods, techniques which permit the determination of
each individual DNA base. Sanger sequencing is expensive and slow on the
CE instruments that are used for STR sizing. SNaPshot assays have also been
developed for SNP determination, but the assays are limited to ten targets per
assay and are time consuming to develop and optimize. Figure 1.1 diagrams
STR and SNP loci.
(a) rs12913832 eye color SNP on chromosome 15 position 28120472
Brown: …GCATTAAATGTCA…
Blue: …GCATTAAGTGTCA…
(b) DYS19 STR (NCBI Accession number X77751) with 12 repeats
TAGGTAGATAGATAGATAGGTAGATAGATAGATAGATAGATAGATAGATAGATAGATATA
12 repeats
TAGGTAGATAGATAGATAGGTAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATA
13 repeats
Figure 1.1 Comparison of a single nucleotide polymorphism and short tandem
repeat. (a) rs12913832 is an SNP near the OCA2 gene that has been shown to be
linked to blue (G,G) or brown (A,A or A,G) eye color and is used in phenotype
prediction assays. (b) DYS19 is an STR (NCBI Accession number X77751) with a
variable number of repeats.
8
Next Generation Sequencing in Forensic Science
1.4 Next Generation Sequencing for Forensic DNA Typing
Massively parallel sequencing (MPS), or next generation sequencing (NGS)
as it is widely known, was introduced in the late 1990s by Jonathan Rothberg,
and the first commercial instrument was made available in 2005 by 454 Life
Sciences (Patrick 2007, Minogue et al. introduced in 2017 by Verogen 2019).
NGS methods are faster and higher throughput than Sanger sequencing.
While previous DNA typing kits – even the newest kits like GlobalFiler™ and
Fusion™ – are limited to twenty-four STR loci, the NGS kits can multiplex more
primer pairs as the sequencing method is not size- or dye-limited. NGS-based
methods are able to multiplex more STRs and simultaneously type many SNP
and X- and Y-haplotype DNA markers in a single assay. For example, the
ForenSeqTM kit introduced in 2017 by Verogen types fifty-eight STRs (including
twenty-seven autosomal STRs and seven X and twenty-four Y haplotype
markers). In addition to typing STR and SNP markers for human identification, SNP markers for determining biogeographical ancestry and phenotype
features are determined simultaneously. For example, the ForenSeq™ kit multiplexes a total of 231 loci including ninety-four identity-informative SNPs,
fifty-six ancestry-informative SNPs, twenty-two phenotypic-informative
SNPs (two ancestry-informative SNPs are used for ancestry and phenotype
prediction), and the fifty-eight STRs (Jäger et al. 2017). In 2018, Applied
Biosystems introduced the Precision ID NGS kit, and Promega introduced the
PowerSeq™ 46GY kit in 2016 that types forty-six loci (van der Gaag et al. 2016).
Even a partial NGS profile may lead to more genetic data than was possible
using the best CE kits. The NGS amplicons are smaller, which can lead to more
genetic data for DNA recovered from degraded and compromised samples.
In 2019, the Precision ID and ForenSeq™ data were approved for inclusion in
CODIS (Verogen). In NGS, the STR repeat sequences and counts can be compared so that allele subtypes and repeat structure variations can be detected.
Isometric heterozygote sequence motifs within STRs and known SNPs in
flanking regions can be detected using NGS but not CE. Instead of reporting
fluorescence intensity as in CE, the number of reads is outputted in NGS data
as shown in Figure 1.2. NGS can be used to type mtDNA and mRNA as well
which will be described in Chapters 7 and 9, respectively.
The Scientific Working Groups (SWGs) and Organizational Scientific
Area Committees (OSACs) that developed standards for DNA typing updated
their documents to include NGS data. The Scientific Working Group on DNA
Analysis Methods (SWGDAM) released an addendum to its 2017 interpretation document entitled “SWGDAM Interpretation Guidelines for Autosomal
STR Typing by Forensic DNA Testing Laboratories” to address NGS needs
in 2019. The US Federal Bureau of Investigation (FBI) published its “Quality
Assurance Standards for DNA Databasing Laboratories.”
History of DNA-Based Human Identification
9
Figure 1.2 Data collected using (a) a modern six-dye kit using CE and (b) NGS.
(Courtesy of Adam Klavens.)
10
Next Generation Sequencing in Forensic Science
Figure 1.3 Decreasing cost per megabase of DNA sequence over time. (https://
search.creativecommons.org/photos/a2a51821-36b0-4432-9a8b-0e86317fca5c.)
1.5 Conclusion
The cost per megabase of DNA sequence has decreased with NGS approaches
so that its application for casework is feasible (Figure 1.3). The chapters that follow detail the chemistries used in the various NGS approaches and detail the
implementation of NGS tools and analysis methods for forensic DNA typing.
The later chapters review NGS applications for mitochondrial DNA typing,
body fluids analysis, and bacterial sequencing for forensic applications. NGS
offers forensic biologists a myriad new capabilities, but there are still issues to be
resolved (Kircher and Kelso 2010, Minogue et al. 2019). The final chapter focuses
on the remaining challenges for introducing NGS to more labs and cases.
Questions
1. Define forensic biology.
2. Historically, how did forensic biology develop? Name innovations
that changed the field.
History of DNA-Based Human Identification
11
3. What precautions must be taken when working with biological evidence to avoid contamination and infection?
4. What is the role of the OSACs in forensic biology?
5. Why have different regions and countries adopted different core STR
loci?
References
Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin,
J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J.,
Staden, R., and I.G. Young. “Sequence and organization of the human mitochondrial genome.” Nature 290, no. 9 (April 9, 1981): 457–465. doi:10.1038/290457a0.
Butler, J. Forensic DNA Typing, 2nd ed. Burlington, MA: Elsevier Academic Press,
2005.
Edwards, A., Civitello, A., Hammond, H.A., and C.T. Caskey. “DNA typing and
genetic mapping with trimeric and tetrameric tandem repeats.” American
Journal of Human Genetics 49, no. 4 (September 30, 1991): 746–756.
Elkins, K.M. Forensic DNA Biology: A Laboratory Manual. Waltham, MA: Elsevier
Academic Press, 2013.
Erlich, H., Lee, J.S., Petersen, J.W., Bugawan, T., and R. DeMars. “Molecular analysis
of HLA class I and class II antigen loss mutants reveals a homozygous deletion of the DR, DQ, and part of the DP region: Implications for class II gene
order.” Human Immunology 16, no. 2 (June 1986): 205–219. doi:10.1016/01988859(86)90049-2.
FBI. “Quality Assurance Standards for DNA Databasing Laboratories.” Accessed
January 23, 2021. https://www.f bi.gov/file-repository/quality-assurancestandards-for-dna-databasing-laboratories.pdf/view.
Franklin, R.E., and R.G. Gosling. “Molecular configuration in sodium thymonucleate.” Nature 171, no. 4356 (April 25, 1953): 740–741. doi:10.1038/171740a0.
Gill, P., Jeffreys, A., and D. Werrett. “Forensic application of DNA ‘fingerprints’.”
Nature 318, no. 6046 (December 12–18, 1985): 577–579. doi:10.1038/318577a0.
Hares, D.R. “Selection and implementation of expanded CODIS core loci in the
United States.” Forensic Science International Genetics 17 (July 1, 2015): 33–34.
doi:10.1016/j.fsigen.2015.03.006.
Jäger, A.C., Alvarez, M.L., Davis, C.P., Guzmán, E., Han, Y., Way, L., Walichiewicz,
P., Silva, D., Pham, N., Caves, G., Bruand, J., Schlesinger, F., Pond, S.J.K.,
Varlaro, J., Stephens, K.M., and C.L. Holt. “Developmental validation of the
MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.” Forensic Science
International: Genetics 28 (May 2017): 52–70. doi:10.1016/j.fsigen.2017.01.011.
International Human Genome Consortium. “Initial sequencing and analysis
of the human genome.” Nature 409, no. 6922 (February 15, 2001): 860–921.
doi:10.1038/35057062.
Kircher, M., and J. Kelso. “High-throughput DNA sequencing-concepts and limitations.” BioEssays 2, no. 6 (May 18, 2010): 524–536. doi:10.1002/bies.200900181.
12
Next Generation Sequencing in Forensic Science
Minogue, T.D., Koehler, J.W., Stefan, C.P., and T.A. Conrad. “Next-generation
sequencing for biodefense: Biothreat detection, forensics, and the clinic.”
Clinical Chemistry 65, no. 3 (March 1, 2019): 383–392.
Patrick, K.L. “454 life sciences: Illuminating the future of genome sequencing
and personalized medicine.” Yale Journal of Biology and Medicine 80, no. 4
(December 2007): 191–194.
Saiki, R., Scharf, S., Faloona, F., Mullis, K., Horn, G., Erlich, H., and N. Arnheim.
“Enzymatic amplification of beta-globin genomic sequences and restriction site
analysis for diagnosis of sickle cell anemia.” Science 230, no. 4732 (December
20, 1985): 1350–1354. doi:10.1126/science.2999980.
Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A.R., Fiddes, C.A.,
Hutchison, C. A., Slocombe, P. M., and M. Smith. “Nucleotide sequence of bacteriophage phi X174 DNA.” Nature 265, no. 5596 (February 24, 1977): 687–695.
doi:10.1038/265687a0.
Scientific Working Group on DNA Analysis Methods. “Interpretation Guidelines
for Autosomal STR Typing by Forensic DNA Testing Laboratories.” Approved
January 12, 2017. Accessed January 23, 2021. https://1ecb9588-ea6f-4feb-971a73265dbf079c.filesusr.com/ugd/4344b0_50e2749756a242528e6285a5bb478
f4c.pdf.
Scientific Working Group on DNA Analysis Methods. “Addendum to ‘SWGDAM
Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing
Laboratories’ to Address Next Generation Sequencing.” Approved April 23, 2019.
Accessed January 23, 2021. https://1ecb9588-ea6f-4feb-971a-73265dbf079c.file
susr.com/ugd/4344b0_91f2b89538844575a9f51867def7be85.pdf.
van der Gaag, K.J., de Leeuw, R.H., Hoogenboom, J., Patel, J., Storts, D.R., Laros, J.,
and P. de Knijff. “Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system.” Forensic
Science International: Genetics 24 (September 2016): 86–96. doi:10.1016/j.
fsigen.2016.05.016.
Venter, C.J., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith,
H.O., Yandell, M., Evans, C.A., Holt, R.A., and J.D. Gocayne. “The sequence
of the human genome.” Science 291, no. 5507 (February 16, 2001):1304–1351.
doi:10.1126/science.1058040.
Verogen. “FBI Approves Verogen’s Next-Gen Forensic DNA Technology for National
DNA Index System (NDIS).” 2019. Accessed November 7, 2020. https://verogen.
com/ndis-approval-of-miseq-fgx/.
Watson, J.D., and F.H.C. Crick. “Molecular structure of nucleic acids: A structure
for deoxyribose nucleic acid.” Nature 171, no. 4356 (April 25, 1953): 737–738.
doi:10.1038/171737a0.
Wilkins, M.H.F., Stokes, A.R., and H. R. Wilson. “Molecular structure of deoxypentose nucleic acids.” Nature 171, no. 4356 (April 25, 1953): 738–740. doi:10.1038/
171738a0.
History of Sequencing
for Human DNA Typing
2
2.1 Introduction
DNA sequencing has a more than forty-year history which began with
Sanger sequencing and has evolved to include newer innovations including
SNaPshot assays and next generation sequencing (NGS) methods, including
pyrosequencing and massively parallel sequencing. The focus of this chapter
is the history and innovations in DNA sequencing and applications to forensic science.
2.2 Common Chemistries Used in
Sequencing Applications
Most sequencing chemistries utilize variations of DNA replication found
routinely in cells and exploited for in vitro sequencing, such as polymerase
chain reaction (PCR). In order for replication to occur, a short oligonucleotide called a primer must anneal to single-stranded template DNA.
DNA polymerases recognize the double-stranded DNA and bind to this
double-stranded complex, and hydrogen bonding allows the nucleotide to
be newly added to enter the active site in close proximity to the 5ʹ end of the
priming strand. The 3ʹ hydroxyl of the terminal nucleotide initiates nucleophilic attack at the 5ʹ alpha phosphate of the nucleotide to be added resulting in the addition of the nucleotide to the daughter strand. The remaining
products in this reaction are pyrophosphate and a proton (Figure 2.1). All
of the sequencing strategies that are outlined in this chapter utilize either a
fluorescent tag bound to the terminal nucleotide in the elongating daughter strand or the pyrophosphate or proton by-products for the detection of
nucleotide incorporation.
2.2.1 Chain Termination Sequencing
Introduced in 1977 (Sanger et al. 1977, Zascavage et al. 2013, Heather and
Chain 2016, Bruijns et al. 2018), chain termination sequencing can be used
to sequence DNA segments of approximately 800–1000 base pairs, although
500 base pairs or less tend to yield the best results. Prior to sequencing,
DOI: 10.4324/9781003196464-2
13
14
Next Generation Sequencing in Forensic Science
Figure 2.1 DNA synthesis with growing DNA template and pyrophosphate
byproduct. (Michał Sobkowski, CC BY 3.0 license.)
extracted DNA is amplified using PCR primers targeting specific loci using a
master mix consisting of a high fidelity DNA polymerase, buffer, magnesium,
dNTPs, and a 0.1–0.003 X concentration of 2ʹ, 3ʹ-dideoxyribonucleotide triphosphates (ddNTPs). Typically, each of the ddNTPs – ddATP, ddCTP, ddGTP,
and ddTTP, which add adenine, cytosine, guanine and thymine nucleotide
bases, respectively, – is labeled with a unique fluorescent dye to allow for the
identification of the final nucleotide added. The template is extended using
the dNTPs and chain-terminating ddNTPs are incorporated at random intervals into the growing nucleotide chain. The sequence is determined following
synthesis by separation of the labeled fragments by size using slab or capillary
electrophoresis, which has single base pair resolution. This allows for the reading of the sequence of the daughter strand. The process is, therefore, referred to
as sequencing by synthesis (SBS) since the sequence is determined by synthesizing the daughter strand (Muzzey et al. 2015) (Figure 2.2).
2.2.2 Pyrosequencing
Pyrosequencing, developed in 1993 as a solid-phase sequencing method, can
process lengths up to 300–500 nucleotides, but typically cannot sequence
strands as long as can be sequenced by Sanger sequencing. The extracted
DNA is digested to ~100 bp fragments, denatured to form single-stranded
History of Sequencing for Human DNA Typing
15
Figure 2.2 Comparison of Sanger sequencing and NGS processes. (Dale Muzzey,
Eric A. Evans, Caroline Lieber, CC BY 4.0 license. https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC4633438/figure/Fig1/.)
16
Next Generation Sequencing in Forensic Science
DNA (ssDNA), and attached to the surface of beads using adaptors or linkers;
or the region of interest can be amplified using PCR, and the single-stranded
products are then annealed to the beads. In the pyrosequencing reaction, the
dNTPs are dispensed in a user-defined order and those incorporated result
in the release of pyrophosphate (PPi). In an enzyme cascade, sulfurylase uses
the released pyrophosphate in the presence of adenosine-5ʹ-phosphosulfate
(APS) to generate ATP in equimolar concentration. Luciferase in the presence of ATP converts luciferin to oxyluciferin to generate light proportional
to the number of nucleotides added in that particular addition step. Apyrase
is used to degrade unincorporated nucleotides and allow the process to reset
prior to the next dispensation of nucleotides (Nyren et al. 1993, Ronaghi et al.
1996, Bruijns et al. 2018).
In order to perform de novo synthesis, repeated rounds of dispensing
all four nucleotides must be performed, making pyrosequencing relatively
inefficient for de novo sequencing long stretches of DNA. The dispensation
of nucleotides in pyrosequencing can be defined by the user in order to
maximize efficiency. Pyrosequencing can be a cost-effective way to confirm
sequence or check for site mutations in a particular DNA fragment. It is also
widely used in the determination of methylation status of genes. In both of
these cases, the dispensation is optimized based on the expected sequence of
the amplicon (Figure 2.3).
2.2.3 Sequencing by Ligation
Another approach to sequencing involves sequencing by ligation (Figure 2.4).
Short probes, seven to nine nucleotides in length, are annealed to the template strand of DNA. The probes are designed in such a way that there are 16
different probes labeled with four different fluorescent tags. The 3ʹ end of each
probe has two annealing nucleotides followed by three degenerate nucleotides, a cleavage site, three additional degenerate nucleotides, and one of four
fluorescent dyes that is associated with the two 3ʹ nucleotides. Upon annealing, the probes are ligated to the daughter strand, unannealed nucleotides are
washed away, and the fluorescent tag is read. The reset to begin the next cycle
is completed by cleavage of the probe between nucleotides five and six. This
allows for the determination of the first two nucleotides and then has a gap of
three unknown nucleotides prior to the next cycle of annealing. Usually, five
to seven cycles of annealing are performed per sequencing primer. In order
to obtain a complete sequence, multiple sequencing primers are utilized in
additional rounds of sequencing. Each of the new sequencing primers is one
nucleotide shorter in length and therefore interrogates the sequence offset by
a single nucleotide. Obtaining the full sequence of twenty-five to thirty-five
nucleotides in length requires five rounds of sequencing of five to seven cycles.
History of Sequencing for Human DNA Typing
17
Figure 2.3 Overview of several DNA sequencing techniques with the principle of (a) Sanger sequencing, (b) pyrosequencing (e.g. 454), (c) em-PCR (e.g., 454,
SOLiD® and Ion Torrent™), and (d) bridge amplification/cluster PCR (e.g., Solexa).
(Brigitte Bruijns, Roald Tiggelaar, and Han Gardeniers, CC BY NC ND 4.0 license.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6282972/figure/elps6707fig-0002/?report=objectonly.)
2.3 Detection Techniques
In order to detect nucleotide incorporation and determine the identity of the
added base, one of several detection methods is used.
2.3.1 Fluorescence
Fluorescence dyes are commonly used for detection of nucleotides. Dyes
are covalently attached to bases, which sterically hinders the reaction from
18
Next Generation Sequencing in Forensic Science
Figure 2.4 Overview of several DNA sequencing techniques with the principle of
(a) sequencing by ligation (SBL, e.g., SOLiD®), (b) ion detection (e.g. Ion Torrent™),
(c) zero-mode waveguides (ZMWs, e.g., PacBio®), and (d) nanopores (e.g., Oxford
Nanopore). (Brigitte Bruijns, Roald Tiggelaar, and Han Gardeniers, CC BY NC
ND 4.0 license. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6282972/figure/
elps6707-fig-0003/?report=objectonly.)
proceeding. The dyes are excited using ~500 nm wavelength laser causing
them to emit light at emission maxima from 520 to 625 nm depending on
the dye (Butler 2005, Elkins 2013). Each dye is associated with an individual
nucleotide. All of the nucleotides are added in a single reaction, and the identification of the nucleotide is directly determined by the wavelength maxima
of the emission spectra. The detection requires the use of filters to detect,
the amount of light emitted by the fluorophores at each wavelength at each
shutter opening which is recorded using a charge-coupled device (CCD)
camera.
History of Sequencing for Human DNA Typing
19
2.3.2 Pyrosequencing
The production of pyrophosphate by the DNA polymerase in the DNA synthesis reaction is converted to blue-green light, through the enzymatic cascade shown in Figure 2.2. Detection requires the use of a CCD camera. The
camera is able to detect very low levels of light and record the amount of light
recorded at each shutter opening. The light produced is proportional to the
number of same type of nucleotides added at a time. Since the dispensation
order is determined prior to the reaction proceeding, the presence of light
indicates that the dispensed nucleotide was added; if no light is produced, no
nucleotide addition is indicated.
2.3.3 Ion Detection
Unlike the previously described optical detection systems, ion detection does
not require modification of the nucleotide or the presence of dyes and uses
only standard nucleotides. The production of an H+ by the DNA synthesis
reaction can be detected by an ion-sensitive field-effect transistor. The reaction
occurs in a well above an ion-sensitive layer which is above the ion-sensitive
field-effect transistor (ISFET) detector. As in pyrosequencing, the nucleotides
are dispensed in a specific order, and the signal associated with the addition
relates to the number of nucleotides added at a time, thereby indicating the
length of a homopolymeric stretch.
2.4 Sequencing Platforms
Each sequencing platform requires a chemistry method, detection method,
and an interpretation method. Examples of how these modules are combined
to produce different types of sequencing platforms are described below.
2.4.1 First-Generation Sequencing Techniques
First-generation technologies have the capability to sequence only one sample
at a time. The individual sample is interrogated with a particular chemistry.
Following detection, interpretation is usually simple, either by determination
of the order of a color sequence, or by a conformation of the addition of a
particular nucleotide in the dispensation.
2.4.1.1 Sanger Sequencing
Sanger sequencing was introduced in 1977 (Sanger et al. 1977). Prior to
sequencing, the extracted DNA is often amplified using PCR primers targeting specific loci or degenerate primers that randomly bind to locations in the
20
Next Generation Sequencing in Forensic Science
genome depending on the type of sequencing desired. Targeted sequencing to
determine a particular gene sequence uses specific primers. Degenerate primers
are added to amplify sequences noted to have variants in the primer binding
region. Each of the ddNTPs – ddATP, ddCTP, ddGTP, and ddTTP – is labeled
with a unique fluorescent dye. The template is extended using the dNTPs, and
chain-terminating ddNTPs are incorporated into positions, at random, into
the growing nucleotide chain. The sequence is determined following synthesis,
therefore sometimes referred to as sequencing by synthesis. Following PCR, the
amplicons are separated by size using electrophoresis, and the identity of the
bases in sequence is determined by the location of each base by the fluorescent
color detected. Initially, the sequence was determined using large slab gels and
electrophoresis. In 1995, Applied Biosystems introduced the first capillary electrophoresis instrument, the Applied Biosystems 310 Genetic Analyzer, a single
capillary instrument with four-dye fluorescence detection. For comparison,
the modern Applied Biosystems 3500 Genetic Analyzer is an eight-capillary
instrument equipped with six-dye detection. CE provides single-nucleotide
spatial resolution, good spectra resolution for dye separation, and precision in
DNA sizing. The CE separation takes approximately an hour. The CE output
is termed an electropherogram. Revolutionary for its time, Sanger sequencing enabled researchers to sequence the first genome (that of bacteriophage
phiX174) and those of many more species after it including the 3.2 billion base
pair human genome as reported in a pair of Science and Nature papers in 2001
(International Human Genome Consortium 2001, Venter et al. 2001). During
this time, Craig Venter pioneered the whole genome shotgun (WGS) sequencing approach. Following sequencing, the target is analyzed by comparison to a
known (K) standard or reference sample or a sequenced human genome standard. The price of Sanger sequencing at core facilities has fallen to approximately
$4–$10 per sample. Although capillary electrophoresis and Sanger sequencing
yield discrete single-base resolution and identification (Figure 2.5), the DNA
typing method is relatively slow and limited in read length.
2.4.1.2 SNaPShot Sequencing
SNaPshot is a chain termination minisequencing or a primer extension assay
based on a Sanger sequencing approach (Butler 2005). Primers are designed
to complement the region directly upstream of an SNP so that the SNP base is
added by the DNA polymerase in the DNA sequencing reaction. The ddNTP
bases used in the PCR reaction are dye-labeled to facilitate fluorescence detection. Poly(T) mobility tails can be added to the primers to aid in size separation
during electrophoresis. For example, primer one may have a five base poly(T)
5ʹ-tail, primer two could have a ten base poly(T) tail, primer three could have a
fifteen base poly(T) tail, and so on. The primers can be multiplexed so that multiple SNPs can be determined simultaneously. The genomic DNA is first amplified using PCR. Exonuclease I is used to digest unincorporated single-stranded
primers. The Applied Biosystems SNaPshot kit contains the ddNTPs each with
History of Sequencing for Human DNA Typing
21
Figure 2.5 A sample Sanger sequence read.
a unique dye label (A-green, G-blue, C-yellow, T-red), buffer, DNA polymerase,
and an internal standard labeled with a fifth dye in the master mix. In the
second round of PCR, the SNaPshot master mix is used to perform the primer
extension. As ddNTPs rather than a mixture of ddNTPs and dNTPs are used
in the sequencing reaction, incorporation of the base immediately stops chain
extension as no further bases can be added without the 3ʹ-hydroxyl. Following
PCR, shrimp alkaline phosphatase (SAP) is used to digest unincorporated
ddNTPs to reduce dye artifacts. Following PCR, the amplicons are resolved
by size using CE, using POP-4 (or POP-6 or POP-7) for separation, and using
the GS120 LIZ size standard for sizing. GeneMapperID is used to analyze the
results on Applied Biosystems instruments. Unlike using CE for STR DNA
typing, there is no stutter artifact in SNaPshot DNA typing.
2.4.1.3 Pyrosequencing
Pyrosequencing is a SBS method that is widely used to detect and quantify
CpG and other types of methylation sites and SNP heteroplasmy. CpG methylation occurs in the 5’ upstream untranslated region of a gene, controlling gene
expression. It is observed through bisulfite treatment, which converts unmethylated cytosines to uracils through a deamidation reaction catalyzed by sodium
bisulfite. Methylated cytosines remain unchanged. SNP heteroplasmy is the
presence of two different SNP variant in a single individual. This can occur
in the same cell or in within a population of cells. Pyrosequencing is ideally
suited to determining the ratio of two different nucleotides at a particular site.
22
Next Generation Sequencing in Forensic Science
Pyrosequencing can be used to determine the identity of one or more
SNPs, ideally in a short target region of twenty to thirty nucleotides, although
some sources suggest that it can be used for lengths up to 300–500 nucleotides. While it can target more than one base per primer unlike the SNaPshot
method, pyrosequencing cannot sequence strands as long as can be sequenced
by Sanger sequencing. The extracted DNA is digested to ~100 bp fragments
and denatured to form single-stranded DNA (ssDNA) and attached to the surface of beads using adaptors or linkers. Samples are prepared by emulsion PCR
prior to pyrosequencing using biotinylated PCR primers as follows. The DNA
is attached to microscopic beads. The beads are each dispersed into a droplet in
a water–oil emulsion where the PCR reaction occurs and amplifies the target.
In the pyrosequencing reaction, the dNTPs are added in a specific order and
those incorporated result in the release of pyrophosphate. In an enzyme cascade, sulfurylase uses the released pyrophosphate to form ATP in the presence
of adenosine monophosphate (AMP). Luciferase uses the ATP to convert luciferin to oxyluciferin in the presence of adenosine-5ʹ-phosphosulfate and generates light. The light is detected by the instrument and indicates that the base
released by the instrument was incorporated into the growing chain. Apyrase
degrades nucleotides not added to the chain. The process works well even for
tetramer homopolymeric sections of DNA. Pyrosequencing takes approximately two hours and costs approximately $10 per sample. Pyrosequencing
was purchased by Qiagen in 2008 and licensed by 454 Life Sciences, which
developed an array-based pyrosequencing instrument (discontinued in 2013).
Another pyrosequencer was the GS FLX Titanium. Qiagen’s latest PyroMark
Q48 pyrosequencer (Figure 2.6), the company’s newest pyrosequencing
instrument, can sequence forty-eight samples simultaneously.
Figure 2.6 Qiagen Q48 pyrosequencing instrument.
History of Sequencing for Human DNA Typing
23
2.5 Massively Parallel Sequencing
Massively parallel sequencing (MPS) is a type of NGS and one of the newest
sequencing methods. In MPS, millions of sequencing reads are detected on
the same chip or flow cell in parallel. NGS, or second-generation sequencing,
instruments emerged in the mid- to late 1990s and became available for commercial purchase in 2005 with the $1000 genome becoming a reality in 2014
(van Dijk et al. 2014, Heather and Chain 2016). An early application of this
chemistry was described in 2008 (Bentley et al. 2008). The earliest applications
were in the clinical life sciences for disease diagnosis. Ancestry applications
have emerged more recently. Several manufacturers, including Illumina, Life
Technologies, Complete Genomics, Helicos Biosciences, Oxford, and Pacific
Biosciences, developed different chemistry and approaches for sequencing. Life Technologies has been acquired by Applied Biosystems which was
acquired by ThermoFisher.
2.5.1 Reversible Chain Termination MPS Platforms
The Illumina sequencers include the iSeq, NanoSeq, MiSeq, HiSeq, and
the Genome Analyzer IIX. The Illumina technology employs clonal bridge
amplification to produce a high concentration of clonal DNA that is covalently attached to the flow cell. It can perform single and paired ends reads
and uses reversible dye sequencing by synthesis chemistry (Figure 2.2). Using
a CCD camera and fluorescence detection, individual nucleotide addition at
each cluster is evaluated at every cycle. Each end read can be up to 150 bases.
They vary in how many flow cells and the number of bases that can be read
in a run. All of the Illumina instruments except the HiSeq process one flow
cell per run. The HiSeq can process two flow cells in parallel and generate 600
billion bases in one run for the highest instrument cost but the lowest cost
per base of the Illumina instrument. The Verogen MiSeq FGx is built on the
Illumina MiSeq platform; the instrument has a small footprint and is shown
in Figure 2.7.
2.5.2 Ion Detection Platforms
The Applied Biosystems sequencers include the Ion Torrent, Ion S5, Ion
Proton, and Ion PGM systems and employ an approach introduced in 2010
in which a pH change is converted to a base call. As with the pyrosequencing technologies, the ion detection platforms detect the same by-product
of synthesis for each addition, in this case a proton; therefore, the order of
nucleotide dispensations must be known. The voltage change is recorded
when a proton ion is released upon addition of a nucleotide base in a process
24
Next Generation Sequencing in Forensic Science
Figure 2.7 Verogen MiSeq FGx instrument.
called semiconductor sequencing. This type of sequencing does not require
the alteration of nucleotides or the addition of an enzymatic cascade for
detection.
2.5.3 Sequencing by Ligation Platforms
Applied Biosystems SOLiD instrument employs a ligation and detection
approach introduced in 2005 with a commercial release in 2007 (Kircher
and Kelso 2010). The sequencing is performed by DNA ligase instead of
DNA polymerase. The extracted template DNA is fragmented using restriction enzymes, and P1 and P2 adaptors are added by PCR. The fragments are
annealed to clonal emulsion beads via the universal P1 adaptor.
One- or two-base encoded fluorescently labeled probes are attached to a
primer hybridized to the complementary target and are joined by DNA ligase.
Upon ligation, a fluorophore is cleaved off and the identity of the ligated probe
is determined using fluorescence emission. Four sequencing primers “n” to
“n-4” by each base so the process is termed oligonucleotide eight-mer chained
ligation chemistry. Non-ligated probes are washed away. The process continues by cleaving the probe to regenerate the 5ʹ-OH group. The maximum read
length is a relatively short twenty to forty-five bases; therefore, overlapping
regions sequenced must be ordered in silico. The sequence is constructed by
overlapping the dinucleotides. The process has a very low single base error
rate and is quite inexpensive per megabase.
History of Sequencing for Human DNA Typing
25
2.5.4 Single Base Extension Platforms
Similar to the technology used for the SNaPshot assay, single base extension
platforms, such as the Illumina iScan, use a combination of primers specific
to the SNPs of interest and fluorescently tagged dNTPs. The template DNA
undergoes whole genome amplification, followed by fragmentation prior to
annealing the template DNA to oligonucleotide primers for SNPs bound to
beads, which are specifically located on a chip. The primed template undergoes one round of extension to label the SNP with a fluorescent nucleotide
which indicates the allele at the site.
2.5.5 Third-Generation Platforms
Third-generation platforms use often use nanofluidics and are capable of
detecting sequences of DNA or RNA from a single cell in real time. These
platforms often use the same sequencing chemistries described for the second (next) generation platforms; however, the scaling is much smaller.
The Oxford Nanopore instruments including the MinION conduct
real-time sequencing using nanopore technology in a portable, USB-sized
device that emerged in 2014. This instrument uses ion detection chemistry in
a reduced size platform. Other instruments include the SmidgeION, flongle,
GridION, and PromethION. The read length extends to 2000 bases and up to
30 GB per run. Interpretation includes demultiplexing barcodes assigned to
target regions as in the larger second-generation instruments.
The Helicos BioSciences Heliscope also uses reversible dye terminator
chemistry with a solid-phase, PCR free, primer extension approach. The
maximum read length is thirty-five bases.
Complete Genomics employs a process termed unchain ligation using
nine-mer oligonucleotides and rolling circle amplification.
The Pacific Biosciences Single-Molecule Real-Time (SMRT) instrument
chemistry uses single-molecule chemistry with a polymerase attached to a
solid support which extends primed templates using phospholinked fluorescent nucleotides. It does not require PCR. The Sequel II model can sequence
up to four million reads and up to 500 GB raw read data.
A summary of NGS read length, run time, and per-base cost for the
sequencing approaches and platforms is tabulated in Table 2.1. Advantages
and disadvantages of the various sequencing approaches are tabulated in
Table 2.2 (Berglund et al. 2011).
2.6 NGS Instruments Adopted for Forensic Science
The availability of commercial kits to process forensic samples is an important factor for labs deciding to add NGS to their repertoire and adopt a
26
Table 2.1 NGS Read Length, Run Time, and Per Base Cost
Method
Read Length
(Typical) (bp)
Run Time
Cost (Per Mb,
Approx. USD)
Instrument Cost
(Approx. USD)
Sanger
SBS (fluorescent ddNTPs)
1000 (20–450)
45/capillary (set)
$500/Mb
$90,000/ABI 3500/8 capillary
Pyrosequencing
Roche/454
MiSeq
Ion Torrent PGM
SOLiD
SMRT
SBS (luciferase)
Emulsion PCR SBS (pyrosequencing)
SBS (reversible terminators)
SBS (H+ detection)
Emulsion PCR (ligation)
Single molecule sequencing, SBS
140 (35)
250
150 × 2 or 300 × 2
>100
75 (8)
>2000
$100/Mb
$20/Mb
$0.50/Mb
$0.63/Mb
$0.50/Mb
$2/Mb
$80,000
$500,000
$97,000
$80,000
$591,000
$695,000
Solexa
HeliScope
Bridge PCR (reversible terminators)
Single molecule (asynchronous
extension)
Emulsion PCR (ligation)
36
35
1 minute/bp
9 hours
27 hours/96 samples
2.5–4 hours
14 days
20–30 hours,
real-time
1–10 days
8 days
$2/Mb
<$0.50/Mb
$430,000
$1,350,000
13
4 days
$1/Mb
$170,000
Polonator
SBS, sequencing by synthesis.
Next Generation Sequencing in Forensic Science
Sequencing
Approach
History of Sequencing for Human DNA Typing
Table 2.2
Advantages and Disadvantages of Various Sequencing Approaches
Sequencing
Approach
Sanger
Pyrosequencing
Roche/454
MiSeq
Ion Torrent
SOLiD
SMRT
Solexa
HeliScope
Polonator
27
Advantages
Disadvantages
Inexpensive for short reads, accessible,
short run time per sample, excellent for
de novo sequencing
Inexpensive for short reads, short run
time per sample, outputs percent
methylation, can sequence
homopolymeric stretches, excellent for
SNPs
At introduction greatly reduced cost per
Mb, paired-end sequencing for dual
confirmation, long reads
Inexpensive per Mb, great option for
low-medium throughput lab, pairedend sequencing for dual confirmation,
flexible, add indexes using standard
PCR, portable sister instruments (e.g.,
iSeq), easy to use, robust, compatible
with other manufacturers products
Inexpensive per Mb, great option for
low-medium throughput lab, pairedend sequencing for dual confirmation,
short run time, low run time, robust,
compatible with other manufacturers
products
Inherent error correction Inexpensive
per Mb
Overall high cost per Mb,
targeted sequencing, difficulty
with homopolymeric stretches
Most expensive per Mb than
Sanger, short reads, slow for de
novo sequencing
High initial cost, high cost for
analysis, must multiplex to be
cost-effective
Must multiplex to be
cost-effective
Must multiplex to be
cost-effective
High initial cost, must
multiplex to be cost-effective,
time-consuming data
processing, short reads
Long reads, short run time
High initial cost, must
multiplex to be cost-effective,
high error rate
Can generate over a billion bases per run, High initial cost, must
highly accurate
multiplex to be cost-effective,
short read length
Sequences RNA, inexpensive per Mb
High initial cost, must
multiplex to be cost-effective,
high error rate, short read
length
Inexpensive per Mb
Long post-sequencing assembly
time, short read length
28
Next Generation Sequencing in Forensic Science
specific instrument or approach. A review of NGS and its forensic genetics
applications was published in 2015 (Børsting and Morling 2015), although
new kits and capabilities continue to be introduced. Time constraints and
pressure to process casework as quickly as possible can limit research and
development in forensic labs. Furthermore, after the sequencing data is collected, lab must have tools to analyze it and facilitate reporting. To date,
the Illumina MiSeq/Verogen MiSeq FGx (Caratti et al. 2015) and Applied
Biosystems/ThermoFisher series of instruments including the Ion Proton,
Ion PGM, Ion S5, and Ion Torrent have been adopted for forensic use. In 2017,
the ForenSeq DNA Signature Prep kits were made available for human identity testing using the Illumina MiSeq NGS instrument; the Illumina platform
has also been adopted for sequencing libraries with the Promega PowerSeq
46GY kit. The Ion series instruments have been adopted for sequencing
libraries prepared using the Precision ID GlobalFiler NGS STR Panel v2,
HID-Ion Ampliseq™ Ancestry Panel, HID-Ion Ampliseq™ Identity Panel,
QIAseq Investigator Panels, and GenPlex™ HID kits. Forensic applications of
these and emerging tools are the focus of the rest of this book.
Questions
1. Which sequencing method(s) is (are) the most cost-effective for
determining the sequence of a short read? Explain your answer.
2. Which sequencing method(s) is (are) the most cost-effective for
determining the sequence at an SNP site? Explain your answer.
3. Which sequencing method(s) is (are) the most cost-effective for
determining the sequence of tens of loci or a genome? Explain your
answer.
4. Which sequencing method should not be used for STR DNA typing?
Explain your answer.
5. Compare and contrast the sequencing methods in terms of read
length, processing time, and per base cost.
References
Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., and C.G.
Brown. “Accurate whole human genome sequencing using reversible terminator chemistry.” Nature 456, no. 7218 (November 6, 2008): 53–59. doi:10.1038/
nature07517.
Berglund, E.C., Kiialainen, A., and A.-C. Syvänen. “Next-generation sequencing technologies and applications for human genetic history and forensics.”
Investigative Genetics 2 (November 24, 2011): 23. doi:0.1186/2041-2223-2-23.
History of Sequencing for Human DNA Typing
29
Børsting, C., and N. Morling. “Next generation sequencing and its applications in
forensic genetics.” Forensic Science International Genetics 18 (September 2015):
78–89. doi:10.1016/j.fsigen.2015.02.002.
Bruijns, B., Tiggelaar, R., and H. Gardeniers. “Massively parallel sequencing techniques for forensics: A review.” Electrophoresis 39, no. 21 (August 13, 2018):
2641–2654. doi:10.1002/elps.201800082.
Butler, J. Forensic DNA Typing, 2nd ed. Burlington, MA: Elsevier Academic Press,
2005.
Caratti, S., Turrina, S., Ferrian, M., Cosentino, E., and D. De Leo. “MiSeq FGx
sequencing system: A new platform for forensic genetics.” Forensic Science
International: Genetics Supplement Series 5 (August 2015): e98–e100. doi:10.1016/
j.fsigss.2015.09.040.
Elkins, K.M. Forensic DNA Biology: A Laboratory Manual. Waltham, MA: Elsevier
Academic Press, 2013.
Heather, J.M., and B. Chain. “The sequence of sequencers: The history of sequencing
DNA.” Genomics 107, no. 1 (January 2016): 1–8. doi:10.1016/j.ygeno.2015.11.003.
International Human Genome Consortium. “Initial sequencing and analysis
of the human genome.” Nature 409, no. 6922 (February 15, 2001): 860–921.
doi:10.1038/35057062.
Kircher, M., and J. Kelso. “High-throughput DNA sequencing – Concepts and limitations.” BioEssays 32, no. 6 (June 2010): 524–536. doi:10.1002/bies.200900181.
Muzzey, D., Evans, E.A., and C. Lieber. “Understanding the basics of NGS: From mechanism to variant calling.” Genetic Counseling and Clinical Testing 3 (December
2015): 158–165. doi:10.1007/s40142-015-0076-8.
Nyren, P., Petersson, B., and M. Uhlen. “Solid phase DNA minisequencing by an
enzymatic luminometric inorganic pyrophosphate detection assay.” Analytical
Biochemistry 208, no. 1 (January 1993): 171–175. doi:10.1006/abio.1993.1024.
Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M., and P. Nyrén. “Real-time
DNA sequencing using detection of pyrophosphate release.” Analytical
Biochemistry 242, no. 1 (November 1, 1996): 84–89. doi:10.1006/abio.1996.0432.
Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A.R., Fiddes, C.A.,
Hutchison, C. A., Slocombe, P. M., and M. Smith. “Nucleotide sequence of bacteriophage phi X174 DNA.” Nature 265, no. 5596 (February 24, 1977): 687–695.
doi:10.1038/265687a0.
van Dijk, E.L., Auger, H., Jaszczyszyn, Y., and C. Thermes. “Ten years of next-generation sequencing technology.” Trends in Genetics 30, no. 9 (September 2014):
418–426. doi:10.1016/j.tig.2014.07.001.
Venter, C.J., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith,
H.O., Yandell, M., Evans, C.A., Holt, R.A., and J.D. Gocayne. “The sequence
of the human genome.” Science 291, no. 5507 (February 16, 2001): 1304–1351.
doi:10.1126/science.1058040.
Zascavage, R.R., Shewale, S.J., and J.V. Planz. “Deep sequencing technologies and
potential applications in forensic DNA testing.” Forensic Science Review 25, no.
1–2 (2013): 79–105.
Sample Preparation,
Standards, and Library
Preparation for Next
Generation Sequencing
3
3.1 Overview of the NGS Sample Preparation Process
In order to analyze samples using next generation sequencing (NGS), several steps must be performed. An overview of the steps in the NGS sample
preparation process is shown in Figure 3.1. The process begins with sample
handling and processing followed by DNA extraction and DNA quantitation.
Upon extracting and ascertaining the quantity and condition of the DNA
samples, a process termed library preparation is begun. Following library
preparation, samples undergo purification and normalization steps prior to
multiplexing and denaturation. Sequencing can then be performed; it is the
focus of Chapter 4.
3.2 Sample Handling and Processing
The samples chosen for DNA typing using NGS may be old, degraded, or
extremely limited in quantity or they may be pristine, fresh body fluid samples. Because NGS is known to have a greater sensitivity than multiplex PCR
kits analyzed using the CE, it is extremely important to reduce and avoid
potential sources of contamination. Surfaces and non-autoclavable equipment should be cleaned to reduce or eliminate residual DNA using at least
two of the following approaches: soap and water, UV light irradiation, 10%
bleach, 70% ethanol, Cidex, LookOut® DNA Erase Spray, and/or DNAZap.
Crushing and pulverizing bone and teeth should be performed in a hood to
reduce dust carry over between samples and the work area. Stainless steel
equipment including freezer mill parts and die presses used to crush and
pulverize bone and teeth should be cleaned and autoclaved between each
use. When working with reagents and samples, DNase and RNase-free tubes
and aerosol filter tips should be used. Samples should be processed sequentially in a preparation space separate from the amplification area. Analysts
should wear N95 protective masks to reduce dust inhalation when sanding
and crushing skeletal remains samples. Other personal protective equipment
(PPE) including goggles, gloves, hair nets, masks, lab coats, and closed-toed
shoes should always be worn when processing samples in the DNA lab.
DOI: 10.4324/9781003196464-3
31
32
Sampling
(e.g., Punch, Swab,
Wash, Digest, Cut,
Crush)
Next Generation Sequencing in Forensic Science
DNA
Extraction
DNA
Quantitation
(and Concentration, if
needed)
(and male/degradation
assessment)
Library Prep
(target amplification,
tagging, adding indexes
and adaptors)
Library
Purification
and
Normalization
Multiplexing
and
Denaturation
Figure 3.1 Overview of steps in the NGS sample preparation process.
3.3 DNA Extraction
There are several methods that can be used to extract DNA from cellular material.
While the organic phenol-chloroform-isoamyl alcohol (PCI or PCIA) (Butler
2005) and the inorganic metal-chelating Chelex-100 resin (Walsh et al. 1999)
were widely used for forensic DNA extraction in the past, silica-based methods
now dominate (Hoff-Olsen et al. 1999, Castella et al. 2006, Brevnov et al. 2009,
Elkins et al. 2013, Eychner et al. 2017). While the PCIA method remains the gold
standard for some applications because it yields the highest quantity of DNA
from samples and the double-stranded structure remains intact, care must be
taken to remove all organics and phenol as the phenol hydroxyl group will attack
the phosphodiester bonds in DNA strands and lead to fragmented, degraded
DNA. Furthermore, the PCIA method is not ideal for all samples; for example,
it digests the gum in chewing gum (Eychner et al. 2017). The Chelex-100 method
is an inexpensive method that requires little hands-on time, although it, like the
PCIA method, requires an overnight incubation step that must be accommodated in the lab’s standard operating procedure (SOP).
Most of the modern DNA extraction methods incorporated into commercial kits are silica-based methods. Examples are the Qiagen QIAamp DNA
Investigator and EZ1 DNA Investigator kits which employ magnetic bead
suspensions. Other approaches, such as the Applied Biosystems PrepFiler and
the Promega DNA IQ kits, employ magnetic beads and magnetite-modified
silicon dioxide magnetic beads, respectfully. These methods are rapid, can
be automated, and yield the most highly reproducible DNA yields (Eychner,
et al. 2017). Robotic tools include the Beckman Coulter Biomek liquid handling platform, Tecan HID EVOlution™, Promega Maxwell ® and Maxprep™
Instruments, and Qiagen QIAcube HT®, QIAcube Connect, and EZ1 instruments. For example, the EZ1 DNA Investigator Kit is used with the Qiagen
EZ1 robot. Samples collected on FTA cards or similar fibrous material containing preservatives can be extracted using one of the above kits such as the
EZ1 DNA Investigator kit or methods or added directly to the PCR sample
tube in library preparation (Kampmann et al. 2016). As modern human DNA
typing kits typically require 1 ng or less of input DNA, any of the extraction
methods can typically recover sufficient DNA from blood or buccal samples.
Many DNA extraction methods have been explored for extracting DNA
from bone and teeth samples. Since the DNA is protected in the bone cells
Sample and Library Preparation and Standards
33
and marrow and tooth pulp, the samples first undergo cleaning, decalcification, cutting, crushing, and drilling steps. In a recent study, the PrepFiler®
BTA method performed better than a PCI-silica-based method for extracting DNA from bone (Hasap et al. 2019). In our lab, a modified procedure
employing the Qiagen EZ1 DNA Investigator kit silica-based method and
EZ1 BioRobot has performed the best for extracting DNA from modern and
historic bone for use in DNA typing analysis (Dukes et al. 2012, Klavens et al.
2020). However, in yet another recent study, some crude bone lysates were
shown to inhibit DNA purification when using paramagnetic silica beads
in the DNA IQ™ Casework Pro Kit and the Maxwell ® sixteen due to the filter clogging and were enhanced when phenol was used to treat the lysates
(Desmyter et al. 2017). Edson (2019) reported that the best DNA recovery
from postcranial osseous human remains of service members lost in World
War II, the Korean War, and South-East Asia was obtained using a complete
demineralization method with organic extraction when four DNA extraction protocols were evaluated. Zupanič Pajnič et al. (2020) quantified human
remains from a massacre of a Slovenian family during World War II using
the PowerQuant kit and were able to produce DNA profiles from a molar
and five femurs. Zeng et al. (2019) tested the DNA IQ™, DNA Investigator,
and PrepFiler® BTA kit as well as the organic extraction method for hair and
blood and two total demineralization protocols for bone; the team found
all of the DNA extraction methods to be efficient and compatible with the
Precision ID and ForenSeq kits. Carrasco et al. (2020) reported upon a specialized extraction method for degraded blood and dental remains samples.
Sidstedt et al. (2020) evaluated PCR inhibition and MPS applications.
DNA recovery of trace materials including hair, fingernails and fingerprints, and other touch DNA has also been studied. Naue et al. (2020) found
that rubbing a wet swab on a hair led to the recovery of DNA from the contributor of the surface material, and the individual who was the source of the
hair and cleaning the hair is essential to obtain a single source hair profile.
Preuner et al. (2014) found that the PrepFiler Forensic DNA Extraction kit led
to high-quality DNA from fingernail clippings. Tasker et al. (2017) extracted
DNA from post-blast improvised explosive device (IED) pipe bombs. England
et al. (2020) tested the ForenSeq kit with DNA extracted from laser microdissected cells. Eychner et al. (2017) tested five DNA extraction methods (PCIA,
QIAamp DNA Investigator, DNA IQ, Chelex-100, and PrepFiler) for recovering DNA from chewing gum and saliva aliquoted on swabs; the QIAamp
method performed the best overall. For low-quantity samples such as chewing gum, touch DNA and human remains, ethanol precipitation, concentrator devices, and reduced elution volumes can be employed to concentrate the
recovered DNA (although larger elution volumes will maximize the overall
yield) (Moore 1998, Eychner et al. 2017).
34
Next Generation Sequencing in Forensic Science
3.4 DNA Quantitation
There are several DNA quantitation options for evaluating the quantity of
DNA recovered to determine how much of the extract to input for NGS
typing. The quantitation methods include human-specific and non-specific
methods. Spectroscopic methods for quantifying DNA include UV-Vis and
fluorescence spectroscopy. UV-Vis Spectroscopic methods can be used to
estimate total DNA in a sample, but they are not human-specific (Elkins
2013). The NanoDrop (UV-Vis spectroscopy) and Qubit (fluorescence spectroscopy) spectrophotometers are widely used for rapid DNA quantification
as they require as little as one microliter of sample.
Real-time PCR methods coupled with fluorscence detection can be used
to determine the quantity of human DNA and also determine its amplifiability (Horsman et al. 2006). Both “ home brew” and commercial assays can be
employed. A real-time PCR assay targeting the TPOX locus quantitates total
human and male genomic DNA (Horsman et al. 2006). Several commercial
kits are available to determine the quantity of human DNA in a sample and
simultaneously determine the presence of human and male genomic DNA and
detect PCR inhibition using real-time PCR and multiple dye channel fluorescence detection. These include the Applied Biosystems Quantifiler™ Human
DNA Quantification, Quantifiler™ DUO and Quantifiler™ Trio kits (Green
et al. 2005, Barbisin et al. 2009) and the Promega Plexor® HY (Krenke et al.
2008) and PowerQuant® System kits. The Qiagen Investigator Quantiplex Pro
and HYres Kits quantify the total human and male DNA in a sample, detect
PCR inhibition, and provide a degradation index (Vraneš et al. 2017, Morrison
et al. 2020). While the Quantifiler™ and Plexor® HY kits require approximately
two hours, the Quantiplex HYres Kit completes in only an hour. The sensitivity of the commercial kits has improved over time and the newest kit, the
Quantiplex HYres kit, is the most sensitive. The Quantiplex HYres Kit is sensitive to <1 pg/μL while the Plexor® HY kit is sensitive to 6.4 pg of total DNA.
The Quantifiler™ DUO kit is sensitive to 51 pg of DNA. The PowerSeq® Quant
MS System, QuantiFluor® ONE dsDNA System, and QuantiFluor® dsDNA
System are recommended for use with the PowerSeq® 45GY NGS kit. The
more the DNA is degraded, the fewer loci should be expected to be typed
using any DNA typing approach. An accurate determination of the quantity of human DNA in a sample is essential for determining the appropriate
input of extracted DNA for NGS. The first step in NGS is library preparation.
The extracted DNA can be diluted to achieve the optimal input quantity as
instructed by the NGS kit manufacturer (Table 3.1). Low template samples
may be used, and input optimized by adding more extract and no water to the
library preparation PCR reactions but a lower number of reads and reduced
coverage may result if sufficient DNA is not supplied in the reaction.
Sample and Library Preparation and Standards
Table 3.1
35
Manufacturer’s Recommended DNA Input Quantity for NGS
Kit
Recommended Input DNA Quantity (ng)
HID-Ion Ampliseq™ Ancestry Panel 1 ng
HID-Ion Ampliseq™ Identity Panel 1 ng
ForenSeq Signature Prep Kit
1 ng (5 µL of 0.2 ng/µL) genomic DNA, or
2 µL crude lysate (e.g., buccal), or
one, 1.2 mm FTA punch
PowerSeq™ 46GY
1 ng (up to 15 µL of 0.2 ng/µL) genomic DNA,
or one FTA punch, or
one nonFTA punch incubated with PunchSolution™, or
2 µL of extract from swab incubated with SwabSolution™
Precision ID GlobalFiler™ NGS STR 1 ng genomic DNA in up to 6 μL (and as little as 0.125 ng)
Panel v2
3.5 Library Preparation
Library preparation is a series of steps to prepare a sample for sequencing
using NGS. It begins with the previously extracted, quantified, and diluted
sample or a sample from an FTA, or similar, sample collection card for direct
amplification. Library preparation has two important features. In the procedure, additional sequences termed adaptor sequences and index sequences are
added. The number of steps and required time to perform the library preparation vary greatly by manufacturer and process (Table 3.2). An overview
and comparison of the Applied Biosystems HID-Ion Ampliseq™ Ancestry
Panel and HID-Ion Ampliseq™ Identity Panel, Verogen ForenSeq™ Signature
Prep Kit, Promega PowerSeq™ 46GY, and Applied Biosystems Precision ID
GlobalFiler™ NGS STR Panel v2 is shown in Table 3.2. The Verogen and
Applied Biosystems library preparation kits are offered in 96 and 384 sample
options. Additional NGS kits include Qiagen’s QIAseq Investigator Missing
Persons SNP panel, QIAseq Investigator ID SNP panel, QIAseq Investigator
Global Ancestry SNP panel, and QIAseq Investigator Middle East Ancestry
SNP panel.
There are three options for analyzing STR and SNP loci using Applied
Biosystems kits. The Applied Biosystems Precision ID GlobalFiler NGS
STR Panel v2 contains primer sets targeting thirty-six markers, including
the same thirty-one autosomal STR markers, amelogenin sex-determining
markers AMELX and AMELY, and three additional sex-determining Y
markers (i.e., DYS391, SRY and Yindel).. The Applied Biosystems Precision
ID Ancestry Panel targets 165 autosomal markers including fifty-five markers developed by Kenneth Kidd and his group and the SNPforID Consortium
and 123 markers developed by Michael Seldin and his team and results in
36
Next Generation Sequencing in Forensic Science
Table 3.2 Comparison of Steps and Time Required for Library Preparation for
Kits from Commercial Suppliers
Library Preparation Kit
Number of Markers
HID-Ion Ampliseq™
165 (includes aiSNPs)
Ancestry Panel
HID-Ion Ampliseq™
124 (124 iiSNPs)
Identity Panel
ForenSeq™ Signature Prep 231 (includes 27 aSTRs, 24 Y-STRs, 7
Kit
X-STRs, 94 iiSNPs, 22 pSNPs, 56
aiSNPsa, amelogenin for sex
determination)
PowerSeq™ 46GY
46 (includes 22 aSTR markers, 23
Y-STRs, and amelogenin for sex
determination)
Precision ID GlobalFiler™ 35 (includes 31 STR markers and 4 sex
NGS STR Panel v2
determination markers)
a
Library Preparation
Time (hours)
0.5 hands-on
~18 total
0.5 hands-on
~18 total
1.5 hands-on
~ 9 total
None specified by
manufacturer
0.5 hands-on
~18 total
Two ancestry markers also used for phenotype estimation.
average amplicon sizes of less than 130 bp. The Applied Biosystems Precision
ID Identity Panel targets 124 autosomal SNPs with a high heterozygosity and
low Fixation Index (Fst). The GenPlex™ HID system amplified forty-eight of
the fifty-two SNPforID SNPs and amelogenin (Johansen et al. 2013).
The Verogen ForenSeq™ Signature Prep kit offers two options: 152 loci
using Primer Set A and 231 loci using Primer Mix B. Primer Set B targets
twenty-seven aSTRs, twenty-four Y-STRs, seven X-STRs, ninety-four iiSNPs,
twenty-two pSNPs, fifty-six aiSNPs, and amelogenin for sex determination
(Jäger et al. 2017). The Promega PowerSeq™ 46GY kit targets twenty-two
aSTR markers, twenty-three Y-STRs, and amelogenin for sex determination.
A comparison of steps in the library preparation workflows for the Applied
Biosystems, Promega and Verogen kits is shown in Figure 3.2. Promega has
also introduced the PowerSeq™ Auto/Y System Prototype for integration in a
lab’s NGS workflow (Montano et al. 2018).
The library preparation steps begin with amplifying and enriching the
targets and adding tags, indexes for demultiplexing, and adaptor sequences
for flow cell binding and conclude with library purification and normalization. In addition to the samples to be sequenced, a positive control (e.g.,
2800 M) and a negative control (no template control) should be processed
during the library preparation steps. There are two PCR steps in library
preparation with the ForenSeq™ kit. In the first PCR step termed PCR1, a
forward tag attached to the forward primer and a reverse tag attached to
the reverse primer are added to the target amplicon. These same sequence
tags are added to all of the targets. In a second PCR step, PCR2, using the
Extract and quantify input DNA
PowerSeq™ 46GY Workflow
Extract and quantify input DNA
(or add one FTA punch, or one nonFTA punch
incubated with PunchSolution TM , or 2 uL of extract
from swab incubated with SwabSolution TM )
ForenSeq™ Workflow
Extract and quantify input DNA
(or add 2 uL crude lysate (e.g., buccal), or one, 1.2
mm FTA punch)
Prepare STR target reaction and perform PCR
Prepare STR and amelogenin target reaction and
perform PCR
Prepare STR, SNP and amelogenin target and
tagging reaction and perform PCR1
Partially digest amplicons
Quantify amplified product
Prepare PCR reaction to add indexes and adaptors
and enrich targets and perform PCR2
Ligate adaptors and purify
Perform end repair
Purify libraries
Quantify libraries using qPCR
Perform A-tailing and adapter ligation
Normalize libraries
Dilute, pool and store libraries until sequencing
Quantify libraries
Denature, dilute, pool and store libraries until
sequencing
Sample and Library Preparation and Standards
Precision ID Workflow
Denature, dilute, pool and store libraries until
sequencing
37
Figure 3.2 Comparison of steps in the library preparation workflows for three NGS products.
38
Next Generation Sequencing in Forensic Science
tags added in PCR1, adaptor sequences are added adjacent to the primer
sequences to allow the amplicons to bind the flow cell for sequencing, and
unique forward and reverse i5 and i7 index sequence combinations are added
to the targets to label them for demultiplexing interpretation. The adaptor
sequences are the same for all samples and are used to complementarily
bind the samples to the flow cell oligonucleotide lawn during sequencing.
The indexes are a unique set of sequences that are used to assign the data to
the sample. An index is analogous to a bar code. Each index is comprised of
an eight base pair sequence used to demultiplex the sample data following
sequencing. Tables 3.3 and 3.4 list the i5 and i7 index sequences, respectively.
Thus the completed “ library” for each target consists of an i5 adaptor, i5
index, forward tag, target sequence, reverse tag, i7 index, and i7 adaptor. The
ForenSeq PCR1 and PCR2 steps result in amplicons in the 60–460 base pair
size range. The amplicon targets and sizes are listed in Chapter 5. The library
Table 3.3
i7 Index Labels and Sequences
Index Label
Sequence
R701
R702
R703
R704
R705
R706
R707
R708
R709
R710
R711
R712
ATCACGAT
CGATGTAT
TTAGGCAT
TGACCAAT
ACAGTGAT
GCCAATAT
CAGATCAT
ACTTGAAT
GATCAGAT
TAGCTTAT
GGCTACAT
CTTGTAAT
Table 3.4
i5 Index Labels and Sequences
Index Label
Sequence
A501
A502
A503
A504
A505
A506
A507
A508
TGAACCTT
TGCTAAGT
TGTTCTCT
TAAGACAC
CTAATCGA
CTAGAACA
TAAGTTCC
TAGACCTA
Sample and Library Preparation and Standards
39
preparation process has been automated by the French National Police using
a Hamilton ID NGS-V STARlet robot (Laurent et al. 2017).
The ThermoFisher Ion Chef™ robot can be used to perform NGS library
preparation for any of the human identification (HID) – Ion AmpliSeq
panels: GlobalFiler NGS STR Panel v2, Identity Panel, Ampliseq™ Identity
Panel, and Ampliseq™ Ancestry Panel as well as the mtDNA Whole
Genome Panel and mtDNA Control Region Panels that will be described in
Chapter 7. The Ion Chef also performs template preparation and chip loading.
Alternatively, the libraries can be prepared manually. If using the robot, the
plate with the eight samples to be sequenced, consumables and master mix
are all loaded to the Ion Chef™. The consumables include Ion S5 Precision ID
Chef Solutions reagent cartridges, chip adapter, enrichment strip cartridge,
tip cartridge with pipet tips, PCR plate and frame seal, recovery station
disposable lid, recovery tubes, and one or two sequencing chips. A camera
system reads the barcodes on the items loaded into the instrument. Three
pipetting steps take approximately fifteen minutes and need to be performed
manually prior to loading the samples to the platform.
In conclusion, a library is a DNA sample that is ready for sequencing with
indexes and adaptors attached to each end. Following manual library preparation, the samples are added to the same tube to form a mixture, or multiplexed, of multiple libraries pooled together that is ready for sequencing.
3.6 Library Purification and Normalization
Following library preparation is the library purification step. Library purification, or clean-up step, removes excess primers and reagents. The ForenSeq™
Signature Prep Kit employs a magnetic bead approach. Working with fewer
samples at a time for the bead-based steps leads to more successful sequencing results. In the procedure, the library is first added to a uniform quantity of magnetic beads and the DNA binds the beads. A magnetic block is
used to draw the DNA-bound beads to its surface at the bottom of the plate.
Ethanol washes are used to wash the beads and remove residual primers and
PCR reagents. The excess ethanol is removed, and then the DNA library is
released from the beads with resuspension buffer (RSB). Users must be careful to remove all of the ethanol. The library resulting after the purification
step results in excess volume of each library than will be needed for sequencing. An agarose or polyacrylamide gel or BioAnalyzer or QIAcel instrument
can be used to check the amplicon lengths and quality prior to sequencing.
Amplicons in the 60–460 bp range indicate high-quality ForenSeq libraries.
Short amplicons of approximately <60 bp indicate primer dimers. Figure 3.3
shows an agarose gel of amplicons prepared using the ForenSeq kit for NGS
40
Next Generation Sequencing in Forensic Science
Figure 3.3 Agarose gel of ForenSeq library preparation amplicons (From left to
right: Lane 1: Trackit 50 bp ladder with bright bands at 350/800/2500 bp, Lanes
2–6: DNA standards, Lane 7: NTC, Lane 8: DNA standard from 10-month-old
library prep). (Courtesy of Adam Klavens.)
library preparation, and Figure 3.4 shows a QIAcel graph of PCR amplicon
for pyrosequencing.
As described above, the Ion Chef™ performs not only the library preparation steps but also the library quantitation, purification, library normalization, and chip loading steps.
Following library purification is the library normalization step. The
eight-sample library can be quantified using the Ion Library TaqMan®
Quantitation kit. Since the PCR steps in library preparation can result in a
range of yields, the library normalization step is used to normalize the quantity and concentration of each sample library to ensure that each library is
represented equally upon pooling. After it has been determined which samples will be sequenced in the same run, the prepared libraries need to be
normalized and pooled together for the sequencing run. Libraries prepared
at different times can be normalized together to prepare them for sequencing on the same flow cell as long as they have unique index combinations.
Sample and Library Preparation and Standards
10
5
bp
1.200
41
00
bp
30
0.900
15
1.000
bp
1.100
[RFU x 1E0]
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
15
50
100
150
200
250
300
400
500 600
800
[bp]
Size
Figure 3.4 QIAcel graph of PCR amplicon for pyrosequencing.
The pooled libraries are diluted to 50 pM and mixed according to the group
of barcode adaptors (1-32) for a 530 chip.
The ForenSeq™ Signature Prep Kit employs bead-based normalization.
The normalization process uses a standardized quantity of magnetic beads
that bind the same amount of library for each sample. The beads must be
warmed to room temperature before use for appropriate binding and mixed
well to ensure delivery of an even quantity of beads to each library. Beads are
added to each well containing library and bind an equal, maximum quantity
of each library based upon the binding capacity and quantity of beads added.
The magnetic beads bind the DNA libraries, the excess is removed, and the
normalized libraries are eluted from the beads.
Verogen indicates that the library preparation step using the ForenSeq™
Signature Prep Kit takes approximately nine hours. In practice, the hands-on
time varies with the number of samples and the experience of the investigator
preparing the library with the steps and procedures.
3.7 Multiplexing and Denaturation
The last step before sequencing using the ForenSeq™ kit is pooling, or multiplexing, the diluted, normalized libraries. Five microliters of each multiplexed set of targets for each sample to be sequenced are pooled together.
A Human Sequencing Control (HSC) is added to the normalized, pooled
libraries prior to sequencing, hybridization buffer is added, and the mixture
is denatured then snap-cooled on ice. Sequencing the prepared libraries is the
focus of Chapter 4.
42
Next Generation Sequencing in Forensic Science
The Ion Chef™ contains a thermocycler and performs the library preparation, library purification, library normalization, and chip loading steps in an
automated, seven-hour process for the eight samples. Primer sequences are
removed prior to sequencing using modifications to the sequences. Through
the process, each DNA sample is cut into millions of fragments using a mixture of restriction enzymes. The kit uses the Ion Xpress™ Barcode Adapters
1–96 Kit and the IonCode™ Barcode Adapters 1–384 Kit. Barcoded libraries
can be combined and loaded onto a single Ion chip to minimize the sequencing run time and cost and allow for accurate sample-to-sample comparisons.
Each fragment attaches to its own primer-coated bead, termed templated ion
sphere particles (ISPs). The ISPs are purified, and those positive for template
are placed at the enrichment station and then loaded onto the chip. Each bead
flows across the chip and falls into a well. When using the Ion Chef with a
530 chip, twenty-four samples can be loaded on one chip for autosomal NGS
library preparation in aproximately 10 hours.
Questions
1. Are DNA extraction and quantitation required prior to NGS library
preparation? Explain why or why not.
2. List some advantages and disadvantages of qPCR DNA quantitation
approaches.
3. What components are included in commercial kits for NGS DNA
typing?
4. Which controls and standards should be prepared during the sample
preparation steps?
5. What is the purpose of the normalization step in library preparation?
6. Describe expected outcomes of sequencing without normalizing the
pooled samples.
7. Why are the index sequences added prior to sequencing?
8. What approaches can be used to determine if high-quality libraries
were prepared?
9. Which NGS system is best to implement with new users or novices to
the process? Explain your answer.
10. Which NGS system reduces interoperator variability? Explain your
answer.
References
Barbisin, M., Fang, R., O‘Shea, C.E., Calandro, L.M., Furtado, M.R., and J.G. Shewale.
“Developmental validation of the Quantifiler Duo DNA Quantification kit
for simultaneous quantification of total human and human male DNA and
Sample and Library Preparation and Standards
43
detection of PCR inhibitors in biological samples.” Journal of Forensic Sciences
54, no. 2 (March 2009): 305–319. doi:10.1111/j.1556-4029.2008.00951.x.
Brevnov, M.G., Pawar, H.S., Mundt, J., Calandro, L.M., Furtado, M.R., and J.G.
Shewale. “Developmental validation of the PrepFiler Forensic DNA Extraction
Kit for extraction of genomic DNA from biological samples.” Journal of Forensic
Sciences 54 (May 2009): 599–607. doi:10.1111/j.1556-4029.2009.01013.x.
Butler, J. Forensic DNA Typing, 2nd ed. Burlington, MA: Elsevier Academic Press,
2005.
Carrasco, P., Inostroza, C., Didier, M., Godoy, M., Holt, C. L., Tabak, J., and A. Loftus.
“Optimizing DNA recovery and forensic typing of degraded blood and dental
remains using a specialized extraction method, comprehensive qPCR sample
characterization, and massively parallel sequencing.” International Journal of
Legal Medicine 134, no. 1 (January 2020): 79–91. doi:10.1007/s00414-019-02124-y.
Castella, V., Dimo-Simonin, N., Brandt-Casadevall, C., and P. Mangin. “Forensic
evaluation of the QIAshredder/QIAamp DNA extraction procedure.” Forensic
Science International 156, no. 1 (January 6, 2006) 70–73. doi:10.1016/j.forsciint.
2005.11.012.
Desmyter, S., De Cock, G., Moulin, S., and F. Noël. “Organic extraction of bone lysates
improves DNA purification with silica beads.” Forensic Science International
273 (April 2017): 96–101. doi:10.1016/j.forsciint.2017.02.003.
Dukes, M.J., Williams, A.L., Massey, C.M., and P.W. Wojtkiewicz. “Technical note:
Bone DNA extraction and purification using silica‐coated paramagnetic
beads.” American Journal of Physical Anthropology 148 (July 2012): 473–482.
doi:10.1002/ajpa.22057.
Edson, S.M. “Extraction of DNA from skeletonized postcranial remains: A discussion of protocols and testing modalities.” Journal of Forensic Sciences 64, no. 5
(September 2019): 1312–1323. doi:10.1111/1556-4029.14050.
Elkins, K.M. Forensic DNA Biology: A Laboratory Manual. Waltham, MA: Elsevier
Academic Press, 2013.
Elkins, K.M., Klavens, A.J. Gorr, K.K., Kollmann, D. D., and C.B. Zeller. “Assessing
the performance of next generation sequencing for determining sex, ancestry,
and phenotypic characteristics of historic human remains.” , in preparation.
England, R., Nancollis, G., Stacey, J., Sarman, A., Min, J., and S. Harbison.
“Compatibility of the ForenSeq™ DNA signature prep kit with laser microdissected cells: An exploration of issues that arise with samples containing low
cell numbers.” Forensic Science International: Genetics 47 (July 2020): 102278.
doi:10.1016/j.fsigen.2020.102278.
Eychner, A.M., Schott, K.M. and K.M. Elkins. “Assessing DNA recovery from chewing gum.” Medicine, Science and the Law 57, no. 1 (January 1, 2017): 7–11.
doi:10.1177/0025802416676413.
Green, R.L., Roinestad, I.C., Boland, C., and L.K. Hennessy. “Developmental validation of the quantifiler real-time PCR kits for the quantification of human nuclear
DNA samples.” Journal of Forensic Sciences 50, no. 4 (July 2005): 809–825.
Hasap, L., Chotigeat, W., Pradutkanchana, J., Asawutmangkul, W., Kitpipit, T., and
P. Thanakiatkrai. “Comparison of two DNA extraction methods: PrepFiler®
BTA and modified PCI-silica based for DNA analysis from bone.” Forensic
Science International: Genetics Supplement Series 7, no. 1 (December 2019):
669–670. doi:10.1016/j.fsigss.2019.10.132.
44
Next Generation Sequencing in Forensic Science
Hoff-Olsen, P., Mevag, B., Staalstrom, E., Hovde, B., Egeland, T., and B. Olaisen.
“Extraction of DNA from decomposed human tissue: An evaluation of five extraction methods for short tandem repeat typing.” Forensic Science International
105, no. 3 (November 8, 1999): 171–183. doi:10.1016/S0379-0738(99)00128-0.
Horsman, K.M., Hickey, J.A., Cotton, R.W., Landers, J.P., and L.O. Maddox.
“Development of a human‐specific real‐time PCR assay for the simultaneous
quantitation of total genomic and male DNA.” Journal of Forensic Sciences 51,
no. 4 (July 2006): 758–765. doi:10.1111/j.1556-4029.2006.00183.x.
Jäger, A.C., Alvarez, M.L., Davis, C.P., Guzmán, E., Han, Y., Way, L., Walichiewicz, P.,
Silva, D., Pham, N., Caves, G., Bruand, J., Schlesinger, F., Pond, S.J.K., Varlaro,
J., Stephens, K.M., and C.L. Holt. “Developmental validation of the MiSeq FGx
forensic genomics system for targeted next generation sequencing in forensic dna casework and database laboratories.” Forensic Science International:
Genetics 28 (May 2017): 52–70. doi:10.1016/j.fsigen.2017.01.011.
Johansen, P., Andersen, J.D., Børsting, C., and N. Morling. “Evaluation of the iPLEX®
Sample ID Plus Panel designed for the Sequenom MassARRAY® system. A
SNP typing assay developed for human identification and sample tracking
based on the SNPforID panel.” Forensic Science International: Genetics 7, no. 5
(September 2013): 482–487. doi:10.1016/j.fsigen.2013.04.009.
Kampmann, M.L., Buchard, A., Børsting, C., and N. Morling. “High-throughput
sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.” BioTechniques 61, no. 3 (September 1, 2016): 149–151. doi:10.2144/
000114453.
Klavens A, Kollmann DD, Elkins KM, Zeller CB. Comparison of DNA yield and
STR profiles from the diaphysis, mid-diaphysis, and metaphysis regions of
femur and tibia long bones. J Forensic Sci. 2021 May;66(3):1104–1113. doi:
10.1111/1556-4029.14657. Epub 2020 Dec 28. PMID: 33369740.
Krenke, B.E., Nassif, N., Sprecher, C.J., Knox, C., Schwandt, M., and D.R. Storts.
“Developmental validation of a real-time PCR assay for the simultaneous
quantification of total human and male DNA.” Forensic Science International:
Genetics 3, no. 1 (December 2008): 14–21. doi:10.1016/j.fsigen.2008.07.004.
Laurent, F.X., Ausset, L., Clot, M., Jullien, S., Chantrel, Y., Hollard, C., and L.
Pene. “Automation of library preparation using Illumina ForenSeq kit for
routine sequencing of casework samples.” Forensic Science International:
Genetics Supplement Series 6 (December 2017): e415–e417. doi:10.1016/j.
fsigss.2017.09.156.
Montano, E.A., Bush, J.M., Garver, A.M., Larijani, M.M., Wiechman, S.M., Baker,
C.H., Wilson, M.R., Guerrieri, R.A., Benzinger, E.A., Gehres, D.N., and M.L.
Dickens. “Optimization of the Promega PowerSeq™ Auto/Y system for efficient
integration within a forensic DNA laboratory.” Forensic Science International:
Genetics 32 (January 2018): 26–32. doi:10.1016/j.fsigen.2017.10.002.
Moore, David, and Dennis Dowhan. “Purification and concentration of DNA from
aqueous solutions.” Current protocols in pharmacology vol. Appendix 3 (2007):
3C. doi:10.1002/0471141755.pha03cs38.
Morrison, J., McColl, S., Louhelainen, J., Sheppard, K., May, A., Girdland-Flink, L.,
Watts, G., and N. Dawnay. “Assessing the performance of quantity and quality
metrics using the QIAGEN Investigator® Quantiplex® pro RGQ kit.” Science &
Justice 60, no. 4 (July 2020): 388–397. doi:10.1016/j.scijus.2020.03.002.
Sample and Library Preparation and Standards
45
Naue, J., Sänger, T., and S. Lutz-Bonengel. “Get it off, but keep it: Efficient cleaning of hair shafts with parallel DNA extraction of the surface stain.” Forensic
Science International. Genetics 45 (March 2020): 102210. doi:10.1016/j.fsigen.
2019.102210.
Preuner, S., Danzer, M., Pröll, J., Pötschger, U., Lawitschka, A., Gabriel, C., and
T. Lion. “High-quality DNA from fingernails for genetic analysis.” The
Journal of Molecular Diagnostics 16, no. 4 (July 2014): 459–466. doi:10.1016/j.
jmoldx.2014.02.004.
Sidstedt, M., Rådström, P., and J. Hedman. “PCR inhibition in qPCR, dPCR and
MPS-mechanisms and solutions.” Analytical and Bioanalytical Chemistry 412,
no. 9 (2020): 2009–2023. doi:10.1007/s00216-020-02490-2.
Tasker, E., LaRue, B., Beherec, C., Gangitano, D., and S. Hughes-Stamm. “Analysis
of DNA from post-blast pipe bomb fragments for identification and determination of ancestry.” Forensic Science International: Genetics 28 (May 2017):
195–202. doi:10.1016/j.fsigen.2017.02.016.
Vraneš, M., Scherer, M., and K. Elliott. “Development and validation of the
Investigator® Quantiplex Pro Kit for qPCR-based examination of the quantity
and quality of human DNA in forensic samples.” Forensic Science International:
Genetics Supplement Series 6 (December 2017): e518–e519. doi:10.1016/j.
fsigss.2017.09.207.
Walsh, P.S., Mitzger, D.A., and R. Higuchi. “Chelex-100 as a medium for simple
extraction of DNA for PCR-based typing from forensic material.” BioTechniques
10, no. 4 (April 1991): 506–513.
Zeng, X., Elwick, K., Mayes, C., Takahashi, M., King, J.L., Gangitano, D., Budowle,
B., and S. Hughes-Stamm. “Assessment of impact of DNA extraction methods
on analysis of human remain samples on massively parallel sequencing success.” International Journal of Legal Medicine 133, no. 1 (January 2019): 51–58.
doi:10.1007/s00414-018-1955-9.
Zupanič Pajnič, I., Obal, M., and T. Zupanc. “Identifying victims of the largest
Second World War family massacre in Slovenia.” Forensic Science International
306 (January 2020): 110056. doi:10.1016/j.forsciint.2019.110056.
Performing Next
Generation Sequencing
4
4.1 Performing Next Generation Sequencing
The focus of Chapter 3 was on sample preparation steps including DNA sampling and extraction, DNA quantitation, library preparation, clean-up, and
normalization. In this chapter, the focus will be sequencing the prepared
libraries using the MiSeq FGx or Ion series instruments. The focus on these
two instruments results from their adoption for forensic use. They have both
proven to be reliable and robust in several years of use in many labs all over
the world. Both are very easy to maintain and use.
4.2 Verogen MiSeq FGx® Sequencing
The Verogen MiSeq FGx instrument is a modified Illumina MiSeq platform
instrument. The MiSeq is a mid-capacity NGS instrument positioned between
the iSeq and MiniSeq benchtop sequencers with less capacity and the HiSeq
and NextSeq sequencers with greater capacity and higher throughput. The
MiSeq requires the user to supply only the reagent cartridge, a plastic casing
with multiple wells each prefilled with the necessary sequencing reagents,
flow cell to sequence pooled, normalized libraries, and a waste bottle. The
instrument has two run options: research use only (RUO) and forensic
genomics (Figure 4.1).
The reagent cartridges and flow cells can only be purchased from Verogen
for forensic use while the same from Illumina can be used for RUO mode
applications. The flow cell is a modified glass slide outfitted with microfluidic
channels. Ninety-six libraries can be sequenced on a standard flow cell using
ForenSeq Primer Set A versus thirty-two prepared with Primer Set B. On a
microflow cell, thirty-two libraries can be sequenced using ForenSeq Primer
Set A, and twelve can be sequenced using Primer Set B.
Unless the sequencer is used frequently, it is best to perform a maintenance wash prior to commencing a sequencing run (Figure 4.2). The maintenance wash consists of three, thirty-minute washes with freshly prepared
0.5% Tween 20. The wells of a plastic cartridge that mimics the shape of the
reagent cartridge are filled with the Tween detergent and placed in the instrument compartment. The waste reservoir is emptied, and a used flow cell is
DOI: 10.4324/9781003196464-4
47
48
Next Generation Sequencing in Forensic Science
Figure 4.1 Setting up a sequencing run on the MiSeq FGx (RUO or Forensic Use).
Figure 4.2 Wash screen on MiSeq FGx.
placed in the flow cell holder in its compartment. The waste bottle and the
wash bottle are placed in the main compartment. The wash cycle is begun;
after each thirty-minute wash, the user must return to the instrument to
replace wash agent and indicate to continue to wash.
Following the wash and upon proper instrument functioning, the reagent
cartridge is thawed in a shallow water bath for ninety minutes as indicated by
the manufacturer and mixed by inversion prior to use. It can be stored on ice
for up to six hours after thawing and prior to a sequencing run. The flow cell
is removed from the buffer-containing vial which is shipped in, and wiped
Performing Next Generation Sequencing
49
dry with care with laboratory wipes to remove all streaks before placing it in
its holder in the instrument to click it into place. The instrument records its
barcode and checks that the flowcell is unused and not expired. The pooled
libraries are denatured, and the human sequencing control (HSC) is added.
Subsequently, the libraries are applied to the marked location in the reagent
cartridge. The foil covering the well can be pierced with a sterile, DNase-free
pipet tip. The manufacturer’s recommendation in the ForenSeq kit manual is
to add seven microliters of the pooled, normalized libraries (PNL) and HSC
mixture to the reagent cartridge for sequencing. However, several labs add
more than the recommended volume. Verogen recommends not exceeding
thirteen microliters of pooled, normalized libraries and HSC mixture. Air
bubbles should be avoided during pooled library addition to the reagent cartridge, but can be tapped out if this occurs. The thawed and mixed reagent
cartridge containing the PNL is loaded into its unique cupboard space in
the instrument so that its unique Radio Frequency ID (RFID) barcode can
be read (it can be manually inputted if not read by the software) (ForenSeq™
DNA Signature Prep Reference Guide).
A new run is created in the Verogen Universal Analysis Software (UAS)
prior to beginning the sequencing run (Figure 4.3). It contains the sample
names, primer set, and index assignments for each sample. The user also
defines the sample type such as sample, positive amplification control, negative amplification control, or reagent blank (NTC). Alternatively, sample
information can be imported from a.txt file. Once all samples are added and
the attributes have been assigned, the run is saved.
The ForenSeq libraries are amplified and sequenced on the flow cell
(Figure 4.4) in the sequencing run. At the instrument computer interface, the
Figure 4.3 Preparing a new run in the Verogen Universal Analysis Software.
50
Next Generation Sequencing in Forensic Science
Figure 4.4 Micro (left) and standard (right) MiSeq flow cells.
MiSeq FGx software prompts lead the user through the process of starting
the sequencing run. The “sequence” box is followed by “ forensic genomics”
and then the run name can be selected. The software proceeds with a series
of pre-run checks and then the run can be started. Quality information metrics including cluster density, cluster passing filter, phasing, and pre-phasing
are reported in real-time as the run progresses. The run status is also visible,
and the number of steps are reported (Figure 4.5). The run can be paused or
stopped using buttons on this screen.
Within the instrument, the samples are introduced over the flow cell. The
flow cell glass slide is covered with lanes populated with an oligonucleotide
lawn containing two types of oligos on its surface, which are complementary
to the i5 and i7 adaptors added. The sequencing by synthesis (SBS) chemistry
proceeds via isothermal amplification. Prior to the actual sequencing occurs
a series of steps called cluster generation. A cluster is a distinct spot on a
flow cell made of approximately one thousand copies of a library containing one amplicon. The libraries bind to one type of oligo via complementary
base pairing via the adaptor region in random locations on the flow cell. A
polymerase extends the oligo by adding the base complementary strand. The
double-stranded complex is denatured, and the library strand is washed away
leaving the extended amplicon of the complementary library sequence covalently attached to the flow cell. The oligo strand folds over to a nearby oligo
of the second type and the other adaptor sequence hydrogen bonds so that
a U-shaped bridging structure is formed. The polymerase extends the second oligo by complementary base pairing to the oligo strand through bridge
Performing Next Generation Sequencing
51
Figure 4.5 Sequencing in process on the MiSeq FGx.
amplification. The bridge is denatured and two single-stranded copies of the
library are now covalently attached and extended from the oligo lawn. The
process is repeated simultaneously in parallel so that billions of clusters are
generated. The reverse strands are cleaved and washed away leaving only the
forward strands. The 3′-ends are blocked to prevent unwanted priming.
Finally, the sequencing begins with Read 1. A sequencing primer complementary to forward tag, which was added in PCR1, initiates sequencing on
the forward strand. One of four fluorescently labeled nucleotides is added
to the primer based upon the complementary sequence of the strand in a
massively parallel sequencing (MPS) process. An ultraviolet light source is
used to excite the fluorophore and enable the detection of each cluster. After
the addition of each base, the emission wavelength is used to determine the
nucleotide base that was added. The fluorescent dye is cleaved off, and the
blocking agent is removed so that the next base can be added. The process
continues for the full length of the strand. The fluorescence emission of
the clusters over the flow cell is read simultaneously generating hundreds
of high-quality images that require significant storage space. After the forward strand first read sequence is completed, the product is washed away.
Next in Read 2, the index 1 i7 read primer is hybridized to the forward
strand, and sequencing is conducted again until the strand is completed.
52
Next Generation Sequencing in Forensic Science
When sequencing is completed, it is washed away. Then, the 3ʹ-end of the
template is deprotected, folds over, and binds the second oligo on the flow
cell. Extension of the second oligo yields a read of index 2 i5 in Read 3. The
index 2 product is washed off after the sequencing and detection are complete. Finally, the second oligo is sequenced fully until the double-stranded
bridge is formed. The bridge is denatured and linearized, and the forward
strand is cleaved off and washed away. A read of the target reverse strand is
begun with the addition of a sequencing primer. The read continues until
the desired length of the second strand is sequenced (Launen 2017). Then
the product is washed away. The sequencing run takes twenty-seven to thirty
hours. The run completion details are displayed on the screen and can be
viewed in the UAS (Figure 4.6).
Illumina recommends keeping the MiSeq powered on at all times with
frequent use unless the shutdown steps are followed to power down the instrument for long periods of disuse. At the conclusion of the sequencing run, the
used flow cell, wash tray, wash bottle, and waste bottle can remain in place
until the next run. An instrument wash should be performed, however, immediately before and after each run. There are three wash options: a post-run
wash, a maintenance wash, and a standby wash. A wizard leads the user
through the wash steps. A post-run wash takes thirty minutes and 25.5 mL of
wash solution. The wash solution contains DNase-free and RNase-free water
mixed with Tween 20 resulting in a 0.5% Tween 20 final concentration. Dilute
bleach solution is used in position seventeen of the wash tray, corresponding
to the position of library addition. The maintenance wash consists of a series
of three wash steps as described above. It should be performed at least once
Figure 4.6 MiSeq FGx sequencing run completion viewed in UAS.
Performing Next Generation Sequencing
53
every thirty days. A maintenance wash uses 17.5 mL of wash solution and takes
approximately ninety minutes. According to the manufacturer, the standby
wash should be used if the instrument will not be used within seven days and
should be repeated every thirty days that the instrument is idle. The standby
wash uses 46 mL of wash solution and takes approximately two hours. Prior to
shutting the instrument down, a maintenance wash should be performed, the
waste should be discarded, and the waste bottle should be returned to its compartment. To shut down the instrument, select “shut down” from the Manage
Instrument screen and toggle the power switch to off. The user should wait at
least sixty seconds before restarting the instrument. Toggle the power switch
on and allow the chiller compartment to cool before using the instrument
(ForenSeq™ DNA Signature Prep Reference Guide).
4.3 ThermoFisher Ion Torrent™ and Ion PGM Sequencing
The ThermoFisher Ion GeneStudio S5 System, Ion PGM™ System, Ion S5™
XL System, and Ion OneTouch™ 2 System sequencing systems can be used
to sequence libraries prepared using the Applied Biosystems Precision ID
GlobalFiler™ NGS STR Panel v2, HID-Ion Ampliseq™ Identity Panel, or
HID-Ion Ampliseq™ Ancestry Panel kits with the Ion 510, Ion 520, and Ion
530 semiconductor chips. The higher chip numbers have a higher capacity
for sequencing libraries to be sequenced on the chip. The Ion 520 chip can
accommodate sixteen sample libraries prepared with the GlobalFiler kit. The
Ion 530 chip can run thirty-two sample libraries prepared with GlobalFiler™
kit. For the Ancestry Panel, 48 samples can be run on the Ion 510 chip, 72 can
be run on the Ion 520 chip, and 362 samples can be run with the Ion 530 chip.
For the Identity Panel, 54 samples can be run on the Ion 510 chip, 81 can be
run on the Ion 520 chip, and 384 samples can be run with the Ion 530 chip.
The sample is loaded directly into the chip well.
To begin sequencing with the Ion Torrent system, the chip, reagent cartridge, wash solution, sequencing buffer, and waste bottle must be loaded to
the instrument in the initialization proces. The reagent cartridge must be
equilibrated at room temperature prior to use and the wash solution must be
inverted several times. The chip, cleaning solution, sequencing buffer, and
waste container are loaded into one compartment. The chip is attached to a
bar and slides in its drawer. Each of these items is tracked using an RFID system. The reagent cartridge contains all of the nucleotides that can be added
during sequencing. The instrument initializes when these items have been
loaded. Before beginning the sequencing run, the user designs a sequencing
protocol that specifies the number of barcodes, chip type, chip barcode, run
module (e.g., Global Filer/Ancestry/whole mtDNA) and the number of flows.
54
Next Generation Sequencing in Forensic Science
The sequencing protocol parameters can vary with the sequencing to be performed by the instrument
The chip prepared using the Ion Chef is placed into the Ion S5 or Ion
S5 XL. The chip has millions of wells covering pixels for detection. The user
selects the “ home” then “run” option to begin. The sample is loaded directly
into the chip well, and the new chip loaded with ISPs is placed in the sequencer
for sequencing and engaged with the chip clamp. Once the protocol is created
as described above, the sequencer automatically detects the chip barcode and
performs sequencing (Precision ID GlobalFiler™ NGS STR Panel v2 with the
HID Ion S5™/HID Ion GeneStudio™ S5 System Application Guide).In a series
of steps, the chip is flooded with one of the four natural DNA nucleotides.
The Watson-Crick base pair is formed by hydrogen bonding when the appropriate base binds the template. Upon incorporation of the base into the chain,
a proton is released changing the pH of the system. The pH change is detected
by an ion-sensitive layer in the chip, and the voltage is detected by the detector. The voltage is converted into a digital signal, and the base call is made by
the software. The chip is flooded with a new base every fifteen seconds. If the
base is not complementary to the next base, no hydrogen ion is released, no
voltage change is detected, and no base call is made. If two (or three) identical bases are next to each other, the voltage will increase by double (or triple)
based upon the number of nucleotides incorporated. Each base is called for
the sequences extended on the millions of beads in the millions of wells. Each
sequencing run requires three and a half hours. Following the run, users
should perform a post-run clean.
4.4 The Next Step
Following DNA sequencing, the run and sequence files are evaluated to
determine the quality of the run. The data is analyzed using a variety of tools.
These steps are the topic of the next chapter.
Questions
1. Explain the concept of sequencing by synthesis.
2. Explain how an added base is detected on the MiSeq FGx and Ion
series instruments.
3. Why are the consumables labeled with barcodes?
4. List some reasons why a sequencing run may fail.
5. What information must be inputted into the sequencer software for
interpretation after sequencing?
Performing Next Generation Sequencing
55
References
ForenSeq™ DNA Signature Prep Reference Guide. August 2020. Accessed May 21, 2021.
https://verogen.com/wp-content/uploads/2020/08/forenseq-dna-signature-prepreference-guide-VD2018005-c.pdf.
Launen, L. “Illumina Sequencing (for Dummies) – An overview on how our samples
are sequenced.” Accessed January 22, 2021. https://kscbioinformatics.wordpress.
com/2017/02/13/illumina-sequencing-for-dummies-samples-are-sequenced/.
Precision ID GlobalFiler™ NGS STR Panel v2 with the HID Ion S5™/HID Ion
GeneStudio™ S5 System Application Guide. Revision 15 November 2018. Accessed
May 21, 2021. https://assets.thermofisher.com/ TFS-Assets/LSG/manuals/
MAN0016129_PrecisionIDSTRIonS5_UG.pdf.
5
Next Generation
Sequencing Data
Analysis and
Interpretation
5.1 NGS Data Analysis
Next generation sequencing (NGS) produces large quantities of data – exponentially more than traditional methods for STR typing and SNP analysis. The data output is also different. Whereas fluorescence intensity data is
recorded by CE instruments, the analogous NGS output is the number of
sequencing reads. Some NGS instruments record photographic images of the
raw data, which are in fact fluorescent dots of clusters on the chip. Software
is used to interpret the fluorescence emission or voltage changes and generate a raw DNA sequence for each cluster. The number of sequences of each
type is counted, and tables are generated showing the number of “reads” or
counts of each sequence recorded. As previously discussed in Chapters 2 and
4, NGS instruments use several different technologies to record the identity of the nucleotide base as it is incorporated. While determining the STR
repeat number and SNP variant base is an aspect of both CE and NGS data
interpretation, there are several new terms that are used to qualify NGS data
that are not applicable to CE. Some are platform-specific and others are more
general to scoring large sequence data sets.
The term cluster density (K/mm2) refers to the number of individual “ islands” or groups of DNA molecules that were amplified into clusters on a flow cell. Each cluster represents a single, unique template on the
Illumina platform. The depth of coverage indicates the average number of
times a sequence is recorded in the process. In NGS protocols, the genome
is fragmented or the targets are preferentially amplified and tagged prior to
sequencing, thus the depth of coverage is an indicator of the strength of the
data obtained. The clusters passing filter (%) indicates percentage of individual clusters the software is able to distinguish. The filter removes the least
reliable data, such as that from overlapping clusters. Phasing is a PCR-based
phenomenon that occurs when a base fails to be added in a sequencing cycle.
Phasing is detected upon comparing the subsequent base sequence addition to other clusters. Prephasing is another PCR-based issue and occurs
when an extra base is added in a cycle leading to a read one base longer
than the other libraries. This is also detected upon sequence comparison.
DOI: 10.4324/9781003196464-5
57
58
Next Generation Sequencing in Forensic Science
Figure 5.1 MiSeq FGx run metrics for a successful sequencing run.
Typically only a small portion of the strands in a cluster, if any, become out
of phase. A screenshot of the Illumina MiSeq FGx run metrics viewed in the
UAS is shown in Figure 5.1.
The Fastq file is the text-based output of the nucleotide-based sequence
and can be analyzed by many software applications. The Q-Score is a quality
measure of Q = −10log 10e where e is the estimated probability that the base
call is wrong. The Q-score is an estimate of the quality of the base call using
a log function. The quality score is based on the Phred score from Sanger
sequencing. The Phred score is a rating of the quality of the nucleotide base
identification in Sanger sequencing. High Q scores indicate a low probability
that the base call is incorrect, while a low Q score reflects low quality or unusable data.
5.2 Verogen Universal Analysis Software
The Illumina MiSeq and the Verogen MiSeq FGx record a photographic
image of all of the fluorescing clusters on the flow cell after the addition of
the base in each cycle. A camera is used to record the fluorescence signal
after each base is added at each cluster location to produce the base read.
(A cluster is needed so that the signal is strong enough to be detected by the
detector.) A good run will reflect clusters that are evenly spaced across the
flow cell. Poor flow cells will exhibit blank areas where clusters were not generated or images showing many overlapping clusters. These images are large
NGS Data Analysis and Interpretation
59
and collectively represent the majority of megabytes of data that is recorded
in each sequencing run.
The MiSeq FGx outputs metrics for cluster density, percent of clusters
passing the filter, and phasing and prephasing that indicate the quality of the
run and issue warning flags when the values are outside the set range. These
metrics and the percentage of the reads completed are shown in the main
run window. The metrics are color-coded green if the run values are within
normal tolerance, orange if a metric is outside of the manufacturer’s target
range, and red if the indicators reflect major problems with the run. The setting for the clusters passing filters is ≥80%. A green light reflects the values at
or above that tolerance. The phasing filter is set at ≤0.25%, and the prephasing
filter is set to ≤0.15%. The main window also shows the flow cell coverage and
has a stop and pause icon and a quality indicator icon. A passing quality is
indicated by a green Q icon and green circles for the read 1, index 1, index 2,
and read 2 indicators (Figure 5.1). When the quality is below the threshold,
the circle turns orange. If all of the metrics are green, the run is of high quality to proceed with data analysis. Approaches for troubleshooting if signal
warnings are indicated are covered in Chapter 6.
Data analysis can be performed with one of several software applications.
The software may be loaded to the local server or be accessed via the cloud.
For research applications, Illumina offers the Sequencing Analysis Viewer
on the cloud to view and analyze the sequencing data. Additional analyses
are traditionally conducted using a variety of applications and scripts on
BaseSpace. For example, the Q-score can be computed. A Q-score of Q30
indicates a high-quality read or that the base call is 99.9% accurate (a 1:1000
chance that the base is incorrect). Additional sequence analysis tools are
described in Section 5.6.
Results from sequencing libraries prepared using the ForenSeq kit
(Table 5.1) can be analyzed using the Verogen ForenSeq Universal Analysis
Software (UAS) or outside of the UAS using BaseSpace or one of many
third-party apps. After the run, the sequencing data is automatically saved
on the instrument, copied to the server, and imported into the UAS. The
UAS enables initial “secondary analysis” and optional analyses and reporting. Forward and reverse reads are paired to create contiguous sequences
which are aligned to the reference genome. The paired-end reads are used
to resolve ambiguous alignments. Multiple samples are sequenced together;
demultiplexing is a bioinformatics-based approach to identify the index
sequences and label them with the appropriate sample name. Thereafter, each
sample can be analyzed separately. The UAS demultiplexes the data using
the user-defined indexes for each sample, generates the raw sequence files,
makes base calls at each locus, and assigns the read to the appropriate STR or
SNP based upon counting from the end of the primers. For example, using
60
Next Generation Sequencing in Forensic Science
Table 5.1 ForenSeq and Precision ID Target Autosomal and Sex Chromosomal
STR Loci
Marker
Repeat
Chromosome
Database
D1S1656
TPOX
D2S441
D2S1338
D3S1358
D4S2408
FGA
D5S818
CSF1PO
D6S1043
D7S820
D8S1179
D9S1122
TAGA
AATG
TCTA
TGCC/TTCC
TCTA/TCTG
ATCT
CTTT/TTCC
AGAT
AGAT
AGAT/AGAC
GATA
TCTA/TCTG
TAGA
1
2
2
2
3
4
4
5
5
6
7
8
9
D10S1248
TH01
vWA
D12S391
D13S317
PentaE
D16S539
D17S1301
GGAA
TCAT
TCTA/TCTG
AGAT/AGAC
TATC
AAAGA
GATA
AGAT
10
11
12
12
13
15
16
17
D18S51
D19S433
D20S482
AAGA
AAGG/TAGG
AGAT
18
19
20
D21S11
PentaD
D22S1045
SRY
DYS391
AMEL-X
AMEL-Y
rs2032678
DYF387S1
TCTA/TCTG
TCTTT
ATT
N/A
TCTA
indel
indel
indel
[AAAG]n GTAG
[GAAG]n
[AAAG]n
GAAG [AAAG]
n [GAAG]n
[AAAG]n
21
21
22
Y
Y
X
Y
Y
Y
Expanded CODIS
CODIS
Expanded CODIS
Expanded CODIS
CODIS
Non-CODIS
CODIS
CODIS
CODIS
Non-CODIS
CODIS
CODIS
Other Autosomal
STRs
Expanded CODIS
CODIS
CODIS
Expanded CODIS
CODIS
Other STR
CODIS
Other Autosomal
STRs
CODIS
Expanded CODIS
Other Autosomal
STRs
CODIS
Other STR
Expanded CODIS
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Precision ID ForenSeq
Amplicon
Amplicon
Size (bp)
Size (bp)
163–211
167–199
163–195
126–190
129–177
167–191
137–299
137–169
143–183
163–227
150–186
151–199
NA
133–192
61–109
137–177
110–203
138–194
98–118
150–312
98–162
72–120
154–226
118–183
82–138
104–132
155–199
129–173
147–207
149–193
149–181
168–273
135–175
NA
124–176
96–140
135–195
229–289
138–186
362–481
132–184
130–154
156–232
155–195
NA
136–272
148–240
125–157
179–245
139–204
168–201
119
130–162
102
108
178–183
NA
147–265
209–298
201–245
NA
119–163
NA
NA
NA
207–263
(Continued)
NGS Data Analysis and Interpretation
61
Table 5.1 (Continued) ForenSeq and Precision ID Target Autosomal and Sex
Chromosomal STR Loci
Marker
Repeat
DYS19
TAGA
DYS385a-b GAAA
DYS389I
[TCTG] [TCTA]
[TCTG]
[TCTA]
DYS389II [TCTG] [TCTA]
[TCTG]
[TCTA]
DYS390
[TCTA] [TCTG]
DYS392
TAT
DYS437
[TCTA]n
[TCTG]n
[TCTA]n
DYS438
TTTTC
DYS439
AGAT
DYS448
AGAGAT
DYS460
ATAG
DYS481
CTT
DYS505
TCCT
DYS522
CTTT
DYS533
ATCT
DYS549
GATA
DYS570
TTTC
DYS576
AAAG
DYS612
[CCT]5[CTT]
[TCT]4[CCT]
[TCT]25
DYS635
TSTA compound
YTAGA
GATA-H4
DXS10074 AAGA
DXS10103 [TAGA]n
[CTGA]n
[CAGA]n
[TAGA]n
[CAGA]n
[TAGA]n
DXS10135 [AAGA]n
GAAA gga
[AAGA]n
[AAAG]n
Precision ID ForenSeq
Amplicon
Amplicon
Size (bp)
Size (bp)
Chromosome
Database
Y
Y
Y
Sex determination NA
Sex determination NA
Sex determination NA
269–309
232–316
236–268
Y
Sex determination NA
283–323
Y
Y
Y
Sex determination NA
Sex determination NA
Sex determination NA
290–334
318–362
194–226
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
Sex determination
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
129–179
167–211
330–402
348–376
129–174
162–186
298–334
186–226
210–226
142–206
163–223
275–296
Y
Y
Sex determination NA
Sex determination NA
242–302
159–187
X
X
Sex determination NA
Sex determination NA
184–244
157–185
X
Sex determination NA
239–312
(Continued)
62
Next Generation Sequencing in Forensic Science
Table 5.1 (Continued) ForenSeq and Precision ID Target Autosomal and Sex
Chromosomal STR Loci
Precision ID ForenSeq
Amplicon
Amplicon
Size (bp)
Size (bp)
Marker
Repeat
Chromosome
Database
DXS7132
DXS7423
TAGA
[TGGA]n
aggacaga
[TGGA]n
ATAG
ATCT
X
X
Sex determination NA
Sex determination NA
175–211
188–220
X
X
Sex determination NA
Sex determination NA
434–458
193–229
DXS8378
HPRTB
n represents number of repeats. NA is not applicable.
Repeats recorded from kit manuals and strbase.nist.gov.
the STR or SNP aligner, the data reads are aligned, converted to allele length
(for STRs), and summed for SNP calls based upon comparison to a reference
sequence.
Errors with index assignments, sample name or user-inputted information can be corrected, and the data can be reanalyzed. The original data,
Version 1.0, is retained when the new analysis is created in the Project with
a New Run Version identified as 2.0. All versions of the data are retained as
separate files. The analytical threshold (AT) can be retained from the manufacturer (Jäger et al. 2017) or set to −2 bp stutter as well as N+1, N−1, and N−2.
Stutter artifacts are PCR misincorporation and sequencing errors. The interpretation threshold (IT) can also be adjusted. The data can be reanalyzed
with new threshold settings such as stutter filter, intralocus balance, analytical threshold, and interpretation threshold. Minor setting changes such
as these are also saved in a new file (e.g., Version 1.1).
Reads/noise is impacted by low template/input quality and proportional
to sample input quantity. Different loci can have different noise levels. In
choosing a threshold, the scientist must consider Global AT, Kit/Locusdependent AT and evaluate which to use by impact on the likelihood ratio
(LR). The sample history is recorded for each action taken by the user, and
user actions or system events can be viewed at the analysis, sample, and locus
level. The sample history can be toggled on and off using a switch. System
events include run completion indicators, analysis completion, and completion of population statistics. User events include genotype edits, comments,
and sample reports generated by the user.
The UAS displays the data in several graphs and charts. Sequencing differences are identified, and the results are displayed pictorially for each locus
of interest. The P representing the positive amplification human sequencing
control (HSC) is displayed with a pass or fail metric based upon the overall
intensity (Figure 5.2). The N representing the negative control will display
the number of STRs and SNPs typed when selected (Figure 5.3). For each
NGS Data Analysis and Interpretation
63
Figure 5.2 Passing HSC in MiSeq FGx sequencing run.
Figure 5.3 ForenSeq sequencing run negative control with no alleles.
sample, a graph is displayed that shows the number (intensity) of reads versus the length of each locus in base pairs (Figure 5.4). A high-quality result
for the positive controls and samples will have a high number of reads for
all loci, regardless of length (Figure 5.5). Degraded or low template samples
often exhibit fewer reads for longer amplicons, and the shorter amplicons are
more likely to lead to success in sequencing (Figure 5.6). Table 5.1 shows the
amplicon lengths at the loci targeted by the ForenSeq and Precision ID kits.
The total number of STR and SNP loci typed for each sample is displayed
under the graph in Figure 5.6.
64
Next Generation Sequencing in Forensic Science
Figure 5.4 UAS reads versus length graph for 2800M.
Figure 5.5 Full profile for 2800M using ForenSeq library prep.
The UAS employs several quality indicators to alert the user to issues
with the data quality. The potential data quality issues include stutter, extra or
missing alleles, read imbalance, low coverage, and alleles below the thresholds.
User edits and alleles not detected also cause loci to be flagged. The stutter flag
indicates that an “allele” is likely a stutter amplicon of another allele at the
locus (Figure 5.7). The allele count flag turns on if more than two alleles have
counts above the IT indicating a possible mixture. The imbalance flag indicates
NGS Data Analysis and Interpretation
65
Figure 5.6 Sample comparison in UAS for 9948 at two input concentrations.
Figure 5.7 D7S820 locus displaying stutter and peak imbalance for a sample.
that the read count ratio is below the intralocus balance setting applied by the
manufacturer or the user. The low coverage flag indicates that the number of
reads is below the IT. The interpretation threshold flag alerts the user when at
least one allele is above the analytical threshold but below the interpretation
threshold. The analytical threshold flag turns on when the number of reads
66
Next Generation Sequencing in Forensic Science
at the locus is below the analytical threshold, and no alleles were above the
interpretation threshold. If no signal is detected for a locus, the not detected
flag is activated. The user modified flag is activated if the user manually edits
an allele.
A screen displays all of the STR loci in boxes with the numerical allele
calls (grey boxes) for each locus and quality indicators (orange boxes), if
any. Selecting each box will display the allele call, a slide switch that turns
the allele call on or off, the number of reads or intensity for each allele, and
detected stutter and the actual sequence of the allele. Being able to evaluate
the full STR sequence is a significant advantage of NGS over CE STR typing.
A graph displays the number of reads versus alleles with thresholds and color
coding to indicate if the alleles are above (blue) or below (grey) the analytical
(dark grey) and interpretation thresholds (light grey). An allele that is above
the analytical threshold but below the interpretation threshold is displayed
in brown. All of the SNP loci are also displayed in boxes showing the heterozygote or homozygote allele calls at each locus (Figure 5.8). SNP imbalance
is displayed with a larger font and can be viewed for each locus in a pie chart
(Figure 5.9). Selecting a box shows switches to turn an allele call on or off,
the total number of reads for each allele and percent of total reads, and a pie
graph that displays the alleles.
If more than five loci have more than two STR alleles or imbalance, the
results are flagged for mixture analysis. Similarly, if more than ten STR loci
are detected to be imbalanced, the data is flagged and a mixture may be
detected. Poor data is also flagged for both STR and SNP data. If poor data is
detected for samples, the controls can indicate if the run or samples were of
Figure 5.8 Relatively Balanced rs12913832 SNP reads for a sample.
NGS Data Analysis and Interpretation
67
Figure 5.9 Imbalanced rs1413212 SNP reads per sample.
poor quality. For example, a run in which the positive control standard ran
successfully and a full profile is produced, but one or more of the samples
produce(d) a partial profile or flagged loci would indicate that the issues are
isolated to the sample(s). If the standard and samples fail to produce a full
profile, this indicates that the issues are global to the run and not necessarily
with the samples. The negative controls should have a low or zero read count,
and ideally, the samples should have a high read count.
An additional analysis that can be performed in the UAS is population
statistics. Computed population statistics indicate the probability that the
sample will match an individual at random in a given population. It is not the
probability that the sample will match an individual in the given population
as the sample may not match any individual in the population. Two methods
of computing population statistics are available: Random Match Probability
(RMP) and LR. The typed samples can also be directly compared with the
results displayed in a Venn diagram. Only loci that are typed in both samples
are included in the comparison. Typed STRs and typed iSNPs are compared
and population statistics can be added for the compared loci. Upon selecting
the box for each locus, the allele call, thresholds, number of reads, and allele
sequences are shown side-by-side for both samples. The software will generate an analysis and display discordant STRs and SNPs and show the number
of intersecting loci. Discordant STR and SNP loci will be shown in boxes
below the number. For the population statistics computations, the loci used
in the calculations must have the correct number of alleles (e.g., one or two
for autosomal loci), and the sex chromosome loci must have the appropriate
number of alleles based upon the called sex using the amelogenin gene.
68
Next Generation Sequencing in Forensic Science
The UAS can be used to create Project Level Reports including an
Autosomal STR Genotype Report with tabulated allele calls at each locus.
These are important because CODIS accepts only the allele repeat number.
Additionally, a Sample Genotype Report can be ordered which includes the
actual repeat sequence for each typed or all alleles at a locus with the number
of reads for each. This report includes data for the autosomal STRs, Y STRs, X
STRs, iSNPs, sample history, and settings. The Sample Genotype Report also
includes graphs for each of the categories above, autosomal STRs, Y STRs, X
STRs, and iSNPs.
Additional or tertiary analysis phenotype and ancestry estimation can be
performed for samples to aid in missing persons or cold case identification
or generate investigative leads. The UAS can perform hair and eye color predictions based upon HIrisPlex loci data from the literature and can estimate
biogeographical ancestry using another SNP panel (Liu et al. 2009, Sampson
et al. 2011, Nievergelt et al. 2013, Walsh et al. 2011, 2014). For the UAS to perform the hair and eye color prediction, all of the HIrisPlex loci must be typed.
The phenotype estimation includes a probability for brown, red, blond, and
black hair color and intermediate, brown, and blue eye color. Figure 5.10
shows the prediction for the standard 2800M. The biogeographical ancestry
tool was trained with 1000 genome project data (The 1000 Genomes Project
Consortium 2012, 2015). In addition to the phenotype prediction, the biogeographical ancestry (BGA) can be predicted. Two component principal
component analysis is used to estimate the sample BGA origin. Samples are
estimated into one of the following categories: Ad Mixed American, African,
East Asian, Eurasian, or may locate to a region between or outside of these
Figure 5.10 UAS phenotype estimate for 2800M.
NGS Data Analysis and Interpretation
69
regions on the graph. The distance to the nearest centroid is also given. There
is no minimum number of SNPs required for this prediction. More detail is
provided in section 5.4.
Finally, the STR and flanking regions can also be analyzed to discover
new, unreported variations and/or additional sources of difference between
two very similar samples that were previously undetected by CE as the length
remains the same. The flanking regions are the genetic sequence between
the PCR primers and the STR repeat region or SNPs of interest. The flanking region can be used for differentiation and deconvolution. These alleles of
the same length exhibiting variation in the sequence are termed isoalleles.
The MICM method for calculating match probabilities using forensic NGS
data trims the data to align to the data in the database. ForenSeq sequences
include 15 bp on the 3ʹ end. The user may need to sequence the flanking
regions using Sanger sequencing for the database so that the complete NGS
data set can be used in database and frequency calculations. Generating the
flanking region report results in an Excel file output. To execute this feature,
the project must be analyzed with “flanking regions” enabled in the analysis
method. The SNP flanking region report includes the “rs” identifier for the
locus congruent with SNPedia and reference of variant SNPs.
To ameliorate some limitations with the UAS, the French National Police
developed Programme d’Interprétation Résultats d’Analyses NGS Hautement
Amélioré (PIRANHA) in R using the UAS summary report (xls) text file to
further interpret ForenSeq data.
5.3 ThermoFisher Converge Software
After an Ion series sequencing run is completed, the data analysis begins
immediately at the local Ion Reporter server. Analysis of sequencing data
from libraries prepared using the Precision ID GlobalFiler NGS STR v2
Panel yields the number of STR allele repeats and the base sequence for each
repeat. Analysis of the sequencing data from libraries prepared using the
HID-Ion Ampliseq™ Ancestry Panel and HID-Ion Ampliseq™ Identity Panel
yield the SNP base calls at the loci. The Torrent Server records and stores
the sequencing data. The data remains on the server, and the data can be
viewed on the Ingenuity Variant Analysis in ThermoFisher Ion Reporter™
Software and server. Alternatively, the data can be stored on the cloud and
analyzed remotely by pointing the browser to 10.65.1.14. The front page of
the Torrent Browser has the login screen and four tiles: plan, monitor, review,
and export. Collaborators can access the data via the browser. The local Ion
Reporter Server System can be used if the lab does not want data in the cloud
to comply with the Health Insurance Portability and Accountability Act of
1996 (HIPPA). The software demultiplexes and sorts the data.
70
Next Generation Sequencing in Forensic Science
A run report is generated for each run. The report includes metrics such
as Ion Sphere™ Particle (ISP) Density, ISP Summary, total bases, total reads,
usable reads, key signal, and, on the bottom, an alignment to a human
genome sequence, hg19. The ISP Density is a heat map that shows the relative amounts of DNA that were loaded on to the semiconductor chip for
sequencing. Red indicates that an excess of DNA was loaded into the wells
on the chip. Some DNA loading is represented by yellow while green and
blue indicate almost no loading. The loaded area of the PGM chip will be in
the shape of a triangular prism. The corners will always be blue indicating
no ISPs. The ISP Summary bars indicate how many of the wells were loaded
with ISPs. The third bar is clonal and polyclonal fragments; a typical value
is 70%. The final bar is the usable sequence library. A histogram of reads
and lengths can be exported as a pdf.
Next, once the metrics have been evaluated, the SNP calls can be viewed
using the links to the HID_SNP_Genotyper plugin. Summary information
about the run is listed at the top. The sample name and barcode ID are shown
on the left and results are displayed on the right. Users can download the
targets and hotspots as BED files and the allele coverage data as a csv or pdf
file. The results section has tabs to display population stats and compare profiles. Within the population stats tab are three tabs: map, results, and genotypes. In the Admixture prediction box, the map view shows the highlighted
Admixture prediction map ancestry of the person in question based upon the
data collected using the 151 aiSNPs. The results tab shows the population and
percentage and a bell curve of the log likelihood with the confidence level of
the prediction. The third tab, Genotypes, displays a table that lists the SNPs
and genotype. The population likelihoods dropdown box shows the hotspot
likelihood on a map, and the results list the population, geographical region,
and RMP for the population groups stored in the database based upon the
allele frequencies programmed into the plugin. The plugin can be customized with allele frequencies from another data set as well. As with any tool of
this kind, the prediction is only as accurate as the database it uses. Although
the ethnic groups for each region are broken out, the cluster of groups from
a region lends confidence in the region prediction. Scrolling further down in
the plugin is allele coverage with table and chart tabs and how many SNPs
failed QC checks. The table includes the chromosome number, position, rs
number, number of reads coverage, number of reads for each base, genotype, and number of reads on the forward (positive) and reverse (negative)
strands. The user can customize the plugin settings to set how many reads are
required to make a call and read imbalances. The chart tab displays the data
in graphical form. The user can scroll to the region of interest and show the
SNPs in that region. The stacked bar graph shows the allele calls at each locus,
each with a unique color, and the vertical axis is the coverage. Hovering the
NGS Data Analysis and Interpretation
71
mouse over each bar shows the rs number, total reads, genotype, and number
of reads for each base and percent. The graph can also be altered to display
the results by coverage. The chart and data table can be downloaded to Excel
or another software program.
The HID_STR_Genotyper plugin includes a profile summary, coverage
plots, and locus data and enables users to evaluate the STR typing results.
The profile summary is a clickable list of every sample in the run and an
STR allele table similar to a GeneMapper report table. Highlighting the row
populates the charts below. The default view is analogous to CE run data with
peaks of varying heights for the alleles. The STR allele subtypes are displayed
with vertical bar graphs with the number of reads labeled on each. The peak
height reflecting the read coverage is analogous to the relative fluorescence
units (RFU) in CE. Hovering the mouse over the bar in the graph yields the
allele call and coverage, and a click will display the sequence. In addition to
the sequence, it displays the SNPs in the upstream or downstream flanking
region which are highly useful in increasing the discriminatory power of a
DNA profile. Further down is the locus data in tabular form. There are three
tabs including genotype, sequence histogram, and Integrative Genomics
Viewer (IGV) link. IGV is a program that displays the sequencing data. The
bases across the bottom represent the reference, and the bars reflect the data
collected in the sequencing run. The colors reflect the forward and reverse
strands. Stutter is reflected by 4 bp gaps in the sequence data. Base variants or
mutations are shown with the one-letter representation of the base call. The
RMP computation includes Y-STR markers.
The ThermoFisher Converge™ Forensic Analysis software merges CE
and NGS data into one analysis. Converge is a case management, kinship,
and paternity case tool. Users can review case status in the searchable Case
Dashboard. The Dashboard lists the Case ID, Case Title, Creation Date,
Owner and Priority, and the case overview, comments, and attachments such
as crime scene photos and reports can be accessed by clicking on the case. A
new case can be created with a title and identification number. A pop-up window enables the user to input the case status, notes, priority, and description
and assign the case to an analyst. New subjects and details can also be created
and associated with the same subject or case. Using the Upload Profiles tool,
the user can upload CE and NGS data created with Applied Biosystems kits.
Profiles using one or more CE kits and NGS are tabulated and combined to
produce a composite profile. Empty cells with no allele call are colored red,
and cells with calls from only one or two experiments are colored yellow.
Loci not included in the kit are colored gray. STR allele calls using NGS can
be displayed as graphs tiled five across on the screen. The graphs plot number
of reads versus alleles. Read balance or imbalance can be evaluated at several
loci simultaneously (Figure 5.11).
72
Next Generation Sequencing in Forensic Science
Figure 5.11 GlobalFiler NGS STR panel v2 data viewed with Converge software.
NGS Data Analysis and Interpretation
73
The allele calls are colored green in the graph, stutter peaks are colored light green, and peaks below the stochastic threshold are colored yellow. Flags at the top of the graph including allele number (AN), off-ladder
alleles (OF), peak height ratio (PHR), and controlled concordance (BST)
are green if they pass and red if they do not. Empty graphs signify loci with
no data. The results at each locus can be probed individually. The allele
calls and other peaks are color coded as in the graph, and the coverage and
sequence are presented in a table. Expected alleles and off-ladder alleles
are listed next to the graph of the locus results. The global parameters
and STR thresholds are also shown to the right of the graph. The software
indicates the analytical and interpretation thresholds in two different gray
colors. All of the typed alleles and number of reads are shown in text and
bar graph formats.
Case notes can be uploaded and viewed with DNA and other case data.
The paternity portion incorporates tools for complex kinship analysis. The
library type can be whole genome. For each barcode name, the sample,
mapped reads, mean depth, and uniformity percentage value are displayed. The user can scroll down to HID_SNP_Genotyper_r94 and click
the HID_SNP_Genotyper.html to view the results and output files by barcode name. For each barcode and sample name, the bases, ≥Q20 bases,
reads, mean read length, read length histogram, and files to download the
data (UBAM, BAM, BAI) are shown. Further down are more tabs including plugin summary, test fragments, chef summary, calibration report,
analysis details, support contact, and software version. The user can download .fastq files and upload them for analysis into other apps or third-party
applications.
For kinship and paternity analysis, users can drag and drop subjects and
create pedigree trees on the platform’s whiteboard. The trees can be tested using
the null and alternative hypothesis. The user can adjust settings including
minimum allele frequency computation strategy (e.g., FIVE_OVER_2N),
fixed maximum allele frequency, mutation analysis model (e.g., MAM_
TWO_PHASE), max mutation step, prior probability, population substructure, included ethnicity, conclusion ethnicity, and a box to check to calculate
PI/RMNE. Incorrect subject linking detected by the software will be highlighted. The analysis results include combined LR, number of incompatible
loci, number of loci excluded, conclusion ethnicity, prior probability, posterior probability, probability of exclusion (PE), and random man not excluded
(RMNE). The software can process trio paternity, trio maternity, duo fatherless, and duo motherless cases. There is a conclusion box for the analyst to type
in a conclusion. Electronic reports can be generated and signed electronically
74
Next Generation Sequencing in Forensic Science
by inputting the username and password. The report can be downloaded and
viewed in Microsoft Word.
After the data analysis steps are complete using Converge or the UAS,
the data can be reported following the lab’s SOP and data can be uploaded
to NDIS and CODIS, if applicable. Data generated with the ForenSeq and the
Precision ID kits have been approved to upload to CODIS.
5.4 Phenotype Analysis Using the Erasmus Server
The SNP loci contained in the HIrisPlex-S assay have been developed over
years into the multiplex. The assay predicts hair color, eye color, and skin tone.
If the HIrisPlex-S loci are not fully typed and not called by the UAS, the data
can be analyzed at the Erasmus server if key loci are typed (e.g., rs 12913832).
A few of the key loci can also lead to a prediction (Figure 5.12). Care must be
taken to submit the typed alleles into Erasmus as the alleles vary on the top
and bottom strands among loci in some kits (Table 5.2). The sequencing data
for K562 were inputted into the Erasmus website (Figure 5.13), and the output
was recorded (Figure 5.14). For comparison, Figure 5.15 shows the UAS prediction output for the same sample. From the data presented here, the human
cell line K562 was likely derived from a woman with red hair, brown eyes,
pale to intermediate skin tone and a European ancestry.
DNA
SNP
rs12913832
rs12203592
AA or AG
Brown or
Green
TT
Green
TT
Blue
rs16891982
rs12896399
GG
Blue or
Green
CC
Brown
GG
Brown
Figure 5.12 Eye color prediction tree using SNPs.
CC
Green
TT
Blue
NGS Data Analysis and Interpretation
Figure 5.13 Erasmus SNP input for K562 prediction of phenotype.
75
76
Next Generation Sequencing in Forensic Science
Table 5.2 Key Notes for Inputting UAS Calls to Erasmus Server
ForenSeq/UAS iiSNPs called on opposite strand of HIrisPlex-S
Loci with alternative names in ForenSeq/UAS as HIrisPlex-S
rs 1042602
rs 12821256
rs 12913832
rs 1393350
rs 2378249
rs 683
rs 885479
N29insA = rs 312262906
Y152OCH = rs 201326893
Loci courtesy of Adam Klavens.
Figure 5.14 Erasmus K562 phenotype prediction.
Figure 5.15 UAS biogeographical ancestry and phenotype prediction for K562
prepared with ForenSeq.
NGS Data Analysis and Interpretation
77
5.5 Other Sequence Analysis Software
NGS produces a lot of data that must be analyzed to extract all of the potential information from the samples. NGS raw data can be output in a variety
of formats. Raw NGS data analysis from any of the library prep kits can be
performed using a software pipeline.
All Illumina instruments output NGS data in the .bcl format which
includes base calls per cycle and quality of each call. If samples are multiplexed, demultiplexing using the bcl2fastq program converts .bcl format
to .fastq format. The .fastq format is the sequencing file format used by the
bioinformatics community. It contains sequence data and quality information. It consists of four lines in each read. First, there is a sequence identifier
(e.g., @SeqID), followed by the DNA sequence, then a plus symbol used as a
spacer, and finally the Phred quality score (Q) (e.g., !”AAA***)%??5)))). With
the fluorescent measurements, there is overlap between the colors and potential overlap of the bases and incorrect calls. The user is not shown the fastq
file in UAS, but it is used to produce graphs and tables. FastQC can perform
quality control on fastq files. It provides a summary report with visuals such
as box and whisker plots. It can be used before and after other programs to
evaluate the quality of the data. Raw fastq files generated using the PowerSeq
46GY System Prototype kit can be analyzed in STRait Razor v2.0 (Riman
et al. 2020).
Raw sequence data can be analyzed and compared to other sequences
including the sequence of a reference genome using sequence alignment.
Sequence alignment enables the detection of variants. Analysis programs
include BWA, Bowtie 2, Maq, Stampy, and Novoalign. Maq and Bowtie use
the computational strategy called indexing to organize data using short
sequences in the files. Maq uses spaced seed indexing. The read is divided into
four segments of equal length called “seeds.” Bowtie uses Burrows-Wheeler
transform. It can analyze the full human genome using only 2 GB of memory, whereas Maq would require more than 50 GB of memory to analyze the
sequences efficiently. If a reference sequence is not available, the sequences
can be aligned de novo using programs including ABySS and SOAPdenovo.
The sequences are compared and checked for overlap to build larger contiguous sequences called contigs until a complete contig is prepared for the entire
genome of the organism. The file produced is called a sequence alignment
map (SAM) file. The SAM file is the universal file for genomic sequence data.
The sequence and quality scores of the mapped reads are contained in the file
as well as the location of the reads in the genome. The compressed binary version of the SAM file is the BAM file. Picard is a command-line tool that can
read SAM and BAM files.
78
Next Generation Sequencing in Forensic Science
Variants can be called using additional programs. The mapped data is compared to the reference genome to identify SNPs, SNVs, and INDELs. Genome
Analysis Toolkit (GATK) and SAMtools mpileup are two major programs
for variant calling. They use Bayesian algorithms to compare the sequences.
The data is outputted in .vcf files. Data visualization tools include Integrative
Genomic Viewer or the UCSC Genome Browser.
RNAseq analysis will differ from whole genome analysis as the reads
will only map to the coding regions of the genome. TopHat and STAR programs handle reads split as splicing junctions. Exome-Seq analysis covers
protein-coding genes. Probes are used that bind to these regions. As the
majority of the regions that code for genes that cause diseases are found in
the exome, this data is rich in answers to clinical questions. Since sequencing data and bioinformatics tools can be overwhelming to the casual user,
preconfigured workflows for various analyses including microbial population detection using 16S metagenomics analysis (Chapter 8) is an area of
effort. The analyses can be saved and edited to achieve the lab’s specific
reporting goals.
5.6 Additional Tools for Mixture Interpretation
While the UAS and Converge have flags and tools for mixture analysis, there
are additional tools that could be employed. Mixture Ace is a tool for the
analysis of NGS mixture data. The Parson ISFG format is verbose but easy
for computers to manage. Fastq files are used to produce graphs that look like
electropherograms and relate peak height ratios. Isoalleles are not stacked
as they would be in CE. Families are color-coded and stutter is assigned
the same color as the parent peak. The STR profile in question can be compared to a reference profile to view which loci are exact matches (orange) or
included (yellow).
ArmedXpert™ is another tool for mixture deconvolution. An audit file
is generated as a csv file with the marker, sequences and length included.
Sequence errors in Illumina data are almost always the same length as the
parent allele. Errors can occur in the allele and flanking regions and are identified by low reads. When a homozygous locus is sequenced, by far all of the
reads will be homozygous but it is expected to observe approximately fifty
sequencing errors per 50,000 reads, each in a different area, as the Illumina
platform makes an error approximately once every 1000 nucleotide bases.
These can be insertions or deletions and can be identified by alignment. The
Promega PowerSeq kit (not NDIS approved at this writing) leads to reads
from both directions which is useful in mixture analysis (van der Gaag et al.
NGS Data Analysis and Interpretation
79
2016). Verogen recommends using data of not less than 650 total locus reads
and 10 read minimum. Mixture Ace can deconvolute microhaplotypes for
mixture analysis and contributor ratio.
5.7 Other NGS Sequence Data Analysis Tools
In addition to the UAS and Converge software developed for forensic applications, there are many additional commercial and open source software tools
available to analyze the massive quantities of sequence data produced with NGS.
Although there are too many to cover in this book, we highlight a few below.
There are sequence alignment and presentation tools such as ExPasy
suite of bioinformatics tools, ClustalW multiple sequence alignment, and
MUltiple Sequence Comparison by Log-Expectation (MUSCLE) (a faster and
more accurate replacement for ClustalW2).
Sequence diversity databases and STR search tools include STRbase
2.0 beta, NCBI Search / BioProject for STRs with Forensic kit annotations,
STRscan, lobSTR, toaSTR, HipSTR, POPSeq, and STRSeq. PopSeq is a
human STR sequence diversity database.
STRSeq is a National Center for Biotechnology Information (NCBI) tool
that catalogues “sequence diversity at human identification Short Tandem
Repeat loci” (Gettings et al. 2017).
The new STRbase 2.0 Beta (https://strbase-b.nist.gov/) hosted by NIST is
more user friendly and searchable than the original format. Now, researchers can search and download information for variable STR alleles and other
human markers including allele size ranges and sequence motifs. Much of the
data for Table 5.1 is found in STRbase 2.0. The NCBI website has a feature by
which users can search by kit such as “ForenSeq” to locate STR microsatellite
target repeat sequences and Accession numbers. The miscellaneous features
(misc_feature) include links to highlight the targets in various kits including
ForenSeq, Precision ID, and PowerSeq 46GY in the genome sequence at the
bottom of the page.
STRs are difficult to genotype due to the high mutability of the repetitive
sequences and PCR stutter errors that result in alignment errors. There are
many sequence handling tools aimed at STR analysis. STRScan is a standalone software tool that uses a greedy algorithm for targeted STR profiling
in next generation sequencing data. STRScan (http://darwin.informatics.
indiana.edu/str/) was tested on the whole genome sequencing data from
Venter et al. (2001) and the 1000 Genomes Project published in Nature. The
results showed that STRScan can profile 20% more STRs in the target set that
are missed by lobSTR and STR-FM. STRScan is particularly useful for the
80
Next Generation Sequencing in Forensic Science
NGS-based targeted STR profiling, e.g., in genetic and human identity testing. lobSTR (http://lobstr.teamerlich.org/) is another tool for profiling STRs
from high-throughput sequencing data that performed well with Y-STR data,
and the toaSTR tool (https://www.toastr.de/) can be used to call STRs from
massively parallel sequencing data independent of the instrument platform
and the forensic kit used. HipSTR (Haplotype inference and phasing for
Short Tandem Repeats) (https://hipstr-tool.github.io/HipSTR) was designed
to perform profiling of heritable and de novo STR variations in genome data
using a specialized hidden Markov model to align reads and phase STRs
using phased SNPs (Willems et al. 2017).
Several tools have also been developed for SNP analysis. These include
SNPedia (a SNP data finding tool), ALFRED (a SNP data finding tool),
SNiPlay (a SNP graphics tool), WebLogo (a SNP graphics tool), SNPServer
(a SNP discovery tool), SNPdetector (a SNP detection tool), QualitySNPng
(a SNP detection and visualization tool) (http://www.bioinformatics.nl/
QualitySNPng/), dbSNP (a SNP location and sequence region tool), and
PredictSNP (a tool to predict SNP disease effect). Scientists can locate
gene and variant information for SNPs, including those for eye color, at
SNPedia. The WebLogo tool displays variations in sequence data using the
size and stacking of the nucleotide base letters. SNPs in genomic data can be
located using NCBI’s dbSNP. The ALlele FREquency Database (ALFRED)
(https://alfred.med.yale.edu) contains gene frequency data for human populations and offers graphics to plot allele frequencies worldwide. SNiPlay
is another tool for displaying SNP data in pie charts and distance trees.
SNPServer (http://hornbill.cspp.latrobe.edu.au/snpdiscovery.html) can be
used to locate candidate SNPs. SNPdetector uses the template and primers
to map the primers, locate SNPs and STRs and genotype SNPs (Zhang et al.
2005). PredictSNP (https://loschmidt.chemi.muni.cz/predictsnp/) can classify the effects of nucleotide substitutions on genetic sequence and amino
acids coded for (Bendl et al. 2016). STRUCTURE can be used to analyze
global human genome datasets and generate neighbor-joining Fst trees
(Lawson et al. 2018).
5.8 NGS Validation and Applications
Several labs have tested and published the results of their implementation
of the commercial NGS kits in the past few years. Jäger et al (2017) published the developmental validation of ForenSeq and interpretation using
the UAS. Labs have conducted an internal validation of the Precision ID
NGS Data Analysis and Interpretation
81
70
60
50
40
30
20
10
0
Length
Sequence
DYF387S1
DYS389II
DYS612
DYS385
DYS448
DYS635
DYS390
DYS481
DYS570
DYS548
DYS437
DYS643
DYS392
DYS576
DYS505
DYS19
DYS438
DYS549
DYS393
DYS391
DYS456
DYS522
DYS533
DYS460
DYS439
Y-GATA-H4
DYS389I
Number of Alleles
Y-STRs Observed in NGS and CE
Y-STR Locus
Figure 5.16 Number of alleles for Y-STRs analyzed using CE and NGS.
GlobalFiler™ NGS STR Panel amplification kit with the Ion Torrent S5™
sequencer (Faccinetto et al. 2019, Tao et al. 2019). Another lab evaluated the
HID-Ion AmpliSeq™ Identity Panel using the Ion Torrent PGM™ platform
(Guo et al. 2016) and the Illumina® ForenSeq™ DNA Signature Prep Kit
using the MiSeq FGx™ (Guo et al. 2017). NIST reported upon sequence variation observed in single-source human DNA samples using the PowerSeq
46GY System Prototype kit with the Illumina sequencing platform (Riman
et al. 2020). Becky Steffen and the Applied Genetics Group at NIST recently
reported on the number of Y-STR alleles observed using CE and NGS at
several loci (Figure 5.16); the sequence variations lead to approximately
twice as many alleles at some loci.
The NGS kits contain the Y-STR loci typically found in supplementary
kits and the aSTR and SNP loci (Figure 5.17). The analytical power of a 1 ng
sample is greatly increased by NGS. Thus, NGS has been applied to case studies and compared to CE with difficult samples. For example, old blood samples from a Chinese Han population were typed revealing a high degree of
polymorphism using MPS (Dai et al. 2019). Chemically compromised human
remains from World War II era mass fatality events, including on the USS
Oklahoma, the Battle of Tarawa, and the Cabanatuan Prison Camps, were
analyzed using five DNA typing methods including mitotyping, autosomal
STR typing using CE using two kits, Y-typing, and NGS (Edson et al. 2019).
CE- and MPS-based DNA typing of petrosal bone and other skeletal remains
was reported by two groups for forensic identification (Kulstein et al. 2018,
Liu et al. 2020).
82
Yfiler
Yfiler Plus
PowerPlex PowerPlex
Fusion
Fusion 6C
PowerPlex
VersaPlex PowerPlex
27PY
Y23
24plex GO! 24plex QS
Figure 5.17 Number of alleles for Y STRs analyzed using CE and NGS.
ForenSeq
Precision
ID GF
PowerSeq
46GY
Next Generation Sequencing in Forensic Science
Y-STR locus
DYS19
DYS385 a/b
DYS389I/II
DYS390
DYS391
DYS392
DYS393
DYS437
DYS438
DYS439
DYS448
DYS449
DYS456
DYS458
DYS460
DYS461
DYS481
DYS505
DYS518
DYS522
DYS533
DYS548
DYS549
DYS570
DYS576
DYS612
DYS627
DYS635
DYS643
Y-GATA-H4
GlobalFiler
GlobalFiler Express
NGS Data Analysis and Interpretation
83
Questions
1. List some tools that can be used for analyzing NGS data.
2. Are NGS reads analogous to fluorescence in CE? Explain.
3. How does the software algorithm locate the STR and SNP loci and
assign alleles?
4. What does it mean if the human sequencing control fails?
5. How many reads are needed for high-quality NGS data?
6. What are some issues with NGS data that can complicate data analysis?
7. Explain how sequence polymorphisms can be used in human identification in cases of identical twins.
8. Explain how the prediction of biogeographical ancestry can be used
in a case.
9. Explain how the prediction of eye color, hair color, and skin tone can
be used in a case.
10. Explain the steps to performing random match probability calculations in the software.
References
Bendl, J., Musil, M., Stourac, J., Zendulka, J., Damborsky, J., and J. Brezovsky.
“PredictSNP2: A unified platform for accurately evaluating SNP effects by
exploiting the different characteristics of variants in distinct genomic regions.”
PLoS Computational Biology 12 (May 25, 2016): e1004962. doi:10.1371/journal.
pcbi.1004962.
Dai, W., Pan, Y., Sun, X., Wu, R., Li, L., and D. Yang. “High polymorphism detected
by massively parallel sequencing of autosomal STRs using old blood samples
from a Chinese Han population.” Scientific Reports 9, no. 1 (December 12,
2019): 18959. doi:10.1038/s41598-019-55282-9.
Edson, S.M. “The effect of chemical compromise on the recovery of DNA from skeletonized human remains: A study of three World War II era incidents recovered
from tropical locations.” Forensic Science, Medicine, and Pathology (November
12, 2019). doi:10.1007/s12024-019-00179-2.
Faccinetto, C., Serventi, P., Staiti, N., Gentile, F., and A. Marino. “Internal validation study of the next generation sequencing of Globalfiler™ PCR amplification
kit for the Ion Torrent S5 sequencer.” Forensic Science International: Genetics
Supplement Series 7, no. 1 (December 2019): 336–338. doi:10.1016/j.fsigss.2019.
10.002.
Gettings, K.B., Borsuk, L.A., Ballard, D., Bodner, M., Budowle, B., Devesse, L., King,
J., Parson, W., Phillips, C., and P.M. Vallone. “STRSeq: A catalog of sequence
diversity at human identification Short Tandem Repeat loci.” Forensic Science
International: Genetics 31 (November 2017): 111–117. doi:10.1016/j.fsigen.2017.
08.017.
84
Next Generation Sequencing in Forensic Science
Guo, F., Zhou, Y., Song, H., Zhao, J., Shen, H., Zhao, B., Liu, F., and X. Jiang. “Next
generation sequencing of SNPs using the HID-Ion AmpliSeq™ Identity Panel
on the Ion Torrent PGM™ platform.” Forensic Science International: Genetics 25
(November 2016): 73–84. doi:10.1016/j.fsigen.2016.07.021.
Guo, F., Yu, J., Zhang, L., and J. Li. “Massively parallel sequencing of forensic STRs
and SNPs using the Illumina® ForenSeq™ DNA Signature Prep Kit on the MiSeq
FGx™ Forensic Genomics System.” Forensic Science International: Genetics 31
(November 2017): 135–148. doi:10.1016/j.fsigen.2017.09.003.
Jäger, A.C., Alvarez, M.L., Davis, C.P., Guzmán, E., Han, Y., Way, L., Walichiewicz, P.,
Silva, D., Pham, N., Caves, G., Bruand, J., Schlesinger, F., Pond, S.J.K., Varlaro,
J., Stephens, K.M., and C.L. Holt. “Developmental validation of the MiSeq
FGx Forensic Genomics System for Targeted Next Generation Sequencing
in Forensic DNA Casework and Database Laboratories.” Forensic Science
International: Genetics 28 (May 2017): 52–70. doi:10.1016/j.fsigen.2017.01.011.
Kulstein, G., Hadrys, T., and P. Wiegand. “As solid as a rock-comparison of CEand MPS-based analyses of the petrosal bone as a source of DNA for forensic identification of challenging cranial bones.” International Journal of Legal
Medicine, 132, no. 1 (2018): 13–24. doi:10.1007/s00414-017-1653-z.
Lawson, D.J., van Dorp, L., and D. Falush. “A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots.” Nature Communications 9
(August 14, 2018): 3258. doi:10.1038/s41467-018-05257-7.
Liu, F., van Duijn, K., Vingerling, J.R., Hofman, A., Uitterlinden, A.G, Janssens,
A.C.J.W, and M.H. Kayser. “Eye color and the prediction of complex phenotypes from genotypes.” Current Biology 19, no. 5 (March 10, 2009): R192–R193.
doi:10.1016/j.cub.2009.01.027.
Liu, Z., Gao, L., Zhang, J., Fan, Q., Chen, M., Cheng, F., Li, W., Shi, L., Zhang, X.,
Zhang, J., Zhang, G., and J. Yan. “DNA typing from skeletal remains: a comparison between capillary electrophoresis and massively parallel sequencing
platforms.” International Journal of Legal Medicine 134, no. 6 (November
2020): 2029–2035. doi:10.1007/s00414-020-02327-8.
Nievergelt, C.M., Maihofer, A.X., Shekhtman, T., Libiger, L., Wang, X., Kidd, K.K.,
and J.R. Kidd. “Inference of human continental origin and admixture proportions using a highly discriminative ancestry informative 41-SNP panel.”
Investigative Genetics 4 (July 1, 2013): 13. doi:10.1186/2041-2223-4-13.
Riman, S., Iyer, H., Borsuk, L.A., and P.M. Vallone. “Understanding the characteristics
of sequence-based single-source DNA profiles.” Forensic Science International:
Genetics 44 (January 2020): 102192. doi:10.1016/j.fsigen.2019.102192.
Sampson, J., Kidd, K.K., Kidd, J.R., and H. Zhao. “Selecting SNPs to identify ancestry.” Annals of Human Genetics 75, no. 4 (July 2011): 539–553. doi:10.1111/j.
1469-1809.2011.00656.x.
Tao, R., Qi, W., Chen, C., Zhang, J., Yang, Z., Song, W., Zhang, S., and C. Li. “Pilot
study for forensic evaluations of the Precision ID GlobalFiler™ NGS STR
Panel v2 with the Ion S5™ system.” Forensic Science International: Genetics 43
(November 2019): 102147. doi:10.1016/j.fsigen.2019.102147.
The 1000 Genomes Project Consortium. “An integrated map of genetic variation from
1,092 human genomes.” Nature 491 (November 1, 2012): 56–65. doi:10.1038/
nature11632.
NGS Data Analysis and Interpretation
85
The 1000 Genomes Project Consortium. “A global reference for human genetic variation.” Nature 526 (October 1, 2015): 68–74. doi:10.1038/nature15393.
van der Gaag, K.J., de Leeuw, R.H., Hoogenboom, J., Patel, J., Storts, D.R., Laros, J.,
and P. de Knijff. “Massively parallel sequencing of short tandem repeats-population data and mixture analysis results for the PowerSeq™ system.” Forensic
Science International: Genetics 24 (September 2016): 86–96. doi:10.1016/j.
fsigen.2016.05.016.
Venter, C.J., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith,
H.O., Yandell, M., Evans, C.A., Holt, R.A., and J.D. Gocayne. “The sequence
of the human genome.” Science 291, no. 5507 (February 16, 2001): 1304–1351.
doi:10.1126/science.1058040.
Walsh, S., Lui, F., Ballantyne, K.N., van Oven, M., Lao, O., and M. Kayser. “IrisPlex:
A sensitive DNA tool for accurate prediction of blue and brown eye colour in
the absence of ancestry information.” Forensic Science International: Genetics
5, no. 3 (June 2011): 170–180. doi:10.1016/j.fsigen.2010.02.004.
Walsh, S., Chaitanya, L., Clarisse, L., Wirken, L., Draus-Barini, J., Kovatsi, L., Maeda,
H., Ishikawa, T., Sijen, T., de Knijff, P., Branicki, W., Liu, F., and M. Kayser.
“Developmental validation of the HIrisPlex system: DNA-based eye and hair
colour prediction for forensic and anthropological usage.” Forensic Science
International: Genetics 9 (March 2014): 150–161. doi:10.1016/j.fsigen.2013.12.006.
Willems, T., Zielinski, D., Yuan, J., Gordon, A., Gymrek, M., and Y. Erlich.
“Genome-wide profiling of heritable and de novo STR variations.” Nature
Methods 14 (April 24, 2017): 590–592. doi:10.1038/nmeth.4267.
Zhang, J., Wheeler, D.A., Yakub, I., Wei, S., Sood, R., Rowe, W., Liu, P.P., Gibbs, R.A.,
and K.H. Buetow “SNPdetector: A software tool for sensitive and accurate
SNP detection.” PLoS Computational Biology (October 28, 2005). doi:10.1371/
journal.pcbi.0010053.
6
Next Generation
Sequencing
Troubleshooting
6.1 Troubleshooting NGS Sequencing
Following the sequencing run, the next generation sequencing (NGS) data
can be interpreted and sufficiently complete profiles can be uploaded to the
National DNA Index System (NDIS). As with CE, issues may arise from time
to time with NGS. There are several points in the protocol and process in
which errors or issues can arise and result in a partial profile or none at all.
These include kit storage conditions, thermal cycler ramping rate or performance, processing time for normalization steps, and room temperature for
library preparation and sequencing. Issues can also arise from analyst errors
or deviating from manufacturer or validated protocols, leaving streaks on the
flow cell, improper instrument settings, mechanical or software failure, or
from using expired kits or recalled lot numbers. Additional issues may occur
in the manufacturing process and may be determined by lot numbers. Trace
and low-quality samples due to degradation, impurities, or random breakage
in the DNA can all lead to problems in sequencing. Data transfers may be
delayed if the instrument software loses connectivity with the instrument.
Processing samples with a different PCR-based STR typing kit followed by
CE can help resolve an issue with the samples or NGS kit or process.
6.2 Troubleshooting MiSeq FGx Instrument Failure
The Illumina MiSeq has demonstrated consistently to be a reliable and
robust instrument which supports Verogen’s decision to adopt the MiSeq and
upgrade it to the MiSeq FGx instrument for forensic applications. However,
as with all machines with mechanical parts, failures can and do happen, even
if they are rare. Troubleshooting some of these possible issues will be the
focus of this section.
When upgrading an existing MiSeq instrument to a MiSeq FGx, the software update will add a new screen from which a user can select research
use only (RUO) or forensic application. Each existing user needs to create
a new username and password in the Verogen Universal Analysis Software
(UAS) to login, even if using the instrument in RUO mode. The new server
must remain connected for the instrument to be able to run. One issue is
DOI: 10.4324/9781003196464-6
87
88
Next Generation Sequencing in Forensic Science
the server can lose connectivity; the problem can be addressed by rebooting the server. If the instrument loses the connection while data is being
transferred to BaseSpace, it must also be restarted to initiate the transfer.
Data from sequencing runs using ForenSeq is automatically transferred to
the UAS. Users can initialize the instrument manually using the MiSeq Test
Software by pressing the spinning arrows icon or shutdown the instrument,
waiting a minute and restarting the instrument and again trying to initialize
the instrument. Another issue that may occur is the MiSeq FGx Y-stage fails
to return to the home position. The user can attempt to return the Y-stage
to the home position via the MiSeq Test Software by selecting the icon with
gears (Figure 6.1). Unfortunately, if the issue persists through the MiSeq Test
Software, Verogen should be consulted for replacement by a field engineer. If
the sequencing run fails to start, the user can wait a minute and restart and
try to start the run again. Similarly, if the MiSeq FGx instrument gets stuck
at a screen position during the wash step and the stop button does not stop
the wash, the instrument will need to be rebooted by toggling the switch to
off, waiting a minute, and toggling it to the on position. After rebooting the
instrument, a maintenance wash should be performed. When loading consumables, the barcode may not be found or the user may have inserted an
expired consumable. Replacing the consumable or manually overriding the
warning will allow the user to initiate the run. After the sequencing is completed, the status of the HSC or PhiX control (RUO mode) should be evaluated to determine if they passed/failed. Failure of the controls could indicate
an issue with the integrity of the control itself or a problem with the library
preparation process. The sequencing run can be repeated or the library prep
Figure 6.1 MiSeq FGx Y-stage home error.
Next Generation Sequencing Troubleshooting
89
Figure 6.2 MiSeq FGx camera focus error.
and sequencing can be repeated. If the HSC/PhiX control passes, but the
phasing/prephasing/clustering flags are raised, the user can repeat the library
preparation steps with high quantity input DNA and perform the bead-based
steps more quickly with fewer samples in a batch to reduce primer dimers
and increase sample target amplicons. If the problem continues after above is
performed, maintenance should be performed to refocus the camera collecting the raw fluorescence data (Figure 6.2).
6.3 Troubleshooting MiSeq FGx Run Failure
Several problems can lead to issues and orange or red flags in the Verogen
MiSeq FGx sequencing run (ForenSeq™ DNA Signature Prep Reference
Guide). If a low cluster density or high cluster density with a low cluster passing filter % is detected, an issue with library normalization or quantity of
input DNA is indicated. Adding too much or low quantity of input DNA,
an incorrect volume of beads in the library normalization step or processing the samples too slowly during the bead-based steps may result in the
analyst needing to repeat the library preparation steps and the sequencing.
The flags may result from issues with PCR ramp failure or improper temperature program. Short or no amplicons and primer dimers indicate that the
library preparation steps need to be redone with the correct PCR ramp settings and/or a working thermocycler. A low cluster density may indicate too
little DNA input, a failure in the library prep steps, or that the diluted library
was insufficiently heated prior to loading to the cartridge for sequencing.
90
Next Generation Sequencing in Forensic Science
These issues were previously discussed in the library preparation section of
Chapter 3 and can be diagnosed using an agarose or polyacrylamide gel or
using a Bioanalyzer or QIAxcel instrument. Issues with high phasing can
be caused by environmental issues such as the room temperature being too
high. The instrument run files can be helpful in troubleshooting (e.g., D:\
Illumina Maintenance Logs->Temperature Log Chiller or ->Temperature
Log flow cell). High prephasing often indicates the need for the instrument
to undergo a maintenance wash. Best practices include performing sufficient
wash steps prior to each sequencing run and refilling the wash tray and bottle after every wash using fresh Tween 20 and bleach wash solutions. Reads
produced with sequence runs that produce orange passing quality metrics
often produce results that are sufficient to be used for analysis. However, this
could indicate that the library preparation steps need to be repeated and the
samples need to be resequenced, or that a new flow cell and cartridge should
be used to resequence the samples.
To obtain the best possible profile for each sample, it is very important
to follow the protocol as instructed by the manufacturer for optimal performance of the ForenSeq kit. Errors in these steps will result in poor sequencing runs. To avoid contamination, aerosol-resistant, filter tips should be used
and changed between reagents and samples. The no template control (NTC)
should be free of allele calls. Reagents should be mixed well but the master
mix should not be vortexed. Plates should be sealed with film and mixed
following the protocol. For example, for optimal PCR performance, reaction components must be well mixed in the liquid and sealed tightly to avoid
evaporation. If the PCR amplification steps did not work properly and adaptors are added but the targets are not sufficiently enriched, adapter dimers can
form in the library normalization step and be carried through to sequencing
at an unusually high proportion to the amplified product. If adaptor dimers
are formed as determined by sizing, the sample can be repurified or reprocessed though library prep. Some protocol deviations may result from laboratory limitations and existing tools and ancillary instruments. For example,
the ForenSeq library preparation steps will still succeed if a shaker that is
limited to 1500 rpm is substituted for one that performs up to 1800 rpm.
Similarly, setting a 4% ramp on the Veriti thermocycler is not optimal but
will not cause the amplification to fail entirely. In contrast, an issue with the
kit components or the PCR 1 step can cause PCR 1 to fail. If PCR 2 is successful, the adaptors and indexes are added to the forward and reverse tags. In the
sequencing run, the quality metrics may all pass, but the only data would be
of the sequenced adaptors because the target was not successfully amplified
in PCR 1, so no alleles will be called. It is extremely important to check the
amplicon sizes after PCR 1 and PCR 2 using a diagnostic gel, Bioanalyzer,
tape station, or fragment analyzer with a suitable ladder to check that the
Next Generation Sequencing Troubleshooting
91
process is proceeding normally to avoid unnecessary time and cost in proceeding with steps that will fail. Correctly amplified targets will be approximately 800 bp. Small base pair fragments indicate an issue to be examined.
As previously described, working quickly through the bead-based steps
for purification and normalization is essential for optimal performance.
Thus, it is best to work with a modest number of samples when performing library preparation. The magnetic beads should be warmed to room temperature prior to use. When working with the magnetic beads for the library
purification and normalization steps, multichannel pipettes should be used
to ensure consistent volumes and quick processing of samples through these
steps. The magnetic beads settle quickly and should be mixed well to resuspend the beads prior to pipetting and pipettes should be checked to see if they
are drawing up equal quantities of beads. When the samples are mixed with
the beads, they should be mixed thoroughly to ensure complete binding. The
supernatant should be drawn off slowly to avoid sample loss by pipetting the
DNA-bound beads. The beads should be washed with freshly prepared 80%
ethanol. The ethanol should be removed completely from the beads and care
should be taken to avoid drawing up the beads. However, care must be taken
so that the beads do not dry out between the ethanol washes and DNA elution steps.
To address the run failure problem further, the data that can be recovered
from a failed sequencing run will vary depending upon when the run failed.
If Read 1 completes but Index 1, Index 2, and Read 2 fail, the data for Read 1
will be lost because the samples are multiplexed and, without the index reads,
cannot be demultiplexed (Figure 6.3). However, if Read 1, Index 1, and Index
Figure 6.3 MiSeq FGx run failure viewed in UAS.
92
Next Generation Sequencing in Forensic Science
2 complete normally, but Read 2 fails, the Read 1 data can be demultiplexed
and assigned. The run will lack the dual confirmation from Read 2. A low
number of reads or cluster density overall may indicate a flow cell issue with
a poor-quality oligo lawn, low template DNA input, or processing issues with
the bead steps during library preparation.
Expectations should be calibrated based on the quality and quantity of
the samples. As with the DNA standard tested with the samples, high-quality
samples with at least 1 ng of DNA in 5 μL added to PCR1 should be expected
to yield a full profile. The full nanogram of input DNA should be used, if possible. Samples can be concentrated to achieve the recommended 0.2 ng/µL
input concentration. Standards including 2800M and the human sequencing
control (HSC) should pass all checks, and a full profile should be achieved
with the 2800M to have confidence in the data quality for the rest of the samples. The NTC should have no allele calls. Female samples should be devoid
of detectable Y markers. If the MiSeq FGx quality flags turn orange or red,
the sequencing run may have to be repeated. However, samples known to be
degraded or contain PCR inhibitors as indicated from the quantification step
should not be expected to yield full profiles using a traditional CE DNA typing method or NGS. Adding more DNA input may be impossible based upon
the evidence sample and DNA yield. Low template samples may not yield
full profiles, and a consensus profile may need to be determined from several
runs (Table 6.1).
6.4 Troubleshooting Ion Series Run Failure
The mechanical parts and consumables are all subject to failure or quality assurance (QA) issues. As with the MiSeq FGx, the ThermoFisher Ion
series instruments have moving parts that can get stuck or fail to initialize.
Manufacturing plants can experience QA issues. The aforementioned thermocycler failures, analyst errors, instrument failures, and consumable issues
can occur with a kit from any manufacturer.
If upon analyzing data from a sequencing run, a base variant is encountered that has not previously been reported or the NGS data is questioned
or questionable for any reason, Sanger sequencing with the ThermoFisher
BigDye Direct sequencing kit can be used to confirm sequence variants identified in NGS runs. In the Ion Reporter software, the user can select “Order
CE primers” or use the ThermoFisher Primer Designer web tool to design the
Next Generation Sequencing Troubleshooting
Table 6.1
93
Troubleshooting the MiSeq FGx Instrument and Sequencing Runs
Problem
Solution
PCR ramp failure or
improper temperature
program
MiSeq FGx instrument
does not initialize
MiSeq FGx Y-stage does
not return to home
Redo library preparation steps with the correct ramp settings and
working thermocycler
Sequencing run is not
starting
Instrument is not
connecting to server
Barcode not found or
expired consumable
Run failed during
sequencing
Low cluster density –
orange or red flag
Run MiSeq Test Software by pressing the gears icon or shutdown,
wait a minute and restart and retry
Run MiSeq Test Software by pressing the spinning arrows icon
and call Verogen for maintenance if the software does not bring
it to the home position
Shutdown instrument, wait a minute and restart and retry
Reboot server
Replace or manually override warning
Check HSC or PhiX control (RUO) to determine if they pass/fail
If the HSC/PhiX control passes, redo library preparation steps
with high quantity input and perform bead-based steps more
quickly with fewer samples in a batch to reduce primer dimers
If the problem continues, maintenance should be performed to
refocus the camera
Increase DNA input quantity and redo library preparation or
retry with carefully heating samples prior to loading them onto
the cartridge or rerun with a new flow cell to address
microfluidics issues
Check room temperature and reduce if needed
High phasing – orange or
red flag
High prephasing – orange Perform maintenance wash with freshly-prepared solutions
or red flag
High phasing, prephasing, Redo library preparation
or low cluster density
PCR primers for the loci of interest. The PCR primers need to be designed with
M13 tails. The Next Generation Sequencing Confirmation (NGC) module in
the cloud can be used to compare the Sanger sequencing and NGS results
using the PCR primers or Assay ID and the .vcf file, respectively (Precision ID
GlobalFiler™ NGS STR Panel v2 with the HID Ion S5™/HID Ion GeneStudio™
S5 System Application Guide).
94
Next Generation Sequencing in Forensic Science
Questions
1. List issues that may arise during the sequencing run on the instrument and cause it to fail.
2. List problems that can occur if the sample residence time is too long
during the bead-based normalization steps.
3. List outcomes that may be observed with incorrect PCR ramp, temperature, and hold settings.
References
ForenSeq™ DNA Signature Prep Reference Guide. August 2020. Accessed May 21, 2021.
https://verogen.com/w p-content/uploads/2020/08/forenseq-dna-signatureprep-reference-guide-VD2018005-c.pdf.
Precision ID GlobalFiler™ NGS STR Panel v2 with the HID Ion S5™/HID Ion
GeneStudio™ S5 System Application Guide. Revision 15 November 2018.
Accessed May 21, 2021. https://assets.thermofisher.com/TFS-Assets/LSG/
manuals/MAN0016129_PrecisionIDSTRIonS5_UG.pdf.
Mitochondrial DNA
Typing Using Next
Generation Sequencing
7
7.1 Introduction to Mitochondrial DNA Typing
Although crime labs primarily collect short tandem repeat (STR) nuclear
DNA profiles for forensic DNA typing because they allow for superior statistical analysis and human differentiation, mitochondrial DNA (mtDNA)
profiles provide forensic scientists with a valuable tool for identifying maternal lineages, performing genetic genealogy, and identifying individuals when
the DNA recovered is damaged, degraded, or too low in quantity for STR
analysis (Wallace et al. 1999, Eduardoff et al. 2017, Rathbun et al. 2017, Strobl
et al. 2019). Mitochondrial DNA typing is valuable for analyzing challenging
samples and providing investigative leads in missing persons cases (Cuenca
et al. 2020), mass disaster cases (Biesecker et al. 2005), cold cases, and historic
investigations (Hickman et al. 2018, Buś et al. 2019, Ambers et al. 2020); evaluating mother-child pairs (Ma et al. 2018); and differentiating monozygotic
twins (Wang, Zhang et al. 2015, Wang, Zhu et al. 2015). Mitochondrial DNA
typing has been used to analyze hair shafts, bone, teeth, and degraded or
compromised samples, and to analyze DNA adhering to an earphone (Ivanov
et al. 1996, Holland and Parsons, 1999, Seo et al. 2002, Remualdo and Oliveira
2007, Chaitanya et al. 2015, Lee et al. 2015, Parson et al. 2015, Marshall et al.
2017, Gallimore et al. 2018, Gaag et al. 2020, Kim et al. 2020).
Mitochondrial DNA typing has been performed in cases since 1996
(Ivanov et al. 1996). The mitochondrion is an unusual organelle that possesses its own genome outside of the cell’s nucleus. Its genome is comprised
of 16,569 bp (Anderson et al. 1981). Its small circular structure means it often
remains intact when nuclear DNA is degraded. Human egg and sperm cells
contain mitochondria; however, upon fertilizing an egg, the sperm mitochondria are degraded by an endonuclease (Chan and Schon 2012). While
there is only one copy of the nuclear DNA genome in each cell, there can
be as many as one to fifteen copies of the mitochondrial chromosome per
mitochondria, up to 1000 mitochondria per cell, and therefore as many as
hundreds to thousands of copies of the mitochondrial genome in each cell
(Budowle et al. 2003). A DNA sample with no detectable copies of genomic
DNA can have over 20,000 copies of mitochondrial DNA (Parson et al. 2015).
Mitochondrial DNA typing can be used to determine if bones and teeth
derive from the same or a different skeleton or family. Thus, the high copy
DOI: 10.4324/9781003196464-7
95
96
Next Generation Sequencing in Forensic Science
number of mitochondrial chromosomes in cells makes them ideal for genetic
analysis and use as a matrilineal screening tool.
7.2 The Sequence of the Mitochondrial Chromosome
Routine sequencing or variable position typing of mtDNA has enabled its
use in forensic DNA screening assays and haplotype determination. The
mitochondrial chromosome originally sequenced in 1981 is referred to as
the Anderson sequence or Cambridge Reference Sequence (CRS) (Anderson
et al. 1981). With improved DNA sequencing technologies, the same sample was resequenced in 1999 and is referred to as the revised CRS (rCRS)
(NCBI NC_012920) (Andrews et al. 1999). Resequencing the CRS addressed
technical errors that had been flagged over the years through public inquiry.
Base changes in the CRS were identified by position number. The variations
between the CRS and rCRS are listed in Table 7.1.
The mitochondrial chromosome is largely conserved as it contains genes
that code for proteins essential for metabolism; mutations can lead to some
diseases. It includes thirty-seven genes that encodes twenty-two tRNAs, two
rRNAs, thirteen proteins involved in oxidative phosphorylation, and a 1122
bp “control” region of non-coding DNA (Butler 2005). The control region
is not known to code for any medically or phenotypically significant genes.
The mitochondrial chromosome does not contain STRs for use in identity
typing. However, variation in the mitochondrial genome has been studied
extensively. The control region has been found to contain many SNPs, and
Table 7.1 Variations between the CRS and rCRS Mitochondrial
Chromosome Sequences
Base Position
CRS
rCRS
Comment
311–315
3106–3107
3423
4985
9559
11,335
13,702
14,199
14,272
14,365
14,368
14,766
CCCCCC
CC
G
G
G
T
G
G
G
G
G
T
CCCCC
C
T
A
C
C
C
T
C
C
C
C
5C instead of more common 6C
sequencing error, 3107del*, gap denoted by N
sequencing error
sequencing error
sequencing error
sequencing error
sequencing error
sequencing error
error due to bovine DNA
error due to bovine DNA
sequencing error
error due to HeLa DNA
Source: Data from Andrews et al. (1999).
Mitochondrial DNA Typing
97
additional SNPs have been found scattered in the base sequence between the
coding genes. The SNP variations have found uses in matrilineal typing and
to distinguish individuals (Coble et al. 2004, Warner et al. 2006, Fridman
and Gonzalez 2009, Holland et al. 2018). Mitochondrial DNA typing can
be used to determine if bones and teeth derive from the same or a different skeleton, family relationships, and maternal relatives in genetic genealogy (Bruijns et al. 2018). Mitochondrial DNA typing was used to assign the
remains interred in the Tomb of the Unknown Soldier at Arlington National
Cemetery to First Lieutenant Blassie who served in the Vietnam War and
died in 1972 (Butler 2005). Within the control region, there are three regions
that have been found to contain the most variation and are referred to as the
highly or hyper variable (HV) regions I, II and III. (Fridman and Gonzalez
2009). The most common variations are shown in Table 7.2. Data from HV1
and HVII are most often used to differentiate individuals and families, with
HVIII used to resolve indistinguishable HV1/HV2 samples (Budowle et al.
1999, Bini et al. 2003, Fridman and Gonzalez 2009). Other SNPs are located
between and around these three hypervariable regions. As mtDNA mutation
rates are ten to twenty times higher than nuclear DNA genes due to the low
fidelity of the mtDNA polymerase and its lack of repair mechanisms, it can be
used as an ancestral clock (Wallace et al. 1999, Budowle et al. 2003). The average nucleotide variation in these regions is estimated at 1.7% so variations are
clocked in only a few generations.
The frequencies of variation at these sites has been studied extensively
and are cataloged in the MITOMAP database (MITOMAP). MITOMAP
contains a compilation of mtDNA SNP data from diverse populations
worldwide and includes the observation frequency of SNPs. A mtDNA
Table 7.2 Frequently Probed Mitochondrial DNA SNP Positions in the Variable
Regions (HVI, HVII, and HVIII) and Outside the Control Region (Other)
HVI (16,024–16,365)
16,051
16,093
16,126
16,129
16,223
16,270
16,278
16,304
16,309
16,311
16,362
HVII (73–340)
HVIII (438–574)
Other
73
146
150
152
189
195
198
200
247
310
477
489
3010
4580
4793
5004
7028
7202
10,211
12,858
14,470
16,519
98
Next Generation Sequencing in Forensic Science
mutation occurs approximately once every 8000 years. An mtDNA haplogroup is defined by differences in the mtDNA sequence. A haplogroup
may vary by only one SNP from another haplogroup and are named with
letters from A to Z in order of their discovery. The International HapMap
Project seeks to develop a haplotype map of the human genome and can be
used to find genetic variations implicated in disease and geographic genetic
origins.
7.3 Mitochondrial DNA Typing Methods
Prior to NGS, mtDNA SNPs were detected using one or more of several
available assays including a primer extension assay (Vallone et al. 2004),
mini-sequencing assays (Gabriel et al. 2001), SNaPshot assays (Quintáns
et al. 2004), denaturing high performance liquid chromatography (LaBerge
et al. 2003), a three-dye fluorescence labeling mitochondrial-SNP kit called
Expressmarker mtDNA-SNP60 (Zhang et al. 2018), and polymerase chain
reaction (PCR) high resolution melt (HRM) analysis (Dobrowolski et al. 2009,
Elkins 2013). A drawback of these methods is the limited sequence information and discriminating power. Depending upon the approach and the SNPs
of interest, mtDNA typing could take a few hours to a couple weeks using
these methods. Human mtDNA standard reference material (SRM) 2392 and
SRM 2392-I have been produced by the National Institute of Standards and
Technology (NIST) for use a positive control in amplification and sequencing
(Levin et al. 2003).
7.4 Mitochondrial DNA Typing Using
Next Generation Sequencing
NGS using kits and methods for control region and whole mitochondrial
chromosome sequencing offer more discrimination power than previous methods. Additionally, applying NGS to mtDNA forensic DNA typing
for bone, teeth, and hair human remains and degraded DNA samples for
which only partial or no STR profile was recovered allows forensic scientists
to obtain some genetic information on these samples. For example, Ambers
et al. (2006) applied NGS to human remains found in Deadwood, South
Dakota and determined the skeleton to be of European (Caucasian) ancestry,
concordant with the anthropological findings. There are several commercial
and published approaches for mitochondrial DNA typing using NGS including the AFDIL method (Fendt et al. 2009), the ForenSeq mtDNA Control
Region Kit (Verogen), ForenSeq mtDNA Whole Genome kit (Verogen), the
Mitochondrial DNA Typing
99
QIAseq Human Mitochondria Panel (Qiagen), QIAseq Investigator Human
Mitochondria Control Region Panel (Qiagen), Precision ID mtDNA Control
Region Panel (ThermoFisher Scientific), and the Precision ID mtDNA Whole
Genome Panel (ThermoFisher Scientific). The commercial library preparation
kits include all of the required multiplexed primer sets, PCR reaction mixes,
control standards, indexes, purification reagents, normalization beads, and
buffers. AFDIL’s published approach details their method and researchers
can obtain all of the required library preparation primer sequences, reagents,
and consumables from suppliers and replicate the process in their labs at low
cost (Fendt et al. 2009). SRM 2392 consists of three mitochondrial genome
samples. SRM 2392-I consists of the HL-60 cell line. These standards have
been fully typed. Control standard samples are used to determine concordance of the NGS results with sequence data produced by Sanger sequencing,
and preliminary evaluations of the commercial kits have demonstrated them
to be accurate in their performance.
In cases in which samples are damaged, old or compromised, little DNA
may be recovered and it may be of low quality. Scientists can frequently obtain
mtDNA profiles from less than a picogram of total DNA. Working with such
low quantities of DNA requires a very clean lab and commitment to contamination elimination measures. Prior to proceeding to library preparation, the
co-extracted nuclear DNA may need to be digested to avoid interference with
the mtDNA assay primer set target amplification. DNase I selectively digests
nuclear DNA. Restriction enzymes, such as HaeIII, CfoI, or MspI, that selectively digest GC-rich (nuclear) DNA can be used. Relatively large fragments
of multi-copy mtDNA result. Alternatively, for degraded and low-quantity
templates and as an alternative to DNA digestion, whole genome amplification (WGA) can be performed prior to library prep to enrich the mtDNA targets. One such kit is the Qiagen REPLI-g Mitochondrial DNA kit. REPLI-g
can be used to amplify human and non-human mtDNA and increase the
sensitivity of NGS.
The ForenSeq mtDNA Control Region Kit targets the control region
range 16008–594 (16008–16569 and 1–594) (Walichiewicz et al., 2019). The
kit employs two primer sets of 122 primers to generate 18 primary amplicons less than 150 bp in length spanning and overlapping in the mtDNA
control region where the majority of the variation is located and is based on
research performed by McElhoe et al. (2014). To ensure there are no gaps in
the sequence when aligning the sequences using bioinformatics, all of the
amplicons overlap by ≥3 bp. If desired, custom primers can also be integrated
in the platform. The recommended DNA input is 50 pg for each initial PCR
reaction or a total input of 100 pg of gDNA per sample. Successful profiles
have been demonstrated with a 12 μL maximum input of DNA extract from
teeth, bone, or buccal cells and 0.5 cm of hair shaft (Gallimore et al. 2018).
100
Next Generation Sequencing in Forensic Science
The included positive control is HL-60. As with the ForenSeq Signature
Prep kit described in Chapter 3, the library preparation steps amplify and
enrich the target regions and add the forward and reverse tags and the adaptor and index sequences. PCR1 leads to enrichment of the targets and addition of the forward and reverse tags in two reactions for each of the samples.
During PCR2, the PCR1 products are pooled, and the i5 and i7 adaptors and
indexes are added. The average primary amplicon size is 118 bp, while some
are as small as 61 bp and the largest amplicon is 458 bp. The short amplicons are intended to lead to optimal results when amplifying degraded DNA.
To check the quantity, size, and quality of the libraries, the samples can be
analyzed using an agarose or polyacrylamide gel or using the Agilent DNA
1000 kit using the Agilent 2100 Bioanalyzer system. Following the PCR1
and PCR2 library preparation steps, either of two methods can be used for
library normalization: a bead-based normalization (BBN) and a fluorimetric
quantification-based normalization. Normalization is performed to achieve
more equal cluster densities on the flow cell and therefore better detection
of each sample upon pooling. Unfortunately, even though all or most of the
low-quantity samples will bind the beads, normalizing the high-quantity samples with the low-quantity samples can lead to a lowered cluster density of all
samples. The beads have a maximum binding capacity, and the high quantity
will be reduced when the beads reach their binding capacity. The bead-based
method is the standard method for high-throughput and can be automated.
The quantification-based method is low-throughput but can improve sample
representation for low input samples (<20 pg), which result in lower overall
library yield. In a recent study, smaller number of samples in a run, manual
quantitation and normalization led to four times greater coverage per sample
as compared to BBN with no adverse effects to read calling (ISHI poster). In
the bead-based normalization procedure, the prescribed quantity of beads
are added to each sample. The bead-based normalization can be performed
using the beads included in the ForenSeq kit or a different kit such as the
Nextera® XT DNA Sample Preparation kit can be used. The Verogen method
employs Illumina technology reaction mix cartridges and flow cells, and the
DNA fragments are sequenced using the Illumina MiSeq instrument that has
been upgraded to the Verogen MiSeq FGx model. According to the manufacturer, up to 48 mtDNA D-loop HV samples can be multiplexed in a sequencing run on a MiSeq Micro Reagent Kit v3 flow cell (2 × 151 cycle run with dual
index reads) and >48 can be sequenced on a standard flow cell. The amplicon start-end locations are 29–285, 172–408, 16997–16236, and 16159–16401.
The mtDNA whole genome samples are sequenced using a 2 × 251 cycle run
with dual index reads (amplicon start-end: 9397–1892 and 15195–9796). The
samples and indexes are assigned in the UAS. Each sequencing run completes
in approximately 18 hours.
Mitochondrial DNA Typing
101
There are two mtDNA NGS options available from ThermoFisher: the
Precision ID mtDNA Whole Genome Panel kit and the Precision ID mtDNA
Control Region Panel kit. Both are available in 48 or 96 sample options. The optimal input DNA for the Precision ID mtDNA Whole Genome Panel kit is 125 pg
but as little as 2 pg of DNA can be used for it and the control region kit. The whole
genome panel consists of 2 primer pools of 81 primers each. The mtDNA Whole
Genome Panel kit employs a tiling approach employing 162 primer sets in two
reactions and yields amplicons of only 163 bp, on average, in length with an average overlap of 11 bp. The Precision ID mtDNA Control Region Panel kit uses the
same tiling approach with an average amplicon length of 153 bp and an average
overlap of 18 bp. The control region kit targets the 1.2-kb control region (16024576) which encompasses the HVI, HVII, and HVIII regions rather than the full
16,569 bp genome. NGS chip preparation can be performed on the Ion Chef
or prepared manually and sequenced on an Ion GeneStudio S5 System instrument. When using the Ion Chef, there are only five pipetting steps and forty-five
minutes of hands-on time as the rest is performed by the robot. On an Ion 510
Chip, thirty-seven samples can be run prepared using the mtDNA control region
library. On an Ion 520 chip, fifty-six samples can be run simultaneously. When
working with a mtDNA whole genome library, twenty-five samples can be run on
the Ion 520 chip and thirty-two samples can be run with the Ion 530 chip.
Qiagen’s whole genome and control region NGS approaches, QIAseq
Human Mitochondria panel and QIAseq Investigator Human Mitochondria
Control Region panel, are built upon primers developed at AFDIL; the targets are amplified using the Qiagen Multiplex PCR Kit and can be sequenced
using any Illumina NGS platform including the HiSeq, NextSeq, MiSeq, or
MiniSeq. The QIAseq 1-Step Amplicon Library Kit is used to prepare DNA
libraries for NGS applications. The Qiagen GeneRead Adapter I Set A 12-plex
includes twelve barcoded adapters for ligation to the DNA library. The QIAseq
Index Kit is used for adding indices prior to NGS. The data can be analyzed
using Illumina BaseSpace apps.
The Illumina mtGenome procedure employs long PCR primers developed
by Mark Wilson’s lab that can be amplified using the Illumina Nextera XT
library preparation kit with TaKaRa LA Taq DNA polymerase and sequenced
using the Illumina MiSeq v2 (2×150 cartridge). This approach was tested on
control DNA and non-probative case samples (Peck et al., 2018). Illumina’s
human mtDNA D-Loop hypervariable region protocol and sequencing using
the MiSeq was also tested with the TaKaRa Ex Taq® HS proofreading enzyme
for hair shafts (Holland, Wilson et al. 2017).
The AFDIL whole mitochondrial genome method of Fendt et al. (2009)
can be performed using the Kapa Hyper Plus Library Kit to amplify the long
PCR primers using TaKaRa LA Taq (GC Buffer and BSA) and sequenced
using the Illumina V3 (2×300 cartridge).
102
Next Generation Sequencing in Forensic Science
7.5 Mitochondrial Sequence Data
Interpretation and Reporting
Some mitochondrial chromosomes may contain mutations and differ in sequence from the other mitochondrial chromosomes in a person.
Mitochondrial DNA typing can be used to screen for heteroplasmy variants
in subpopulations of mitochondria (Li et al., 2010, Just et al., 2015, Holland
et al., 2018). NGS has been evaluated by several groups to assess heteroplasmy
and determine haplogroup (Li et al. 2010, Just et al. 2015, Holland, Pack et al.
2017, Cho et al. 2018, Kim et al. 2018, Strobl et al. 2019). Heteroplasmy is
a term used to describe the occurrence of more than one type of organelle
genome and can be indicative of mitochondrial diseases. In heteroplasmy,
the mutation or variant mtDNA mixes with the “normal” type mtDNA in the
cell. An oocyte may contain a mitochondrion with a mutation in its DNA; this
mitochondrion can be replicated and the variation transmitted to new cells
upon cell division. Additionally, cells can accumulate mutations over time,
and the aggregate is observed in the sequencing data. The mutated mitochondrial chromosomes accumulate in the cell and can begin to predominate and
influence the cellular and individual genotype. Upon fertilization of an egg
cell, the heteroplasmy can be transmitted differently to different organs and
tissues and produce a mosaicism. Issues with mtDNA typing including heteroplasmy need to be considered when analyzing mtDNA data. Triplasmy is
a term used to describe heteroplasmy at two sites in an individual. In forensics, heteroplasmy can be used to further discriminate within a maternal line,
since the variants or mutations detected in an individual may not appear in
other individuals in the same maternal line. For example, in a recent study
using SRM 2392 DNA, all sequence calls were concordant with the Certificate
of Analysis from NIST with the exception position 64 in which heteroplasmy
(ISHI poster) was detected using NGS using ForenSeq that was undetected
using Sanger sequencing (Peck et al. 2018).
After sequencing, the evidence data is reported as compared to the known
rCRS sequence by base position number and variant (e.g., 73A). An N is used
to denote a base that cannot be unambiguously determined. Confirmed heteroplasmy is reported as R for A/G and Y for C/T. If there are insertions, a
“.1” is added directly to the number position of the insertion (e.g., 315.1C for
6Cs instead of 5 following the T at 310 prior to 316). A deletion is denoted by a
“D,” “d,” or “del” following the position where the deletion was observed (e.g.,
309D, 309d, or 309del). The sequences of the questioned samples are compared to reference known samples, if available. The haplotype is determined
using databases such as the European DNA profiling (EDNAP) mtDNA population database (EMPOP) and frequency is determined (Butler 2005).
Mitochondrial DNA typing data analysis of ForenSeq mtDNA Control
Region and Whole Genome data can be performed with the upgraded
Mitochondrial DNA Typing
103
Figure 7.1 Variation between SNP 73 in HL-60 as compared to the rCRS.
Verogen Universal Analysis Software (UAS) v2. The software tool is based on
mtDNA BaseSpace applications, and the mtDNA Variant Processor is an app
on Illumina’s BaseSpace cloud computing platform. The software analyzes
the raw data retrieved from the MiSeq FGx instrument, shows a coverage
map, and calls the base positions as compared to the rCRS (Figure 7.1).
The UAS can be used to assess quality metrics from a sequencing run,
evaluate reads per sample, view the SNP base at each position, compare the
variable positions to the rCRS, and view the number of reads at each SNP.
Figure 7.2 shows the reads per sample in a graph generated by the UAS.
Figure 7.2 Reads per sample.
104
Next Generation Sequencing in Forensic Science
Figure 7.3 Insertion at 315.1 in the HL-60 standard as compared to the rCRS.
The mtDNA Variant Processor app performs adaptor trimming, alignment, primer, variant calling, and performs variant call format (.vcf) file
output. In the trimming step, the adaptor sequences are removed from the
forward and reverse reads until only three adaptor bases are found on each
end of the read. Incomplete amplicons and the reads that were trimmed excessively are discarded. To perform base calling, the sequences are aligned from
the true start of the circle using BWA-MEM with parameters optimized for
homopolymeric C-stretches (e.g., HVII that can lead to discordance). Indels
are realigned and a 3ʹ alignment is used in C-stretches. Figure 7.3 shows the
insertion at 315.1 in the HL-60 standard as compared to the rCRS. Primer
contributions are removed from the reads to accurately call the variants.
Using all of the reads, the data is compiled to identify the consensus base
for the allele call. The quality score is used to filter the bases prior to use in
calling. A minimum read count is used for calls; locations in which the reads
or noise fall below the minimum are flagged. A score is calculated for each
called position based upon the BaseQ, MapQ, and analysis threshold scores.
At each SNP position, the base call may match the rCRS, differ from the
rCRS, or reflect an insertion or deletion from the rCRS sequence.
A full profile is expected for the positive control while the negative or no
template control should yield no reads. The total read count and percentage
of each base detected at each position is also numerically and graphically
presented. Figure 7.4 shows the reads and calls for a sample at three concentrations (1, 5, and 100 pg) at position 489. A purple warning flag indicates
that the number of reads for a SNP is below the interpretation threshold (IT)
Mitochondrial DNA Typing
105
Figure 7.4 Total reads and calls for a sample at three concentrations (1, 5, and
100 pg) at position 489.
Figure 7.5 A purple warning flag indicates that the number of reads for position
523 is below the IT in a 1 pg sample.
at position 523 in the 1 pg sample (Figure 7.5). The mtDNA Variant Analyzer
BaseSpace app gives a visual representation of the read coverage at each base,
variations from the rCRS, modified bases, and bases flagged with colorimetric indicators and enables the user to advance the dial through the mitochondrial base locations. Filters can be used to restrict the view to the hotspots.
106
Next Generation Sequencing in Forensic Science
The Sample Compare mode opens a new window and includes the base call,
paired end depth, variant quality score, and base read percentages. Any sample can be assigned as the reference and used to generate comparisons and
call differences to other samples. Users can compare results within a run or
those generated over time. The UAS does not have a tool for haplotypes, but
the generated report can be cut and pasted into the information into EMPOP
database for the prediction (https://empop.online/). Reports can be generated
to summarize and output the results. The variant report can be exported to
Excel. The BAM can be used for additional investigation with other tools
including integrative genomics viewer (IGV) for alignments.
Mitochondrial DNA typing data analysis of Precision ID mtDNA Control
Region Panel, and Whole Genome Panel can be performed in the Applied
Biosystems Converge Software v2.1 NGS Data Analysis module released in
2018 which is approved for uploading the data into CODIS. Converge combines CE and NGS data analysis into one platform. Mitochondrial DNA analysis can be difficult due to mixtures, heteroplasmy, insertions, and deletions
that can complicate alignments. The Precision ID mtDNA sequence data
varies in read depth just as the ForenSeq sequence data does. Like the UAS,
the Converge software automates base calling and alignment, detects heteroplasmy to ~10%, identifies variants, and allows labs to look at sequences
throughout the control region. It can filter common variants, deleterious
variants, and other variants. Converge includes annotation information from
more than twenty databases. The top tabs include Home, Samples, Analyses,
and Workflow. There are quick links to access samples, workflows, and analyses. By selecting the Workflow tab, the user can create a new workflow. The
first step is to name the workflow and add a description. The subsequent steps
include reference, annotation, filters, plugins, final report, parameters, and
confirm. Data can be displayed as a variant impact tree or a copy number
variation (CNV) heat map. Compared to Sanger sequencing, variant calling
with SRM 2392 sequenced with NGS showed little discordance and was limited primarily to the 309 position.
Other software available for the analysis of NGS mitotyping data is the
Softgenetics GeneMarker HTS, which is compatible with forensic nomenclature and can be used to produce an EMPOP formatted report. The CLC
Genomics Workbench can be used to analyze AFDIL and Qiagen NGS
data using the AFDIL-QIAGEN mtDNA Expert (AQME) tool. AQME
generates an editable mtDNA profile consistent with forensic naming conventions and reporting information. AQME also estimates mtDNA haplogroup, has optional outputs including SNP metrics, exportable files, and
is compatible with any sample type, library preparation kit, or NGS platform (Sturk-Andreaggi et al. 2017). Van Neste et al. (2014) disseminated
tools including Python scripts and My-Forensic-Loci-queries (MyFLq) for
Mitochondrial DNA Typing
107
mitochondrial sequence data analysis. Liu et al. (2018) performed a review of
bioinformatics tools and methods for forensic DNA analysis for users seeking
additional information and options.
In evaluations of the several kits and methods, variations have been
observed in read depth among the methods. NIST reported on their testing of the Nextera® XT DNA Sample Preparation Kit and sequencing with
the Illumina MiSeq™ in 2014 (King et al. 2014). More consistent coverage
and less likelihood of dropout have been obtained using the AFDIL Whole
mtGenome Method (Kapa Hyper Plus Library Kit) than the Nextera® Library
kit whole genome method in a performance analysis at NIST. McElhoe and
Holland (2020) reported upon the discrimination of heteroplasmy from system noise and substitution and sequence specific errors in MiSeq NGS data.
NGS has improved mixture analysis for autosomal DNA typing. Three
or more heteroplasmic sites in HV1 and HV2 usually indicate a mixture, but
mtDNA typing is not used for mixture analysis. In addition, nuclear pseudogenes (chromosome 11) can amplify and contaminate mtDNA sequence; this
can be reduced or eliminated by using WGA to enrich the mtDNA or DNase
I digestion to eliminate nuclear DNA.
7.6 Recent Reports of Mitotyping Using
NGS for Forensic Applications
There has been a flurry of interest in massively parallel sequencing (MPS)
tools for mitotyping in forensic testing over the past five years. Many
teams have evaluated MPS in their labs using the commercial kits we have
described. Kim et al. (2015) reported on massively parallel sequencing using
the 454 GS Junior instrument to sequence the mitochrondrial HVI and HVII
regions, and Churchill et al. (2016) reported on the use of the Ion PGM™ for
sequencing the whole mitochondrial genome and the HID-Ion STR 10-plex
panel. Davis et al. (2015) reported upon the use of NGS for buccal swab, bone,
and tissue samples using the Nextera® XT kit and the MiSeq and found that
the profiles were concordant and that the method was able to resolve length
heteroplasmy. Woerner et al. (2018) reported on the use of the Precision ID
mtDNA whole genome panel using the Ion S5 and additionally tested it using
the MiSeq; Strobl et al. (2018) also evaluated the Precision ID whole mtDNA
genome panel but tested it on the Ion Personalized Genome Machine (PGM);
Avent et al. (2019) reported on the use of the Qiagen 140-locus SNP NGS assay;
and Cihlar et al. (2020) also reported on the use of the Precision ID mtDNA
whole genome panel sequenced with the Ion Chef and Ion S5. Zascavage et al.
(2018) evaluated the Oxford Nanopore MinION device for mtDNA analysis and found that it resulted in a 1.00% error rate in sequencing the whole
108
Next Generation Sequencing in Forensic Science
mtDNA genome. Shih et al. (2018) performed probe capture instead of PCR
amplification for target amplification offers for the analysis of limited and
mock degraded samples and individual telogen hairs. Huszar et al. (2019)
reported on their evaluation of the prototype Promega PowerSeq™ Auto/
Mito/Y System which employs overlapping primers to produce short amplicons. The Promega PowerSeq™ CRM Nested System followed by Nextera®
XT library preparation and NGS were used to assess heteroplasmy and DNA
damage under low template conditions (Holland et al. 2021). McCord and Lee
(2018) recently edited an issue of Electrophoresis entitled “Novel Applications
of Massively Parallel Sequencing (MPS) in Forensic Analysis,” which includes
twenty forensic NGS reports including a several on mtDNA applications.
Holland et al. (2019) reported on mtDNA profiles obtained from unfired
ammunition components using MPS.
Stoljarova et al. (2016) evaluated mtDNA variation in the Estonian population using MPS. Park et al. (2017) reported on full mtDNA genome sequencing of Korean individuals using MPS, and Avila et al. (2019) reported on the
same for Brazilian admixed individuals. Melchionda et al. (2020) reported
on NGS DNA typing results obtained with the Romanian population and
polymorphisms that have not been previously detected using other tools that
enable increased haplogroup discrimination.
In November 2020, Barrio et al. reported on the CHEP-ISFG collaborative exercise using MPS. NIST reported upon implementation of mtDNA
typing using NGS in that laboratory (Churchill et al. 2017). Brandhagen et al.
(2020) reported on validating mtDNA typing using NGS for casework at the
FBI laboratory.
7.7 Mitochondrial Sequence Data and Databases
The Federal Bureau of Investigation (FBI) hosts the Combined DNA Index
System (CODIS) database for law enforcement purposes which includes
mtDNA profiles. The Scientific Working Group on DNA Analysis Methods
(SWGDAM) has produced new interpretation guidelines “Interpretation
Guidelines for Mitochondrial DNA Analysis by Forensic DNA Testing
Laboratories,” which includes sections on using NGS in forensic DNA testing. The mtDNA testing interpretation document 2019 updates include sections on sequence analysis criteria, interpretation of C-stretches, and mixture
interpretation. The US National DNA Index System (NDIS) Procedures Board
approved (5/2/19) the use of the MiSeq FGx® Forensic Genomics System to
collect data that can be entered into NDIS. Control region data collected
using the Precision ID mtDNA Whole Genome Panel is also approved for
inclusion in the NDIS CODIS database. Data collected using the ForenSeq
Mitochondrial DNA Typing
109
and Precision ID mitotyping kits have been approved for inclusion in CODIS.
A high level of accuracy is required for EMPOP submission.
Questions
1. What is mitochondrial DNA? Where is it found in the cell? Which
regions are probed in forensic DNA analysis?
2. What forensic questions can mtDNA typing answer?
3. Compare and contrast Sanger sequencing and NGS methods for
mtDNA DNA typing.
4. Describe the processes that are occurring in PCR1 and PCR2 in the
ForenSeq mtDNA Control Region kit and compare and contrast
PCR1 and PCR2 in the ForenSeq mtDNA Control Region kit and the
ForenSeq DNA Signature Prep Kit.
5. Describe heteroplasmy and how this can be used in mtDNA analysis
for forensic applications.
6. Explain how haplotypes are assigned using sequencing data.
References
Ambers, A., Bus, M.M., King, J.L., Jones, B., Durst, J., Bruseth, J.E., Gill-King, H.,
and B. Budowle. “Forensic genetic investigation of human skeletal remains
recovered from the La Belle shipwreck.” Forensic Science International, 306
(January 2020): 110050. doi:10.1016/j.forsciint.2019.110050.
Ambers, A.D., Churchill, J.D., King, J.L., Stoljarova, M., Gill-King, H., Assidi, M.,
Abu-Elmagd, M., Buhmeida, A., Al-Qahtani, M., and B. Budowle. “More comprehensive forensic genetic marker analyses for accurate human remains identification using massively parallel DNA sequencing.” BMC Genomics 17, Suppl
9 (October 17, 2016): 750. doi:10.1186/s12864-016-3087-2.
Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin,
J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J.,
Staden, R., and I.G. Young. “Sequence and organization of the human mitochondrial genome.” Nature 290, no. 9 (April 9, 1981): 457–465. doi:10.1038/
290457a0.
Andrews, R.M., Kubacka, I., Chinnery, P.F., Lightowlers, R.N., Turnbull, D.M., and
N. Howell. “Reanalysis and revision of the Cambridge reference sequence for
human mitochondrial DNA.” Nature Genetics 23, no. 2 (October 1999): 147.
doi:10.1038/13779.
Avent, I., Kinnane, A.G., Jones, N., Petermann, I., Daniel, R., Gahan, M.E., and D.
McNevin. “The QIAGEN 140-locus single-nucleotide polymorphism (SNP)
panel for forensic identification using massively parallel sequencing (MPS): An
evaluation and a direct-to-PCR trial.” International Journal of Legal Medicine
133, no. 3 (May 2019): 677–688. doi:10.1007/s00414-018-1975-5.
110
Next Generation Sequencing in Forensic Science
Avila, E., Graebin, P., Chemale, G., Freitas, J., Kahmann, A., and C.S. Alho. “Full
mtDNA genome sequencing of Brazilian admixed populations: A forensicfocused evaluation of a MPS application as an alternative to Sanger sequencing methods.” Forensic Science International: Genetics 42 (2019): 154–164.
doi:10.1016/j.fsigen.2019.07.004.
Barrio, P.A., García, Ó., Phillips, C., Prieto, L., Gusmão, L., Fernández, C., Casals, F.,
Freitas, J.M., González-Albo, M.D.C., Martín, P., Mosquera, A., Navarro-Vera,
I., Paredes, M., Pérez, J.A., Pinzón, A., Rasal, R., Ruiz-Ramírez, J., Trindade,
B.R., and A. Alonso. “The first GHEP-ISFG collaborative exercise on forensic
applications of massively parallel sequencing.” Forensic Science International:
Genetics 49 (November 2020): 102391. doi:10.1016/j.fsigen.2020.102391.
Biesecker, L.G., Bailey-Wilson, J.E., Ballantyne, J., Baum, H.R., Bieber, F.R., Brenner,
C., Budowle, B., Butler, J.M., Carmody, G., Conneally, P.M., Duceman, B.,
Eisenberg, A., Forman, L., Kidd, K.K. Leclair, B., Niezgoda, S., Parsons, T.J.,
Pugh, E., Shaler, R., Sherry, S.T., Sozer, A., and A. Walsh. “DNA identifications
after the 9/11 world trade center attack.” Science 310 no. 5751 (November 18,
2005): 1122–1123. doi:10.1126/science.1116608.
Bini, C., Ceccardi, S., Colalongo, C., Ferri, G., Falconi, M., Pelotti, S., and G.
Pappalardo. “Population data of mitochondrial DNA region HVIII in 150
individuals from Bologna (Italy).” International Congress Series 1239 (January
2003):525–528. doi:10.1016/S0531-5131(02)00559-9.
Brandhagen, M.D., Just, R.S., and J.A. Irwin. “Validation of NGS for mitochondrial
DNA casework at the FBI Laboratory.” Forensic Science International: Genetics
44 (January 2020): 102151. doi:10.1016/j.fsigen.2019.102151.
Bruijns, B., Tiggelaar, R., and H. Gardeniers. “Massively parallel sequencing techniques for forensics: A review.” Electrophoresis 39, no. 21 (November 2018):
2642–2654. doi:10.1002/elps.201800082.
Budowle, B., Wilson, M.R., DiZinno, J.A., Stauffer, C., Fasano, M.A., Holland,
M.M., and K.L. Monson. “Mitochondrial DNA regions HVI and HVII population data.” Forensic Science International 103, no. 1 (July 12, 1999): 25–35.
doi:10.1016/S0379-0738(99)00042-0.
Budowle, B., Allard, M.W., Wilson, M.R., and R. Chakraborty. “Forensics and
Mitochondrial DNA: Applications, Debates, and Foundations.” Annual Review
of Genomics and Human Genetics 4 (September 2003): 119–141. doi:10.1146/
annurev.genom.4.070802.110352.
Buś, M.M., Lembring, M., Kjellström, A., Strobl, C., Zimmermann, B., Parson, W.,
and M. Allen. “Mitochondrial DNA analysis of a Viking age mass grave in
Sweden.” Forensic Science International: Genetics 42 (September 2019): 268–
274. doi:10.1016/j.fsigen.2019.06.002
Butler, J. Forensic DNA Typing, 2nd ed. Burlington, MA: Elsevier Academic Press, 2005.
Chaitanya, L., Ralf, A., van Oven, M., Kupiec, T., Chang, J., Lagacé, R., and M.
Kayser. “Simultaneous whole mitochondrial genome sequencing with short
overlapping amplicons suitable for degraded DNA using the ion torrent personal genome machine.” Human Mutation 36, no. 12 (December 2015): 1236–
1247. doi:10.1002/humu.22905.
Chan, D.C., and E.A. Schon. “Eliminating mitochondrial DNA from sperm.”
Developmental Cell 22, no. 3 (March 13, 2012): 469–470. doi:10.1016/j.devcel.
2012.02.008.
Mitochondrial DNA Typing
111
Cho, S., Kim, M. Y., Lee, J. H., and S.D. Lee. “Assessment of mitochondrial DNA
heteroplasmy detected on commercial panel using MPS system with artificial
mixture samples.” International Journal of Legal Medicine 132 no. 4 (July 2018):
1049–1056. doi:10.1007/s00414-017-1755-7.
Churchill, J.D., King, J.L., Chakraborty, R., and B. Budowle. “Effects of the Ion
PGM™ Hi-Q™ sequencing chemistry on sequence data quality.” International
Journal of Legal Medicine 130, no. 5 (September 2016): 1169–1180.
doi:10.1007/s00414-016-1355-y.
Churchill, J.D., Peters, D., Capt, C., Strobl, C., Parson, W., and B. Budowle. “Working
towards implementation of whole genome mitochondrial DNA sequencing
into routine casework.” Forensic Science International: Genetics Supplement
Series 6 (December 2017): e388–e389. doi:10.1016/j.fsigss.2017.09.167.
Cihlar, J.C., Amory, C., Lagacé, R., Roth, C., Parson, W., and B. Budowle.
“Developmental validation of a MPS workflow with a PCR-based short amplicon whole mitochondrial genome panel.” Genes (Basel) 11, no. 11 (November
13, 2020): 1345. doi:10.3390/genes11111345.
Coble, M.D., Just, R.S., O’Callaghan, J.E., Letmanyi, I.H., Peterson, C.T., Irwin,
J.A., and T.J. Parons. “Single nucleotide polymorphisms over the entire
mtDNA genome that increase the power of forensic testing in Caucasians.”
International Journal of Legal Medicine 118 (February 4, 2004): 137–146.
doi:10.1007/s00414-004-0427-6.
Cuenca, D., Battaglia, J., Halsing, M, and S. Sheehan. “Mitochondrial Sequencing of
Missing Persons DNA casework by implementing thermo fisher’s Precision ID
mtDNA whole genome assay.” Genes (Basel) 4, no. 11 (November 2020): 1303.
doi:10.3390/genes11111303.
Davis, C., Peters, D., Warshauer, D., King, J., and B. Budowle. “Sequencing the
hypervariable regions of human mitochondrial DNA using massively parallel sequencing: Enhanced data acquisition for DNA samples encountered in
forensic testing.” Legal Medicine (Tokyo, Japan) 17, no. 2 (March 2015): 123–
127. doi:10.1016/j.legalmed.2014.10.004.
Dobrowolski, S.F., Gray, J., Miller, T., and M. Sears. “Identifying sequence variants
in the human mitochondrial genome using high‐resolution melt (HRM) profiling.” Human Mutation 30, no. 6 (June 2009): 891–898. doi:10.1002/humu.21003.
Eduardoff, M., Xavier, C., Strobl, C., Casas-Vargas, A., and W. Parson. “Optimized
mtDNA control region primer extension capture analysis for forensically relevant samples and highly compromised mtDNA of different age and origin.”
Genes (Basel) 8, no. 10 (September 21, 2017): 237. doi:10.3390/genes8100237.
Elkins, K.M. Forensic DNA Biology: A Laboratory Manual. Waltham, MA: Elsevier
Academic Press, 2013.
Fendt, L., Zimmerman, B., Daniaux, M., and W. Parson. “Sequencing strategy for
the whole mitochondrial genome resulting in high quality sequences.” BMC
Genomics 10 (March 30, 2009): 139. doi:10.1186/1471-2164-10-139.
ForenSeq mtDNA Control Region Kit Reference Guide, Document # VD 2019001
Rev. A. Accessed May 22, 2021. https://verogen.com/w p-content/uploads/
2019/08/ForenSeq-mtDNA-Control-Region-Guide-VD2019001-A.pdf
Fridman, C., and R.S. Gonzalez. “HVIII discrimination power to distinguish
HVI and HVII common sequences.” Forensic Science International: Genetics
Supplement Series 2 (December 2009): 320–321. doi:10.1016/j.fsigss.2009.07.011.
112
Next Generation Sequencing in Forensic Science
Gaag, K.J.V., Desmyter, S., Smit, S., Prieto, L., and T. Sijen. “Reducing the number
of mismatches between Hairs and Buccal references when analysing mtDNA
heteroplasmic variation by massively parallel sequencing.” Genes (Basel) 11,
no. 11 (November 16, 2020): 1355. doi:10.3390/genes11111355.
Gabriel, M.N., Huffine, E.F., Ryan, J.H., Holland, M.M., and T.J. Parsons. “Improved
MtDNA sequence analysis of forensic remains using a ‘mini-primer set’ amplification strategy.” Journal of Forensic Sciences 46, no. 2 (March 2001): 247–253.
Gallimore, J.M., McElhoe, J.A., and M.M. Holland. “Assessing heteroplasmic variant
drift in the mtDNA control region of human hairs using an MPS approach.”
Forensic Science International: Genetics 32 (January 2018): 7–17.
Hickman, M.P., Grisedale, K.S., Bintz, B.J., Burnside, E.S., Hanson, E.K., Ballantyne,
J., and M.R. Wilson. “Recovery of whole mitochondrial genome from compromised samples via multiplex PCR and massively parallel sequencing.” Future
Science OA 4, no. 9 (August 24, 2018): FSO336. doi:10.4155/fsoa-2018-0059.
Holland, M.M., and T.J. Parsons. “Mitochondrial DNA sequence analysis –
Validation and use for forensic casework.” Forensic Science Reviews 11, no. 1
(June 1999): 21–50.
Holland, M.M., Pack, E.D., and J.A. McElhoe. “Evaluation of GeneMarker® HTS for
improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment.” Forensic Science International: Genetics 28 (May 2017):
90–98. doi:10.1016/j.fsigen.2017.01.016.
Holland, M.M., Wilson, L.A., Copeland, S., Dimick, G., Holland, C.A., Bever, R.,
and J.A. McElhoe. “MPS analysis of the mtDNA hypervariable regions on the
MiSeq with improved enrichment.” International Journal of Legal Medicine
131, 4 (July 2017): 919–931. doi:10.1007/s00414-017-1530-9.
Holland, M.M., Makova, K.D., and J.A. McElhoe. “Deep-coverage MPS analysis of heteroplasmic variants within the mtGenome allows for frequent differentiation of maternal relatives.” Genes 9, no. 3 (February 26, 2018): 124.
doi:10.3390/genes9030124.
Holland, M.M., Bonds, R.M., Holland, C.A., and J.A. McElhoe. “Recovery of
mtDNA from unfired metallic ammunition components with an assessment of sequence profile quality and DNA damage through MPS analysis.”
Forensic Science International: Genetics 39 (March 2019): 86–96. doi:10.1016/j.
fsigen.2018.12.008.
Holland, C.A., McElhoe, J.A., Gaston-Sanchez, S., and M.M. Holland. “Damage
patterns observed in mtDNA control region MPS data for a range of template concentrations and when using different amplification approaches.”
International Journal of Legal Medicine 135, no. 1 (January 2021): 91–106. doi:1
0.1007/s00414-020-02410-0.
Huszar, T.I., Wetton, J.H., and M.A. Jobling. “Mitigating the effects of reference
sequence bias in single-multiplex massively parallel sequencing of the mitochondrial DNA control region.” Forensic Science International: Genetics 40
(May 2019): 9–17. doi:10.1016/j.fsigen.2019.01.008.
Ivanov, P.L., Wadhams, M.J., Roby, R.K., Holland, M.M., Weedn, V.W., and T.J.
Parsons. “Mitochondrial DNA sequence heteroplasmy in the Grand Duke
of Russia Georgij Romanov establishes the authenticity of Tsar Nicholas II.”
Nature Genetics 12, no. 4 (April 1996): 417–420. doi:10.1038/ng0496-417.
Mitochondrial DNA Typing
113
Just, R.S., Irwin, J.A., and W. Parson. “Mitochondrial DNA heteroplasmy in the
emerging field of massively parallel sequencing.” Forensic Science International:
Genetics 18 (September 2015): 131–139. doi:10.1016/j.fsigen.2015.05.003.
Kim, H., Erlich, H.A., and C.D. Calloway. “Analysis of mixtures using next generation sequencing of mitochondrial DNA hypervariable regions.” Croatian
Medical Journal 56, no. 3 (May 31, 2015): 208–217. doi:10.3325/cmj.2015.56.208.
Kim, M.Y., Cho, S., Lee, J.H., Seo, H.J., and S.D. Lee. “Detection of innate and artificial mitochondrial DNA heteroplasmy by massively parallel sequencing:
Considerations for analysis.” Journal of Korean Medical Science 33, no. 52
(December 11, 2018): e337. doi:10.3346/jkms.2018.33.e337.
Kim, B.M., Hong, S.R., Chun, H., Kim, S., and K.J. Shin. “Comparison of whole
mitochondrial genome variants between hair shafts and reference samples
using massively parallel sequencing.” International Journal of Legal Medicine
134, no. 3 (May 2020): 853–861. doi:10.1007/s00414-019-02205-y.
King, J.L., LaRue, B.L., Novroski, N.M., Stoljarova, M., Seo, S.B., Zeng, X., Warshauer,
D.H., Davis, C.P., Parson, W., Sajantila, A., and B. Budowle. “High-quality and
high-throughput massively parallel sequencing of the human mitochondrial
genome using the Illumina MiSeq.” Forensic Science International: Genetics 12
(September 2014): 128–135. doi:10.1016/j.fsigen.2014.06.001.
LaBerge, G.S., Shelton, R.J., Danielson, P.B. “Forensic utility of mitochondrial DNA
analysis based on denaturing high-performance liquid chromatography.”
Croatian Medical Journal 44, no. 3 (2003): 281–88.
Lee, E.Y., Lee, H.Y., Oh, S.Y., Jung, S.-E., Yang, I.S., Lee, Y.-H., Yang, W.I., and K.-J.
Shin. “Massively parallel sequencing of the entire control region and targeted
coding region SNPs of degraded mtDNA using a simplified library preparation method.” Department of Forensic Medicine; Yonsei University College of
Medicine. 2015. http://forensic.yonsei.ac.kr/presentation/116.pdf.
Levin, B.C., Hancock, D.K., Holland, K.A., Cheng, H., and K.L. Richie. “Human mitochondrial DNA—amplification and sequencing standard reference materials—
SRM 2392 and SRM 2392-I.” NIST Special Publication 260-155 (2003): 1–93.
Li, M., Schönberg, A., Schaefer, M., Nasidze, I., and M. Stoneking. “Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.” American Journal of Human Genetics 87, no. 2 (August
13, 2010): 237–249. doi:10.1016/j.ajhg.2010.07.014.
Liu, Y.Y., and S. Harbison. “A review of bioinformatic methods for forensic DNA
analyses.” Forensic Science International: Genetics 33 (March 2018): 117–128.
doi:10.1016/j.fsigen.2017.12.005.
Ma, K., Zhao, X., Li, H., Cao, Y., Li, W., Ouyang, J., Xie, L., and W. Liu. “Massive parallel
sequencing of mitochondrial DNA genomes from mother-child pairs using the
ion torrent personal genome machine (PGM).” Forensic Science International:
Genetics 32 (January 2018): 88–93. doi:10.1016/j.fsigen.2017.11.001.
Marshall, C., Sturk-Andreaggi, K., Daniels-Higginbotham, J., Oliver, R.S., BarrittRoss, S., and T.P McMahon. “Performance evaluation of a mitogenome capture and Illumina sequencing protocol using non-probative, case-type skeletal
samples: Implications for the use of a positive control in a next-generation
sequencing procedure.” Forensic Science International: Genetics 31 (November
2017): 198–206. doi:10.1016/j.fsigen.2017.09.001.
114
Next Generation Sequencing in Forensic Science
McCord, B., and S.B. Lee. “Novel applications of Massively Parallel Sequencing
(MPS) in forensic analysis.” Electrophoresis 39, no. 21 (November 2018): 2639–
2641. doi:10.1002/elps.201870175.
McElhoe, J.A., Holland, M.M., Makova, K.D. Su, M.S., Paul, I.M., Baker, C.H.,
Faith, S.A., and B. Young. “Development and assessment of an optimized nextgeneration DNA sequencing approach for the mtgenome using the Illumina
MiSeq.” Forensic Science International: Genetics 13 (November 2014) 20–29.
doi:10.1016/j.fsigen.2014.05.007.
McElhoe, J.A., and M.M. Holland. “Characterization of background noise in MiSeq
MPS data when sequencing human mitochondrial DNA from various sample sources and library preparation methods.” Mitochondrion 52 (May 2020):
40–55. doi:10.1016/j.mito.2020.02.005.
Melchionda, F., Stanciu, F., Buscemi, L., Pesaresi, M., Tagliabracci, A., and C. Turchi.
“Searching the undetected mtDNA variants in forensic MPS data.” Forensic
Science International: Genetics 49 (November 2020):102399. doi:10.1016/j.
fsigen.2020.102399.
MITOMAP: mtDNA Control Region Sequence Variants. Accessed January 25, 2021.
https://www.mitomap.org/foswiki/bin/view/MITOMAP/Polymorphisms
Control.
Park, S., Cho, S., Seo, H.J., Lee, J.H., Kim, M.Y., and S.D. Lee. “Entire mitochondrial DNA sequencing on massively parallel sequencing for the Korean population.” Journal of Korean Medical Science 32, no. 4 (April 2017): 587–592.
doi:10.3346/jkms.2017.32.4.587.
Parson, W., Huber, G., Moreno, L., Madel, M.B., Brandhagen, M.D., Nagl, S.,
Xavier, C., Eduardoff, M., Callaghan, T.C., and J.A. Irwin. “Massively parallel sequencing of complete mitochondrial genomes from hair shaft samples.”
Forensic Science International: Genetics 15 (March 2015): 8–15. doi:10.1016/j.
fsigen.2014.11.009.
Peck, M.A., Sturk-Andreaggi, K., Thomas, J. T., Oliver, R.S., Barritt-Ross, S., and C.
Marshall. “Developmental validation of a Nextera XT mitogenome Illumina
MiSeq sequencing method for high-quality samples.” Forensic Science
International: Genetics 34 (May 2018): 25–36. doi:10.1016/j.fsigen.2018.01.004.
Quintáns, B., Álvarez-Iglesias, V., Salas, A., Phillips, C., Lareu, M.V., and A.
Carracedo. “Typing of mitochondrial DNA coding region SNPs of forensic and anthropological interest using SNaPshot minisequencing.” Forensic
Science International 140, no. 2–3 (March 10, 2004): 251–257. doi:10.1016/j.
forsciint.2003.12.005.
Rathbun, M.M., McElhoe, J.A, Parson, W., and M.M. Holland. “Considering DNA
damage when interpreting mtDNA heteroplasmy in deep sequencing data.”
Forensic Science International: Genetics 26 (January 2017): 1–11. doi:10.1016/j.
fsigen.2016.09.008.
Remualdo, V.R., and R.N. Oliveira. “Analysis of mitochondrial DNA from the teeth of a
cadaver maintained in formaldehyde.” The American Journal of Forensic Medicine
and Pathology 28, no. 2 (June 2007): 145–146. doi:10.1097/PAF.0b013e31805f67d1.
Seo, Y., Uchiyama, T., Matsuda, H., Shimizu, K., Takami, Y., Nakayama, T., and K.
Takahama. “Mitochondrial DNA and STR typing of matter adhering to an earphone.” Journal of Forensic Sciences 47, no. 3 (May 2002): 605–608.
Mitochondrial DNA Typing
115
Shih, S.Y., Bose, N., Gonçalves, A.B.R., Erlich, H.A., and C.D, Calloway. “Applications
of probe capture enrichment next generation sequencing for whole mitochondrial genome and 426 nuclear SNPs for forensically challenging samples.”
Genes (Basel) 9, no. 1 (January 22, 2018): 49. doi:10.3390/genes9010049.
Strobl, C., Eduardoff, M., Bus, M.M., Allen, M., and W. Parson. “Evaluation of the precision ID whole MtDNA genome panel for forensic analyses.” Forensic Science
International: Genetics 35 (July 2018): 21–25. doi:10.1016/j.fsigen.2018.03.013.
Strobl, C., Churchill Cihlar, J., Lagacé, R., Wootton, S., Roth, C., Huber, N., Schnaller,
L., Zimmermann, B., Huber, G., Lay Hong, S., Moura-Neto, R., Silva, R.,
Alshamali, F., Souto, L., Anslinger, K., Egyed, B., Jankova-Ajanovska, R., CasasVargas, A., Usaquén, W., Silva, D., Barletta-Carrillo, C., Tineo, D.H., Vullo, C.,
Würzner, R., Xavier, C., Gusmão, L., Niederstätter, H., Bodner, M., Budowle,
B., and W. Parson. “Evaluation of mitogenome sequence concordance, heteroplasmy detection, and haplogrouping in a worldwide lineage study using the
Precision ID mtDNA Whole Genome Panel.” Forensic Science International:
Genetics 42 (September 2019): 244–251. doi:10.1016/j.fsigen.2019.07.013.
Stoljarova, M., King, J.L., Takahashi, M., Aaspõllu, A., and B. Budowle. “Whole
mitochondrial genome genetic diversity in an Estonian population sample.”
International Journal of Legal Medicine 130, no. 1 (January 2016): 67–71.
doi:10.1007/s00414-015-1249-4.
Sturk-Andreaggi, K., Peck, M.A., Boysen, C., Dekker, P., McMahon, T.P., and C.K.
Marshall. “AQME: A forensic mitochondrial DNA analysis tool for next-generation sequencing data.” Forensic Science International: Genetics 31 (November
2017): 189–197. doi:10.1016/j.fsigen.2017.09.010.
Vallone, P.M., Just, R.S., Coble, M.D., Butler, J.M., and T.J. Parsons. “A multiplex
allele-specific primer extension assay for forensically informative SNPs distributed throughout the mitochondrial genome.” International Journal of Legal
Medicine 118, no. 3 (June 2004): 147–157. doi:10.1007/s00414-004-0428-5.
Van Neste, C., Vandewoestyne, M., Van Criekinge, W., Deforce, D., and F. Van
Nieuwerburgh. “My-Forensic-Loci-queries (MyFLq) framework for analysis of
forensic STR data generated by massive parallel sequencing.” Forensic Science
International: Genetics 9 (March 2014): 1–8. doi:10.1016/j.fsigen.2013.10.012.
Walichiewicz, P., Eagles, J., Daulo, A., Didier, M., Edwards, C., Fleming, K., Han,
Y., Hill, T., Li, S., Rensfield, A., Sa, D., Husbands, J., Holt, C., and K. Stephens.
“Performance evaluation of the ForenSeq mtDNA control region solution.” ISHI,
2019. https://verogen.com/wp-content/uploads/2019/11/Mito-ISHI-Poster_
final.pdf.
Wallace, D.C., Brown, M.D., and M.T. Lott. “Mitochondrial DNA variation in
human evolution and disease.” Gene 238 (September 30, 1999): 211–230.
doi:10.1016/S0378-1119(99)00295-4.
Wang, Z., Zhang, S., Bian, Y., and C. Li. “Differentiating between monozygotic twins
in forensics through next generation mtGenome sequencing.” Forensic Science
International: Genetics Supplement Series 5 (September 10, 2015): e58–e59.
doi:10.1016/j.fsigss.2015.09.023.
Wang, Z., Zhu, R., Zhang, S., Bian, Y., Lu, D., and C. Li. “Differentiating between
monozygotic twins through next-generation mitochondrial genome sequencing.”
Analytical Biochemistry 490 (December 1, 2015): 1–6. doi:10.1016/j.ab.2015.08.024.
116
Next Generation Sequencing in Forensic Science
Warner, J.B., Bruin, E.J., Hannig, H., Hellenkamp, F., Hörning, A., Mittmann, K.,
van der Steege, G., de Leij, L.F.M.H., and H.S.P. Garritsen. “Use of sequence
variation in three highly variable regions of the mitochondrial DNA for
the discrimination of allogeneic platelets.” Transfusion 46 (2006): 554–561.
doi:10.1111/j.1537-2995.2006.00775.x.
Woerner, A.E., Ambers, A., Wendt, F.R., King, J.L., Moura-Neto, R.S., Silva, R., and
B. Budowle. “Evaluation of the precision ID mtDNA whole genome panel on
two massively parallel sequencing systems.” Forensic Science International:
Genetics 36 (September 2018): 213–224. doi:10.1016/j.fsigen.2018.07.015.
Zascavage, R.R., Thorson, K., and J.V. Planz. “Nanopore sequencing: An enrichmentfree alternative to mitochondrial DNA sequencing.” Electrophoresis 40, no. 2
(January 2019): 272–280. doi:10.1002/elps.201800083.
Zhang, C., Li, H., Zhao, X., Ma, K., Nie, Y., Liu, W., Jiao, H., and H. Zhou. “Validation
of expressmarker mtDNA-SNP60: A mitochondrial SNP kit for forensic application.” Electrophoresis 37 (2018): 2848–2861. doi:10.1002/elps.201600042.
Microbial Applications
of Next Generation
Sequencing for Forensic
Investigations
8
8.1 Introduction to Microbial DNA Profiling
In addition to human DNA typing using genomic DNA and mitochondrial
DNA, next generation sequencing (NGS) can be used for many additional
applications relevant to forensic investigations, many of which have been
demonstrated in the past few years while others continue to be introduced.
Research from the ten-year Human Microbiome Project (HMP) has demonstrated that microbe populations from different sources vary and can be
used to attribute the community to the source (Belizário and Napolitano
2015). Similarly, microbe communities have been shown to be impacted
by human disease and infection characteristics that can be differentiating
(Gurenlian 2007, Chen et al. 2014, Kistler et al. 2015, Lipowski et al. 2017).
Emerging NGS applications include microbial DNA profiling of a variety
of forensically relevant samples including soil, blood, hair, skin, oral, nasal,
vaginal, and anal sources which can aid investigators in determining the
circumstances of a case (Tridico et al. 2014, Belizário and Napolitano 2015,
Giampaoli et al. 2017, Leong et al. 2017, Quaak et al. 2018, Schmedes et al.
2018, Rajan et al. 2019, Woerner et al. 2019). Analysis of the human microbiome can provide scientists with an additional measure of individual identification based, at least in part, on lifestyle and behavioral patterns. The
human microbiome consists of all of the microbiological organisms on the
skin and within the body in the colon, vaginal cavity, mouth, and ears.
Microbiome analysis can complement traditional serological analysis and
be used to determine which body site a sample came from. Microbiome
profiling can aid in determining the circumstances of a case including who
touched a computer keyboard, which regions came into contact during sexual intercourse, and to which body region a fluid or stain came into contact
with. Microbiological analysis can also be used to estimate the postmortem interval (PMI). This chapter aims to trace progress in applying NGS
for microbial community profiling with a focus on forensic applications to
serve as a resource of publications published to date in this area for forensic
researchers and practitioners.
DOI: 10.4324/9781003196464-8
117
118
Next Generation Sequencing in Forensic Science
8.2 Why NGS?
Whereas bacteria have long been identified through culture and microscopy,
biochemical enzymatic assays, and spectroscopic methods and more recently
by mass spectrometry (Elkins 2019, Franco-Duarte et al. 2019, Elkins and
Bender 2020, Bender et al. 2020), molecular diagnostic methods using PCR
are more sensitive (Welinder-Olsson et al. 2007). PCR-based molecular fingerprinting methods and NGS can be used to understand the structure and
function relationships in microbial communities in human-associated systems (Phadke et al. 2017). NGS can capture subtle differences between bacterial communities in samples without reliance on target genetic marker
systems (Sjödin et al. 2012).
8.3 The Human Microbiome Project
Humans are host to microorganisms including pathogenic and non-pathogenic
bacterial species that have coevolved with the immune system and aid in
functions including breakdown of biomolecules in the intestines (Belizário
and Napolitano 2015). The US National Institutes of Health (NIH) sponsored
a decade-long study, the “Human Microbiome Project,” from 2007 to 2016
with the goal of understanding how the microbiome affects human health
and disease by characterizing the abundance, diversity, and functionality of
microbes (Belizário and Napolitano 2015). Samples from volunteers were collected from the mouth, tonsils, placenta at birth, vagina, skin, gut, and stool
(Belizário and Napolitano 2015). The bacterial families and species present in
the samples were characterized, and the differentiating families of bacteria
present in the samples were documented (Belizário and Napolitano 2015).
The study showed that human gut bacteria were found to express over 3.3
million bacterial genes compared to the 20,000 genes expressed by the human
genome or 165 times as many bacterial genes as human genes (Belizário and
Napolitano 2015).
8.4 Sampling and Processing
The overall NGS approach is very similar to those applied to forensic DNA
typing of human autosomal, sex-linked, and mitochondrial DNA polymorphisms. Samples must be collected using sterile and DNA-free swabs or collection devices and handled so that DNA from the investigator is not transferred
to the material. The sampling device should be labeled, catalogued, and stored
under cool, dry conditions until processing begins. A DNA extraction method
Microbial Applications of NGS
119
is used to recover the cells from the microbial community on the swab/device
or from the soil and extract the DNA from the cells. The quantity and quality
of the extracted DNA is determined, and the extract is concentrated or diluted,
as needed, for input in library preparation. More detail on DNA extraction
and quantitation is found in Chapter 3. NGS instruments, including the Roche
454, Illumina HiSeq and MiSeq, and ThermoFisher Ion Proton, can be used for
sequencing (Budowle et al. 2014, Clarke et al. 2017, Minogue et al. 2019). Details
regarding the chemistries and instruments can be found in Chapter 2.
In a process study, buccal, nasal, and ear swabs were used to evaluate the
Promega DNA IQ™ Casework Pro Kit with the Maxwell ® 16 and GenElute
Bacterial Genomic DNA kit to extract DNA from ten bacterial species followed by sequencing using the Ion S5 NGS system and data analysis using
the metagenomics workflow in the Ion Reporter Software (Alessandrini et al.
2019). The authors were able to simultaneously purify and identify both microbial and human DNA (Alessandrini et al. 2019). This is critical in many forensic cases in which the sample quantity is often trace or limited (Alessandrini
et al. 2019). In the study, the DNA IQ™ Casework Pro Kit with the Maxwell ® 16
performed better than the GenElute method (Alessandrini et al. 2019).
8.5 NGS Methodology in Microbial Forensics
While single source or human mixture samples have been routinely analyzed for many years, identifying the bacterial species in a human sample is a
newer investigative tool. In bacterial studies, amplifying and sequencing the
16S rRNA gene is a standard approach for microbial profiling. Alternatively
whole genome shotgun (WGS) metagenomics can be used. The 16S rRNA
gene is highly species specific (Franco-Duarte et al. 2019). The principle of
16S rRNA testing relies on detecting differences in the reverse-transcribed
highly variable region 16S rRNA sequences (16S rDNA) to construct the
microbial flora composition. Not only is sequencing faster than culture in
many cases, it is also preferred in cases in which the sample is unculturable
(Gilchrist et al. 2015, Franco-Duarte et al. 2019, Willis and Gabaldón 2020).
NGS can be used for detecting and identifying microorganisms in research
and clinical diagnosis using approaches including DNA barcoding, single
cell sequencing with whole genome amplification (WGA), whole metagenome shotgun (WMS) sequencing, meta-transcriptomics, and metagenomics (Franco-Duarte et al. 2019, Willis and Gabaldón 2020). The researcher can
investigate the composition of a sample including the major and minor bacterial constituents with nearly limitless multiplex ability (Minogue et al. 2019).
The 16S rRNA gene primers are designed to target the 16S rRNA genes
and tagged for adding barcodes and adaptor sequences in a subsequent PCR
120
Next Generation Sequencing in Forensic Science
step. The 16S rRNA contains nine hypervariable regions spanning 1500 bp
flanked by conserved sequences. It is part of the 30S subunit of the prokaryotic ribosome. Because it is essential for protein synthesis, it is widely conserved among bacteria and archaea. The primer sequences are purchased,
reconstituted in nuclease-free water, quantified, and diluted to the appropriate stock concentration. A master mix containing DNA polymerase and
other PCR reagents is combined with the primers and the input DNA to
amplify and tag the target sequences. In a second PCR step, the barcodes and
adaptor sequences are added. The libraries are cleaned-up, normalized, and
pooled for sequencing. The libraries are added to the flow cell, chip, or system
chosen for sequencing. In human microbiome analysis, whole populations of
microorganisms can be analyzed in a sample using NGS via its deep sequencing capability.
8.6 Results from the Human Microbiome Project
In the HMP, samples from the oral mucosa, skin, vaginal mucosa, and gut
were analyzed by sequencing the 16S rRNA gene using NGS and identifying
the metabolic genes and proteins using computational methods (Pasolli et al.
2019). Altogether the samples included ~150,000 microbial genomes attributed
to 4930 species (Pasolli et al. 2019). The data was used to develop genome scale
metabolic reconstructions and constraint-based modeling methods led to phenotype prediction, microbe-microbe, microbe-host, and microbe-diet metabolic interactions, and disease etiology data (Chowdhury and Fong 2020). The
data demonstrated that the four sites varied in bacterial colonizers (Chowdhury
and Fong 2020).
Analysis of infant body sites also showed variation in the relative abundances of key phyla in the oral mucosa, nares, gastrointestinal tract, and
skin (Milani et al. 2017). The human microbiomes are first established in the
infant gut and skin during pregnancy and delivery. After delivery, the microbiome changes with the new environment and the proportions of the bacterial types shift. Factors that influence gut composition in infants and children
include the mode of delivery, gestational age, mode of feeding (breastmilk or
formula), family members, host interactions, maternal diet, and geographical location. The most common types of bacteria encountered in their study
were actinobacteria, bacteroidetes, firmicutes, fusobacteria, and proteobacteria. The most common bacteria varied for the body regions sampled.
Bacteroidetes were most common in the gut microbiome. Actinobacteria
were most common in the skin microbiome. Proteobacteria were the most
common in the placenta microbiome, and firmicutes were most common
in the vaginal microbiome. The oral microbiome contained relatively equal
Microbial Applications of NGS
121
proportions of all of the types with a smaller proportion of actinobacteria.
Saliva bacteria from HMP samples were quantified and identified using the
GENIUS tool (Hasan et al. 2014). Analysis of the urine microbiome led to
the determination that urine contains bacteria and viruses and is not sterile
(Wojciuk et al. 2019). These sites and human body fluid samples are routinely
profiled in forensic investigations for human identification applications. The
HMP data and approaches demonstrate additional opportunities for sample
and human individualization using NGS but analysts must be aware of the
potential for bias in sequencing methods as demonstrated in Figure 8.1. A
normalized comparison of the samples sequenced with the three methods is
shown in Figure 8.2 in tree mode.
8.7 HMP Applications for Forensic Science
Applications of human individualization using microbiome analysis in
forensic cases include typing the intestinal microbiota in infants, body fluids
including oral, vaginal and fecal samples, and skin and hair samples. The
intestinal microbiota was investigated using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing and compared to Sanger paired end
sequencing reads and 454 sequencing reads with comparable results but the
SOLiD sequencing yielding better resolution to the species level (Mitra et al.
2013). Variations in the infant intestinal microbiota through 16S rRNA gene
fecal testing were assayed in sudden infant death syndrome (SIDS) cases as
compared to healthy, age-matched controls; no significant difference was
found (Leong et al. 2017). The identification of vaginal fluid is very important in forensic investigations and sexual assault cases. Human microbiome
analysis was demonstrated to identify vaginal fluid (Giampaoli et al. 2017). In
a recent study, DNA was extracted from eighteen samples including vaginal,
oral, fecal, and yogurt and analyzed using NGS and a traditional PCR-based
method; the NGS method was shown to detect more species and lead to more
probative data (Giampaoli et al. 2017).
Teeth can develop biofilms formed when oral bacteria form a sticky layer
on a tooth’s surface (Gurenlian 2007). If not frequently removed, the biofilm can lead to the formation of dental caries, gingivitis, and periodontitis
(Gurenlian 2007). Periodontitis is a common oral bacterial disease in humans;
left untreated it can lead to tooth loss (Kistler et al. 2015). To better study biofilm formation on teeth, a model was seeded with natural saliva from volunteers and extracted microbial DNA 16S rRNA targets were sequenced using
the 454 and analyzed using mothur (Kistler et al. 2015). Biofilm grown from
the same panel of volunteers were found to be highly similar and clustered in
PCA plots with the differences between panels being more pronounced after
122
Next Generation Sequencing in Forensic Science
Figure 8.1 Bias in sequencing in the gut microbiome using Sanger, 454, SOLiD
and Shotgun-SOLiD sequencing methods. (Suparna Mitra, Karin Förster-Fromme,
Antje Damms-Machado, Tim Scheurenbrand, Saskia Biskup, Daniel H. Huson,
Stephan C. Bischoff (CC 2.0). https://pubmed.ncbi.nlm.nih.gov/24564472/.)
Microbial Applications of NGS
123
Actinobacteridae
Actinobacteria (class)
Sanger_16S
Nakamurellaceae
Bifidobacteriaceae
Coriobacteriaceae
454_16S
SOLiD_16S
Bacteroidales
Bacteroidates
Bacteroidetes/Chlorobi group
Bacteroidaceae
Porphyromonadaceae
Prevotellaceae
Rikenellaceae
unclassified Bacteroidales
Cytophagaceae
Flavobacteriaceae
Sphingobacteriales
Chitinophagaceae
Rhodothermaceae
Chlorobiaceae
Cyanobacteria
Stigonematales
Deferribacteres
Fibrobacteres/Acidobacteria
group
cellular organisms
Verrucomicrobiaceae
Chroococcales
Gloeobacteria
Oscillatoriales
Bacilli
Bacteria
Acidobacteriaceae
Fibrobacteraceae
Bacillales
Lactobacillaceae
Firmicutes
root
Lactobacillales
Clostridiales
Streptococcaceae
Clostridiaceae
Clostridiales Family XI. Incertae Sedis
Eubacteriaceae
Lachnospiraceae
Peptostreptococcaceae
Ruminococcaceae
unclassified Clostridiales
Erysipelotrichaceae
Veillonellaceae
Fusobacteriaceae
Alphaproteobacteria
Gemmatimonadaceae
Rhodospirillaceae
Sphingomonadaceae
Alcaligenaceae
Proteobacteria
Burkholderiaceae
Nitrosomononadaceae
Desulfovibrionaceae
Pelobacteraceae
Enterobacteriaceae
Eukaryota
Mollicutes
unclassified Bacteria
Pipidae
Viridiplantae
No hits
Figure 8.2 Normalized comparison between 16S samples obtained using three
technologies: “‘Sanger,” “‘16S-454,” and “16S-SOLiD” datasets. Normalized comparison result obtained using MEGAN for “Sanger”-dataset (blue), “16S-454”
dataset (cyan), and “16S-SOLiD” dataset (magenta) without considering “No hits”
node. The tree is collapsed at “family” level of NCBI taxonomy. Circles are scaled
logarithmically to indicate the number of summarized reads. (Suparna Mitra,
Karin Förster-Fromme, Antje Damms-Machado, Tim Scheurenbrand, Saskia
Biskup, Daniel H. Huson, Stephan C. Bischoff (CC 2.0). https://pubmed.ncbi.nlm.
nih.gov/24564472/#&gid=article-figures&pid=figure-3-uid-2.)
124
Next Generation Sequencing in Forensic Science
14 days as compared to seven days (Kistler et al. 2015). A supervised learning
computational method was developed and applied to predict periodontitis
phenotypes based on microbial composition determined using 16S rRNA
sequence data (Gurenlian 2007). The reported jackknife accuracy was 94.83%
demonstrating its strength in predicting the disease status (Gurenlian 2007).
In another study, pyrosequencing was used to analyze the oral microbiome and
predict biofilm infections (Siqueira et al. 2012). Chen et al. (2014) suggested
the approach could be applied to forensics and other research questions.
Phylogenetic profiles of microorganisms from fecal samples depended
upon sequencing depth and NGS analysis method underscoring the need
to develop SOPs, reference databases, and standardized bioinformatics
approaches (Rajan et al. 2019). Sequencing depth beyond 60 million reads
using the Illumina HiSeq 2500 was not found to improve classification (Rajan
et al. 2019). The multiplex hidSkinPlex was developed for forensic identification and prediction of body site using the skin microbiome (Schmedes et al.
2018, Woerner et al. 2019). Prediction of body site using the skin microbiome
had an accuracy of up to 86% (Schmedes et al. 2018). Tests were conducted
using hand, chest, foot, and “all” sites (Woerner et al. 2019). A recent review
and meta-analysis of research on using the skin microbiome as a forensic
tool cautions that while skin microbial communities are personalized, body
sites and sample time impact the profile, and although intrapersonal differences are smaller than interpersonal ones, understanding the variability
will be essential to the use of this tool (Tozzo et al. 2020). NGS can be used
to determine tissue type (Aly et al. 2015). The cell type from different body
sites including hand, foot, groin, penis, vagina, mouth, and feces were analyzed (Quaak et al. 2018). Oral and fecal sites were clearly distinguished from
skin and vaginal samples, and human feces were differentiable from dog and
cat animal feces (Quaak et al. 2018). However, differentiating samples from
some skin sites was difficult as some penis site samples were highly similar to
vaginal samples and skin samples (Quaak et al. 2018). In another study, NGS
metagenomic analysis was applied to bacteria on human scalp and pubic hair
using the Roche GS Junior™ (Tridico et al. 2014). As human hairs without a
root are often problematic for identification as it is difficult to obtain STR
profiles, microbial NGS offers a new opportunity for discrimination (Tridico
et al. 2014). The microbiomes from male and female pubic hairs were found
in separate clusters using principle component analysis (PCA) (Tridico et al.
2014). The microbiome was similar for a cohabitating couple who engaged
in sexual intercourse (Tridico et al. 2014). The study concluded that metagenomics was most promising for pubic hair analyses to augment STR DNA
typing and mtDNA sequencing (Tridico et al. 2014). NGS can also sequence
minute and degraded samples and enable better mixture analysis (Aly et al.
2015).
Microbial Applications of NGS
125
8.8 NGS Applications in Geolocation, Autopsy,
PMI, and Lifestyle Analysis
In addition to body fluid and site identification, human microbiome analysis
can be used to answer several other forensic questions. Metagenomics can
answer questions about past events (Giampaoli et al. 2018). Human microbiome analysis can be used for human identification and geolocation including location of clandestine graves as well as indoor and outdoor sites (Clarke
et al. 2017, Alessandrini et al. 2019). NGS can analyze forensically relevant
environmental samples such as soil and water (Budowle et al. 2014, Gilchrist
et al. 2015). A study of the 18S rRNA from fungi of eleven samples from different soil environments with different flora including forests, fields, grasslands, and urban park yielded nine GI matches unique to the sampling area
using analysis from mothur and Blastn (Lilje et al. 2013).
Microbiome analysis can be used to provide lifestyle information and
behavioral patterns (including diet, cohabitants, pets, and romantic partners), determine the body site where a sample came from and answer contextual questions of sources of commingled body fluid samples, estimate
postmortem intervals (PMI), and determine the environmental locations a
body or object interacted with (Shrivastava et al. 2015, Schmedes et al. 2016,
Clarke et al. 2017, Cho et al. 2019). Human microbiome analysis can enable
monozygotic twin differentiation (Aly et al. 2015). It can also be used in occupational medicine investigations (Giampaoli et al. 2018) and to determine
food authenticity (Arenas et al. 2017, Haynes et al. 2019).
NGS has been used to perform molecular analysis of post-mortem and
autopsy samples. Analysis of post-mortem samples NGS of the 16S rRNA
gene using the MiSeq could lead to the determination that a bacterial infection resulted in sepsis that caused the death (Cho et al. 2019). In a test of
sixty-five post-mortem specimens in which the Illumina MiSeq and Applied
Biosystems MicroSEQ 500 16S rDNA Bacteria Identification system were
used for sequencing, the MiSeq was found to be more time- and cost-efficient
when more than thirty samples are analyzed and was easier to use for bacterial identification with a larger library for more accurate determination (Cho
et al. 2019). NGS using the Illumina platform was employed to characterize
microbial species in animals that drowned as compared to those submerged
postmortem (Wang et al. 2020). Unweighted UniFrac-based PCA differentiated the microbial constituency of the skin, lung, blood, and liver samples of
the two groups (Wang et al. 2020). NGS was applied to determine cause of
death in two recent autopsies of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) positive individuals (Sekulic et al. 2020). Quantitative
reverse transcriptase PCR was used to synthesize the cDNA for SARS-CoV-2
RNA isolated from lung tissue which was sequenced by NGS (Sekulic et al.
126
Next Generation Sequencing in Forensic Science
2020). One of the two cases “revealed mutations most consistent with Western
European Clade A2a with ORF1a L3606F mutation” (Sekulic et al. 2020). This
study demonstrates the power of NGS in molecular genetic analysis in cause
of death analyses.
8.9 Bioinformatic Approaches and Tools
As with all NGS runs, the read quality and other quality metrics are assessed.
However, unlike the human identification applications, there is no specialized, commercial software for analyzing microbial sequence data for forensic
investigations. Technical and biological validation of the applications will
be required before NGS can be adopted as a standard tool for use in casework and acceptance into courts (Kuiper 2016). The NGS data is uploaded to
the cloud and analyzed using a series of open source and low cost software
which uses bioinformatics to generate a taxonomic profile of the microbiota,
reconstruct the microbial genomes, and perform a functional analyses of the
genes. A metagenomic analysis is conducted to analyze the WGS and 16S
rRNA data, determine the functional groups of the genes, and construct a
taxonomic classification. Operational taxonomic units (OTUs) are generated
by comparisons to a database and classified using a taxonomic classification for taxonomic profiling and used to generate a phylogenetic tree. For
example, the Qiime software can be used to detect chimera, cluster OTUs,
pick representative sequences, assign taxonomy, and generate a taxonomic
table. MG-RAST uses the uploaded sequences and metadata to assess the
quality, perform RNA identification and clustering, assign taxonomy, and
prepare a taxonomy table. The mothur program can perform quality control,
align sequences, clean alignments, pre-cluster sequences and perform chimera detection, classify sequences, remove non-bacterial sequences, generate
a distance matrix, cluster and classify OTUs, and create a taxonomy table.
Bar graphs are generated to demonstrate the proportion of each family of
microbe present in a sample. The bar graphs look like a bar code, as shown in
Figure 8.1 and differences are readily apparent.
Culture-independent analysis such as NGS involves PCR amplification
followed by sequencing. Analysis tools and pipelines perform a variety of
functions including data cleaning, sequence alignment, gene classification
and annotation, and grouping sequences into OTUs. OTU groupings are
used to infer phylogenetic and taxonomic relationships. Many analysis tools
to manage and analyze NGS data were developed during the HMP timeframe. In a study, three analysis methods (QIIME, mothur, and MG-RAST)
were compared for their performance in evaluating 16 S rRNA sequence data
from preterm gut microbiota (Plummer et al. 2015). These tools perform tasks
Microbial Applications of NGS
127
including quality control, sequence alignment, adding metadata, classifying
sequences, nonbacterial sequence purging, RNA identification and clustering, chimera detection, OTU clustering, sequencing picking, and taxonomy
and have a similar workflow (Plummer et al. 2015). Mothur was able to annotate a “slightly higher number of reads” (Plummer et al. 2015). The workflow
times varied from an hour for QIIME, ten hours for mothur, and two days
with manual cleaning for MG-RAST (Plummer et al. 2015). The three programs identified the sample phyla as most abundant although MG-RAST left
the most phyla unclassified and failed to identify some low abundance phyla
(Plummer et al. 2015). The HmmUFOtu tool processes microbiome amplicon
sequencing data, clusters the data into OTUs, and assigns taxonomy (Zheng
et al. 2018). In the authors’ comparison to standard pipelines, HmmUFOtu
was more accurate in determining microbial community diversity and composition faster with a very high accuracy (Zheng et al. 2018). For comparison,
the tool had the same high accuracy of the NCBI Blastn pairwise alignment
algorithm but was over 300 times faster due to its efficiency with multiple
sequence alignment (Zheng et al. 2018). Other tools have also been developed
to handle and clean the sequence data. As NGS produces short sequences that
require mapping to a reference genome, the DecontaMiner tool was developed to detect the presence of contaminating unmapped sequences, either
from the lab or the biological source, in human R NA-Seq data (Sangiovanni
et al. 2019). The Kraken 2 program is a tool that offers “ultrafast and accurate 16S rRNA microbial community analysis” (Lu and Salzberg et al. 2020).
The CLUSTOM tool was developed for clustering 16S rRNA NGS data by
overlap minimization (Hwang et al. 2013). Identifying microbes in a majority human sample is a difficult signal-to-noise problem (Minogue et al. 2019).
The PathSeq tool was shown to discover viral sequences present in human
tissue sequenced by deep sequencing (Kostic et al. 2011). A full review of
bioinformatics tools is beyond the scope of this chapter. However, there are
many bioinformatics tools available; the one chosen for use by forensic labs
will need to undergo testing validation in the lab environment prior to use
on casework.
8.10 Bioforensics and Biosurveillance
NGS capabilities have developed from applications in bioforensics, biosurveillance, and infectious disease diagnosis (Budowle et al. 2014, Yang et al. 2014,
Schmedes et al. 2016, Arenas et al. 2017, Minogue et al. 2019). NGS biothreat
surveillance and diagnostics grew out of research funding that followed the
2001 Amerithrax case investigation (Minogue et al. 2019). The nascent tool
saw little use in the Amerithrax case (Minogue et al. 2019); WGS and Sanger
128
Next Generation Sequencing in Forensic Science
sequencing was used to identify the source of the specimen (Minogue et al.
2019). Broomall et al. (2016) report that sequences of gamma-irradiated mail
rendering non-viable substances could be determined using the Illumina and
454 NGS platforms. NGS is being used in West Africa for Ebola virus biosurveillance (Minogue et al. 2019).
NGS can be used for source agent identification (Minogue et al. 2019).
Determination of the molecular signature of a pathogen is possible with NGS
(Gilchrist et al. 2015, Minogue et al. 2019). WGS coupled with NGS can be
used to identify pathogenic organisms including food-borne pathogen bacteria that may be used as bioweapons and indicate bioterrorism activity in a
disease outbreak (Sjödin et al. 2012, Gilchrist et al. 2015, Elkins 2019, Elkins
and Bender 2020).
NGS can also be used to detect the introduction of plant and animal
diseases including bacteria, virus, and fungal microorganisms for forensic
applications (Sjödin et al. 2012). As an example, the presence of fungi such
as Aspergillus can indicate the location of an indoor wall, while Penicillium,
Debaryomices, and Wickerhamomyces indicate food storage (Giampaoli et al.
2020). The MiSeq NGS platform was used to analyze food spoilage organisms and detect that the bacteria in cooked ham was mostly from four phyla,
while vacuum-packed ham samples contained three families of cold resistant flora including a Clostridium spp. (Piotrowska-Cyplik et al. 2016). While
microorganisms may not always inform the specific environment, NGS can
be used to produce sequence data that can cluster samples with similar provenance (Giampaoli et al. 2020). A metabarcoding approach was applied to
forensically related environmental soil samples resulting in an accurate and
sensitive analysis of organisms including microflora, plants, metazoa, and
protozoa (Giampaoli et al. 2014).
8.11 Infectious Disease Diagnostics
MPS and bioinformatics characterization of microorganisms can be used
to track infectious diseases (Schmedes et al. 2016). In a recent study, NGS
performed using an Illumina MiSeq was used to detect emerging infectious
disease from viruses and bacteria in bats using DNA extracted from blood
samples (Hadi et al. 2020). The data was analyzed using Bowtie and searches
using Blastn in the NCBI databank (Hadi et al. 2020). The genetic data showed
that bat immunity evolved with flight ability (Hadi et al. 2020). Human
genetic analysis has also shown that immunity develops with bacterial exposure (Sharma and Gilbert 2018). NGS was used instead of Sanger sequencing to investigate Hepatitis B virus (HBV) infection in ninety-four patients
and forty-five chronic HBV-infected individuals; word pattern frequencies
Microbial Applications of NGS
129
of HBV sequences differentiated the HBV genotypes and infection status
(Bai et al. 2018). In another study, NGS was used to detect encephalitis virus
in donated transplant organs and transmission from the organs (Lipowski
et al. 2017). WGS and NGS were used to analyze Burkholderia mallei and
Burkholderia pseudomallei that cause the human diseases melioidosis and
glanders using SNPs and determine the direction of mutation using passaged
samples (Jakupciak et al. 2013).
8.12 NGS Applications in Archeology
Metagenomics deep sequencing studies can also be applicable to investigations of ancient host microbiomes. In a limited genetic investigation,
Helicobacter pylori was detected in ancient DNA in archeological samples
from 17th-century Korean mummy stomachs (Shin et al. 2018). H. pylori can
cause gastric disease (Shin et al. 2018). The authors identified H. pylori DNA
vacA (s- and m-region) alleles from stomach isolates of two samples including vacA s1/m2 in a Cheongdo mummy and s1 in a Dangjin mummy and
suggest that NGS is needed for full characterization of the samples (Shin et al.
2018). In another study, sequencing data was analyzed using bioinformatics
methods to screen for Mycobacterium tuberculosis complex fingerprints in
twenty-eight individuals (dated 4400–4000 and 3100–2900 BC) from central
Poland and showed that NGS led to the identification of probable ancient
disease cases supported by statistics (Borówka et al. 2019)
8.13 Summary of NGS Microbial Sequencing
Applications in Forensic Investigation
Microbiome sequences contain all of the information needed to solve a variety of forensic cases. NGS using both targeted and metagenomics approaches
have been applied to a wide variety of forensic applications (Figure 8.3). NGS
has differentiated sources of body fluids and sites on the human body and
fingerprinted their microbial composition. It has been used to determine the
cause of death in an autopsy and signs of disease in oral and blood samples. It
has been used to differentiate hair sources and predict lifestyle patterns. NGS
has greatly enhanced the capabilities in disease diagnostics, bioforensics,
and biosurveillance as well as soil profiling and geolocation. It has been used
in archeological studies to identify gut bacteria and infection. Microbiome
analysis will be incorporated more widely in forensic investigations in the
future.
130
Next Generation Sequencing in Forensic Science
HUMAN
IDENTIFICATION
HUMAN BODY FLUID
IDENTIFICATION
HUMAN BODY SITE
IDENTIFICATION
GEOLOCATION
MICROBIAL
NGS
INFECTIOUS DISEASE
DIAGNOSTICS
BIOTERRORISM AND
BIOSURVEILLANCE
LIFESTYLE ANALYSIS
PMI ESTIMATION AND CAUSE
OF DEATH
PREDICTION
Figure 8.3 Summary of forensic applications of microbial NGS.
Questions
1. What loci are targeted in microbes for sequencing?
2. Explain how NGS is used for microbial analysis.
3. Describe data analysis approaches for cleaning and interpreting
microbial DNA sequence data.
4. Explain how NGS can be used to detect and differentiate body regions
a sample is derived from or was in contact with.
5. Explain how NGS of microbes can be used for geolocation.
6. List NGS issues that must be overcome with microbial DNA data
analysis before the tool is widely adopted in forensic science.
References
Alessandrini, F., Brenciani, A., Fioriti, S., Melchionda, F., Mingoia, M., Morroni,
G., and A. Tagliabracci. “Validation of a universal DNA extraction method
for human and microbiAL DNA analysis.” Forensic Science International
Genetics Supplement Series 7, no. 1 (December 2019): 256–258. doi:10.1016/j.
fsigss.2019.09.098.
Aly, S.M., and D.M. Sabri. “Next generation sequencing (NGS): a golden tool in
forensic toolkit.” Archiwum medycyny sadowej i kryminologii 65, no. 4 (2015):
260–271. doi:10.5114/amsik.2015.61029.
Arenas, M., Pereira, F., Oliveira, M., Pinto, N., Lopes, A.M., Gomes, A., Carracedo,
A., and A. Amorim. “Forensic genetics and genomics: Much more than just a
human affair.” PLoS Genetics 13 (September 21, 2017): e1006960. doi:10.1371/
journal.pgen.1006960.
Microbial Applications of NGS
131
Bai, X., Jia, J.-A., Fang, M., Chen, S., Liang, X., Zhu, S., Zhang, S., Feng, J., Sun, F., and
C. Gao. “Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV
genotypes and associations of word pattern frequencies with HCC.” PLoS Genetics
14, no. 2 (February 23, 2018): 1007206. doi:10.1371/journal.pgen.1007206.
Belizário, J.E., and M. Napolitano. “Human microbiomes and their roles in dysbiosis, common diseases, and novel therapeutic approaches.” Frontiers in
Microbiology 6 (October 6, 2015): 1050. doi:10.3389/fmicb.2015.01050.
Bender, A.C., Faulkner, J.A., Tulimieri, K., Boise, T.H., and K.M. Elkins. “High
resolution melt assays to detect and identify Vibrio parahaemolyticus, Bacillus
cereus, Escherichia coli, and Clostridioides difficile bacteria.” Microorganisms 8,
no. 4 (April 14, 2020): 561. doi:10.3390/microorganisms8040561.
Borówka, P., Pułaski, L., Marciniak, B., Borowska-Strugińska, B., Dziadek, J.,
Żądzińska, E., Wiesław Lorkiewicz, W., and D. Strapagiel. “Screening methods
for detection of ancient Mycobacterium tuberculosis complex fingerprints in
next-generation sequencing data derived from skeletal samples.” GigaScience
8, no. 6 (June 1, 2019): giz065. doi:10.1093/gigascience/giz065.
Broomall, S.M., Ichou, M.A., Krepps, M.D., Johnsky, L.A., Karavis, M.A., Hubbard,
K.S., Insalaco, J.M., Betters, J.L., Redmond, B.W., Rivers, B.A., Liem, A.T., Hill,
J.M., Fochler, E.T., Roth, P.A., Rosenzweig, C.N., Skowronski, E.W., and H.S.
Gibbon. “Whole-genome sequencing in microbial forensic analysis of gammairradiated microbial materials.” Applied Environmental Microbiology 82, no. 2
(January 2016): 596–607. doi:10.1128/AEM.02231-15.
Budowle, B., Connell, N.D., Bielecka-Oder, A., Colwell, R.R., Corbett, C.R., Fletcher,
J., Forsman, M., Kadavy, D.R., Markotic, A., Morse, S.A., Murch, R.S., Sajantila,
A., Schmedes, S.E., Ternus, K.L., Turner, S.D., and S. Minot. “Validation of high
throughput sequencing and microbial forensics applications.” Investigative
Genetics 5 (June 30, 2014): 9. doi:10.1186/2041-2223-5-9.
Chen, W., Cheng, Y.M., Zhang, S.W., and Q. Pan. “Supervised method for periodontitis phenotypes prediction based on microbial composition using 16S rRNA
sequences.” International Journal of Computational Biology and Drug Design 7,
no. 2 (May 28, 2014): 214–224. doi:10.1504/IJCBDD.2014.061647.
Cho, Y., Lee, M.H., Kim, H.S., Park, M., Kim, M.-H., Kwon, H., Kim, J., Lee, Y.H., and
D.S. Lee. “Comparative analysis of Sanger and next generation sequencing methods for 16S rDNA analysis of post-mortem specimens.” Australian Journal of
Forensic Sciences 51, no. 5 (2019): 426–455. doi:10.1080/00450618.2017.1402957.
Chowdhury, S., and S.S. Fong. “Computational modeling of the human microbiome.”
Microorganisms 8, no. 2 (January 21, 2020): 197. doi:10.3390/microorganisms
8020197.
Clarke, T.H., Gomez, A., Singh, H., Nelson, K.E., and L.M. Brinkac. “Integrating the
microbiome as a resource in the forensics toolkit.” Forensic Science International:
Genetics 30 (September 2017): 141–147. doi:10.1016/j.fsigen.2017.06.008.
Elkins, K.M. Introduction to Forensic Chemistry. Boca Raton, FL: CRC Press/
Taylor & Francis, 2019.
Elkins, K., and A. Bender. “Detection and identification of foodborne pathogens.”
Encyclopedia 1 (2020). https://encyclopedia.pub/512.
Franco-Duarte, R., Černáková, L., Snehal, K., Kaushik, K.S., Salehi, B., Bevilacqua,
A., Corbo, M.R., Antolak, H., Dybka-Stępień, K., Leszczewicz, M., and S.
132
Next Generation Sequencing in Forensic Science
Relison Tintino. “Advances in chemical and biological methods to identify
microorganisms—From past to present.” Microorganisms 7, no. 5 (May 13,
2019): 130. doi:10.3390/microorganisms7050130.
Giampaoli, S., Berti, A., Di Maggio, R.M., Pilli E., Valentini, A., Valeriani, F.,
Gianfranceschi, G., Barni F., Ripani, L., and V.R. Spica. “The environmental
biological signature: NGS profiling for forensic comparison of soils.” Forensic
Science International 240 (July 2014): 41–47. doi:10.1016/j.forsciint.2014.02.028.
Giampaoli, S., DeVittori, E., Valeriani, F., Berti, A., and V. Romano Spica.
“Informativeness of NGS analysis for vaginal fluid identification.” Journal of
Forensic Sciences 62, no. 1 (January 2017): 192–196. doi:10.1111/1556-4029.13222.
Giampaoli, S., Alessandrini, F., Frajese, G.V., Guglielmi, G., Tagliabracci, A., and A.
Berti. “Environmental microbiology: Perspectives for legal and occupational
medicine.” Legal Medicine (Tokyo) 35 (November 2018): 34–43. doi:10.1016/j.
legalmed.2018.09.014.
Giampaoli, S., De Vittori E., Frajese, G.V., Paytuví, A., Sanseverino, W, Anselmo, A.,
Barni, F., and A. Berti. “A semi-automated protocol for NGS metabarcoding
and fungal analysis in forensic.” Forensic Science International 306 (January
2020): 110052. doi:10.1016/j.forsciint.2019.110052.
Gilchrist, C.A., Turner, S.D., Riley, M.F., Petri, W.A., and E.L. Hewlett. “Whole-genome
sequencing in outbreak analysis.” Clinical Microbiology Reviews 28, no. 3 (July
2015): 541–563. doi:10.1128/CMR.00075-13.
Gurenlian, J.R. “The role of dental plaque biofilm in oral health.” Journal of Dental
Hygiene 81, no. 5 (October 2007): 116. https://jdh.adha.org/content/jdenthyg/
81/suppl_1/116.full.pdf.
Hadi, M.I., Alamudi, M.Y., Suprayogi, D., and M. Widiyanti. “Detection of emerging
infectious disease in Chiroptera brachjatis and Rhinolopus boorneensis as reservoirs of zoonotic diseases in Indonesia.” Indian Journal of Forensic Medicine
and Toxicology 14(2020): 2027–2032.
Hasan, N.A., Young, B.A., Minard-Smith, A.T., Saeed, K., Li, H., Heizer, E.M.,
McMillan, N.J., Isom, R., Abdullah, A.S., and D.M. Bornman. “Microbial community profiling of human saliva using shotgun metagenomic sequencing.”
PLoS One 9 (May 20, 2014): e97699. doi:10.1371/journal.pone.0097699.
Haynes, E., Jimenez, E., Pardo, M.A., and S.J. Helyar. “The future of NGS (Next
Generation Sequencing) analysis in testing food authenticity.” Food Control
101 (July 2019): 134–143. doi:10.1016/j.foodcont.2019.02.010.
Hwang, K., Oh, J., Kim, T.K., Kim, B.K., Yu, D.S., Hou, B.K., Caetano-Anollés, G.,
Hong, S.G., and K.M. Kim. “CLUSTOM: a novel method for clustering 16S
rRNA next generation sequences by overlap minimization.” PLoS One 8, no. 5
(May 1, 2013): e62623. doi:10.1371/journal.pone.0062623.
Jakupciak, J.P., Wells, J.M., Karalus, R.J., Pawlowski, D.R., Lin, J.S., and A.B. Feldman.
“Population-sequencing as a biomarker of Burkholderia mallei and Burkholderia
pseudomallei evolution through microbial forensic analysis.” Journal of Nucleic
Acids 2013 (December 17, 2013): 801505. doi:10.1155/2013/801505.
Kistler, J.O., Pesaro, M., and W.G. Wade. “Development and pyrosequencing analysis of an in-vitro oral biofilm model.” BMC Microbiology 15 (February 10,
2015): 24. doi:10.1186/s12866-015-0364-1.
Kostic, A.D., Ojesina, A., Pedamallu, C.S., Jung, J., Verhaak, R.G.W., Getz, G., and
M. Meyerson. “PathSeq: Software to identify or discover microbes by deep
Microbial Applications of NGS
133
sequencing of human tissue.” Nature Biotechnology 29, no. 5 (May 2011): 393–
396. doi:10.1038/nbt.1868.
Kuiper, I. “Microbial forensics: next-generation sequencing as catalyst: The use of
new sequencing technologies to analyze whole microbial communities could
become a powerful tool for forensic and criminal investigations.” EMBO
Reports 17 (2016): 1085–1087. doi:10.15252/embr.201642794..
Leong, L.E.X., Taylor, S.L, Shivasami, A., Goldwater, P.N., and G.B. Rogers.
“Intestinal microbiota composition in sudden infant death syndrome and
age-matched controls.” Journal of Pediatrics 191 (December 2017): 63–68.
doi:10.1016/j.jpeds.2017.08.070.
Lilje, L., Lillsaar, T., Rätsep, R., Simm, J., and A. Aaspõllu. “Soil sample metagenome NGS data management for forensic investigation.” Forensic Science
International: Genetics Supplement Series 4, no. 1 (2013): e35–e36. doi:10.1016/j.
fsigss.2013.10.017.
Lipowski, D., Popiel, M., Perlejewski, K., Nakamura, S., Bukowska-Osko, I.,
Rzadkiewicz, E., Dzieciatkowski, T., Milecka, A., Wenski, W., Ciszek, M., and
A. Dębska-Ślizień. “A cluster of fatal tick-borne encephalitis virus infection in
organ transplant setting.” The Journal of Infectious Diseases 215, no. 6 (March
15, 2017): 896–901. doi:10.1093/infdis/jix040.
Lu, J., and S.L. Salzberg. “Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2.” Microbiome 8 (2020): 124. doi:10.1186
/s40168-020-00900-2.
Milani, C., Duranti, S., Bottacini, F., Casey, E., Turroni, F., Mahony, J., Belzer, C.,
Palacio, S.D., Montes, S.A., Mancabelli, L., and G.A. Lugli. “The first microbial
colonizers of the human gut: Composition, activities, and health implications
of the infant gut microbiota.” Microbiology and Molecular Biology Reviews 81,
no. 4 (November 8, 2017): e00036-17. doi:10.1128/MMBR.00036-17.
Minogue, T.D., Koehler, J.W., Stefan, C.P., and T.A. Conrad. “Next-generation
sequencing for biodefense: Biothreat detection, forensics, and the clinic.” Clinical
Chemistry 65, no. 3 (March 1, 2019): 383–392. doi:10.1373/clinchem.2016.266536.
Mitra, S., Förster-Fromme, K., Damms-Machado, A., Scheurenbrand, T.,
Biskup, S., Huson, D. H., and S.C. Bischoff. “Analysis of the intestinal
microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing.” BMC Genomics 14, Suppl no. 5 (October 16, 2013): S16.
doi:10.1186/1471-2164-14-S5-S16.
Pasolli, E., Asnicar, F., Manara, S., Zolfo, M., Karcher, N., Armanini, F., Begini, F.,
Manghi, P., Tett, A., Ghensi, P., Collado, M.C., Rice, B.L., DuLong, C., Morgan,
X.C., Golden, C.D., Quince, C., Huttenhower, C., and N. Segata. “Extensive
unexplored human microbiome diversity revealed by over 150,000 genomes
from metagenomes spanning age, geography, and lifestyle.” Cell 176, no. 3
(January 24, 2019): 649–662. doi:10.1016/j.cell.2019.01.001.
Phadke, S., Salvador, A.F., Alves, J.I., Bretschger, O., Alves, M.M., and M.A. Pereira.
“Harnessing the power of PCR molecular fingerprinting methods and next
generation sequencing for understanding structure and function in microbial
communities.” In PCR, 1st ed., edited by L. Domingues, 225–248. New York:
Springer, 2017. doi:10.1007/978-1-4939-7060-5_16.
Piotrowska-Cyplik, A., Myszka, K., Czarny, J., Ratajczak, K., Kowalski, R., and R.
Biegańska‐Marecik. “Characterization of specific spoilage organisms (SSOs)
134
Next Generation Sequencing in Forensic Science
in vacuum-packed ham by culture-plating techniques and MiSeq next-generation sequencing technologies.” Journal of the Science of Food and Agriculture
97 (January 2016): 659–668. doi:10.1002/jsfa.7785.
Plummer, E., Twin, J., Bulach, D.M., Garland, S.M., and S.N. Tabrizi. “A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data.” Journal of Proteomics and
Bioinformatics 8, no. 12 (December 2015). doi:12. 10.4172/jpb.1000381.
Quaak, F.C.A., van Duijn, T., Hoogenboom, J., Kloosterman, A.D., and I. Kuiper.
“Human-associated microbial populations as evidence in forensic casework.” Forensic Science International: Genetics 36 (September 2018): 176–185.
doi:10.1016/j.fsigen.2018.06.020.
Rajan, S.K., Lindqvist, M., Brummer, R.J., Schoultz, I., and D. Repsilber. “Phylogenetic
microbiota profiling in fecal samples depends on combination of sequencing
depth and choice of NGS analysis method.” PLoS One 14, no. 9 (September
2019): e0222171. doi:10.1371/journal.pone.0222171.
Sangiovanni, M., Granata, I., Thind, A.S., and M.R. Guarracino. “From trash to
treasure: Detecting unexpected contamination in unmapped NGS data.” BMC
Bioinformatics 20, Suppl 4 (April 18, 2019): 168. doi:10.1186/s12859-019-2684-x.
Schmedes, S.E., Sajantila, A., and B. Budowle. “Expansion of microbial forensics.”
Journal of Clinical Microbiology 54 (2016): 1964–1974. doi:10.1128/JCM.00046-16.
Schmedes, S.E., Woerner, A.E., Novroski, N.M.M., Wendt, F.R., King, J.L., Stephens,
K.M., and B. Budowle. “Targeted sequencing of clade-specific markers from skin
microbiomes for forensic human identification.” Forensic Science International:
Genetics 32 (January 2018): 50–61. doi:10.1016/j.fsigen.2017.10.004.
Sekulic, M., Harper, H., Nezami, B.G., Shen, D.L, Sekulic, S.P., Koeth, A.T., Harding,
C.V., Gilmore, H., and N. Sadri. “Molecular detection of SARS-CoV-2 infection in FFPE samples and histopathologic findings in fatal SARS-CoV-2 cases.”
American Journal of Clinical Pathology 154, no. 2 (July 7, 2020): 190–200.
doi:10.1093/ajcp/aqaa091.
Sharma, A., and J.A. Gilbert. “Microbial exposure and human health.” Current
Opinion in Microbiology 44 (August 2018): 79–87. doi:10.1016/j.mib.2018.08.003.
Shin, D.H,. Oh, C.S., Hong, J.H., Lee, H., Lee, S.D., and E. Lee. “Helicobacter pylori
DNA obtained from the stomach specimens of two 17(th) century Korean mummies.” Anthropologischer Anzeiger 75, no. 1 (2018): 75–87. doi:10.1127/anthranz/
2018/0780.
Shrivastava, P., Jain, T., and M.K. Gupta. “Microbial forensics in legal medicine.”
SAS Journal of Medicine 1 (2015): 33–40.
Siqueira, J.F., Fouad, A.F., and I.N. Rôças. “Pyrosequencing as a tool for better
understanding of human microbiomes.” Journal of Oral Microbiology 4, no. 1
(2012): 10743. doi:10.3402/jom.v4i0.10743.
Sjödin, A., Broman, T., Melefors, O., Andersson, G., Rasmusson, B., Knutsson, R.,
and M. Forsman. “The need for high-quality whole-genome sequence databases in microbial forensics.” Biosecurity and Bioterrorism 11 (2012): S78–S86.
doi:10.1089/bsp.2013.0007.
Tozzo, P., D’Angiolella, G., Brun, P., Castagliuolo, I., Gino, S., and L. Caenazzo. “Skin
microbiome analysis for forensic human identification: What do we know so far?”
Microorganisms 8 (June 9, 2020): 873. doi:10.3390/microorganisms8060873.
Microbial Applications of NGS
135
Tridico, S.R., Murray, D.C., Addison, J., Kirkbride, K.P., and M. Bunce. “Metagenomic
analyses of bacteria on human hairs: A qualitative assessment for applications in forensic science.” Investigative Genetetics 5 (December 16, 2014): 16.
doi:10.1186/s13323-014-0016-5.
Wang, L.-L. Zhang, F.-Y., Dong, W.-W., Wang, C.-L., Liang, X.-Y., Suo, L.-L., Cheng,
J., Zhang, M., Guo, X.-S., Jiang, P.-H., Guan, D.-W., and R. Zhao. “A novel
approach for the forensic diagnosis of drowning by microbiological analysis with next-generation sequencing and unweighted UniFrac-based PCoA.”
International Journal of Legal Medicine 134, no. 6 (2020): 2149–2159. doi:10.10
07/s00414-020-02358-1.
Welinder-Olsson, C., Dotevall, L., Hogevik, H., Jungnelius, R., Trollfors, B., Wahl,
M., and P. Larsson. “Comparison of broad-range bacterial PCR and culture of
cerebrospinal fluid for diagnosis of community-acquired bacterial meningitis.” Clinical Microbiology and Infection 13, no. 9 (September 2007): 879–886.
doi:10.1111/j.1469-0691.2007.01756.x.
Willis, J.R., and T. Gabaldón. “The human oral microbiome in health and disease:
From sequences to ecosystems.” Microorganisms 8, no. 2 (February 23, 2020):
308. doi:10.3390/microorganisms8020308.
Woerner, A.E., Novroski, N.M.M., Wendt, F.R., Ambers, A., Wiley, R., Schmedes,
S.E., and B. Budowle. “Forensic human identification with targeted microbiome
markers using nearest neighbor classification.” Forensic Science International:
Genetics 38 (January 2019): 130–139. doi:10.1016/j.fsigen.2018.10.003.
Wojciuk, B., Salabura, A., Grygorcewicz, B., Kędzierska, K., Ciechanowski, K., and
B. Dołęgowska. “Urobiome: In sickness and in health.” Microorganisms 7
(November 10, 2019): 548. doi:10.3390/microorganisms7110548.
Yang, Y., Xie, B., and J. Yan. “Application of next-generation sequencing technology
in forensic science.” Genomics, Proteomics & Bioinformatics 12, no. 5 (October
2014): 190–197. doi:10.1016/j.gpb.2014.09.001.
Zheng, Q., Bartow-McKenney, C., Meisel, J.S., and E.A. Grice. “HmmUFOtu: An
HMM and phylogenetic placement based ultra-fast taxonomic assignment
and OTU picking tool for microbiome amplicon sequencing studies.” Genome
Biology 19 (June 27, 2018): 82. doi:10.1186/s13059-018-1450-0.
Body Fluid Analysis
Using Next Generation
Sequencing
9
9.1 Introduction
Body fluids analysis is performed in forensic cases to understand the circumstances of the case. Traditional methods of body fluid analysis include
colorimetric, enzymatic, immunochromatographic, enzyme-linked immunosorbent assay (ELISA), and microscopy assays (Virkler and Lednev
2009). While these have many strengths, specifically low cost, ease of use,
and portability, there is room for improvement. For example, with the current methods, the analyst must decide which test to perform based on their
expectations of which body fluid is likely to be present and then each body
fluid must be tested individually. This is a significant drawback as casework
evidence samples may contain more than one body fluid from one or more
donors. In addition, many of the traditional tests require intact enzymes
which depend on the stability of these molecules. Conditions that lead to
damaged and degraded samples often degrade or render enzyme molecules
inactive. Additionally, the sensitivity of the tests varies, and some are not
very sensitive. Similarly, many of the tests are not specific and have known
false positives. Furthermore, trace evidence is submitted in many cases and
is simply not sufficient to permit both body fluid analysis and identity testing. Finally, traditional body fluid tests are destructive, and the quantity of
the evidence sample may preclude body fluid testing if identity testing is to
be performed. Next generation sequencing (NGS) offers a solution to several
of the above problems. NGS methods including pyrosequencing and massively parallel sequencing have been developed for multiplex body fluid testing. Samples include mRNA and methylated DNA that can be co-extracted
during sample preparation steps for human identification testing. NGS is not
without its own drawbacks. It requires specialized instrumentation and analyst training. Nevertheless, it offers high specificity and sensitivity, simultaneous identification of body fluids, and reproducibility.
9.2 Epigenetic-Based Tissue Source Attribution
Epigenetics is the study of gene expression. Gene expression varies by cell
type but can also change with environmental factors, aging, and disease such
DOI: 10.4324/9781003196464-9
137
138
Next Generation Sequencing in Forensic Science
as carcinogenesis (Lilischkis et al. 2001, Bernstein et al. 2007). Gene expression can be modulated by chemical modification to the cytosine nitrogenous
base in DNA. Select cytosine residues are methylated at C-5 of the pyrimidine base and are termed 5ʹ-methylcytosine (Lilischkis et al. 2001, Bernstein
et al. 2007). The methylation of cytosine creates an additional layer for individualization of DNA valuable for identifying identical twins. Cytosine
methylation in DNA was found to be a heritable trait like the sequence of the
nitrogenous bases in the human genome, and methylation sites have been
annotated (Meissner et al. 2005, Bernstein et al. 2007, Harrow et al. 2012).
Methylated cytosines generally precede guanine bases forming so-called
CpG islands in the 5ʹ to 3ʹ direction on the DNA strand (Lilischkis et al. 2001,
Bernstein et al. 2007). CpG islands represent approximately 2% of the cytosines in the human genome and are most commonly found in the promoter
region of genes (Lilischkis et al. 2001). While they usually repress transcription, in some cases they promote transcription (Lilischkis et al. 2001, Weber
et al. 2005). Methylated cytosines cluster in CpG islands, but the pattern of
CpG methylation varies among body fluids and tissues (Lilischkis et al. 2001).
Paliwal et al. (2010) reviewed quantitative detection of DNA methylation at
CpG sites and developed a quantitation method for minute DNA.
Pyrosequencing is an NGS tool that has been used to identify and quantify differentially methylated loci (Dejeux et al. 2009, Powell et al. 2018).
Pyrosequencing theory is covered in Chapter 2. Prior to sequencing, the
extracted DNA can be treated with bisulfite which, at low pH, converts
the unmethylated cytosines in the sequence to uracil (Grunau et al. 2001,
Lilischkis et al. 2001). Using PCR, the target is amplified, and the uracil bases
are replaced with thymine in the amplified DNA. Methylated cytosines are
copied as cytosines in PCR. Tissue-specific differentially methylated regions
(tDMRs) have been identified and serve as the targets in forensic body tissue and fluid analysis (Rakyan et al. 2008, Frumkin et al. 2011, Lee et al.
2012, Slieker et al. 2013, Balamuragan et al. 2014). Genomic DNA that was
extracted from 3-mm dried blood spots, bisulfite treated, and PCR-amplified
was sequenced to analyze the imprinted gene SNRPN (Xu et al. 2012). Slieker
et al. (2013) used an Illumina 450k chip to identify and annotate tDMRs
for blood, saliva, buccal swabs, and hair follicle tissue and found that most
tDMRs were in CpG-poor regions. DNA extracts from evidence can be fully,
partially, or unmethylated at the sites of interest. The output pyrogram produced in pyrosequencing can be used to quantify the percent methylation
(Dejeux et al. 2009, Qiagen PyroMark ® Q48 Autoprep User Manual, 2020).
Park et al. (2014) evaluated over 450,000 CpG sites using an Illumina
450k chip. They identified eight markers that demonstrated high sensitivity
and specificity for body fluid identification (Park et al. 2014). Bruce McCord
and his group has developed several pyrosequencing assays for body fluids
Body Fluid Analysis Using NGS
139
Table 9.1 Genetic Markers Identified Body Fluid Identification
Using Pyrosequencing
Marker
Body Fluid
AHRR, cg06379435, C20orf117, cg06379435, cg08792630
ZC3H12D, FGF7, cg23521140, cg17610929
NMUR2, UBE2U, B_SPTB_03
BCAS4, SA-6, cg26107890, cg20691722
PFN3A, VE_8
cg01774894, cg14991487
Blood
Semen
Sperm
Saliva/oral epithelia
Vaginal epithelia
Vaginal secretions
including blood, saliva, semen, and vaginal fluid using the Qiagen PyroMark
Assay Design software using various marker sites for the body fluids (Table 9.1).
Elkins has described details of pyrosequencing primer design for CpG targets
(Elkins 2021), and McCord’s group has published several reports over the past
half dozen years detailing the sequences and applications of pyrosequencing
primers they have designed and tested for body fluid analysis (Madi et al.
2012, Balamurugan et al. 2014, Antunes et al. 2016, Silva et al. 2016, Gauthier
et al. 2019, Alghanim et al. 2020). His team has reported a pyrosequencing
multiplex assay that detects and identifies blood, saliva, semen, and vaginal
cells simultaneously that has been licensed and distributed by Qiagen (Powell
et al. 2018). In the assay, BCAS4, ZC3H12D, cg06379435, and VE_8 loci are
used to detect and identify saliva, semen, blood, and vaginal epithelial cells,
respectively, after they have been bisulfite treated using the Qiagen EpiTect®
Fast DNA Bisulfite Kit (Figure 9.1).
9.3 mRNA-Based Tissue Source Attribution
In addition to methylated DNA targets, RNA targets have also been examined for forensic body fluid analysis. Many RNA molecules have been characterized including messenger RNA (mRNA), microRNA (miRNA), and
small nuclear RNA (snRNA). Several studies have evaluated NGS approaches
for differentiating body fluids using mRNA. In 2015, Børsting and Morling
reviewed forensic applications of NGS using markers such as mRNA. Zubakov
et al. (2015) evaluated the Ion Torrent PGM instrument in its ability to simultaneously individualize samples by STR DNA typing, perform sex typing by
amelogenin, and perform body fluid/tissue identification. Dørum et al. (2018)
analyzed 183 body fluids/tissues using their NGS mRNA approach using MPS
and used partial least squares (PLS) and linear discriminant analysis (LDA)
to classify samples using NGS read counts into one of six body fluids. They
tested the model on mixed body fluid samples and its ability to identify the
140
Next Generation Sequencing in Forensic Science
(a)
(b)
(c)
(d)
Figure 9.1 Pyrograms resulting from a vaginal epithelial sample analyzed
with the Body Fluid Identification Multiplex. Vaginal epithelia is characterized by moderate methylation in the BCAS4 assay (a), hypomethylation in the
cg06379435 assay (b), moderate methylation in the VE_8 assay (c), and hypermethylation in the ZC3H12D assay (d). The combination of multiple body fluid
assays in a single reaction allows for higher accuracy in body fluid identification
while reducing sample consumption and costs. (Courtesy of Quentin Gauthier.)
individual body fluid components in a mixture (Dørum et al. 2018). In 2018,
Hanson et al. published a study of their work using targeted mRNA sequencing to identify blood, saliva, semen, vaginal secretions, menstrual blood, and
skin using a thirty-three-biomarker assay. Their assay correctly identified the
body fluids in a study in their lab and in a blinded study (Hanson et al. 2018).
9.4 MicroRNA Analysis
MicroRNA (miRNA) was proposed for forensic applications by Courts and
Madea in 2010. At 18–24 nucleotides in length (Yang et al. 2014), miRNA
Body Fluid Analysis Using NGS
141
molecules are shorter in length than mRNA and small nuclear RNA (snRNA)
molecules. Like mRNA and snRNA, they are endogenous to the organism
(Hanson et al. 2009, Yang et al. 2014). Their small size resists degradation
(Yang et al. 2014). Thus, miRNA profiling is an attractive tool for typing damaged or compromised samples. Hanson et al. (2009) analyzed 452 miRNAs
from different body fluids including blood, menstrual blood, semen, saliva,
and vaginal secretions and observed differential expression in nine miRNAs
(miR451, miR16, miR135b, miR10b, miR658, miR205, miR124a, miR372,
and miR412) using real-time PCR and demonstrated the use of miRNA for
forensic applications using as little as 50 pg of sample. In 2010, Zubakov et al.
(2010) analyzed the expression of 718 miRNAs from saliva, semen, venous
blood, menstrual blood, and vaginal secretions using a microarray and
evaluated using RT-PCR Taq Man assays for distinguishing the body fluids
and reported miRNAs specific for venous blood and semen. NGS technology was proposed for miRNA forensic analysis in 2014 (Yang et al. 2014);
however, at the time of this writing, no researchers have published initial
proof-of-concept studies.
9.5 The Future of Body Fluid Assays
Massively parallel sequencing (MPS) using the MiSeq and Ion Torrent instruments also offers new opportunities for body fluid analysis. Just as PCR primers have been designed, tested, and multiplexed for human identification and
phenotyping applications, they can be designed to target loci such as SNPs
with demonstrated variations in body fluids. Alternatively, extracted snRNA
or miRNA could be reverse transcribed to cDNA and the targets could be
sequenced using MPS using sequencing as is being performed with mRNA.
However, as with the introduction of NGS applications for human identity
testing, labs need to allocate time and resources to acquire the instrumentation, consumables, and training and validate the new methods. The commercialization of the pyrosequencing assays for body fluid analysis is an
indication of the importance of this capability. How widely the new tool is
adopted will depend not only on funding but also institutional desire and
will to make a significant change in evidence processing and standard operating procedure.
Questions
1. List a locus for the identification of each body fluid.
2. Explain how mRNA, microRNA, and methylated DNA can be used
in body fluid analysis.
142
Next Generation Sequencing in Forensic Science
3. What NGS technique has been developed to analyze body fluids?
Explain how it works.
4. Could the MiSeq FGx or Ion Torrent instruments be used in body
fluid analysis? Explain your answer.
5. What is the future of body fluid testing? Explain.
References
Alghanim, H., Balamurugan, K., and B. McCord. “Development of DNA methylation markers for sperm, saliva and blood identification using pyrosequencing
and qPCR/HRM.” Analytical Biochemistry 611 (December 15, 2020): 113933.
doi:10.1016/j.ab.2020.113933.
Antunes, J., Silva, D.S., Balamurugan, K., Duncan, G., Alho, C.S., and B. McCord.
“Forensic discrimination of vaginal epithelia by DNA methylation analysis
through pyrosequencing.” Electrophoresis 37, no. 21 (October 2016): 2751–2758.
doi:10.1002/elps.201600037.
Balamurugan, K., Bombardi, R., Duncan, G., and B. McCord. “Identification of
spermatozoa by tissue-specific differential DNA methylation using bisulfite
modification and pyrosequencing.” Electrophoresis 35, no. 21–22 (November
2014): 3079–3086.
Bernstein, B.E., Meissner, A., and E.S. Lander. “The mammalian epigenome.” Cell
128, no. 4, (February 23, 2007): 669–681. doi:10.1016/j.cell.2007.01.033.
Børsting, C., and N. Morling. “Next generation sequencing and its applications in
forensic genetics.” Forensic Science International Genetics 18 (September 2015):
78–89. doi:10.1016/j.fsigen.2015.02.002.
Courts, C., and B. Madea. “Micro-RNA – A potential for forensic science?” Forensic
Science International 203, no. 1–3 (November 1, 2010): 106–111. doi:10.1016/j.
forsciint.2010.07.002.
Dejeux, E., El Abdalaoui, H., Gut, I.G., and J. Tost. “Identification and quantification
of differentially methylated loci by pyrosequencing™ technology.” In Methods
in Molecular Biology: DNA Methylation: Methods and Protocols, 2nd ed., vol.
507, edited by J. Tost, 189–205. New York: Humana Press, 2009. doi:10.100
7/978-1-59745-522-0_15.
Dørum, G., Ingold, S., Hanson, E., Ballantyne, J., Snipen, L., and C. Haas.
“Predicting the origin of stains from next generation sequencing mRNA data.”
Forensic Science International Genetics 34 (May 2018): 37–48. doi:10.1016/j.
fsigen.2018.01.001.
Elkins, K.M. “Pyrosequencing primer design for forensic biology applications.” In
Methods in Molecular Biology: PCR Primer Design, 3rd ed., edited by C. Basu.
New York: Humana Press, 2021, in press.
Frumkin, D., Wasserstrom, A., Budowle, B., and A. Davidson. “DNA methylationbased forensic tissue identification.” Forensic Science International: Genetics 5,
no. 5 (November 2011): 517–524. doi:10.1016/j.fsigen.2010.12.001.
Gauthier, Q.T., Cho, S., Carmel, J.H., and B.R. McCord. “Development of a body
fluid identification multiplex via DNA methylation analysis.” Electrophoresis
40, no. 18–19 (September 2019): 2565–2574. doi:10.1002/elps.201900118.
Body Fluid Analysis Using NGS
143
Grunau, C., Clark, S.J., and A. Rosenthal. “Bisulfite genomic sequencing: systematic
investigation of critical experimental parameters.” Nucleic Acids Research 29,
no. 13 (July 1, 2001): E65–E65. doi:10.1093/nar/29.13.e65.
Hanson, E., Ingold, S., Haas, C., and J. Ballantyne. “Messenger RNA biomarker
signatures for forensic body fluid identification revealed by targeted RNA
sequencing.” Forensic Science International: Genetics 34 (May 2018): 206–221.
doi:10.1016/j.fsigen.2018.02.020.
Hanson, E.K., Lubenow, H., and J. Ballantyne. “Identification of forensically relevant
body fluids using a panel of differentially expressed microRNAs.” Analytical
Biochemistry 387, no. 2 (April 15, 2009): 303–314. doi:10.1016/j.ab.2009.01.037.
Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F.,
Aken, B.L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko,
V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G.,
Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J.,
Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia,
I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A.,
Gerstein, M., Guigó, R., and T.J. Hubbard. “GENCODE: the reference human
genome annotation for The ENCODE Project.” Genome Research 22, no. 9
(September 2012): 1760–1774. doi:10.1101/gr.135350.111.
Lee, H.Y., Park, M.J., Choi, A., An, J.H., Yang, W.I., and K.-J. Shin. “Potential forensic application of DNA methylation profiling to body fluid identification.”
International Journal of Legal Medicine 126(2012): 55–62.
Lilischkis, R., Kneitz, H., and H. Kreipe. “Methylation analysis of CpG islands.” In
Methods in Molecular Medicine: Metastasis Research Protocols, Volume I: Cells
and Tissues, Vol. 57, edited by S.A. Brooks and U. Schumacher, 271–283. New
York: Humana Press, 2001. doi:10.1385/1-59259-136-1:271.
Madi, T., Balamurugan, K., Bombardi, R., Duncan, G., and B. McCord. “The determination of tissue-specific DNA methylation patterns in forensic biofluids
using bisulfite modification and pyrosequencing.” Electrophoresis 33 (2012):
1736–1745. doi:10.1002/elps.201100711.
Meissner, A., Gnirke, A., Bell, G.W., Ramsahoye, B., Lander, E.S., and R. Jaenisch.
“Reduced representation bisulfite sequencing for comparative high-resolution
DNA methylation analysis.” Nucleic Acids Research 33, no. 18 (October 13,
2005): 5868–5877. doi:10.1093/nar/gki901.
Paliwal, A., Vaissiere, T., and Z. Herceg. “Quantitative detection of DNA methylation states in minute amounts of DNA from body fluids.” Methods 52, no. 3
(November 2010): 242–247. doi:10.1016/j.ymeth.2010.03.008.
Park, J.-L., Kwon, O.-H., Kim, J.H., Yoo, H.-S., Lee, H.-C., Woo, K.-M., Kim, S.-Y.,
Lee, S.-H., and Y.S. Kim. “Identification of body fluid-specific DNA methylation markers for use in forensic science.” Forensic Science International
Genetics 13 (2014): 147–153. doi:10.1016/j.fsigen.2014.07.011.
Powell, M., Lee, A.S., St. Andre, P., and B. McCord. “Tissue source attribution using the
PyroMark ® Q48 Autoprep System: Sperm identification in forensic casework.
Qiagen Applications Note.” (2018). https://www.qiagen.com/us/resources/
download.aspx?id=ddaa262e-f3ec-4ac7-9bac-aaf3ec8968cf&lang=en.
Qiagen. “PyroMark Assay Design SW 2.0 quick-start guide - (EN).” Accessed January
11, 2021. https://www.qiagen.com/us/resources/download.aspx?id=231a689457d4-4f0b-81f2-eaa56a2b6bd8&lang=en.
144
Next Generation Sequencing in Forensic Science
Qiagen. “PyroMark® Q48 Autoprep User Manual, June 2020.” Accessed January
11, 2021. https://www.qiagen.com/us/resources/download.aspx?id=650a0c133b8e-4a77-b433-6b1e50b9525a&lang=en.
Rakyan, V.K., Down, T.A., Thorne, N.P., Flicek, P., Kulesha, E., Graf, S., Tomazou,
E.M., Bäckdahl, L., Johnson, N., Herberth, M., Howe, K.L., Jackson, D.K., Miretti,
M.M., Fiegler, H., Marioni, J.C., Birney, E., Hubbard, T.J., Carter, N.P., Tavaré,
S., and S. Beck. “An integrated resource for genome-wide identification and
analysis of human tissue-specific differentially methylated regions (tDMRs).”
Genome Research 18, no. 9 (June 23, 2008): 1518–1529. doi:10.1101/gr.077479.108.
Silva, D.S.B.S., Antunes, J., Balamurugan, K., Duncan, G., Alho, C.S., and B.
McCord. “Developmental validation studies of epigenetic DNA methylation
markers for the detection of blood, semen and saliva samples.” Forensic Science
International Genetics 23 (July 2016): 55–63. doi:10.1016/j.fsigen.2016.01.017.
Slieker, R.C., Bos, S.D., Goeman, J.J., Bovée, J.V., Talens, R.P., van der Breggen, R.,
Suchiman, H.E.D., Lameijer, E.-W., Putter, H., van den Akker, E.B., Zhang, Y.,
Jukema, J.W., Slagboom, P.E., Meulenbelt, I., and B.T. Heijmans. “Identification
and systematic annotation of tissue-specific differentially methylated regions
using the Illumina 450k array.” Epigenetics Chromatin 6, no. 1 (August 6, 2013):
26. doi:10.1186/1756–8935–6–26.
Virkler, K., and I.K. Lednev. “Analysis of body fluids for forensic purposes: From
laboratory testing to non-destructive rapid confirmatory identification at a
crime scene.” Forensic Science International 188, no. 1–3 (March 2009): 1–17.
doi:10.1016/j.forsciint.2009.02.013.
Weber, M., Davies, J.J., Wittig, D., Oakeley, E.J., Haase, M., and W.L. Lam.
“Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells.” Nature
Genetics 37, no. 8 (August 2005): 853–862. doi:10.1038/ng1598.
Xu, H., Zhao, Y., Liu, Z., Zhu, W., Zhou, Y., and Z. Zhao. “Bisulfite genomic
sequencing of DNA from dried blood spot microvolume samples.” Forensic
Science International Genetics 6, no. 3 (May 2012): 306–309. doi:10.1016/j.
fsigen.2011.06.007.
Yang, Y., Xie, B., and J. Yan. “Application of next-generation sequencing technology
in forensic science.” Genomics, Proteomics & Bioinformatics 12, no. 5 (October
2014): 190–197. doi:10.1016/j.gpb.2014.09.001.
Zubakov, D., Boersma, A.W., Choi, Y., van Kuijk, P.F., Wiemer, E.A., and M. Kayser.
“MicroRNA markers for forensic body fluid identification obtained from
microarray screening and quantitative RT-PCR confirmation.” International
Journal of Legal Medicine 124, no. 3 (May 2010): 217–226. doi:10.1007/
s00414-009-0402-3.
Zubakov, D., Kokmeijer, I., Ralf, A., Rajagopalan, N., Calandro, L., Wootton, S., Langit,
R., Chang, C., Lagace, R., and M. Kayser. “Towards simultaneous individual
and tissue identification: A proof-of-principle study on parallel sequencing of
STRs, amelogenin, and mRNAs with the Ion Torrent PGM.” Forensic Science
International Genetics 17 (July 2015): 122–128. doi:10.1016/j.fsigen.2015.04.002.
Conclusions and
Future Outlook of Next
Generation Sequencing
in Forensic Science
10
10.1 NGS Is Here
Since its introduction in the late 1990s, next generation sequencing (NGS)
has found numerous applications in cancer and disease research and clinical applications, microbiology, and crop biology as well as bioforensics,
biosurveillance, and infectious disease diagnosis (Kircher and Kelso, 2010,
Yang et al. 2014, Budowle et al. 2014, Schmedes et al. 2016, Arenas et al. 2017,
Minogue et al. 2019). Important lessons were learned from sequencing the
human genome that led to sequencing an individual genome (Collins 2003).
Upon commercialization of the sequencing instruments, clinical applications
began and continue to increase. NGS has altered genomics research in the
past fifteen years. Testing that was not affordable or technically feasible has
been made possible by NGS (Patrick 2007, Mannhalter 2017). While many
laboratories still use Sanger sequencing for forensic and clinical diagnostic
applications, NGS is increasingly finding applications in these labs especially as the price for NGS has decreased to approximately $1000 (US dollars)
per sample, making it more feasible for routine applications and casework
(Mannhalter 2017). Full genomes are mapped with decreased cost and published almost weekly (Børsting and Morling 2015). NGS has demonstrated
that it can overcome issues of efficiency, capacity, and allelic resolution presented by capillary electrophoresis and reduce the number of false positives
in mixture analysis. While NGS is currently being used for the most challenging samples such as human remains samples recovered in cold cases, we
expect it will be employed to analyze more routine samples in the future.
Labs will decide how and where in their workflow NGS fits.
It has only been since the 2010s that NGS has begun to make an impact
in forensic science (Minogue et al. 2019). NGS is currently being used for
human identification, phenotyping, and ancestry applications using human
blood, buccal, bone, or teeth samples (Jäger et al. 2017). The first NGS kits
approved for collecting human genotyping data for Combined DNA Index
System (CODIS) searches in the United States criminal justice system were
only approved in 2019 (Verogen media release). In parallel to the progress in
applying NGS to human genotyping applications, the tool has been found
DOI: 10.4324/9781003196464-10
145
146
Next Generation Sequencing in Forensic Science
useful in characterizing species of microorganisms for forensic applications
(Minogue et al. 2019). While the research landscape in this area is still in its
infancy, there are several trends and opportunities that are observed. As de
Knijff wrote in 2019 in his paper, “From next generation sequencing to now
generation sequencing in forensics,” forensic use of CE also took time to be
appreciated and adopted.
10.2 Why NGS?
As we have seen throughout this book, NGS has several advantages over CE
approaches. NGS has been applied to diverse applications with successful
results in each. NGS can be used in forensic casework to identify the source
of a stain or biological evidence sample – even if it was from an identical
twin. Sequencing STRs can illuminate differences that are masked in CE
fragment analysis. NGS can be used to predict biogeographical ancestry and
traits including eye color, hair color, and skin tone. SNPs that are multiplexed
in NGS library prep kits with STRs are analyzed simultaneously instead of by
multiple different assays. NGS can be used to determine the body fluid source
of a biological sample, the site on the human body it was taken from, if the
sample is human, and even which species are present in the sample without
having to decide which species to test for ab initio. NGS has led to more complete DNA typing results from human remains including damaged, historic
or ancient hair, bones, and teeth remains. NGS has led to more complete routine mitochondrial DNA typing and familial tracing. NGS has been used to
determine the cause of death in an autopsy and signs of disease in oral and
blood samples. It has been used to differentiate the source of hairs and cell
phones and predict lifestyle patterns. NGS has enhanced the capabilities in
bioforensics and biosurveillance. NGS has been used to differentiate soils and
sites and found use for geolocation. It has been used in archeological studies to identify gut bacteria and infection. NGS can capture subtle differences
between bacterial communities including foodborne pathogens and bioterrorism agents in samples without reliance on target genetic marker systems
(Sjödin et al. 2012). NGS has also been applied to sequencing drug and “ legal
high” species including the opium poppy and marijuana to characterize simple sequence repeat (SSR) genetic markers and chloroplast genome and STR
sequences, respectively (Celik et al. 2014, Oh et al. 2015, Houston et al. 2018).
The MiSeq and Ion series instruments are easier to use and maintain
than CE. The commercial Converge and Universal Analysis Software systems are easy to use and intuitive. In our lab, NGS has been found to lead to
more data, higher sensitivity, and increased precision than older methods
with similar sample input (unpublished data). An advantage of NGS is the
Conclusions and Future Outlook
147
scalability and ability to look at data for many markers simultaneously in a
sample. NGS can simultaneously type STRs and SNPs and detect isoallelic
variation (Berglund et al. 2011). NGS can capture sequence differences for
STR alleles of the same length (homoplasy) and polymorphisms in the flanking regions (Børsting and Morling 2015). NGS can better detect heteroplasmy
in mitochondrial DNA typing. These features can aid in the individualization of samples and mixture deconvolution. NGS can be used to differentiate identical twins and perform age prediction (Weber-Lehmann et al. 2014,
Silva et al. 2015, Daunay et al. 2019). The increased number of loci increases
the discriminatory power. This is essential when the quantity of DNA is limited, and there is not enough DNA to perform a half dozen or more separate
panels or when working with samples from closely related individuals. When
compared to implementing separate aSTR, Y-STR, X-STR, and SNaPshot
tests, NGS is time- and cost-effective. NGS is more cost-effective per base
than Sanger sequencing for large numbers of samples and loci as the sequencing is performed in a massively parallel configuration.
With the identity and ancestry informative SNPs probed using the NGS
kits, all samples can produce investigative leads even if the STR profile fails to
make a hit in a database such as NDIS. Additional loci can be used in statistical calculations. NGS can also reveal errors that are not clearly resolved with
CE including stutter caused by DNA strand slippage, base-pair errors not
fixed by the polymerase during editing, slippage especially at homopolymeric
strands, and errors as a results of sequencing miscalls.
10.3 Ongoing Challenges of Adopting NGS
for Forensic Investigations
After years of development, NGS has demonstrated great promise for forensic
casework. Although researchers and analysts have developed, piloted, and
validated methods and kits, NGS remains a new tool that is being used in
forensic casework. Even with all of the possibilities and advantages that NGS
offers, there are still very real challenges that must be overcome for NGS to be
widely adopted for forensic use.
Issues that need to be resolved prior to adoption for casework include
nomenclature, data storage, funding, training, statistics, genetic privacy concerns, contamination issues, reporting, sample tracking, accreditation, casework needs, and acceptance by court (Alonso et al. 2017). The nomenclature
for NGS-based STR alleles needs to be standardized. There is no convention for reporting isoalleles or accepted procedure for performing statistical evaluations of the newly identified alleles. Statistics need to be developed
and uniformly employed to analyze new NGS alleles. Just as when any new
148
Next Generation Sequencing in Forensic Science
technology is introduced, standard operating procedures (SOPs) need to be
developed and the instrument and method need to be subjected to internal
validations.
Beginning at the crime scene, investigators need to know the power
and limitations of NGS in order to collect the appropriate samples or decide
which samples are most promising to send to the lab. Other issues include the
potentially long and complex chain of custody and speed of analysis if NGS
is used (Gilchrist et al. 2015) and defining the analytical threshold (AT) to
avoid overinterpreting the data (Young et al. 2017). As an example, to further
analyze raw NGS data and noise to define the AT, FASTQ files were analyzed
using the STRait Razor™ software and Python scripts instead of the Verogen
UAS software (Young et al. 2017).
An issue that needs to be addressed with regard to NGS is cost. More
competitors are needed in the industry to drive down the costs of forensic DNA analysis. Labs need to invest in and implement more automation
to reduce preparation time and inter-operator variability. Centralized labs
may help counties, states, territories, and countries achieve the economies of
scale needed to make NGS cost-effective rather than cost-prohibitive. While
Sanger sequencing is still the optimal method in terms of time and cost for
sequencing short targets and pyrosequencing is ideal for probing DNA methylation variants, NGS technologies have several advantages when many loci
need to be sequenced for each sample or for sequencing complete genomes or
chromosomes.
Funding must be made available in the form of grants or included in
allocated annual budgets to facilitate access to NGS, either in the form of
new a local capability or for sample out-processing. The US DNA Capacity
Enhancement and Backlog Reduction Program which funds grants totaling
$82 million a year has been approved for the purchase of laboratory equipment, and reagents as well as training for systems that are approved for use
with NDIS; grants can help reduce direct costs to labs seeking to implement
new technology such as NGS (Verogen media release).
Even if funding is granted through a special program or allocated by states
in the annual budget (Funding information for U.S. Forensic Laboratories),
labs still must decide which NGS platform to adopt and validate the new
kits and instruments at their labs. Labs need to develop and validate SOPs
and workflows to process samples and store the high resolution and large
sequence datasets (Gilchrist et al. 2015, Aly et al. 2015, Clarke et al. 2017,
Mannhalter 2017). As some of the kits are sequencer-specific, the lab will need
to decide upon the kit and sequencing instrument. The commercial NGS kits
that have been introduced and are approved for input into CODIS are reliable
and robust. The Promega PowerSeq 46GY system is sequenced on a MiSeq
and the Verogen ForenSeq Signature Prep kit amplicons are sequenced using
Conclusions and Future Outlook
149
the MiSeq FGx. The Applied Biosystems Precision ID system amplicons can
be sequenced using a ThermoFisher Ion series instrument. Qiagen’s kits are
compatible with the MiSeq. After sequencing is complete, the data analysis begins. Verogen sells its Universal Analysis Software (UAS) for analyzing
STR and SNP data and a different version for analyzing mitochondrial DNA
typing data. ThermoFisher’s Torrent Suite and Converge software can be
used to analyze data from its Ion series instruments. Qiagen’s CLC Genomics
Workbench can analyze and visualize data from all major NGS platforms.
Versions are available for Windows, Mac OS X, and Linux platforms. Methods
still in development will have to demonstrate that they are also sensitive, specific, reliable, and robust through developmental validation.
Many of the commercial software packages developed for NGS data
analysis are hosted on the cloud which could be susceptible to outages and
cyberattacks. Verogen’s UAS and Illumina’s BaseSpace applications are
cloud-based software that can be accessed from any computer using a virtual
private network (VPN) client and a lab can make unlimited accounts for its
users. ThermoFisher’s Torrent Suite Software can be accessed via the local
area network.
Another challenge is the quantity of data produced. NGS generates a huge
quantity of data. The data output from the MiSeq is approximately 1 GB per
sequencing run. Labs must consider data storage options including external
hard drives, cloud storage, and internal server storage when adopting NGS
technology. Whereas forensic labs routinely maintain paper backup files of
CE and quantitative data with NGS, it is no longer feasible to print hard copies of all of the data. While the actual sequence data files are not large in
storage size, the raw digital photographs recorded after each base is added
are cumulatively substantial in size. For example, the server shipped with the
Verogen UAS can store approximately one hundred sequencing runs. A lab
could choose to save only the original raw data and final analyses as intermediate data interpretation files can be reconstructed, as needed, using the software. Labs will need to establish which data to save and back-up and whether
off-site services will be acceptable. Adoption of cloud storage is an option for
storing a copy of all of the data that will be collected or in-house servers can
be purchased and maintained to house the data. All of these options require
additional infrastructure and funding, and supporting an in-house server
may require additional IT support. Labs will also face the question of which
sequencing output files need to be stored indefinitely.
NGS reporting and implementation guidelines have been released and
continue to be rolled out. A new version of SWGDAM was released on January
12, 2017, that included NGS in the Internal Validation Guidelines for the first
time (SWGDAM). On April 23, 2019, an NGS Addendum to the SWGDAM
Autosomal Interpretation Guidelines included background information, core
150
Next Generation Sequencing in Forensic Science
elements, interpretation guidelines, mixture interpretation guidelines, and a
comparison of references and statistical weight (SWGDAM). The application
of NGS is included in the FBI Quality Assurance Guidelines and Standards
that became effective July 1, 2020 (FBI).
NGS is extremely sensitive. This is a great strength of NGS but also can
lead to mixture profiles from samples contaminated with environmental
DNA. Since NGS is much more sensitive than previous kits and methods, the
quality assurance program needs to ensure that all products the lab uses from
tips to tubes and other consumables are DNA-free otherwise a plant worker’s non-relevant DNA could be typed. Scientists must be able to differentiate between mutation and error, especially in mixture samples. Error reads
typically occur infrequently while true alleles will result in tens of reads. The
minimum number of reads under various conditions (e.g., mixtures) needs
to be established (e.g., 10 or 50 or 100 reads for low-level contributor) for each
locus and sample. Labs need to allocate more analysis time for mixtures.
NGS poses challenges in implementation. Implementing a technology
such as NGS requires training of existing staff and validation of the new
technology, reagents, kits, and writing new or amended SOPs (Budowle
et al. 2014). While the DNA extraction and quantitation instrumentation are
largely transferable, staff will need to be trained in NGS technology. While
most graduates of Forensic Science Education Programs Accreditation
Commission (FEPAC)-accredited programs are well-versed in STR DNA typing methods using CE, as of early 2021, they likely have not had training in
NGS. Skilled and experienced analysts will need advanced training courses.
They should be reassured that the commercial library preparation kits and
sequencing manuals for forensic applications are easy to follow and that their
skills are easily transferrable to performing the new protocols. Verogen and
ThermoFisher offer training to labs who purchase their instruments and
adopt their kits. Colleges and universities are conducting NGS research and
adding NGS courses. FEPAC-accredited institutions including Pennsylvania
State University, Sam Houston State University, and Towson University are
training new scientists in NGS applications for forensic science and developing new NGS-related forensic biology methods. In 2019, Towson University
added undergraduate and Masters-level courses in forensic science (FRSC
422 and FRSC 622, respectively) focused on NGS for both autosomal and
mitochondrial DNA analysis (Elkins and Zeller 2020). Other institutions
offer online NGS courses.
On May 2, 2019, the US FBI approved profiles generated using Verogen’s
MiSeq FGx Forensic Genomics System for upload to the National DNA Index
System (NDIS). With support from a contractor, Ohio participated in a pilot
study of the ForenSeq kit. Washington, DC and California forensic labs have
adopted ForenSeq in their labs. To introduce NGS to the Washington, DC
Conclusions and Future Outlook
151
Department of Forensic Sciences, Dr. Jenifer Smith utilized a contractor for
implementation support to mitigate the burden on her staff. The Baltimore
City Police Department laboratory obtained an Illumina MiSeq instrument
with grant support and is validating ForenSeq for casework.
While processing and preparing samples for NGS requires a similar
amount of time as CE, the library preparation steps are more time-consuming
and the sequencing step takes much longer. Whereas STR fragment analysis
on CE takes approximately twenty minutes a sample, and several capillaries are routinely run in parallel, an NGS run with a MiSeq FGx instrument
must run to completion to obtain the data for the samples so while ninety-six
samples take a similar amount of time as CE, the time required is standard,
so for one sample it is prohibitive. Furthermore, remote labs are challenged
with continuous power for long NGS sequencing runs (Minogue et al. 2019).
Microbial community profiling of human body fluids, human body site and
geographic locations, and evidence items needs to be accepted by courts.
Technical and biological validation of the various NGS applications will be
required before it can be adopted as a standard tool for use in casework and
acceptance into courts (Kuiper 2016).
Microbial NGS methods especially lack standardization of targets and
analysis approaches including databases for statistical analysis, especially
when unknown and rare taxa are encountered for interpretation using limited published study data. Further development of bioinformatics tools and
processed and referred databases are needed (Minogue et al. 2019). The bioinformatics methods need to be able to map and identify sequence variants
by critically evaluating raw sequence data (Gilchrist et al. 2015). The data
output needs to be presented in an actionable format (Gilchrist et al. 2015).
Other NGS-centric issues include depth of sequencing, higher error rates
than Sanger sequencing, reproducibility, sensitivity, AT-sequence bias, and
large number of targets and markers (Gilchrist et al. 2015, Aly et al. 2015).
More studies of robustness of the method are needed (Gilchrist et al. 2015,
Aly et al. 2015).
Familial DNA searching has begun in jurisdictions that allow it but this
poses privacy concerns. Similarly, there are ethical considerations to consider. The ForenSeq NGS panel contains loci that codes for unique traits, as
opposed to solely “ junk” DNA. GEDmatch was a database founded to help
users use their genetic profiles to locate family members based upon similarities in the genetic makeup. Individuals can upload their DNA profiles from
one of several personal genealogy services. Initially, GEDmatch gave users the
option of opting out of other uses including investigations. Verogen recently
purchased GEDmatch for use in cold case and other criminal investigations.
Now users and potential family member matches must opt into investigative use; otherwise, their samples are protected from these searches and their
152
Next Generation Sequencing in Forensic Science
MICROBIAL
DIVERSITY
SAMPLE COLLECTION METHOD, TIME,
SEQUENCING ISSUES
(BIAS, DEPTH, ERROR, REPRODUCIBILITY)
REFERRED DATABASES OF
STANDARDS
AND STORAGE
NGS
CHALLENGES
DATA STORAGE
STANDARDIZED BIOINFORMATICS
APPROACH AND OUTPUTS
POWER GENERATION AND TIME FOR REMOTE
SEQUENCING APPLICATIONS
ACCEPTANCE BY COURTS
Figure 10.1 Summary of challenges of adopting NGS for forensic investigations.
privacy is maintained. There were concerns when users previously had to opt
out that GEDmatch was turning all users into suspects. Nevertheless, database security breaches are an ongoing concern.
Figure 10.1 summarizes many of the issues that need to be resolved
including allele naming, data storage, statistics, and acceptance by the courts.
In spite of the concerns and challenges, countries around the world are
working to bring NGS to the forensics workflow due to its advantages. NGS
has been used in casework and missing persons investigations.
10.4 Early Successes of NGS in Forensic Cases
Genetic genealogy has been employed in investigations. DNA profiles of
non-offenders from commercial companies including 23andMe, Ancenstry
DNA®, and My Heritage DNA tests are being used in searches to support law
enforcement. NGS data and websites such as GEDmatch and Family Tree
databases as well as traditional history research methods using the United
States Federal Census, state birth indexes, Newspapers.com Obituary Index,
US City Directories, US Obituary Collection, US Social Security records, and
church membership and baptismal records have proved to be valuable in their
approach. As a recent case example, GEDmatch was used by law enforcement
to solve the decades-long cases of the Golden State Killer (Selk 2018).
The ThermoFisher HID-Ion AmpliSeq™ Ancestry Panel was used in a
forensic case involving a carbonized corpse (Hollard et al. 2017). The autosomal STR profile did not lead to a profile so NGS was used to determine the eye
color and biogeographical origin of the deceased. The team also conducted
Y typing and mitotyping. NGS did lead to more information but lack of a
sufficient database for interpretation was a drawback. Xiao (2019) recently
Conclusions and Future Outlook
153
described the design and implementation of a large-scale high-throughput
automated DNA database construction using NGS.
The first case using NGS in a trial in Dutch Courts in January 2019
(de Knijff). The case included a sample from a complex sexual assault with
a minor contributor at less than 10% that of the major contributor in the
STR profile. The STR repeats were reported, but the traditional CE method
does not permit the determination of underlying sequencing information.
The DNA sample was analyzed using traditional PCR and CE methods and
generated a hit in their convicted criminal database. The case resulted in an
acquittal because many of the minor contributor’s alleles were in the stutter
position of the major contributor’s alleles. Upon appeal by the prosecution
and reanalysis of the samples using the MiSeq FGx, the minor contributor
was distinguished from stutter, and likelihood ratio statistics were performed
on the results. Upon hearing the new data and analysis, the judge ruled that
the defendant was guilty of sexual assault (de Knijff 2020).
NGS has also been demonstrated for use in paternity cases. In a study,
DNA isolated from sperm cells of monozygotic twins and blood from one of
the twins’ children was typed using ultra-deep NGS. The researchers used
VarScan 2 to analyze the sequence data for somatic mutations. Individualizing
SNPs were identified in samples originating from the child’s father that were
not found in the father’s twin (Weber-Lehmann et al., 2014).
The France National Police implemented a decision tree for deciding
whether to analyze samples with NGS or traditional CE. In summary, if a
sample is limited or degraded or if a Y-STR profile is needed, NGS using
ForenSeq library preparation is supported (Alvarez-Cubero et al. 2017). If
the DNA profile is urgent (<72 hours), PCR followed by CE is recommended.
Their workflows include mini-STR typing, Y-STR typing, autosomal STR
typing, and phenotypic SNP typing using CE and mitotyping using Sanger
sequencing. They suggest NGS use in complex kinship cases, to identify a
very minor contributor, to obtain a genetic profile from highly degraded
DNA, and to deconvolute a mixture using possible isomutations. The French
National Police used NGS in 2018 to analyze a 2011 cold case. The first analysis was performed using Identifiler and the two DNA extracts showed a mixture of the victim’s DNA and that of a very minor male contributor. Using
NGS, the profile at D2S1338 was determined to contain two different seventeen repeat alleles (isoalleles) which led to assignment of the minor profile.
To date, only a few cases processed using NGS have been presented in
court; each jurisdiction will have to assess allowing the introduction of data
produced with the new methods in accordance with the law. Forensic laboratories may utilize NGS for serious crimes and cold cases in the future
although caseloads may preclude using DNA typing for all cases.
154
Next Generation Sequencing in Forensic Science
10.5 Summary
In summary, John Butler wrote in 2005 that “the future is always challenging due to unforeseen innovation.” While NGS continues to challenge scientists and labs, NGS is here and providing new opportunities for analysis
to solve crimes. The developmental validation of NGS for forensic applications has been published in peer-reviewed journals. NGS is being applied
successfully to criminal cases, mass disaster, and missing persons forensic
casework.
Questions
1. List five advantages of NGS over CE-based DNA typing methods for
forensic applications.
2. Compare and contrast the advantages and challenges of implementing NGS in place of traditional DNA typing methods.
3. Is it easier for a lab to move to NGS-based DNA typing when the
sample is limited or plentiful? Explain.
4. Is the time for NGS “now” or not? Justify your response with
references.
5. Is there a “gold standard” for NGS? Explain why or why not.
6. List and explain five issues that forensic labs face in implementing
NGS.
7. What is the biggest challenge facing labs considering implementing
NGS? Support your answer.
8. Discuss ethical concerns regarding the use of DNA data.
9. Are there risks associated with including human sequencing data in
databases for law enforcement use? Explain.
10. Explain how NGS can be used to solve forensic cases that were intractable with traditional STR typing approaches.
References
Alonso, A., Muller, P., Roewer, L., Willuweit, S., Budowle, B., and W. Parson.
“European survey on forensic applications of massively parallel sequencing.”
Forensic Science International: Genetics 29 (March 2017): e23–e25. doi:10.1016/j.
fsigen.2017.04.017.
Alvarez-Cubero, M.J., Saiz, M., Martínez-García, B., Sayalero, S.M., Entrala, C.,
Lorente, J.A., and L.J. Martinez-Gonzalez. “Next generation sequencing: an
application in forensic sciences?” Annals of Human Biology 44, no. 7 (November
2017): 581–592. doi:10.1080/03014460.2017.1375155.
Conclusions and Future Outlook
155
Aly, S.M., and D.M. Sabri. “Next generation sequencing (NGS): A golden tool in forensic toolkit.” Archiwum Medycyny Sądowej i Kryminologii [Archives of Forensic
Medicine and Criminology] 65, no. 4 (2015): 260–271. doi:10.5114/amsik.2015.
61029.
Arenas, M., Pereira, F., Oliveira, M., Pinto, N., Lopes, A.M., Gomes, A., Carracedo,
A., and A. Amorim. “Forensic genetics and genomics: Much more than just a
human affair.” PLoS Genetics 13 (September 21, 2017): e1006960. doi:10.1371/
journal.pgen.1006960.
Berglund, E.C., Kiialainen, A., and A.-C. Syvänen. “Next-generation sequencing technologies and applications for human genetic history and forensics.” Investigative
Genetics 2, no. 1 (November 24, 2011): 23. doi:10.1186/2041-2223-2-23.
Børsting, C., and N. Morling. “Next generation sequencing and its applications
in forensic genetics.” Forensic Science International: Genetics 18 (September
2015): 78–89. doi:10.1016/j.fsigen.2015.02.002.
Budowle, B., Connell, N.D., Bielecka-Oder, A., Colwell, R.R., Corbett, C.R., Fletcher,
J., Forsman, M., Kadavy, D.R., Markotic, A., Morse, S.A., Murch, R.S., Sajantila,
A., Schmedes, S.E., Ternus, K.L., Turner, S.D., and S. Minot. “Validation of high
throughput sequencing and microbial forensics applications.” Investigative
Genetics 5 (June 30, 2014): 9. doi:10.1186/2041-2223-5-9.
Butler, J.M. “The future of forensic DNA analysis.” Philosophical Transactions of
the Royal Society B 370, no. 1674 (August 5, 2015): 20140252. doi:10.1098/rstb.
2014.0252.
Celik, I., Gultekin, V., Allmer, J., Doganlar, S., and A. Frary. “Development of
genomic simple sequence repeat markers in opium poppy by next-generation
sequencing.” Molecular Breeding 34 (February 6, 2014): 323–334. doi:10.1007/
s11032-014-0036-0.
Clarke, T.H., Gomez, A., Singh, H., Nelson, K.E., and L.M. Brinkac. “Integrating the
microbiome as a resource in the forensics toolkit.” Forensic Science International:
Genetics 30 (September 2017): 141–147. doi:10.1016/j.fsigen.2017.06.008.
Collins, F.S. “The human genome project: Lessons from large-scale biology.” Science
300, no. 5617 (April 11, 2003): 286–290. doi:10.1126/science.1084564.
Daunay, A., Baudrin, L.G., Deleuze, J.F., and A. How-Kit. “Evaluation of six
blood-based age prediction models using DNA methylation analysis by
pyrosequencing.”ScientificReports9,no.1,(June20,2019):8862.doi:10.1038/s41598019-45197-w.
de Knijff, P. “From next generation sequencing to now generation sequencing in
forensics.” Forensic Science International: Genetics 38 (January 1, 2019): P175–
P180. doi:10.1016/j.fsigen.2018.10.017.
de Knijff, P. “Case study: How next generation sequencing resolved a difficult case,
leading to the first criminal conviction of its kind.” Verogen 2020: 1–4. Accessed
November 27, 2020. https://cdn2.hubspot.net/hubfs/6058606/Verogen-FirstNGS-Court-Case-Study_Final_VD2019024 _8.5x11-web.pdf ?_ _hstc=
238 6 0 9 695.b e d74 b 81c f4 0 41e 4 2 ad ad16 833a b 858 4 .1576 87070 4 8 8 8 .
1576870704888.1576870704888.1&__hssc=238609695.1.1576870704888.
Elkins, K.M., and C.B. Zeller. “What is the CURE for limited DNA? A forensic science course focused on NGS.” Journal of Forensic Science Education 2, no. 2
(2020). https://jfse-ojs-tamu.tdl.org/jfse/index.php/jfse/article/view/31.
156
Next Generation Sequencing in Forensic Science
FBI. “Quality assurance standards for forensic DNA testing laboratories.” July 1,
2020. https://r.search.yahoo.com/_ylt=AwrJ7Fu6cxBge58AkaVXNyoA;_ylu=
Y29sbwNiZjEEcG9zAzQEdnRpZANBMDYxNV8xBHNlYwNzcg--/RV=2/
RE=1611719739/RO=10/RU=https%3a%2f%2fwww.fbi.gov%2ffile-repository
%2fquality-assurance-standards-for-forensic-dna-testing-laboratories.
pdf%2fview/RK=2/RS=ZY.1vT5BZwWmyh4tC9UI4j9Nwpw-.
“FBI approves Verogen’s next-gen forensic DNA technology for National DNA Index
System (NDIS).” May 2, 2019. https://verogen.com/ndis-approval-of-miseq-fgx/.
“Funding information for U.S. Forensic Laboratories.” March 27, 2019. Funding
Information for U.S. Forensic Laboratories.
Gilchrist, C.A., Turner, S.D., Riley, M.F., Petri, W.A., and E.L. Hewlett. “Wholegenome sequencing in outbreak analysis.” Clinical Microbiology Reviews 28,
no. 3 (July 2015): 541–563. doi:10.1128/CMR.00075-13.
Hollard, C., Keyser, C., Delabarde, T., Gonzalez, A., Vilela Lamego, C., Zvénigorosky,
V., and B. Ludes. “Case report: on the use of the HID-Ion AmpliSeq™ Ancestry
Panel in a real forensic case.” International Journal of Legal Medicine 131, no. 2
(March 2017): 351–358. doi:10.1007/s00414-016-1425-1.
Houston, R., Mayes, C., King, J.L., Hughes-Stamm, S., and D. Gangitano. “Massively
parallel sequencing of 12 autosomal STRs in Cannabis sativa.” Electrophoresis
39, no. 22 (November 2018): 2906–2911. doi:10.1002/elps.201800152.
Jäger, A.C., Alvarez, M.L., Davis, C.P., Guzmán, E., Han, Y., Way, L., Walichiewicz,
P., Silva, D., Pham, N., Caves, G., Bruand, J., Schlesinger, F., Pond, S.J.K.,
Varlaro, J., Stephens, K.M., and C.L. Holt. “Developmental validation of the
MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories.” Forensic Science
International: Genetics 28 (May 2017): 52–70. doi:10.1016/j.fsigen.2017.01.011.
Kircher, M., and J. Kelso. “High-throughput DNA sequencing-concepts and limitations.” BioEssays 2, no. 6 (May 18, 2010): 524–536. doi:10.1002/bies.200900181.
Kuiper, I. “Microbial forensics: next-generation sequencing as catalyst: The use of
new sequencing technologies to analyze whole microbial communities could
become a powerful tool for forensic and criminal investigations.” EMBO
Reports 17 (2016): 1085–1087. doi:10.15252/embr.201642794.
Mannhalter, C. “Neue entwicklungen in der molekular biologischen diagnostik
[German].” Hamostaseologie 37, no. 2 (May 2017): 138–151.
Minogue, T.D., Koehler, J.W., Stefan, C.P., and T.A. Conrad. “Next-generation
sequencing for biodefense: Biothreat detection, forensics, and the clinic.” Clinical
Chemistry 65, no. 3 (March 1, 2019): 383–392. doi:10.1373/clinchem.2016.266536.
Oh, H., Seo, B., Lee, S., Ahn, D.-H., Jo, E., Park, J.-K., and G.-S. Min. “Two complete
chloroplast genome sequences of Cannabis sativa varieties.” Mitochondrial DNA
Part A 27, no. 4 (June 24, 2015): 2835–2837. doi:10.3109/19401736.2015.1053117.
Patrick, K.L. “454 life sciences: Illuminating the future of genome sequencing
and personalized medicine.” Yale Journal of Biology and Medicine 80, no. 4
(December 2007): 191–194.
Schmedes, S.E., Sajantila, A., and B. Budowle. “Expansion of microbial forensics.”
Journal of Clinical Microbiology 54 (2016): 1964–1974. doi:10.1128/JCM.00046-16.
Selk, A. “The ingenious and ‘ dystopian’ DNA technique police used to hunt the ‘Golden
State Killer’ suspect.” The Washington Post 2018. https://www.washington
Conclusions and Future Outlook
157
post.com/news/true-crime/wp/2018/04/27/golden-state-killer-dna-websitegedmatch-was-used-to-identify-joseph-deangelo-as-suspect-police-say/.
Silva, D.S.B.S., Antunes, J., Balamurugan, K., Duncan, G., Alho, C.S., and B. McCord.
“Evaluation of DNA methylation markers and their potential to predict human
aging.” Electrophoresis 36, no. 15 (August 2015): 1775–1780. doi:10.1002/elps.
201500137.
Scientific Working Group on DNA Analysis Methods. “Interpretation guidelines
for autosomal STR typing by forensic DNA testing laboratories.” Approved
January 12, 2017. Accessed January 23, 2021. https://1ecb9588-ea6f-4feb-971a73265dbf079c.filesusr.com/ugd/4344b0_50e2749756a242528e6285a5bb478
f4c.pdf.
Scientific Working Group on DNA Analysis Methods. “Addendum to ‘SWGDAM
interpretation guidelines for autosomal STR typing by forensic DNA testing
laboratories’ to address next generation sequencing.” Approved April 23, 2019.
Accessed January 23, 2021. https://1ecb9588-ea6f-4feb-971a-73265dbf079c.
filesusr.com/ugd/4344b0_91f2b89538844575a9f51867def7be85.pdf.
Sjödin, A., Broman, T., Melefors, O., Andersson, G., Rasmusson, B., Knutsson, R.,
and M. Forsman. “The need for high-quality whole-genome sequence databases in microbial forensics.” Biosecurity and Bioterrorism 11 (2012): S78–S86.
doi:10.1089/bsp.2013.0007.
Weber-Lehmann, J., Schilling, E., Gradl, G., Richter, D.C., Wiehler, J., and B.
Rolf. “Finding the needle in the haystack: Differentiating ‘ identical’ twins in
paternity testing and forensics by ultra-deep next generation sequencing.”
Forensic Science International Genetics 9 (March 2014): 42–46. doi:10.1016/j.
fsigen.2013.10.015.
Xiao, L. “Designing and implementing a large-scale high-throughput Total
Laboratory Automation (TLA) system for DNA database construction.”
Forensic Science International 302 (September 2019): 109859. doi:10.1016/j.
forsciint.2019.06.017.
Yang, Y., Xie, B., and J. Yan. “Application of next-generation sequencing technology
in forensic science.” Genomics, Proteomics & Bioinformatics 12, no. 5 (October
2014): 190–197. doi:10.1016/j.gpb.2014.09.001.
Young, B., King, J.L., Budowle, B., and L. Armogida. “A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.”
PLoS One 12, no. 5 (May 18, 2017): e0178005. doi:10.1371/journal.pone.0178005.
Index
A
adenosine monophosphate (AMP) 22
AFDIL-QIAGEN mtDNA Expert (AQME) 106
AFDIL whole mitochondrial genome
method 101
agarose gel 39, 40
age estimation 147
ALlele FREquency Database (ALFRED) 80
alleles 65–68, 72–73
for Y-STRs analyzed 80–82
analytical threshold (AT) 62
ancestry 8, 68, 76
Anderson sequence 96
Armed Forces DNA Identification
Laboratory (AFDIL) 98
ArmedXpert™, 78
autopsy 125–126
Autosomal STR Genotype Report 68
B
Ballantyne, J. 68, 95, 139, 140, 141
bead-based normalization (BBN) 100
biogeographical ancestry (BGA) 68, 76
biothreat surveillance 127
body fluid analysis
epigenetics 137–140
future of 141
microRNA 140–141
mRNA 139, 140
traditional methods 137
Børsting, C. 28, 32, 36, 139, 145, 147
Bowtie 77, 128
Budowle, B. 33, 79, 95, 97, 107, 108, 117, 119,
124, 125, 127, 128, 145, 147, 148, 150
Butler, J. 3, 4, 18, 20, 32, 96, 97, 98, 102, 154
C
Cambridge Reference Sequence (CRS) 96
capillary electrophoresis (CE) 6, 20
alleles for Y-STRs analyzed 80–82
chain termination sequencing 13–14
cluster density 57–58
cluster generation 50
Combined DNA Index System (CODIS) 5,
108, 145
computed population statistics 67
Converge and Universal Analysis Software
systems 146
cytosine methylation 138
D
databases 69, 108–109; see also specific types
dbSNP 80
2'-deoxyribonucleotide triphosphate
(dNTP) 14, 16
2', 3'-dideoxyribonucleotide triphosphate
(ddNTP) 14, 20, 21
DNA extraction 32–33
DNA IQ™, 33
DNA quantitation 34–35
DNA sequencing
chemistries used in 13
chain termination sequencing 13–14
by ligation 16, 18
pyrosequencing 14–17
detection techniques 17, 18
for human identification 1
massively parallel sequencing 23–25
platforms 19–22
DNA typing, for human identification 2–7
double-stranded complex 50
E
Elkins, K.M. 4, 18, 32, 34, 98, 118, 128,
139, 150
epigenetics 137–139
Erasmus server, phenotype analysis using
74–77
European Standard Set (ESS) 5
Exome-Seq analysis 78
159
160
F
Fastq files 58, 78
Federal Bureau of Investigation (FBI) 108
first-generation sequencing techniques
pyrosequencing 21–22
Sanger sequencing 19–20
SNaPShot sequencing 20–21
fluorescence spectroscopy 34
fluorescent dye 17, 18, 51
fluorimetric quantification-based
normalization 100
ForenSeq™, 8, 39–41, 60–63
ForenSeq mtDNA Control Region 98–99, 102
forensic biology 1, 2
France National Police 153
G
gene expression 137–138
GeneMapperID 21
genetic genealogy 152
Genome Analysis Toolkit (GATK) 78
genomics 25
geolocation 125–126
H
haplotype 102
Helicobacter pylori 129
Helicos BioSciences Heliscope 25
heteroplasmy 102
HID_SNP_Genotyper plugin 70–71
HID_STR_Genotyper plugin 71
HipSTR (Haplotype inference and phasing
for Short Tandem Repeats) 79
HIrisPlex-S assay 74
HiSeq 23, 47
Holland, M.M. 95, 97, 98, 99, 101, 102, 107, 108
human genome sequence 1
Human Microbiome Project (HMP) 117,
118, 120–121
applications 121–124
human sequencing control (HSC) 41, 49,
62–63
I
i5 and i7 index sequences 38
integrative genomics viewer (IGV) 71, 106
interpretation threshold (IT) 62
ion detection 19
Index
platforms 23, 24
Ion PGM Sequencing 53–54
Ion Reporter Server System 69
ion series instruments 146–147
ion series run failure, troubleshooting 92, 93
Ion Sphere™ Particle (ISP) Density 70
ion sphere particles (ISPs) 42
K
Kayser, M.H. 32, 68, 81, 95, 139, 141
Kidd, K. 35, 68, 95
de Knijff, P. 8, 146, 153
L
Lander, E. 138
Lednev, I. 137
ligation
DNA sequencing by 16, 18
platforms 24
luciferase 22
M
maintenance wash 52
Maq 77
massively parallel sequencing (MPS) 8, 23,
141; see also NGS
ion detection platforms 23, 24
by ligation platforms 24
for mitotyping in forensic testing 107
Reversible Chain Termination MPS
Platforms 23, 24
single base extension platforms 25
third-generation platforms 25–27
McCord, B. 108, 138, 139
messenger RNA (mRNA) 139, 140
5'-methylcytosine 138
microbial DNA profiling
applications 129–130
in archeology 129
autopsy 125–126
bioforensics and biosurveillance 127–128
bioinformatic approaches and tools
126–127
geolocation 125–126
infectious disease diagnostics 128–129
lifestyle analysis 125–126
NGS methodology in 119–120
postmortem interval 125–126
sampling and processing 118–119
Index
microbiome analysis 117
microRNA (miRNA) 140–141
MiSeq FGx 58–59, 146–147
instrument failure 87–89, 93
run failure 89–93
MiSeq Test Software 88
mitochondrial chromosome 1, 96–98
mitochondrial DNA (mtDNA) typing 95
for forensic applications 107–108
methods 98
sequence data
and databases 108–109
interpretation and reporting 102–107
using next generation sequencing 98–101
MITOMAP database (MITOMAP) 97
Mixture Ace 78
modern six-dye kit 9
MPS see massively parallel sequencing
(MPS)
My-Forensic-Loci-queries (MyFLq) 106–107
N
National DNA Index System (NDIS) 5,
87, 108
next generation sequencing (NGS)
alleles for Y-STRs analyzed 79–82
body fluid analysis 137–141
challenges 147–152
data analysis 57–58, 79–80
denaturation 41–42
DNA extraction 32–33
DNA quantitation 34–35
early successes of 152–153
for forensic DNA typing 8–10
instruments 25, 28
Ion PGM Sequencing 53–54
library preparation 35–39
library purification and normalization
39–41
microbial applications of 117–130
mitochondrial DNA typing 95–109
for mixture interpretation 78–79
multiplexing 41–42
sample handling and processing 31
sample preparation process 31, 32
sequence analysis software 77–78
ThermoFisher Ion Torrent™ Sequencing
53–54
troubleshooting 87–93
validation and applications 80–81
Verogen MiSeq FGx® Sequencing 47–53
161
O
Organizational Scientific Area Committees
(OSACs) 8
Oxford Nanopore instruments 25
P
Parson ISFG format 78
Parson, W. 79, 95, 98, 99, 101, 102, 107,
108, 147
Perkin-Elmer Corporation 4
phasing 57
phenol-chloroform-isoamyl alcohol (PCI/
PCIA) 32
phenotype 8
characteristics 3
estimation 68
tertiary analysis 68
using erasmus server 74–77
Phred score 58
polyacrylamide gel 39
polymerase chain reaction (PCR) 4
polymorphisms 2
postmortem interval (PMI) 125–126
post-run wash 52
PowerPlex™ 5
Precision ID mtDNA Control Region
Panel kit 101, 106
PredictSNP 80
prephasing 57
Promega Corporation 4
Promega PowerSeq 78
Promega PowerSeq™ CRM Nested
System 108
Promega PowerSeq™ 46GY kit 35, 36, 148
pyrosequencing 138–139
detection techniques 19
dispensation of nucleotides in 16
DNA sequencing 14–17
first-generation sequencing techniques
21–22
Q
QIAcel graph, of PCR amplicon for 41
Qiagen’s latest PyroMark Q48
pyrosequencer 22
Q-Score 58
QualitySNPng 80
Quantifiler™ DUO kit 34
Quantiplex HYres Kit 34
162
R
random match probability (RMP) 67
real-time PCR methods 34
research use only (RUO) 47
restriction fragment length polymorphisms
(RFLPs) 3
reversible chain termination MPS platforms
23, 24
revised CRS (rCRS) 96
RFLPs see restriction fragment length
polymorphisms (RFLPs)
RNAseq analysis 78
S
Sample Compare mode 106
Sample Genotype Report 68
SAMtools mpileup 78
Sanger sequencing 15, 19–20
Scientific Working Group on DNA Analysis
Methods (SWGDAM) 8, 108,
149–150
Scientific Working Groups (SWGs) 8
semiconductor sequencing 24
sequence alignment map (SAM) file 77
sequence diversity databases 79
sequencing by synthesis (SBS) 50
severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 125
sex typing 4, 139
short tandem repeats (STRs) 4, 6, 69,
79–80, 146
simple sequence repeats (SSRs) 4
single base extension platforms 25
Single-Molecule Real-Time (SMRT) 25
single nucleotide polymorphisms (SNPs) 6
single-stranded DNA (ssDNA) 22
small nuclear RNA (snRNA) 141
SNaPShot sequencing 20–21
SNiPlay 80
SNPdetector 80
SNPedia 80
SNPServer 80
snRNA see small nuclear RNA (snRNA)
spectroscopic methods 34
ssDNA see single-stranded DNA (ssDNA)
SSRs see simple sequence repeats (SSRs)
standard reference materials (SRM)
98, 99
standby wash 52–53
Index
STRbase 2.0 Beta 79
STRs see short tandem repeats (STRs)
STRScan 79
STRSeq 79
T
ThermoFisher 101
ThermoFisher Converge Software 69–74
ThermoFisher GlobalFiler™ 5
ThermoFisher HID-Ion AmpliSeq™
Ancestry Panel 152–153
ThermoFisher Ion Chef ™ robot 39
ThermoFisher Ion Reporter™ Software 69–74
ThermoFisher Ion Torrent™ Sequencing
53–54
third-generation platforms 25–27
tissue-specific differentially methylated
regions (tDMRs) 138
triplasmy 102
troubleshooting
ion series run failure 92, 93
MiSeq FGx instrument failure 87–89
MiSeq FGx run failure 89–93
NGS sequencing 87
U
ultraviolet light source 51
V
variable number of tandem repeats
(VNTRs) 3, 4
Variant Analyzer BaseSpace app 105
Variant Processor app 103
Verogen ForenSeq™ Signature Prep kit 36
Verogen MiSeq FGx® 23, 24, 47–53
Verogen Universal Analysis Software 49,
58–69, 103
W
WebLogo 80
whole genome amplification (WGA) 99, 119
whole genome shotgun (WGS)
metagenomics 119
Z
Zeller, C.B. 150