SlideShare a Scribd company logo
1 of 12
Metadata Harvesting and the OAI-PMH Andrew Schenck Pamela Russell LIS 688
What is Metadata Harvesting? An automatic metadata generating method Occurs when metadata is automatically collected from META tags  Automatically gathers metadata from individual repositories
Example Metadata Generators Metadata generators are also known as metadata extraction systems Sample metadata extraction systems available for libraries include: DC-dot MarcEdit Metaextract IBM Magic System Some are available via open source
DC-dot DC-dot is open source and it can be redistributed or modified DC-dot creates Dublin Core metadata Metadata creation is initiated by submitting a URL Generates keywords by analyzing hyperlinked concepts and presentation encoding Does not produce description metadata Generates type, format and date metadata
MarcEdit MarcEdit is open source MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool. An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting.  It allows users to: Customize the existing data conversion rules or create new data conversion rules Harvest metadata from a supported metadata format Create conversion templates for additional metadata formats Customize existing conversion templates to reflect many variations in best practices used among projects
Metaextract Designed for metadata extraction in the domain of math and science education for K-12 Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels  Collection-level metadata is generated based on a collection-specific configuration Item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery HTML-based modules Keyword generator module
IBM Magic System Includes various content analytic modules for metadata generation: Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents Facilitates content reuse and repurposing Improves interoperability Creates more timely registration of content
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Released in June 2002 Provides an application-independent interoperability framework based on metadata harvesting Two levels of participants in the OAI-PMH: Data providers: Administer the systems Service providers: Use the metadata harvested to build their digital collection
OAI-PMH Key terms Harvester Operated by a service provider as a way to collect metadata from a repository Repository A network accessible server that is able to process OAI-PMH requests Managed by the data provider to allow harvesters access to its metadata
Harvesting Problems Lack of consistency Different collections using different DC elements and controlled vocabularies Repositories may have missing data within their metadata The repository may decline to fill out elements Incorrect data Data in the wrong element Harvested metadata can be confusing Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons Insufficient data
Recommendations for Improving Harvesting Establish guidelines and best practices Develop local standards Evaluate metadata Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment. Check to see if any fields are populated with unknown or N/A Communicate with the service provider
Conclusion Evidence suggests that OAI-PMH is a successful endeavor Increase in number of repositories Many funded projects based on OAI eprints.org  Metadata Harvesting Initiative of the Mellon Foundation NSF National Science Digital Library (NSDL) The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting

More Related Content

What's hot

Digital library software
Digital library softwareDigital library software
Digital library softwareavid
 
Library consortia
Library consortiaLibrary consortia
Library consortiaMpilo7
 
Library Automation in Circulation
Library Automation in Circulation Library Automation in Circulation
Library Automation in Circulation Murchana Borah
 
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.`Shweta Bhavsar
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpaceIryna Kuchma
 
METS(Metadata Encoding and Transmission Standard )
METS(Metadata Encoding and Transmission Standard )METS(Metadata Encoding and Transmission Standard )
METS(Metadata Encoding and Transmission Standard )Manu K M
 
Information Consolidation
Information ConsolidationInformation Consolidation
Information ConsolidationKishor Sakariya
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
 
Library congress subject headings
Library congress subject headings Library congress subject headings
Library congress subject headings MahendraAdhikari7
 

What's hot (20)

UNISIST
UNISISTUNISIST
UNISIST
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
Library consortia
Library consortiaLibrary consortia
Library consortia
 
International Standard Bibliographic Description: background and recent devel...
International Standard Bibliographic Description: background and recent devel...International Standard Bibliographic Description: background and recent devel...
International Standard Bibliographic Description: background and recent devel...
 
Dspace
DspaceDspace
Dspace
 
ISO 2709
ISO 2709ISO 2709
ISO 2709
 
Library 2.0
Library 2.0Library 2.0
Library 2.0
 
Metadata: A concept
Metadata: A conceptMetadata: A concept
Metadata: A concept
 
DELNET by Gaurav Boudh
DELNET by Gaurav BoudhDELNET by Gaurav Boudh
DELNET by Gaurav Boudh
 
Library Automation in Circulation
Library Automation in Circulation Library Automation in Circulation
Library Automation in Circulation
 
POPSI
POPSIPOPSI
POPSI
 
Digital Library Software
Digital Library SoftwareDigital Library Software
Digital Library Software
 
Spiral of Scientific Method Arun Joseph MPhil
Spiral of Scientific Method   Arun Joseph MPhilSpiral of Scientific Method   Arun Joseph MPhil
Spiral of Scientific Method Arun Joseph MPhil
 
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
METS(Metadata Encoding and Transmission Standard )
METS(Metadata Encoding and Transmission Standard )METS(Metadata Encoding and Transmission Standard )
METS(Metadata Encoding and Transmission Standard )
 
Information Consolidation
Information ConsolidationInformation Consolidation
Information Consolidation
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Library congress subject headings
Library congress subject headings Library congress subject headings
Library congress subject headings
 
SLSH ppt
SLSH pptSLSH ppt
SLSH ppt
 

Viewers also liked

OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...Patrice Chalon
 
Visual Resources for Teaching and Learning
Visual Resources for Teaching and LearningVisual Resources for Teaching and Learning
Visual Resources for Teaching and LearningEmilia Frinculeasa
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
Grooming Presentation
Grooming PresentationGrooming Presentation
Grooming PresentationNikhil Mathur
 

Viewers also liked (7)

OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...OAI-PMH for dummies: how to build an institutional repository with limited re...
OAI-PMH for dummies: how to build an institutional repository with limited re...
 
Cataloguing
CataloguingCataloguing
Cataloguing
 
Visual Resources for Teaching and Learning
Visual Resources for Teaching and LearningVisual Resources for Teaching and Learning
Visual Resources for Teaching and Learning
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
FishBase
FishBaseFishBase
FishBase
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Grooming Presentation
Grooming PresentationGrooming Presentation
Grooming Presentation
 

Similar to Metadata Harvesting via OAI-PMH

UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
 
MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)Nikos Palavitsinis, PhD
 
CC Technology Summit 3 Update
CC Technology Summit 3 UpdateCC Technology Summit 3 Update
CC Technology Summit 3 UpdateNathan Yergler
 
TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010Eli Robillard
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence dannyijwest
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligencedannyijwest
 
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEEMEMTECHSTUDENTPROJECTS
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1 e_chae
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsIJMER
 
Oracle data integrator training from hyderabad
Oracle data integrator training from hyderabadOracle data integrator training from hyderabad
Oracle data integrator training from hyderabadFuturePoint Technologies
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_internSai Ganesh
 

Similar to Metadata Harvesting via OAI-PMH (20)

Metadata
MetadataMetadata
Metadata
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)
 
Meta data
Meta dataMeta data
Meta data
 
CC Technology Summit 3 Update
CC Technology Summit 3 UpdateCC Technology Summit 3 Update
CC Technology Summit 3 Update
 
CodeIgniter
CodeIgniterCodeIgniter
CodeIgniter
 
TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010TSPUG: Content Management in SharePoint 2010
TSPUG: Content Management in SharePoint 2010
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence               Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
MIDESS
MIDESSMIDESS
MIDESS
 
Metadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled IntelligenceMetadata: Towards Machine-Enabled Intelligence
Metadata: Towards Machine-Enabled Intelligence
 
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
 
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1
 
Cake PHP
Cake PHPCake PHP
Cake PHP
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Oracle data integrator training from hyderabad
Oracle data integrator training from hyderabadOracle data integrator training from hyderabad
Oracle data integrator training from hyderabad
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_intern
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 

Metadata Harvesting via OAI-PMH

  • 1. Metadata Harvesting and the OAI-PMH Andrew Schenck Pamela Russell LIS 688
  • 2. What is Metadata Harvesting? An automatic metadata generating method Occurs when metadata is automatically collected from META tags Automatically gathers metadata from individual repositories
  • 3. Example Metadata Generators Metadata generators are also known as metadata extraction systems Sample metadata extraction systems available for libraries include: DC-dot MarcEdit Metaextract IBM Magic System Some are available via open source
  • 4. DC-dot DC-dot is open source and it can be redistributed or modified DC-dot creates Dublin Core metadata Metadata creation is initiated by submitting a URL Generates keywords by analyzing hyperlinked concepts and presentation encoding Does not produce description metadata Generates type, format and date metadata
  • 5. MarcEdit MarcEdit is open source MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool. An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. It allows users to: Customize the existing data conversion rules or create new data conversion rules Harvest metadata from a supported metadata format Create conversion templates for additional metadata formats Customize existing conversion templates to reflect many variations in best practices used among projects
  • 6. Metaextract Designed for metadata extraction in the domain of math and science education for K-12 Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels Collection-level metadata is generated based on a collection-specific configuration Item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery HTML-based modules Keyword generator module
  • 7. IBM Magic System Includes various content analytic modules for metadata generation: Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents Facilitates content reuse and repurposing Improves interoperability Creates more timely registration of content
  • 8. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Released in June 2002 Provides an application-independent interoperability framework based on metadata harvesting Two levels of participants in the OAI-PMH: Data providers: Administer the systems Service providers: Use the metadata harvested to build their digital collection
  • 9. OAI-PMH Key terms Harvester Operated by a service provider as a way to collect metadata from a repository Repository A network accessible server that is able to process OAI-PMH requests Managed by the data provider to allow harvesters access to its metadata
  • 10. Harvesting Problems Lack of consistency Different collections using different DC elements and controlled vocabularies Repositories may have missing data within their metadata The repository may decline to fill out elements Incorrect data Data in the wrong element Harvested metadata can be confusing Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons Insufficient data
  • 11. Recommendations for Improving Harvesting Establish guidelines and best practices Develop local standards Evaluate metadata Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment. Check to see if any fields are populated with unknown or N/A Communicate with the service provider
  • 12. Conclusion Evidence suggests that OAI-PMH is a successful endeavor Increase in number of repositories Many funded projects based on OAI eprints.org Metadata Harvesting Initiative of the Mellon Foundation NSF National Science Digital Library (NSDL) The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting

Editor's Notes

  1. Metadata harvesting and the Open Archives Initiative Protocol for Metadata Harvesting by Andrew Schenck and Pamela Russell
  2. Metadata harvesting is an automatic metadata generating method. Harvesting occurs when metadata is automatically collected from META tags found in the “header” source code of an HTML resource or encoded from another resource format. Metadata harvesting automatically gathers metadata from individual repositories where it has been produced by either automatic or manual approaches.
  3. Much like other automated tasks, there are a multitude of metadata generators available.These generators, also known as metadata extraction systems, can be extremely helpful for libraries wishing to extract metadata from various repositories. Some of the different metadata extraction systems available for libraries to use include: DC-dotMarcEditMetaextractand IBM Magic System.Some of these systems are available via open source and are free, although the people needed to run them must usually be paid.Many of the systems were created to harvest all types of metadata, and some were created to harvest metadata for very specific objects or areas of study.
  4. DC-dot was developed by Andy Powell at UKOLN at the University of Bath. DC-dot is open source and it can be redistributed or modified under the terms of the GNU General Public License as published by the Free Software Foundation.DC-dot creates Dublin Core metadata and can format output according to a number of different metadata schemas.In DC-dot, metadata creation is initiated by submitting a URL. The resource identifier metadata from the Web browser’s address prompt is copied, and metadata included in the title, keywords, description, and type fields is then harvested from the resource META tags. DC-dot will automatically generate keywords by analyzing hyperlinked concepts and presentation encoding (bolding and font size), but will not produce description metadata. DC-dot also automatically generates type, format, and date metadata
  5. MarcEdit was created by Terry Reese in 1998 and was initially conceived as a graphical user interface designed as a batch MARC editing tool. Currently, MarcEdit is an application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. Unlike other metadata extraction systems, MarcEdit allows users to customize the existing data conversion rules or create new data conversion rules.This allows users to harvest metadata from a supported metadata format as well as create conversion templates for additional metadata formats.It also allows users to customize existing conversion templates to reflect many variations in best practices used among projects.
  6. Metaextract is an extraction system that was designed for metadata extraction in the domain of math and science education for K-12.It was designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels using natural language processing techniques.The collection-level metadata is generated based on a collection-specific configuration and the item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery, HTML-based modules, and a keyword generator module.
  7. IBM Magic System was presented in 2005 and includes various content analytic modules for metadata generation.Audiovisual analysis modules are available that recognize semantic sound categories and identify narrators and informative text segments as well as text analysis modules that extract title, keywords and summaryfrom text documents.The IBM Magic System can facilitate content reuse and repurposing, improve interoperability and create more timely registration of content by course developers and authors.
  8. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) provides an application-independent interoperability framework that is based on metadata harvesting.There are two levels of participants in the OAI-PMH: data providers and service providers.Data providers administer the systems that support the OAI-PMH as a means of supplying metadata.Service providers use the metadata harvested from the OAI-PMH to help build their digital collections.
  9. Some other key terms necessary to understand OAI-PMH are harvester and repository. A harvester is a client application that can issue any OAI-PMH requests.The harvester is operated by a service provider as a way to collect metadata from a repository. A repository is a network accessible server that is able to process OAI-PMH requests. A repository is managed by the data provider to allow harvesters access to its metadata.
  10. The most common problem with harvested metadata is a lack of consistency. For example, inconsistencies across collections can occur when data providers use some Dublin Core elements and controlled vocabularies in one collection but not in another.On a larger scale, some data providers use different Dublin Core elements in different ways throughout their repository. This can lead to similar kinds of metadata ending up in different fields when harvested. The metadata harvested from OAI-PMH has other significant problems.Many repositories have missing data within their metadata. For example, if an entire collection consisted of materials of the same format or type, the repository may decline to fill out the “format” or “type” element in Dublin Core because the information would be deemed unnecessary for the collection’s local purposes. Every item is the same type so why fill out that field? This causes problems when an OAI-PMH service provider wants to limit their search. If they wanted to limit their search using the format or type element they wouldn’t be able to do so because that particular field had been left empty by the repository.An example of incorrect data in a repository would be creator names repeated in the language element or repeating the identifier for the metadata record in the Dublin Core identifier element. Also included in incorrect data would be any misspelled words or stray characters such as dashes or hyphens.Another problem with harvested metadata is that it can be confusing. Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons. This type of confusing data can occur when the entries are dumped without revision into a metadata record. This may happen when records are cut and pasted from Web HTML text. Insufficient data can also cause problems with harvesting because the metadata present in the repositories is not useful when trying to limit searches and retrieve specific information.
  11. Recommendations for improving harvesting:As a repository, established guidelines should be used and local standards should be developed. Either use a guideline and best practices resource that already exists or develop and document standards to meet your local needs.Evaluate your metadata to determine if there is some that you do not want or need to share.Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment.If you find that there are some unnecessary elements, unmap the fields before allowing them to be harvested.While checking for necessary and unnecessary fields, check to see if any fields are populated with unknown or N/A. In and aggregate environment this should not be done. It is better to leave a field blank than to use unknown or N/A in fields where harvesters might interpret them as meaningful data.Most importantly, communicate with the service provider who is harvesting your records. Review your metadata and determine if there are ways to make it cleaner and easier to understand
  12. Although the OAI-PMH is far from perfect, there is ample evidence to suggest that it is a successful endeavor.The number of repositories who make their metadata available through OAI-PMH has grown since the initial release in January of 2001.Another way to gage success is from the level of attention garnered from funding agencies. Some examples of funded projects and programs that promote or are based on the OAI are eprints.org, Metadata Harvesting Initiative of the Mellon Foundation and the NSF National Science Digital Library (NSDL).The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting. Although it is not a perfect process, it has been very successful in helping many libraries of all types, both large and small, to create and offer Web access to digital collections.