Metadata Harvesting via OAI-PMH

Metadata Harvesting and the OAI-PMH Andrew Schenck Pamela Russell LIS 688

What is Metadata Harvesting? An automatic metadata generating method Occurs when metadata is automatically collected from META tags Automatically gathers metadata from individual repositories

Example Metadata Generators Metadata generators are also known as metadata extraction systems Sample metadata extraction systems available for libraries include: DC-dot MarcEdit Metaextract IBM Magic System Some are available via open source

DC-dot DC-dot is open source and it can be redistributed or modified DC-dot creates Dublin Core metadata Metadata creation is initiated by submitting a URL Generates keywords by analyzing hyperlinked concepts and presentation encoding Does not produce description metadata Generates type, format and date metadata

MarcEdit MarcEdit is open source MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool. An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. It allows users to: Customize the existing data conversion rules or create new data conversion rules Harvest metadata from a supported metadata format Create conversion templates for additional metadata formats Customize existing conversion templates to reflect many variations in best practices used among projects

Metaextract Designed for metadata extraction in the domain of math and science education for K-12 Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels Collection-level metadata is generated based on a collection-specific configuration Item-level metadata is extracted from the content of educational documents using three extraction modules: eQuery HTML-based modules Keyword generator module

IBM Magic System Includes various content analytic modules for metadata generation: Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents Facilitates content reuse and repurposing Improves interoperability Creates more timely registration of content

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Released in June 2002 Provides an application-independent interoperability framework based on metadata harvesting Two levels of participants in the OAI-PMH: Data providers: Administer the systems Service providers: Use the metadata harvested to build their digital collection

OAI-PMH Key terms Harvester Operated by a service provider as a way to collect metadata from a repository Repository A network accessible server that is able to process OAI-PMH requests Managed by the data provider to allow harvesters access to its metadata

Harvesting Problems Lack of consistency Different collections using different DC elements and controlled vocabularies Repositories may have missing data within their metadata The repository may decline to fill out elements Incorrect data Data in the wrong element Harvested metadata can be confusing Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons Insufficient data

Recommendations for Improving Harvesting Establish guidelines and best practices Develop local standards Evaluate metadata Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment. Check to see if any fields are populated with unknown or N/A Communicate with the service provider

Conclusion Evidence suggests that OAI-PMH is a successful endeavor Increase in number of repositories Many funded projects based on OAI eprints.org Metadata Harvesting Initiative of the Mellon Foundation NSF National Science Digital Library (NSDL) The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting

Metadata Harvesting via OAI-PMH

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Metadata Harvesting via OAI-PMH

Similar to Metadata Harvesting via OAI-PMH (20)

Recently uploaded

Recently uploaded (20)

Metadata Harvesting via OAI-PMH

Editor's Notes