The summary of "PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation Learning"
Article Link: https://lnkd.in/evaNbHk4
Academic graphs generated from bibliographic data serve as an essential data source across many different fields. For analysis of the academic graphs, researchers are exploring Heterogeneous graph neural networks (GNNs) to incorporate the node and edge types inside the graph for better result. However, recent paper demonstrated that these state-of-the-art heterogeneous GNNs can have widely varying results due to the lack of consistent experimental setup and preprocessing of the data.
In this paper, Eric W Lee and their colleagues present a new benchmark dataset of over 30 million PubMed articles for evaluating heterogeneous graph embeddings for biomedical literature. PubMed, an academic graph that contains over 33 million citations and abstracts of literature related to biomedicine and health fields, can provide rich metadata including abstract, authors, citations, MeSH terms, MeSH hierarchy, and some other information to this benchmark. In addition to building PGB, extensive benchmark experiments are performed for the dataset using current state-of-the-art graph embedding methods including 2 homogeneous GNNs and 3 heterogeneous graph embedding models.
Experimental results show that the scalability and the capability of handling rich metadata, especially the hierarchical structure, for existing graph embedding models, still remain open challenges.
Reference: Lee, E. W., & Ho, J. C. (2023). PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation Learning.
DOI: 10.48550/arXiv.2305.02691
#artificialintelligence #pubmed