Please be polite to your peers: a multi-task model for assessing the tone and objectivity of critiques of peer review comments

Bharti, Prabhat Kumar; Agarwal, Mayank; Ekbal, Asif

doi:10.1007/s11192-024-04938-z

Please be polite to your peers: a multi-task model for assessing the tone and objectivity of critiques of peer review comments

Published: 15 February 2024

Volume 129, pages 1377–1413, (2024)
Cite this article

Scientometrics Aims and scope Submit manuscript

Prabhat Kumar Bharti¹,
Mayank Agarwal¹ &
Asif Ekbal¹

15 Altmetric
1 Mention
Explore all metrics

Abstract

The peer-review process plays a pivotal role in maintaining the quality and credibility of scientific publications. However, in recent times, there has been an increase in unhelpful and overly critical reviews, which can be detrimental to the process. This surge in unconstructive reviews can be attributed to a higher volume of paper submissions and the inclusion of inexperienced reviewers. Consequently, authors are left with limited valuable feedback, compromising the effectiveness of peer review. Peer review feedback must be not only objective but also delivered politely and constructively. Our study introduces a novel approach to assessing the constructiveness and tone of peer reviews. We propose a two-fold taxonomy that categorizes reviews into five labels for constructiveness and three labels for politeness. To facilitate this research, we have created a corpus of 2716 review sentences, which have been manually annotated with a high inter-annotation agreement of 88.27% for constructiveness and 83.49% for politeness, offering a valuable resource for the scientific community. Furthermore, we present a multi-task model named “Multi-Label Critique (MLC)”that leverages ToxicBERT representations and deep neural attention mechanisms. This model adeptly evaluates the constructiveness and politeness of review sentences, outperforming competitive baseline models with an impressive accuracy of 87.4%. Our paper includes an extensive analysis of the MLC model and its variations. Our research is a significant step towards contributing to the development of constructive peer-review reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews

Article 14 May 2023

BetterPR: A Dataset for Estimating the Constructiveness of Peer Review Comments

PEERAssist: Leveraging on Paper-Review Interactions to Predict Peer Review Decisions

Notes

https://iclr.cc/.
https://neurips.cc.
http://shitmyreviewerssay.tumblr.com.
https://publons.com/wos-op/.
http://shitmyreviewerssay.tumblr.com.
https://2020.emnlp.org/blog/2020-05-17-write-good-reviews.
https://iclr.cc/Conferences/2021/ReviewerGuide.
https://aaai.org/?s=Reviewer+Guidelines.
https://icml.cc/Conferences/2020/ReviewerGuidelines.
https://neurips.cc/Conferences/2022/ReviewerGuidelines.
https://huggingface.co/
https://huggingface.co/roberta-base..
https://huggingface.co/gsarti/scibert-nli..
https://huggingface.co/GroNLP/hateBERT..
https://publons.com/wos-op/review/9635430.
https://publons.com/wos-op/review/14137393.

References

Ahmed, S.S., et al.: (2021). Classification of censored tweets in chinese language using xlnet. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, (pp. 136–139)
Akhtar, S.S., Pandey, P., & Ekbal, A. (2020). Multi-task learning with deep neural networks for aspect term extraction and sentiment analysis. In Proceedings of the Fourth Workshop on Natural Language Processing Techniques for Educational Applications, (pp. 62–68)
Beaumont, L. J. (2019). Peer reviewers need a code of conduct too. Nature, 572(7769), 439–440.
Article Google Scholar
Belcher, D. D. (2007). Seeking acceptance in an english-only research world. Journal of Second Language Writing, 16(1), 1–22.
Article Google Scholar
Beltagy, I., Lo, K., & Cohan, A. (2019). Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676
Bharti, P.K., Ghosal, T., Agarwal, M., & Ekbal, A. (2022). Betterpr: a dataset for estimating the constructiveness of peer review comments. In: Linking Theory and Practice of Digital Libraries: 26th International Conference on Theory and Practice of Digital Libraries, TPDL 2022, Padua, Italy, September 20–23, 2022, Proceedings, pp. 500–505 . Springer
Bharti, P.K., Ghosal, T., Agarwal, M., & Ekbal, A. (2022). A dataset for estimating the constructiveness of peer review comments. In: International Conference on Theory and Practice of Digital Libraries, Springer (pp. 500–505).
Bohannon, J. (2013). Who’s afraid of peer review? American Association for the Advancement of Science, 2013, 60–65.
Article Google Scholar
Bonn, N.A. (2020). Noémie aubert bonn
Brezis, E. S., & Birukou, A. (2020). Models of peer review: A comparative study of selection and evaluation in journal publishing. Scientometrics, 125(1), 87–115.
Google Scholar
Caselli, T., Basile, V., Mitrović, J., & Granitzer, M. (2020). Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472
Choudhary, G., Modani, N., & Maurya, N. (2021). React: A review comment dataset for act ionability (and more). In: Web Information Systems Engineering–WISE 2021: 22nd International Conference on Web Information Systems Engineering, WISE 2021, Melbourne, VIC, Australia, October 26–29, 2021, Proceedings, Part II 22. Springer, (pp. 336–343).
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
Article Google Scholar
Coniam, D. (2012). Exploring reviewer reactions to manuscripts submitted to academic journals. System, 40(4), 544–553.
Article Google Scholar
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116
Dell’Anno, D., Schneider, J., & Falk-Krzesinski, H. J. (2020). Understanding the peer-review process: A simulated experiment to assess different decision-making criteria. Scientometrics, 125(1), 467–494.
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dueñas, P. M. (2012). Getting research published internationally in english: An ethnographic account of a team of finance spanish scholars’ struggles. Ibérica, Revista de la Asociación Europea de Lenguas para Fines Específicos, 24, 139–155.
Google Scholar
Falkenberg, L. J., & Soranno, P. A. (2018). Reviewing reviews: An evaluation of peer reviews of journal article submissions. Limnology and Oceanography Bulletin, 27(1), 1–5.
Article Google Scholar
Fan, K., Li, J., He, S., Sun, Z., & Ma, Y. (2018). Multi-grained attention network for aspect-level sentiment classification. In Proceedings of the 27th International Conference on Computational Linguistics, (pp. 3006–3017)
Fortanet, I. (2008). Evaluative language in peer review referee reports. Journal of English for Academic Purposes, 7(1), 27–37.
Article Google Scholar
Gao, Y., Eger, S., Kuznetsov, I., Gurevych, I., & Miyao, Y. (2019). Does my rebuttal matter? insights from a major nlp conference. arXiv preprint arXiv:1903.11367
Gerwing, T. G., Gerwing, A. M. A., Avery-Gomm, S., Choi, C.-Y., Clements, J. C., & Rash, J. A. (2020). Quantifying professionalism in peer review. Research Integrity and Peer Review, 5(1), 1–8.
Article Google Scholar
Ghosal, T., Kumar, S., Bharti, P. K., & Ekbal, A. (2022). Peer review analyze: A novel benchmark resource for computational analysis of peer reviews. PloS ONE, 17(1), 0259238.
Article Google Scholar
Hewings, M. (2004). An’important contribution’or’tiresome reading’? a study of evaluation in peer reviews of journal article submissions. Journal of Applied Linguistics and Professional Practice, 247–274
Hua, X., Nikolov, M., Badugu, N., & Wang, L. (2019). Argument mining for understanding peer reviews. arXiv preprint arXiv:1903.10104
Hyland, K. (2018). Metadiscourse: Exploring Interaction in Writing. Bloomsbury Publishing.
Google Scholar
Hyland, K., & Jiang, F. K. (2020). “This work is antithetical to the spirit of research”: An anatomy of harsh peer reviews. Journal of English for Academic Purposes, 46, 100867.
Article Google Scholar
Hyland, K. (2016). Academic publishing: Issues and challenges in the construction of knowledge. Oxford University Press.
Isonuma, M., Fujino, T., Mori, J., Matsuo, Y., & Sakata, I. (2017). Extractive summarization using multi-task learning with document classification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (pp. 2101–2110)
Jefferson, T., Rudin, M., Folse, S.B., & Davidoff, F. (2006). Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database of Systematic Reviews (1), John Wiley & Sons, Ltd
Jones, T., & Lee, M. (2020). The limitations of peer review in a changing world: A preliminary investigation. Journal of Scholarly Publishing, 51(3), 207–225.
Google Scholar
Jubb, M. (2016). Peer Review: An Introduction and Guide. SAGE Publications Limited.
Google Scholar
Kang, D., Ammar, W., Dalvi, B., Zuylen, M., Kohlmeier, S., Hovy, E.H., & Schwartz, R. (2018) A dataset of peer reviews (peerread): Collection, insights and NLP applications. In: Walker, M.A., Ji, H., Stent, A. (eds.) In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). Association for Computational Linguistics. (pp. 1647–1661). https://doi.org/10.18653/v1/n18-1149
Kendall, M. G., & Smith, B. (1939). The problem of m rankings. The Annals of Mathematical Statistics, 10(3), 275–287. https://doi.org/10.1214/aoms/1177732140
Article MathSciNet Google Scholar
Kourilová, M. (1996). Interactive functions of language in peer reviews of medical papers written by non-native users of english. Unesco ALSED-LSP Newsletter, 19(1), 4–21.
Google Scholar
Kowalczuk, M. K., Dudbridge, F., & Nanda, S. (2015). A brief history of peer review. Genetics in Medicine, 17(10), 766–767.
Google Scholar
Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Sage.
Google Scholar
Kumar, S., Ghosal, T., Bharti, P.K., & Ekbal, A. (2021). Sharing is caring! joint multitask learning helps aspect-category extraction and sentiment detection in scientific peer reviews. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 270–273 . IEEE
Lauscher, A., Glavaš, G., & Ponzetto, S.P. (2018). An argument-annotated corpus of scientific publications. Association for Computational Linguistics
Li, X., Ye, Y., & Wang, S. (2017). Automatic sentiment analysis of peer review comments. In Proceedings of the 2017 IEEE International Conference on Big Data, (pp. 1627–1630) . IEEE.
Lin, J., Song, J., Zhou, Z., Chen, Y., & Shi, X. (2022). Moprd: A multidisciplinary open peer review dataset. arXiv preprint arXiv:2212.04972
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692
Luu, S.T., & Nguyen, N.L.-T. (2021). Uit-ise-nlp at semeval-2021 task 5: Toxic spans detection with bilstm-crf and toxicbert comment classification. arXiv preprint arXiv:2104.10100
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., & Cambria, E. (2018). Iarm: Inter-aspect relation modeling with memory networks for aspect-based sentiment analysis. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (pp. 1643–1652).
Matsui, A., Chen, E., Wang, Y., & Ferrara, E. (2021). The impact of peer review on the contribution potential of scientific papers. Peer J, 9, 11999.
Article Google Scholar
Mavrogenis, A. F., Quaile, A., & Scarlat, M. M. (2020). The good, the bad and the rude peer-review. International Orthopaedics, 44(3), 413–415.
Article Google Scholar
Mehta, D., Dwivedi, A., Patra, A., & Anand Kumar, M. (2021). A transformer-based architecture for fake news classification. Social Network Analysis Mining, 11, 1–12.
Article Google Scholar
Mulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a changing world: An international study measuring the attitudes of researchers. Journal of the American Society for Information Science and Technology, 64(1), 132–161.
Article Google Scholar
Mungra, P., & Webber, P. (2010). Peer review process in medical research publications: Language and content comments. English for Specific Purposes, 29(1), 43–53.
Article Google Scholar
OpenReview: ICLR 2021 (2021) OpenReview. https://openreview.net/group?id=ICLR.cc/2021/Conference
Paltridge, B. (2017). The discourse of peer review. Palgrave Macmillan.
Book Google Scholar
Plank, B., & Dalen, R. (2019). Citetracked: A longitudinal dataset of peer reviews and citations. In: BIRNDL@ SIGIR, pp. 116–122
Ragone, A., Bakker, R., & Parolo, P. P. (2013). An agent-based model of peer review applied to research funding. Scientometrics, 94(1), 291–312.
Google Scholar
Rennie, D. (2016). Let’s make peer review scientific. Nature, 535(7610), 31–33.
Article Google Scholar
Rowland, F. (2002). The peer-review process. Learning and Teaching in the Social Sciences, 1(2), 73–91.
Google Scholar
Salem, D. N., Aboelmaged, M. G., & Alshenqeeti, H. (2016). The peer review process: An overview for researchers. Journal of Educational and Social Research, 6(3), 117–124.
Google Scholar
Schwartz, S. J., & Zamboanga, B. L. (2009). The peer-review and editorial system: Ways to fix something that might be broken. Perspectives on Psychological Science, 4(1), 54–61.
Article Google Scholar
Seeber, M., & Bacchelli, A. (2017). Does single-blind peer review hinder newcomers? Scientometrics, 113(3), 1435–1453.
Google Scholar
Shen, C., Cheng, L., Zhou, R., Bing, L., You, Y., & Si, L. (2022). Mred: A meta-review dataset for structure-controllable text generation. Findings of the Association for Computational Linguistics: ACL, 2022, 2521–2535.
Google Scholar
Silbiger, N. J., & Stubler, A. D. (2019). Unprofessional peer reviews disproportionately harm underrepresented groups in stem. PeerJ, 7, 8247.
Article Google Scholar
Siler, K., Lee, K., & Bero, L. (2015). Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences, 112(2), 360–365.
Article Google Scholar
Singh, S., Singh, M., & Goyal, P. (2021). Compare: a taxonomy and dataset of comparison discussions in peer reviews. In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 238–241. IEEE
Smith, E., Haustein, S., Mongeon, P., Shu, F., Ridde, V., Larivière, V., & Bowman, T. D. (2021). Peer review: A changing landscape. El Profesional de la Información, 30(2), 300215.
Google Scholar
Stappen, L., Rizos, G., Hasan, M., Hain, T., & Schuller, B.W. (2020). Uncertainty-aware machine support for paper reviewing on the interspeech 2019 submission corpus
Swales, J. (1996). Occluded genres in the academy. Academic Writing, 45, 58.
Google Scholar
Tang, D., Qin, B., Liu, T., & Yang, Y. (2016). Aspect level sentiment classification with deep memory network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, (pp. 214–224)
Tay, Y., Luu, A.T., Hui, S.C., Li, J., & Zhao, M. (2018). Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 2876–2886)
Verma, R., Roychoudhury, R., & Ghosal, T. (2022). The lack of theory is painful: Modeling harshness in peer review comments. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, (pp. 925–935)
Wang, W., Pan, J., Dahlmeier, D., & Xia, L. (2016). Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615
Wilcox, C. (2019). Rude reviews are pervasive and sometimes harmful, study finds. Science, 366(6472), 1433–1433.
Article Google Scholar
Xue, W., Li, F., & Huang, S. (2018). Aspect based sentiment analysis with deep learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 3088–3097)
Yuan, W., Liu, P., & Neubig, G. (2022). Can we automate scientific reviewing? Journal of Artificial Intelligence Research, 75, 171–212.
Article MathSciNet Google Scholar
Zhao, X., Ma, J., & Sun, C. (2019). Sentiment analysis of peer review comments using a convolutional neural network. Scientometrics, 119(2), 1167–1186.
Google Scholar

Download references

Acknowledgements

The third author, Asif Ekbal, has received the Visvesvaraya Young Faculty Award. He owes a debt of gratitude to the Indian government and the Ministry of Electronics and Information Technology for their assistance. We want to thank our annotators, Meith Navlakha and Rahul Raheja for their annotations and data-cleaning work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Bihta, Patna, Bihar, 801106, India
Prabhat Kumar Bharti, Mayank Agarwal & Asif Ekbal

Authors

Prabhat Kumar Bharti
View author publications
You can also search for this author in PubMed Google Scholar
Mayank Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Asif Ekbal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

PKB: Conceptualization, data curation, annotation, investigation, methodology, experiments, writing - original draft and review & editing. MASupervision, reviewing & editing. AESupervision, Conceptualization, reviewing & editing.

Corresponding author

Correspondence to Prabhat Kumar Bharti.

Ethics declarations

Conflict of interest:

We, the authors, declare that there are no conflicts of interest with respect to the publication of this article.

Ethical approval

The objective of our paper is not to target or criticize any particular individual. Instead, we aim to highlight the prevailing negative cultural trends within peer review processes. We hope to inspire positive changes that will improve the peer review system by initiating a conversation and raising awareness on this issue.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bharti, P.K., Agarwal, M. & Ekbal, A. Please be polite to your peers: a multi-task model for assessing the tone and objectivity of critiques of peer review comments. Scientometrics 129, 1377–1413 (2024). https://doi.org/10.1007/s11192-024-04938-z

Download citation

Received: 01 April 2023
Accepted: 09 January 2024
Published: 15 February 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11192-024-04938-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Please be polite to your peers: a multi-task model for assessing the tone and objectivity of critiques of peer review comments

Abstract

Access this article

Similar content being viewed by others

PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews

BetterPR: A Dataset for Estimating the Constructiveness of Peer Review Comments

PEERAssist: Leveraging on Paper-Review Interactions to Predict Peer Review Decisions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest:

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Please be polite to your peers: a multi-task model for assessing the tone and objectivity of critiques of peer review comments

Abstract

Access this article

Similar content being viewed by others

PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews

BetterPR: A Dataset for Estimating the Constructiveness of Peer Review Comments

PEERAssist: Leveraging on Paper-Review Interactions to Predict Peer Review Decisions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest:

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation