Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science

arXiv cs.CL Papers

Summary

This paper proposes a segment combination strategy for automatically classifying research methods in academic papers by partitioning full-text content. Experiments on an annotated corpus from Library and Information Science journals show that methodological information is unevenly distributed, with middle-to-late segments having higher discriminative power.

arXiv:2606.19051v1 Announce Type: new Abstract: Research methods are essential carriers of knowledge contribution in academic papers. Automatic multi-label classification of research methods can support knowledge services such as method retrieval, review generation, and research intelligence analysis. While existing studies primarily rely on titles and abstracts, abstracts often provide only limited methodological information, whereas utilizing full-text content faces challenges related to excessive length and information redundancy. Therefore, this paper proposes a segment combination strategy by partitioning the full-text content according to its physical postion. Using an annotated corpus of 1,954 full-text articles from three representative journals in Library and Information Science (JASIST, LISR, and JDoc), we evaluate the classification performance of various segments and their combinations across multiple models. Experimental results indicate that methodological information is distributed unevenly within the full-text content, with the middle-to-late and final segments exhibiting greater discriminative power. Furthermore, integrating bibliographic metadata with cross-segment combination strategies effectively enhances classification performance.
Original Article
View Cached Full Text

Cached at: 06/18/26, 05:47 AM

# Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science
Source: [https://arxiv.org/abs/2606.19051](https://arxiv.org/abs/2606.19051)
[View PDF](https://arxiv.org/pdf/2606.19051)

> Abstract:Research methods are essential carriers of knowledge contribution in academic papers\. Automatic multi\-label classification of research methods can support knowledge services such as method retrieval, review generation, and research intelligence analysis\. While existing studies primarily rely on titles and abstracts, abstracts often provide only limited methodological information, whereas utilizing full\-text content faces challenges related to excessive length and information redundancy\. Therefore, this paper proposes a segment combination strategy by partitioning the full\-text content according to its physical postion\. Using an annotated corpus of 1,954 full\-text articles from three representative journals in Library and Information Science \(JASIST, LISR, and JDoc\), we evaluate the classification performance of various segments and their combinations across multiple models\. Experimental results indicate that methodological information is distributed unevenly within the full\-text content, with the middle\-to\-late and final segments exhibiting greater discriminative power\. Furthermore, integrating bibliographic metadata with cross\-segment combination strategies effectively enhances classification performance\.

## Submission history

From: Chengzhi Zhang \[[view email](https://arxiv.org/show-email/8619b055/2606.19051)\] **\[v1\]**Wed, 17 Jun 2026 13:17:41 UTC \(1,620 KB\)

Similar Articles

AI for Monitoring and Classifying Data Used in Research Literature

arXiv cs.CL

This paper presents a multitask GLiNER-based framework for scalable monitoring of dataset usage in research literature, using synthetic data generation and LLM-based revalidation to address challenges in extraction, relation identification, and usage classification.

Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

arXiv cs.CL

This paper identifies document-side early compression as a failure mode in long-document dense retrieval and introduces the Evidence Dilution Index (EDI) to measure it. The authors propose DICE, a training-free method that splits documents into chunks, encodes them independently, and aggregates them into a single vector, significantly improving retrieval on long documents.