Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science
Summary
This paper proposes a segment combination strategy for automatically classifying research methods in academic papers by partitioning full-text content. Experiments on an annotated corpus from Library and Information Science journals show that methodological information is unevenly distributed, with middle-to-late segments having higher discriminative power.
View Cached Full Text
Cached at: 06/18/26, 05:47 AM
# Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science Source: [https://arxiv.org/abs/2606.19051](https://arxiv.org/abs/2606.19051) [View PDF](https://arxiv.org/pdf/2606.19051) > Abstract:Research methods are essential carriers of knowledge contribution in academic papers\. Automatic multi\-label classification of research methods can support knowledge services such as method retrieval, review generation, and research intelligence analysis\. While existing studies primarily rely on titles and abstracts, abstracts often provide only limited methodological information, whereas utilizing full\-text content faces challenges related to excessive length and information redundancy\. Therefore, this paper proposes a segment combination strategy by partitioning the full\-text content according to its physical postion\. Using an annotated corpus of 1,954 full\-text articles from three representative journals in Library and Information Science \(JASIST, LISR, and JDoc\), we evaluate the classification performance of various segments and their combinations across multiple models\. Experimental results indicate that methodological information is distributed unevenly within the full\-text content, with the middle\-to\-late and final segments exhibiting greater discriminative power\. Furthermore, integrating bibliographic metadata with cross\-segment combination strategies effectively enhances classification performance\. ## Submission history From: Chengzhi Zhang \[[view email](https://arxiv.org/show-email/8619b055/2606.19051)\] **\[v1\]**Wed, 17 Jun 2026 13:17:41 UTC \(1,620 KB\)
Similar Articles
Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches
This systematic review of 139 studies proposes a unified framework and meta-analysis for document classification via multimodal and multiview information fusion, finding that fusion improves accuracy (mean gain of +5.28 percentage points) but highlights reproducibility challenges.
How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description
This paper evaluates whether bibliometric structure improves LLM-assisted scientific literature synthesis by comparing six pipelines for generating cluster descriptions. Results show LLMs perform best in a hybrid workflow where bibliometric algorithms define clusters and LLMs generate readable descriptions.
From Snippets to Semantics: Rethinking Evidence Granularity for Multilingual Fact Verification
This paper introduces SEEK, a framework for semantic evidence extraction in multilingual fact verification, which constructs coherent evidence chunks from full articles and fine-tunes multilingual LLMs with LoRA, achieving up to 20% improvement in macro-F1 over baselines.
AI for Monitoring and Classifying Data Used in Research Literature
This paper presents a multitask GLiNER-based framework for scalable monitoring of dataset usage in research literature, using synthetic data generation and LLM-based revalidation to address challenges in extraction, relation identification, and usage classification.
Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation
This paper identifies document-side early compression as a failure mode in long-document dense retrieval and introduces the Evidence Dilution Index (EDI) to measure it. The authors propose DICE, a training-free method that splits documents into chunks, encodes them independently, and aggregates them into a single vector, significantly improving retrieval on long documents.