MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization

arXiv cs.CL Papers

Summary

Presents MASF, a multi-model adaptive selection framework that integrates multiple fine-tuned transformer summarization models and selects the highest-quality summary, achieving 88.63% BERTScore on CNN/DailyMail and outperforming several LLMs.

arXiv:2606.05494v1 Announce Type: new Abstract: Automatic text summarization has become increasingly important due to the rapid growth of digital textual information. This paper presents a Multi-Model Adaptive Summarization Framework designed to improve the robustness and quality of abstractive text summarization. Relying on a single model often leads to inconsistent summarization quality across articles with varying structures and topics. To address this limitation, the proposed framework integrates multiple fine-tuned transformer-based summarization models and introduces an adaptive selection mechanism. In this framework, each model independently generates a candidate summary for the same input article. The generated summaries are then evaluated using automatic evaluation metrics that capture both lexical similarity and semantic relevance. Based on these scores, the framework selects the highest-quality summary as the final output. The models are fine-tuned and evaluated on the widely used CNN/DailyMail news summarization dataset. Experimental results demonstrate that the proposed framework achieves the highest BERTScore among all compared methods with a score of 88.63%. It also outperforms several LLMs such as GPT3-D2, Falcon-7b, and Mpt-7b, highlighting its effectiveness and robustness. These findings highlight the effectiveness of leveraging multiple transformer-based models within an adaptive selection strategy to improve the quality and robustness of automatic text summarization systems.
Original Article
View Cached Full Text

Cached at: 06/05/26, 08:06 AM

# MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization
Source: [https://arxiv.org/html/2606.05494](https://arxiv.org/html/2606.05494)
###### Abstract

Automatic text summarization has become increasingly important due to the rapid growth of digital textual information\. This paper presents aMulti\-Model Adaptive Summarization Frameworkdesigned to improve the robustness and quality of abstractive text summarization\. Relying on a single model often leads to inconsistent summarization quality across articles with varying structures and topics\. To address this limitation, the proposed framework integrates multiple fine\-tuned transformer\-based summarization models and introduces an adaptive selection mechanism\. In this framework, each model independently generates a candidate summary for the same input article\. The generated summaries are then evaluated using automatic evaluation metrics that capture both lexical similarity and semantic relevance\. Based on these scores, the framework selects the highest\-quality summary as the final output\. The models are fine\-tuned and evaluated on the widely used CNN/DailyMail news summarization dataset\. Experimental results demonstrate that the proposed framework achieves the highest BERTScore among all compared methods with a score of 88\.63%\. It also outperforms several LLMs such as GPT3\-D2, Falcon\-7b, and Mpt\-7b, highlighting its effectiveness and robustness\. These findings highlight the effectiveness of leveraging multiple transformer\-based models within an adaptive selection strategy to improve the quality and robustness of automatic text summarization systems\.

††publicationid:pubid:979\-8\-3315\-8488\-7/26/$31\.00 ©2026 IEEE## IIntroduction

The rapid growth of digital information has led to an unprecedented increase in textual data across news platforms, social media, and online repositories\. As a result, efficiently processing and understanding large volumes of text has become a critical challenge\. Text summarization has emerged as an essential Natural Language Processing \(NLP\) task that aims to generate concise summaries while preserving the core meaning of the original content\. In particular, news summarization plays an important role in helping users quickly grasp key information from lengthy articles\[[25](https://arxiv.org/html/2606.05494#bib.bib1),[13](https://arxiv.org/html/2606.05494#bib.bib2)\]\.

Traditional methods relied mainly on extractive techniques that select the most informative sentences from the source text\. These approaches typically follow a pipeline consisting of text preprocessing, feature extraction, sentence scoring, base model utilization, sentence selection, and final summary generation\. A comprehensive review of extractive summarization techniques highlights the wide range of methods applied in this domain, including statistical, rule\-based, fuzzy logic, optimization, graph\-based, clustering\-based, machine learning, and deep learning approaches\[[25](https://arxiv.org/html/2606.05494#bib.bib1)\]\. Although some extractive summarization techniques have contributed significantly to the development of summarization systems, many existing surveys emphasize that current methods still face several challenges, including limited robustness across diverse article structures and topics, inconsistency in summary quality when relying on a single model architecture, and heavy reliance on individual model outputs that may not consistently capture both lexical and semantic quality\[[25](https://arxiv.org/html/2606.05494#bib.bib1),[12](https://arxiv.org/html/2606.05494#bib.bib4),[3](https://arxiv.org/html/2606.05494#bib.bib5)\]\.

Transformer architectures have achieved superior results on benchmark datasets such as CNN/DailyMail due to their self\-attention mechanism and parallel sequence processing capabilities, which enable better modeling of long\-range dependencies in text\[[16](https://arxiv.org/html/2606.05494#bib.bib3),[18](https://arxiv.org/html/2606.05494#bib.bib22)\]\. Similarly, hybrid frameworks that combine Optical Character Recognition \(OCR\) and deep learning summarization models such as LSTM, Bi\-LSTM, BERT, and T5 to process textual information extracted from images\[[13](https://arxiv.org/html/2606.05494#bib.bib2)\]\. Despite these advancements, deep learning models still face several limitations, including high computational requirements, sensitivity to training data quality, and dependence on evaluation metrics such as ROUGE that may not fully reflect summary quality from a semantic perspective\[[16](https://arxiv.org/html/2606.05494#bib.bib3)\]\.

Furthermore, the emergence of Large Language Models \(LLMs\) has introduced new possibilities for text summarization\. Recent studies have investigated the use of multiple LLMs, including models such as MPT\-7B, Falcon\-7B, and ChatGPT\-based architectures, for generating abstractive summaries\[[3](https://arxiv.org/html/2606.05494#bib.bib5)\]\. Although these approaches demonstrate the flexibility of modern summarization systems, they also highlight an important limitation: most existing systems rely on a single model during inference, which may not always generate the most informative summary\.

Motivated by these limitations, this study proposes aMulti\-Model Adaptive Summarization Frameworkfor the CNN/DailyMail dataset\. Unlike traditional approaches that depend on a single summarization model, the proposed method leverages multiple transformer\-based models to generate candidate summaries for each input article\. Specifically, three different summarization models are utilized to produce alternative summaries, and each generated summary is evaluated using automatic evaluation metrics that measure lexical, n\-gram overlap, and semantic similarity\. The system then automatically selects the most informative summary based on a combined evaluation score\. By integrating multiple models and an adaptive selection mechanism, the proposed framework aims to improve summarization quality and robustness compared with single\-model approaches\. This strategy enables the system to exploit the strengths of different models while mitigating the weaknesses of individual architectures, ultimately producing more accurate and coherent summaries for news articles\.

## IIRelated Work

Automatic Text Summarization \(ATS\) has been widely studied as a fundamental Natural Language Processing \(NLP\) task aimed at generating concise representations of large textual documents while preserving essential information\[[8](https://arxiv.org/html/2606.05494#bib.bib27)\]\. The rapid growth of digital text across domains such as news articles, scientific publications, and social media has intensified the need for efficient summarization systems\[[5](https://arxiv.org/html/2606.05494#bib.bib7),[24](https://arxiv.org/html/2606.05494#bib.bib8)\]\. Early research in this field focused primarily on extractive summarization techniques, where important sentences are selected directly from the original document to form the summary\. Comprehensive surveys have analyzed the evolution of summarization methods, highlighting the main components of summarization pipelines, including preprocessing, feature extraction, sentence scoring, and summary generation\[[25](https://arxiv.org/html/2606.05494#bib.bib1),[17](https://arxiv.org/html/2606.05494#bib.bib6),[6](https://arxiv.org/html/2606.05494#bib.bib24)\]\. These studies also emphasize the increasing complexity of summarization tasks, particularly with the emergence of multi\-document, multilingual, and multimodal content, which continues to pose challenges for existing systems\[[25](https://arxiv.org/html/2606.05494#bib.bib1)\]\. Additionally, several surveys have investigated the progression of abstractive summarization models, datasets, and evaluation methodologies, identifying widely used benchmarks such as the CNN/DailyMail dataset and evaluation metrics including ROUGE\-based measures\[[19](https://arxiv.org/html/2606.05494#bib.bib19),[2](https://arxiv.org/html/2606.05494#bib.bib23)\]\. These analyses collectively provide a comprehensive understanding of the current landscape of text summarization research and highlight several unresolved challenges\.

Traditional extractive summarization approaches have employed various statistical, graph\-based, and feature\-driven techniques to identify salient sentences within documents\. Graph\-based models have been widely explored, where sentence importance is determined using similarity relationships between sentences\. For example, sentence centrality and semantic similarity have been utilized to construct graphs that capture relationships between textual units, enabling more informative sentence selection\[[9](https://arxiv.org/html/2606.05494#bib.bib13)\]\. Similarly, ranking\-based approaches have been proposed to combine multiple sentence\-level features such as topic information, semantic representations, keywords, and positional importance in order to determine sentence saliency within a document\[[10](https://arxiv.org/html/2606.05494#bib.bib12)\]\. Other studies have investigated enhancements to classic algorithms such as TextRank by integrating word embeddings and weighting mechanisms to improve sentence representation and summary quality\[[14](https://arxiv.org/html/2606.05494#bib.bib15)\]\. In addition, unsupervised summarization methods have explored clustering and topic modeling strategies to reduce topic bias and generate summaries that better represent document subtopics\[[20](https://arxiv.org/html/2606.05494#bib.bib20)\]\. Despite their effectiveness and computational efficiency, extractive approaches often suffer from redundancy and lack the ability to generate coherent paraphrased summaries, which limits their ability to match human\-written summaries\.

To address the limitations of extractive techniques, research has increasingly shifted towards abstractive summarization using neural network architectures\. Sequence\-to\-sequence models based on Recurrent Neural Networks \(RNNs\) and Long Short\-Term Memory \(LSTM\) networks have been widely applied to generate summaries that paraphrase the source text while preserving its meaning\[[11](https://arxiv.org/html/2606.05494#bib.bib16)\]\. These models typically employ encoder–decoder architectures with attention mechanisms to capture contextual dependencies between words and sentences\. Further improvements have been achieved by incorporating bidirectional encoders, stacked architectures, and attention mechanisms that enhance sequence representation and summarization performance\[[11](https://arxiv.org/html/2606.05494#bib.bib16),[1](https://arxiv.org/html/2606.05494#bib.bib26)\]\. Additionally, discourse\-aware neural models have been proposed to capture long\-range dependencies and structural relationships between discourse units within documents, improving the quality of extractive summarization by modeling document\-level discourse structures\[[23](https://arxiv.org/html/2606.05494#bib.bib10)\]\.

More recently, Transformer\-based architectures and pre\-trained language models have significantly advanced the performance of text summarization systems\. Transformer models benefit from self\-attention mechanisms that allow them to capture long\-range dependencies and contextual relationships more effectively than earlier neural architectures\[[16](https://arxiv.org/html/2606.05494#bib.bib3)\]\. Pre\-trained models such as PEGASUS\-xsum, BART, and T5 have demonstrated strong performance on benchmark datasets by leveraging large\-scale pre\-training followed by task\-specific fine\-tuning\[[12](https://arxiv.org/html/2606.05494#bib.bib4),[15](https://arxiv.org/html/2606.05494#bib.bib18)\]\. Nevertheless, fine\-tuning large pre\-trained models often introduces challenges related to overfitting and high computational costs, motivating research into optimization techniques and model adaptation strategies to improve generalization performance\[[12](https://arxiv.org/html/2606.05494#bib.bib4)\]\. Furthermore, research has explored alternative problem formulations for extractive summarization, such as modeling summarization as a semantic matching task between source documents and candidate summaries, which has achieved competitive performance on the CNN/DailyMail dataset\[[26](https://arxiv.org/html/2606.05494#bib.bib17)\]\. Other studies have also examined multi\-modal summarization frameworks that combine Optical Character Recognition \(OCR\) with deep learning models in order to summarize textual information extracted from images, expanding the scope of summarization applications\[[13](https://arxiv.org/html/2606.05494#bib.bib2)\]\.

Beyond general\-domain summarization, LLMs have also been applied to specialized domains such as clinical text summarization, where adapted models have demonstrated performance comparable to or exceeding human experts in certain tasks\[[21](https://arxiv.org/html/2606.05494#bib.bib11)\]\. Despite these advancements, deep learning and LLM\-based summarization systems still face several challenges, including maintaining factual consistency, ensuring semantic correctness, and reliably evaluating generated summaries\[[15](https://arxiv.org/html/2606.05494#bib.bib18)\]\. These limitations motivate the exploration of alternative frameworks that leverage the strengths of multiple summarization models to improve overall summarization performance\.

TABLE I:Example sample from the CNN/DailyMail dataset\.ArticleHighlightsLiverpool target Neto is also wanted by PSG and clubs in Spain as Brendan Rodgers faces stiff competition to land the Fiorentina goalkeeper, according to the Brazilian’s agent Stefano Castagna\. The Reds were linked with a move for the 25\-year\-old, whose contract expires in June, earlier in the season when Simon Mignolet was dropped from the side\. A January move for Neto never materialised but the former Atletico Paranaense keeper looks certain to leave the Florence\-based club in the summer\. It had been reported that Neto had a verbal agreement to join Serie A champions Juventus at the end of the season but his agent has revealed no decision about his future has been made yet\. And Castagna claims Neto will have his pick of top European clubs when the transfer window re\-opens in the summer, including Brendan Rodgers’ side\. ’There are many European clubs interested in Neto, such as for example Liverpool and Paris Saint\-Germain,’ Stefano Castagna is quoted as saying by Gazzetta TV\. Firoentina goalkeeper Neto saves at the feet of Tottenham midfielder Nacer Chadli in the Europa League\. ’In Spain too there are clubs at the very top level who are tracking him\. Real Madrid? We’ll see\. ’We have not made a definitive decision, but in any case he will not accept another loan move elsewhere\.’ Neto, who represented Brazil at the London 2012 Olympics but has not featured for the senior side, was warned against joining a club as a No 2 by national coach Dunga\. Neto joined Fiorentina from Atletico Paranaense in 2011 and established himself as No1 in the last two seasons\.Fiorentina goalkeeper Neto has been linked with Liverpool and Arsenal\. Neto joined Firoentina from Brazilian outfit Atletico Paranaense in 2011\. He is also wanted by PSG and Spanish clubs, according to his agent\. CLICK HERE for the latest Liverpool news\.
## IIIDataset

In this work, we utilize the CNN/DailyMail news summarization dataset, a widely used benchmark for supervised text summarization tasks\. The dataset consists of news articles collected from the CNN and Daily Mail websites, each paired with a set of human\-written highlights that serve as reference summaries\. In the summarization setting, these highlight sentences are concatenated to form the target summary corresponding to each article\.\[[22](https://arxiv.org/html/2606.05494#bib.bib21)\]

The dataset contains more than 300,000 article–summary pairs written by professional journalists\. Each article typically ranges between 500 and 800 words, while the associated summaries usually consist of 3–5 sentences that capture the key information of the news story\. Each sample in the dataset includes two main fields:Article, which contains the full text of the news article, andHighlights, which contains the corresponding summary written by the article author\. Table[I](https://arxiv.org/html/2606.05494#S2.T1)presents example sample from the dataset to illustrate the structure of the article–summary pairs used for training and evaluation\. Following the standard configuration, the dataset is divided into training, validation, and test splits\. The training set contains 287,113 samples, the validation set contains 13,368 samples, and the test set contains 11,490 samples\.

The CNN/DailyMail dataset was originally introduced for machine reading comprehension and question answering, but later versions were adapted for abstractive summarization tasks by using the article highlights as reference summaries\. Due to its large scale and high\-quality journalistic summaries, it has become one of the most commonly used benchmarks for evaluating neural summarization models\.

## IVMethodology

This study proposes a multi\-model adaptive summarization framework designed to improve the quality of abstractive summaries by leveraging multiple transformer\-based language models and automatically selecting the most informative output\. Unlike traditional approaches that rely on a single summarization model, the proposed framework employs multiple fine\-tuned models to generate candidate summaries and then selects the most suitable summary using an automatic evaluation mechanism\.

The proposed architecture consists of five main stages: dataset preparation, model fine\-tuning, input preprocessing, multi\-model summarization, and adaptive evaluation and selection\.

### IV\-AModel Fine\-Tuning

To improve domain adaptation and summarization performance, the pretrained models are fine\-tuned on the training dataset\. Three transformer\-based models are used in this framework:

ℳ=\{M1,M2,M3\}\\mathcal\{M\}=\\\{M\_\{1\},M\_\{2\},M\_\{3\}\\\}\(1\)
whereM1M\_\{1\}represents T5\-small,M2M\_\{2\}represents PEGASUS\-xsum, andM3M\_\{3\}represents LED\-base\.

The T5\-small and PEGASUS\-xsum models are fine\-tuned using a standard sequence\-to\-sequence training objective\. During training, the article text is used as input and the reference summary is used as the target output\. Tokenization is applied using the corresponding tokenizer for each model, and the models learn to generate summaries that align with the reference highlights\.

For the LED\-base model, parameter\-efficient fine\-tuning is applied using Low\-Rank Adaptation \(LoRA\)\. Instead of updating all model parameters, LoRA introduces trainable low\-rank matrices into specific attention layers of the transformer architecture\. Let the adapted model be denoted as

M3′=LoRA​\(M3\)M\_\{3\}^\{\\prime\}=\\text\{LoRA\}\(M\_\{3\}\)\(2\)
where only the injected LoRA parameters are trained while the base model parameters remain frozen\. This approach reduces the number of trainable parameters while maintaining effective summarization performance\.

![Refer to caption](https://arxiv.org/html/2606.05494v1/Multiii.png)Figure 1:Overview of the proposed multi\-model adaptive summarization framework\.
### IV\-BInput Preprocessing

Given an input articleAA, the text is first tokenized and truncated to ensure compatibility with the maximum input length supported by the models\. In the case of T5\-small and LED\-base, task\-specific prompts are appended to guide the summarization process\.

Let the preprocessing function be denoted as

A′=f​\(A\)A^\{\\prime\}=f\(A\)\(3\)
whereA′A^\{\\prime\}represents the processed input sequence that is fed into the summarization models\.

### IV\-CMulti\-Model Summarization

After fine\-tuning, the processed articleA′A^\{\\prime\}is passed to the three summarization models to generate candidate summaries as illustrated in Figure[1](https://arxiv.org/html/2606.05494#S4.F1)\. Each model independently produces a summary for the same input article\.

Formally, the summary generated by modelMiM\_\{i\}is defined as

Si=Mi​\(A′\)S\_\{i\}=M\_\{i\}\(A^\{\\prime\}\)\(4\)
whereSiS\_\{i\}denotes the candidate summary produced by modelMiM\_\{i\}\. This process results in a set of candidate summaries:

𝒮=\{S1,S2,S3\}\\mathcal\{S\}=\\\{S\_\{1\},S\_\{2\},S\_\{3\}\\\}\(5\)
Each summary is generated using a maximum generation length of 128 tokens\.

### IV\-DAutomatic Evaluation

To assess the quality of the generated summaries, three complementary evaluation metrics are used: ROUGE\-L, BLEU and BERTScore\. ROUGE\-L measures lexical overlap between the generated summary and the article, BLEU measures the precision of n\-gram overlap between the generated summary and the article, while BERTScore measures semantic similarity using contextual embeddings\.

LetR​\(Si\)R\(S\_\{i\}\)denote the ROUGE\-L score of summarySiS\_\{i\},B​L​\(Si\)BL\(S\_\{i\}\)denote the BLEU score, andB​S​\(Si\)BS\(S\_\{i\}\)denote the BERTScore F1 value\. A combined evaluation score is computed as

S​c​o​r​e​\(Si\)=R​\(Si\)\+B​L​\(Si\)\+B​S​\(Si\)3Score\(S\_\{i\}\)=\\frac\{R\(S\_\{i\}\)\+BL\(S\_\{i\}\)\+BS\(S\_\{i\}\)\}\{3\}\(6\)
This combined score captures both surface\-level textual overlap and deeper semantic similarity between the generated summaries and articles\.

### IV\-EAdaptive Selection

The final stage of the framework selects the most informative summary among the candidate outputs\. LetS∗S^\{\*\}denote the final selected summary\. The adaptive selection mechanism chooses the summary with the highest evaluation score:

S∗=arg⁡maxSi∈𝒮⁡S​c​o​r​e​\(Si\)S^\{\*\}=\\arg\\max\_\{S\_\{i\}\\in\\mathcal\{S\}\}Score\(S\_\{i\}\)\(7\)
By selecting the best\-performing summary from multiple fine\-tuned models, the framework leverages the complementary strengths of different transformer architectures\. This adaptive strategy improves robustness and increases the likelihood of producing high\-quality summaries across diverse input articles\.

TABLE II:Performance comparison between recent related works on CNN/DailyMail dataset and\(MASF\)

## VResults and Discussion

The experimental results presented in Tables[III](https://arxiv.org/html/2606.05494#S5.T3),[IV](https://arxiv.org/html/2606.05494#S5.T4)together with the corresponding bar charts in Figures[2](https://arxiv.org/html/2606.05494#S5.F2)and[3](https://arxiv.org/html/2606.05494#S5.F3), demonstrate the effectiveness of the proposedMulti\-Model Adaptive Summarization Framework \(MASF\)in comparison with individual transformer\-based models\.

TABLE III:Performance comparison between baseline models and\(MASF\)In the baseline setting,MASFachieves the highest BERTScore of 87\.07%, outperforming LED\-base \(86\.33%\), T5\-small \(86\.31%\), and PEGASUS\-xsum \(86\.36%\)\. Although T5\-small records slightly higher ROUGE\-L and BLEU scores of 25\.19% and 7\.75%, respectively, compared with 24\.30% and 7\.40% forMASF, the proposed framework maintains a highly competitive overall average score of39\.59%, which is very close to the best\-performing individual baseline model at 39\.75%\. As illustrated in the baseline bar chart, the proposed framework exhibits a more balanced distribution across the evaluation metrics, reflecting stronger consistency in summary quality\.

![Refer to caption](https://arxiv.org/html/2606.05494v1/base_chart.png)Figure 2:Visual Comparison between baseline models and MASFA more substantial improvement is observed in the fine\-tuned setting, whereMASFachieves the best performance across all evaluation metrics\. Specifically, it records a BERTScore of 88\.63%, a ROUGE\-L score of 32\.75, and a BLEU score of 16\.00%, resulting in the highest average score of 45\.80%\. Compared with the strongest individual fine\-tuned model, T5\-small, which achieves an average score of 42\.79%, the proposed framework improves the overall performance by 3\.01%\. This improvement is particularly evident in the BLEU metric, whereMASFsignificantly outperforms PEGASUS\-xsum \(11\.55%\), T5\-small \(12\.46%\), and LED\-base \(10\.00%\)\. The corresponding bar chart further reinforces this result by showing thatMASFspans the largest area among all compared models, indicating superior balance between semantic similarity and lexical alignment\.

When compared with recent related works, as shown in Table[II](https://arxiv.org/html/2606.05494#S4.T2), the proposed framework achieves a BERTScore of 88\.63%, which is higher than all reported models in the comparison, including BRIO \(87\.10%\) and text\-davinci\-003 \(86\.80%\)\. In terms of the overall average score,MASFachieves 45\.80%, outperforming several strong recent approaches such as GPT3\-D2 \(40\.19%\) and T0 \(43\.05%\), while remaining close to PEGASUS\-xsum from\[[7](https://arxiv.org/html/2606.05494#bib.bib14)\]\(45\.11%\) and BRIO \(46\.00%\)\. Although text\-davinci\-003 reports a higher average score of 53\.75%, the proposed framework demonstrates stronger consistency across all three evaluation metrics without relying on a large model\.

TABLE IV:Performance comparison between fine\-tuned models and\(MASF\)![Refer to caption](https://arxiv.org/html/2606.05494v1/fine_chart.png)Figure 3:Visual Comparison between fine\-tuned models and MASFOverall, the results confirm that the proposedMASFprovides consistently strong summarization performance, with the most notable gains observed after fine\-tuning\. The framework not only surpasses the individual constituent models but also remains highly competitive with recent approaches, demonstrating its effectiveness in generating robust and high\-quality abstractive summaries against Large Language Models \(LLMs\)\.

## VIConclusion

This study presented a multi\-model adaptive summarization framework designed to improve the quality and robustness of news text summarization on the CNN/DailyMail dataset\. Unlike traditional approaches that rely on a single summarization model, the proposed framework integrates multiple transformer\-based models to generate several candidate summaries for each input article\. An automatic evaluation mechanism based on ROUGE\-L, BLEU, and BERTScore is then used to enable the system to select the most informative summary through an adaptive selection strategy\.

Experimental results demonstrate that the proposed framework consistently outperforms the individual baseline and fine\-tuned summarization models in terms of overall summarization quality\. By leveraging the complementary strengths of different transformer architectures, the framework produces more reliable and coherent summaries across diverse news articles\. The adaptive selection mechanism enables the system to dynamically identify the most suitable candidate summary for each input article, thereby improving the robustness and consistency of the generated outputs compared with single\-model approaches\. Furthermore, when compared with recent related works, the proposed framework achieves superior performance over several competitive large models, including GPT3\-D2 and T0, and remains highly competitive with strong approaches such as PEGASUS\-xsum and BRIO\. In particular, it achieves the highest BERTScore among all compared methods, highlighting its strong semantic alignment and contextual fidelity in generated summaries\.

Future work may explore the integration of more advanced evaluation metrics, including reference\-free and human\-aligned evaluation methods, to improve the reliability of the selection mechanism\. Furthermore, incorporating additional large language models and exploring lightweight model adaptation techniques could further enhance the scalability and effectiveness of the proposed framework for large\-scale summarization tasks\.

## References

- \[1\]E\. Aloraini, A\. Hamdi, E\. Elmahjub,et al\.SummFactScore: a claim\-centric framework forreference\-free factual consistency evaluation inlong\-document summarization\.Ali and Elmahjub, Ezieddin, SummFactScore: A Claim\-Centric Framework forReference\-Free Factual Consistency Evaluation inLong\-Document Summarization\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p3.1)\.
- \[2\]E\. Aloraini, H\. Kassab, A\. Hamdi, and K\. Shaban\(2025\)LexiSem: a re\-ranker balancing lexical and semantic quality for enhanced abstractive summarization\.Neurocomputing650,pp\. 130816\.External Links:ISSN 0925\-2312,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.neucom.2025.130816)Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[3\]L\. Basyal and M\. Sanghvi\(2023\)Text summarization using large language models: a comparative study of mpt\-7b\-instruct, falcon\-7b\-instruct, and openai chat\-gpt models\.External Links:2310\.10449,[Link](https://arxiv.org/abs/2310.10449)Cited by:[§I](https://arxiv.org/html/2606.05494#S1.p2.1),[§I](https://arxiv.org/html/2606.05494#S1.p4.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.1.1.2),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.12.9.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.2.2)\.
- \[4\]M\. Burukanli and D\. Ari\(2025\-12\)DAMB: a dynamic adaptive multi\-model benchmarking framework for abstractive text summarization\.pp\.\.Cited by:[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.13.10.1)\.
- \[5\]W\. S\. El\-Kassas, C\. R\. Salama, A\. A\. Rafea, and H\. K\. Mohamed\(2021\)Automatic text summarization: a comprehensive survey\.Expert systems with applications165,pp\. 113679\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[6\]M\. Elewa, A\. Hamdi, H\. Kassab, and K\. Shaban\(2025\)Balancing factual consistency and diversity in abstractive summarization via model\-agnostic composite reranking\.In2025 IEEE/ACS 22nd International Conference on Computer Systems and Applications \(AICCSA\),pp\. 1–8\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[7\]T\. Goyal, J\. J\. Li, and G\. Durrett\(2023\)News summarization and evaluation in the era of gpt\-3\.External Links:2209\.12356,[Link](https://arxiv.org/abs/2209.12356)Cited by:[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.10.7.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.11.8.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.8.5.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.9.6.1),[§V](https://arxiv.org/html/2606.05494#S5.p4.1)\.
- \[8\]A\. Hamdi, H\. Kassab, M\. Bahaa, and M\. Mohamed\(2024\)Riro: reshaping inputs, refining outputs unlocking the potential of large language models in data\-scarce contexts\.InThe International Conference of Advanced Computing and Informatics,pp\. 69–79\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[9\]M\. Jain and H\. Rastogi\(2020\)Automatic text summarization using soft\-cosine similarity and centrality measures\.In2020 4th International Conference on Electronics, Communication and Aerospace Technology \(ICECA\),Vol\.,pp\. 1021–1028\.External Links:[Document](https://dx.doi.org/10.1109/ICECA49313.2020.9297583)Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p2.1)\.
- \[10\]A\. Joshi, E\. Fidalgo, E\. Alegre, and R\. Alaiz\-Rodriguez\(2022\)RankSum—an unsupervised extractive text summarization based on rank fusion\.Expert Systems with Applications200,pp\. 116846\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p2.1)\.
- \[11\]A\. Kovačević and D\. Kečo\(2021\)Bidirectional lstm networks for abstractive text summarization\.InInternational Symposium on Innovative and Interdisciplinary Applications of Advanced Technologies,pp\. 281–293\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p3.1)\.
- \[12\]M\. T\. R\. Laskar, E\. Hoque, and J\. X\. Huang\(2022\)Domain adaptation with pre\-trained transformers for query\-focused abstractive text summarization\.Computational Linguistics48\(2\),pp\. 279–320\.Cited by:[§I](https://arxiv.org/html/2606.05494#S1.p2.1),[§II](https://arxiv.org/html/2606.05494#S2.p4.1)\.
- \[13\]D\. Liu and V\. Demberg\(2023\)ChatGPT vs human\-authored text: insights into controllable text summarization and sentence style transfer\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 4: Student Research Workshop\),pp\. 1–18\.Cited by:[§I](https://arxiv.org/html/2606.05494#S1.p1.1),[§I](https://arxiv.org/html/2606.05494#S1.p3.1),[§II](https://arxiv.org/html/2606.05494#S2.p4.1)\.
- \[14\]A\. N\. Vora, R\. M\. Jain, A\. S\. Shah, and S\. Sonawane\(2024\-12\)Extractive summarization using extended TextRank algorithm\.InProceedings of the 21st International Conference on Natural Language Processing \(ICON\),S\. Lalitha Devi and K\. Arora \(Eds\.\),AU\-KBC Research Centre, Chennai, India,pp\. 462–471\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p2.1)\.
- \[15\]K\. Rani Krishna, K\. Somasundaram, P\. Arulmozhivarman, S\. A\. Immanuel, and E\. Rajkumar\(2025\)Deep learning for text summarization using nlp for automated news digest\.Scientific Reports15\(1\),pp\. 36343\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p4.1),[§II](https://arxiv.org/html/2606.05494#S2.p5.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.4.1.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.5.2.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.6.3.1),[TABLE II](https://arxiv.org/html/2606.05494#S4.T2.2.7.4.1)\.
- \[16\]V\. Rennard, G\. Shang, J\. Hunter, and M\. Vazirgiannis\(2023\-07\)Abstractive meeting summarization: a survey\.Transactions of the Association for Computational Linguistics11,pp\. 861–884\.External Links:ISSN 2307\-387X,[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00578)Cited by:[§I](https://arxiv.org/html/2606.05494#S1.p3.1),[§II](https://arxiv.org/html/2606.05494#S2.p4.1)\.
- \[17\]M\. F\. Salchner and A\. Jatowt\(2022\-10\)A survey of automatic text summarization using graph neural networks\.InProceedings of the 29th International Conference on Computational Linguistics,N\. Calzolari, C\. Huang, H\. Kim, J\. Pustejovsky, L\. Wanner, K\. Choi, P\. Ryu, H\. Chen, L\. Donatelli, H\. Ji, S\. Kurohashi, P\. Paggio, N\. Xue, S\. Kim, Y\. Hahm, Z\. He, T\. K\. Lee, E\. Santus, F\. Bond, and S\. Na \(Eds\.\),Gyeongju, Republic of Korea,pp\. 6139–6150\.External Links:[Link](https://aclanthology.org/2022.coling-1.536/)Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[18\]A\. Scherbakov, L\. Whittle, R\. Kumar, S\. Singh, M\. Coleman, and E\. Vylomova\(2021\-06\)Anlirika: an LSTM–CNN flow twister for spoken language identification\.InProceedings of the Third Workshop on Computational Typology and Multilingual NLP,E\. Vylomova, E\. Salesky, S\. Mielke, G\. Lapesa, R\. Kumar, H\. Hammarström, I\. Vulić, A\. Korhonen, R\. Reichart, E\. M\. Ponti, and R\. Cotterell \(Eds\.\),Online,pp\. 145–148\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.sigtyp-1.14)Cited by:[§I](https://arxiv.org/html/2606.05494#S1.p3.1)\.
- \[19\]H\. Shakil, A\. Farooq, and J\. Kalita\(2024\)Abstractive text summarization: state of the art, challenges, and improvements\.Neurocomputing603,pp\. 128255\.External Links:ISSN 0925\-2312,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.neucom.2024.128255)Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[20\]R\. Srivastava, P\. Singh, K\. Rana, and V\. Kumar\(2022\)A topic modeled unsupervised approach to single document extractive text summarization\.Knowledge\-Based Systems246,pp\. 108636\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p2.1)\.
- \[21\]D\. Van Veen, C\. Van Uden, L\. Blankemeier, J\. Delbrouck, A\. Aali, C\. Bluethgen, A\. Pareek, M\. Polacin, E\. P\. Reis, A\. Seehofnerová,et al\.\(2024\)Adapted large language models can outperform medical experts in clinical text summarization\.Nature medicine30\(4\),pp\. 1134–1142\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p5.1)\.
- \[22\]J\. Wang, F\. Meng, D\. Zheng, Y\. Liang, Z\. Li, J\. Qu, and J\. Zhou\(2022\)A survey on cross\-lingual summarization\.Transactions of the Association for Computational Linguistics10,pp\. 1304–1323\.Cited by:[§III](https://arxiv.org/html/2606.05494#S3.p1.1)\.
- \[23\]J\. Xu, Z\. Gan, Y\. Cheng, and J\. Liu\(2020\-07\)Discourse\-aware neural extractive text summarization\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 5021–5031\.External Links:[Link](https://aclanthology.org/2020.acl-main.451/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.451)Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p3.1)\.
- \[24\]Y\. Zhang, H\. Jin, D\. Meng, J\. Wang, and J\. Tan\(2025\)A comprehensive survey on automatic text summarization with exploration of llm\-based methods\.Neurocomputing,pp\. 131928\.Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[25\]Y\. Zhang, A\. Ni, Z\. Mao, C\. H\. Wu, C\. Zhu, B\. Deb, A\. Awadallah, D\. Radev, and R\. Zhang\(2022\)Summn: a multi\-stage summarization framework for long input dialogues and documents\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 1592–1604\.Cited by:[§I](https://arxiv.org/html/2606.05494#S1.p1.1),[§I](https://arxiv.org/html/2606.05494#S1.p2.1),[§II](https://arxiv.org/html/2606.05494#S2.p1.1)\.
- \[26\]M\. Zhong, P\. Liu, Y\. Chen, D\. Wang, X\. Qiu, and X\. Huang\(2020\-07\)Extractive summarization as text matching\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 6197–6208\.External Links:[Link](https://aclanthology.org/2020.acl-main.552/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.552)Cited by:[§II](https://arxiv.org/html/2606.05494#S2.p4.1)\.

Similar Articles

Learning to summarize with human feedback

OpenAI Blog

OpenAI demonstrates a technique for improving language model summarization by training a reward model on human preferences and fine-tuning models with reinforcement learning, achieving significant quality improvements that generalize across datasets. This work advances model alignment through human feedback at scale, with applications beyond summarization.

Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

arXiv cs.CL

This paper presents SSAS (Syntactic & Semantic Context Assessment Summarization), a framework designed to improve consistency in LLM-based sentiment prediction by reducing noise and variance through hierarchical classification and iterative summarization. Empirical evaluation on three industry-standard datasets shows up to 30% improvement in data quality and reliability for enterprise decision-making.