Headlines You Won't Forget: Can Pronoun Insertion Increase Memorability?

arXiv cs.CL Papers

Summary

Experimental study shows inserting first/second-person pronouns into headlines has mixed effects on human memorability and that LLMs often produce inaccurate or unnatural revisions.

arXiv:2604.19189v1 Announce Type: new Abstract: For news headlines to influence beliefs and drive action, relevant information needs to be retained and retrievable from memory. In this probing study we draw on experiment designs from cognitive psychology to examine how a specific linguistic feature, namely direct address through first- and second-person pronouns, affects memorability and to what extent it is feasible to use large language models for the targeted insertion of such a feature into existing text without changing its core meaning. Across three controlled memorization experiments with a total of 240 participants, yielding 7,680 unique memory judgments, we show that pronoun insertion has mixed effects on memorability. Exploratory analyses indicate that effects differ based on headline topic, how pronouns are inserted and their immediate contexts. Additional data and fine-grained analysis is needed to draw definitive conclusions on these mediating factors. We further show that automatic revisions by LLMs are not always appropriate: Crowdsourced evaluations find many of them to be lacking in content accuracy and emotion retention or resulting in unnatural writing style. We make our collected data available for future work.
Original Article
View Cached Full Text

Cached at: 04/22/26, 08:30 AM

# Can Pronoun Insertion Increase Memorability?
Source: [https://arxiv.org/html/2604.19189](https://arxiv.org/html/2604.19189)
## Headlines You Won’t Forget: Can Pronoun Insertion Increase Memorability?

###### Abstract

For news headlines to influence beliefs and drive action, relevant information needs to be retained and retrievable from memory\. In this probing study we draw on experiment designs from cognitive psychology to examine how a specific linguistic feature, namely direct address through first\- and second\-person pronouns, affects memorability and to what extent it is feasible to use large language models for the targeted insertion of such a feature into existing text without changing its core meaning\. Across three controlled memorization experiments with a total of 240 participants, yielding 7,680 unique memory judgments, we show that pronoun insertion has mixed effects on memorability\. Exploratory analyses indicate that effects differ based on headline topic, how pronouns are inserted and their immediate contexts\. Additional data and fine\-grained analysis is needed to draw definitive conclusions on these mediating factors\. We further show that automatic revisions by LLMs are not always appropriate: Crowdsourced evaluations find many of them to be lacking in content accuracy and emotion retention or resulting in unnatural writing style\. We make our collected[data](https://zenodo.org/records/19254945)available for future work\.

Keywords:News Memorability, LLM\-based Text Editing, Cognitive Psychology

\\NAT@set@cites

Headlines You Won’t Forget: Can Pronoun Insertion Increase Memorability?

Abstract content

## 1\. Introduction

News research in NLP is often related to boosting engagement of news articles, as formalized by behavioural data such as article dwell timeDavoudiet al\.\([2019](https://arxiv.org/html/2604.19189#bib.bib7)\), likes, retweets, quotes, or repliesGopalakrishna Pillaiet al\.\([2025](https://arxiv.org/html/2604.19189#bib.bib8)\); Parket al\.\([2021](https://arxiv.org/html/2604.19189#bib.bib2)\)\. This includes work on generating or editing suitable headlines or social media posts and mitigating the impact of misinformationSrbaet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib9)\)\.

However, far less is known about how users process and retain news content, an equally critical factor in shaping belief and behaviour\. Memorability plays a key role here: what users remember can influence what they believe and share\. This is especially relevant in the age of generative AI, which has the potential to accelerate the production and spread of persuasive, yet misleading contentSpitaleet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib1)\); Bashardoustet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib3)\); Garryet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib4)\)\. Cognitive psychology, particularly the illusory truth effect, suggests that repetition alone can enhance perceived truthfulness and increase the likelihood of information being sharedPennycooket al\.\([2018](https://arxiv.org/html/2604.19189#bib.bib5)\); Vellaniet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib6)\), highlighting the importance of understanding other factors, such as linguistic characteristics, that can shape memory\.

![Refer to caption](https://arxiv.org/html/2604.19189v1/images/ARR_figure.png)Figure 1:An overview of our experiment design\. We ask humans and LLMs to insert first and second person pronouns in pre\-existing news headlines\. Participants are then shown headlines for a short time with the goal of memorizing them\. In the examples shown, pronoun insertion considerably boosted recognition and recall\.While psychological drivers of belief in fake news have received considerable attention in cognitive psychology\(e\.g\. see Pennycook and Rand,[2021](https://arxiv.org/html/2604.19189#bib.bib10)\), linguistic aspects that drive memorability of true or reputable news have largely been overlooked \(cf\. §[2](https://arxiv.org/html/2604.19189#S2)\)\. In this paper, we address this gap through a preliminary set of experiments that focus on the memorability of news headlines\. In the course of this, we also explore LLMs’ capabilities to manipulate news headlines, in the form of directly addressing readers, to make them more memorable\. Our experiments test whether minor, targeted edits affect memory in terms ofrecognizingandrecallingheadlines, and whether LLMs can reliably implement relevant edits without distorting the original meaning \(§[3](https://arxiv.org/html/2604.19189#S3)\)\. Our results show that pronoun insertion has mixed effects on memorability and LLM revisions are not fully reliable \(§[4](https://arxiv.org/html/2604.19189#S4)\)\.

In summary, our contributions are two\-fold: we test LLMs on a linguistically motivated paraphrasing task and we measure downstream effects in memorization studies using experimental methods from cognitive psychology\.

## 2\. Related Work

Our work is related to text style transfer in that we manipulate one dimension of a text while preserving its core meaningMukherjeeet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib11)\), but differs in that we focus on a targeted manipulation rather than changing the overall style of a piece of text\. Prior research has shown that despite LLMs’ generally impressive capabilities across many NLP tasksWeiet al\.\([2022](https://arxiv.org/html/2604.19189#bib.bib12)\), even large LLMs often fail at simple tasks on which humans achieve perfect performance, such as writing sentences that contain a specific word, word unscrambling, or sentence editingEfratet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib13)\); Zhang and He \([2024](https://arxiv.org/html/2604.19189#bib.bib14)\)\. Fine\-tuned models, even small ones, have been shown to outperform much larger base models on narrow text editing tasks, such as grammar correctionRahejaet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib15)\), whereas zero\- and few\-shot prompting has been shown to lead to inconsistent performance in text style transfer tasks, including language detoxification and sentiment transferMukherjeeet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib11)\), highlighting the continued importance of training data in such tasks\.

Specifically focusing on news rewriting and headline generation,Gopalakrishna Pillaiet al\.\([2025](https://arxiv.org/html/2604.19189#bib.bib8)\)explore different prompting strategies to rewrite news tweets to be more formal, casual, or factual, focusing on increasing predicted engagement,Aoet al\.\([2021](https://arxiv.org/html/2604.19189#bib.bib16)\)introduce a dataset of personalized headlines based on user preferences and candidate articles, andChenet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib17)\)work on methods to leverage clickbaiting techniques, while keeping content faithful to increase reading interest and promote real information\.

Beyond NLP\-focused work, our approach is informed by findings from psychology, psycholinguistics and marketing showing that direct address and pronoun choice can influence memory even when propositional content is unchanged\. For instance,Symons and Johnson \([1997](https://arxiv.org/html/2604.19189#bib.bib18)\)discuss how information framed in relation to the self is more memorable,Brunyéet al\.\([2011](https://arxiv.org/html/2604.19189#bib.bib19)\)show that second\-person constructions induce stronger reader involvement, andCruzet al\.\([2017](https://arxiv.org/html/2604.19189#bib.bib20)\)suggest a robust effect of second person pronouns on consumer outcomes\. Related to news memorability,Lutzet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib21)\)previously found different linguistic cues to affect cognitive and affective processing andPeñaet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib22)\)show that tweet\-style texts are generally more memorable than news headlines\. Also related to our work are studies byClarket al\.\([2026](https://arxiv.org/html/2604.19189#bib.bib23)\)and others on sentence recognition, which however do not take into account recall \(i\.e\., the accessibility in memory in the absence of any retrieval cues\)\. In contrast to this, we follow a common approach in cognitive psychology that provides a broader picture on memory by including measures for both recognition and recall\(e\.g\. MacLeod and Kampe,[1996](https://arxiv.org/html/2604.19189#bib.bib36); Unsworth and Brewer,[2009](https://arxiv.org/html/2604.19189#bib.bib37)\)\.

## 3\. Methods

We performed a linguistic analysis on[Peñaet al\.](https://arxiv.org/html/2604.19189#bib.bib22)’s data\. The findings suggest that personal pronouns help distinguish highly memorable content from less memorable items\. To test whether this effect holds with headlines alone, we conducted a pilot study using topic\-balanced headlines with and without pronouns\. The results indicated that headlines with first and second person pronouns tend to be more memorable\.111See Appendix[A](https://arxiv.org/html/2604.19189#A1)for details on our analysis of[Peñaet al\.](https://arxiv.org/html/2604.19189#bib.bib22)’s data and pilot study\.Building on this insight, we explore the capabilities of a range of LLMs to insert such pronouns into real news headlines, without changing the content of the original headline or resulting in an unnatural writing style\. Upon asserting the quality of the manipulated headlines, we conduct between\-subject user studies to identify the effect of this specific linguistic change on memorability, in the absence of other discriminating factors\. Overall, we run three memory studies, each informed by results of the preceding study\.

### 3\.1\. Memory Studies

Our memory study design is based on established study structures from the field of cognitive psychology\(e\.g\. see Peñaet al\.,[2023](https://arxiv.org/html/2604.19189#bib.bib22); Abel and Bäuml,[2023](https://arxiv.org/html/2604.19189#bib.bib24)\)and consists of five phases:

- •Presentation Phase\.After reading and agreeing to the informed consent, participants view a fixed number of news headlines for 10 seconds each in random order, with no additional content shown\. They are instructed to memorize them\.
- •Distraction Phase\.Participants view and react to unrelated images for 60 seconds to reduce potential recency effects\.
- •Recall Phase\.Participants freely recall and write down as many headlines as possible, aiming for exact wording\. They are encouraged to spend at least 5 minutes on this task\. If participants try to move to the next phase early, the system prompts them to take more time up to two times\. After that, they may proceed even if less than 5 minutes have passed\.
- •Recognition Phase\.All headlines of the presentation phase plus an equal number of unseen distractor headlines are shown in random order\. For each headline, participants are asked to indicate whether they have seen the headline in the presentation phase\.
- •Truth Judgement Phase\.In addition to recognition and recall, we also measured perceived truthfulness\. To this end, all headlines shown in the recognition phase are presented again in random order\. Participants indicate how false or true they personally believe the headline to be on a 7\-point likert\-scale scale ranging from “definitely false” to “definitely true”\.

HeadlineVersionRecognition RateRecall RateLLM Revisions which increased likelihood of recall and recognitionAs the World Warms, Extreme Rain Is Becoming Even More ExtremeOriginal73\.3426\.67AreYouPrepared for the Dramatic Increase in Extreme Rain as Earth Warms?LLM Revision86\.6746\.67AsOurWorld Warms, Extreme Rain is Becoming Even More ExtremeHuman revision76\.6733\.34Study finds no link between aluminum in vaccines and autism, asthmaOriginal76\.6743\.34Autism and Asthma: How a New Study Confirms No Connection to Aluminum inYourVaccinesLLM Revision90\.0043\.34Insurers Are Deserting Homeowners as Climate Shocks WorsenOriginal80\.0020\.00IsYourHome Insurance at Risk as Climate Shocks Intensify?LLM Revision86\.6730\.00LLM\-Revisions which decreased likelihood of recall and recognitionFrom Food Aid to Dog Chow? How Trump’s Cuts Hurt Kansas Farmers\.Original83\.3436\.67YourKansas Farmers Are Suffering: Trump’s Cuts Lead from Food Aid to Dog ChowLLM Revision63\.3426\.67From Food Aid to Dog Chow? How Trump’s Cuts HurtOurKansas Farmers\.Human Revision96\.6736\.67Téa Leoni and Tim Daly Marry in Intimate New York WeddingOriginal83\.3446\.67YourInside Look at Téa Leoni and Tim Daly’s Intimate New York WeddingLLM Revision80\.0033\.34Kennedy Family Reunites for Massive Fourth of July CelebrationOriginal90\.0036\.67YouWon’t Believe the Kennedy Family’s Massive Fourth of July Reunion\!LLM Revision83\.3436\.67

Table 1:Examples of headlines and LLM\-revisions with their recognition and recall rates\. Where available, equivalent human revisions are included for comparison\.#### Study Material and Procedure

While all three studies share the same underlying design, they differ in study material, group assignments, and number of participants\. We describe each study separately below\. Across all experiments, each participant group has 30 participants, resulting in a total of 240 participants\. The corresponding headline revision procedures are described in §[3\.2](https://arxiv.org/html/2604.19189#S3.SS2)\.

#### Study I

We collect 32 headlines from 32 major news outlets, with eight headlines for each of four topics: entertainment, politics, environment, and health, excluding those that originally include pronouns\. Using various LLMs, we insert at least one first\- or second\-person pronoun into 16 headlines, collecting quality judgements by 8 annotators for each revision\. We only include LLM revisions judged as both accurate and appropriate by at least 62\.5% of annotators\.

Participants are randomly assigned to one of two groups\. In the presentation phase, each group sees 16 headlines, balanced across topics: 8 original and 8 LLM\-revised with pronouns inserted\. The assignment is counterbalanced: Group A sees one set revised and the other original, while Group B sees the reverse\. In the recognition phase, participants view the 16 headlines they have previously seen plus 16 held\-out new ones which serve as distractor items \(identical across groups\)\. Of these new headlines, 7 are LLM\-revised, ensuring that revised items are not recognized simply due to being the only ones containing pronouns\. This design allows us to isolate the effect of pronoun use on memorability while controlling for content\.

#### Study II

Qualitative insights from study I results indicated that recognition improved when pronouns were organically integrated in the headline\. We hypothesized that humans might achieve this more naturally than LLMs, which often relied on the addition of sentence fragments and clickbaity phrasing \(see Table[1](https://arxiv.org/html/2604.19189#S3.T1)for examples\)\. To investigate this, we include human revisions into our study material: in study II, half of the headlines with pronouns presented in the presentation phase are revised by prolific workers, while the other half is LLM\-revised\. Again, participants are randomly assigned to two counterbalanced groups\.

![Refer to caption](https://arxiv.org/html/2604.19189v1/images/scatterplot_experiment_topic.png)Figure 2:Mean true positive rates of original and revised headlines by topic \(environment, entertainment, politics, health\) in study II\.
#### Study III

Based on results obtained in study II, which indicate strong differences between effects of pronoun insertion across headline topics \(see Figure[2](https://arxiv.org/html/2604.19189#S3.F2)\), we narrow down the selection of headlines to only one topic for study III\. The differences between original headlines and revisions seemed to be strongest for headlines related to politics in study II, leading us to collect 32 new headlines from this topic\. All 32 headlines are paired with a revision with pronouns inserted\. We only use human revisions for this study and all revisions are written by the same person\.

Participants are assigned to four groups with counterbalanced headline sets across presentation and recognition phases\. Groups 1 and 3 view opposite versions of original and revised headlines during presentation, while Groups 2 and 4 see the held\-out distractor sets from Groups 1 and 3\. This design doubles the amount of evaluated headlines and tests whether pronoun use increases false recognition of previously unseen headlines\.

### 3\.2\. Pronoun Insertion

#### Models

We use 8 LLMs of various sizes, including open\-weight and proprietary models, to introduce first or second person pronouns to the collected headlines:gpt\-4o\-mini\-2024\-07\-18,gpt\-4o\-2024\-08\-06,Llama\-3\.1\-8B\-Instruct,Mistral\-7B\-Instruct\-v0\.3,Mixtral\-8x7B\-Instruct\-v0\.1,Qwen3\-32B\(thinking mode enabled\),DeepSeek\-V3\-0324, andDeepSeek\-R1\-0528\. We set the temperature to 0\.3 to allow for a limited amount of creativity and pass the same prompt to each model \(see Appendix[B](https://arxiv.org/html/2604.19189#A2)\)\.

#### Human Evaluation

We collect annotations that assess the accuracy and stylistic appropriateness of revised headlines compared to the originals\. Based on the multidimensional quality metrics frameworkLommelet al\.\([2013](https://arxiv.org/html/2604.19189#bib.bib25)\), we classify accuracy errors as misrepresentations, additions, or omissions, and style issues as grammar errors, awkwardness, or inconsistency\.222See Appendix[C](https://arxiv.org/html/2604.19189#A3)for instructions provided to annotators\.128 annotators recruited through Prolific review each original assigned to them alongside a revision and mark it as inaccurate or inappropriate only if at least one subcategory applies\. They can also note shifts in tone or emotion\. Before participating in annotation, annotators are required to pass a qualification test consisting of four original\-revision pairs\. Each original\-revision pair receives 8 annotations\. Annotators see 2–3 revisions per model and never see two revisions for the same headline\. To compute inter\-annotator agreement \(IAA\), we calculate the average raw agreement across annotators and annotation groups, as well as the mean of Krippendorff’sα\\alphaacross annotator groups\. Acceptance rates for accuracy, style, and emotion retention are determined by the proportion of annotators who rated the accuracy and style as acceptable and did not report any shift in emotion or tone relative to the original headline\. We collect annotations for 232 original–revision pairs derived from 29 seed headlines, yielding 1,856 judgments\.

#### Human Rewriting

In addition to generating LLM revisions, we ask 10 Prolific workers who work in journalism, copywriting, or creative writing, to revise the original headlines\. Participants are provided with instructions that are slightly modified from the prompt given to the LLMs \(see Appendix[D](https://arxiv.org/html/2604.19189#A4)\) and must pass a shortened version of the qualification test used in the LLM revision annotations in order to take part\. Each participant rewrites 15 headlines and may skip headlines they feel are not suitable for pronoun insertion\. We receive between 7 and 10 revisions per original headline\. If two or more participants insert a pronoun in a headline in the same way, it is included in study II \(8 headlines overall\)\.

For study III, revisions for all 32 original headlines are obtained from a graduate student enrolled at the university of this work, who is an English native speaker\. The student received the same instructions as the prolific workers and revisions were checked for appropriateness and faithfulness to the original meaning by the first author of this work\.

## 4\. Results and Discussion

We begin by outlining results regarding different LLMs’ performances at the pronoun insertion task defined above as judged by human annotators\. After this, we elaborate on the results obtained across the three user studies and offer exploratory analyses to identify potential mediating effects between pronoun insertion and headline memorability\.

### 4\.1\. LLM Revisions

As a result of annotation, we observe IAA scores for accuracy, style and emotion retention ofα\\alpha= 0\.19 \(60\.32% raw agreement\),α\\alpha= 0\.08 \(56\.64%\) andα\\alpha=−\-0\.03 \(55\.33%\), respectively, indicating that shifts in emotion between original headlines and revisions are especially subjective or difficult to judge for human annotators\.

#### Differences between LLMs

Table 2:Mean acceptance rates in % per model\. Highest values per category arebolded, second highest values areitalicized, lowest values are displayed inred\.Headlines rewritten by DeepSeek\-reasoning have the highest acceptance rates across all three annotation categories, whereas Qwen displays the worst performance on accuracy, Llama on style, and GPT\-4o\-mini on emotion retention \(see Table[2](https://arxiv.org/html/2604.19189#S4.T2)\)\. While revisions by larger LLMs generally seem to garner higher acceptance rates, this advantage is not consistent\. For instance, the average accuracy acceptance rate for Mistral revisions is 11\.64 and 7\.32 percentage points higher than for Mixtral and DeepSeek\-chat, and only 1\.72 percentage points shy of GPT\-4o’s acceptance rate\. This mirrors existing research on LLM’s failures at solving simple text editing tasks out\-of\-the\-boxEfratet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib13)\); Zhang and He \([2024](https://arxiv.org/html/2604.19189#bib.bib14)\)\.

For the memorization experiment, we select LLM revisions with minimum acceptance rates of 62\.5% \(corresponding to at least 5 out of 8 annotators\) for style and accuracy\. Mean acceptance rates of selected revisions across experiments lie at 81\.9% for style, 80\.6% for accuracy and 65\.13% for emotion retention\.

Table 3:Examples of commonly found error types in LLM revisions of news headlines\. Indicators for each error category areitalicized\.
#### Common Errors in LLM Revisions

We qualitatively examine the LLM revisions with style and accuracy acceptance rates of 50% or less to identify common error types\. Examples for each identified error type are presented in Table[3](https://arxiv.org/html/2604.19189#S4.T3)\. Revisions with low acceptance rates commonly include forms of inappropriate role attribution, which incorrectly frame the reader or writer as an active participant in the headline content\. Other common error types include the addition of hallucinated details not present in the original headline, the insertion of evaluative statements, which introduce an author stance not grounded in the original, and the omission of details from the original\. In some cases, combinations of multiple error types are displayed at the same time\.

### 4\.2\. Memory Experiment

As evaluation measures, we compute the true positive rate for each presented headline and the false positive rate for each distractor based on user inputs collected in the recognition phase\. We additionally calculate recall rate as the frequency of a headline’s appearance in free recall divided by its presentation frequency\.

#### Recall Matching

Recalled items are manually matched to their corresponding headlines and ambiguous cases \(e\.g\., items matching multiple headlines\) are not counted\. For instance, all items containing the wordNASAand no reference to another headline were matched to the headlineNASA Website Will Not Provide Previous National Climate Reportsor its revision, depending on experimental group\. We provide some examples for recalled items and their respective original headlines in Table[5](https://arxiv.org/html/2604.19189#S4.T5)\. This resulted in some items displaying a considerate amount of distortion or lack of detail compared to the headlines seen by participants in the presentation phase\. To account for this, we measure recall distortion using the mean cosine similarity, based on S\-BERT embeddingsReimers and Gurevych \([2019](https://arxiv.org/html/2604.19189#bib.bib35)\), between recalled items and their respective ground truth headline\. High similarity with the original means that participants remembered a headline in detail and correctly, whereas low similarity is an indicator for a participant remembering merely the gist of a headline or even misremembering the content of the headline\.

Table 4:Mean rates of true positive \(TP\) hits and false alarms \(FP\) forrecognitionand averagerecallrates\.Table 5:Examples for items recalled in the free recall phase and their cosine similarity with the corresponding headlines\. While some recalled items are mostly faithful to the original headline shown during the presentation phase, others merely reproduce individual words or a general gist\.Table 6:Results of statistical tests for main measures across the three studies\.
#### Effects of Pronoun Insertion on Memorability

Results for all three studies are summarized in Table[4](https://arxiv.org/html/2604.19189#S4.T4)\. We run significance tests for the main measures on each study separately\. We use two\-tailed independent t\-tests for normally distributed data and Mann\-Whitney U tests were parametric assumptions are violated \(see Table[6](https://arxiv.org/html/2604.19189#S4.T6)\)\. Overall, the effects of pronoun insertion on the evaluation measures considered across the three studies are not significant, indicating that pronouns alone do not systematically affect news headline memorability\.

#### Effects of Pronoun Insertion on Perceived Truthfulness

Mean results for perceived truthfulness given headline version \(original or revision\) and whether it had been presented in the presentation phase or not are provided in Table[7](https://arxiv.org/html/2604.19189#S4.T7)\.

Table 7:Mean truth judgements of original and pronoun\-inserted presentation headlines \(seen\) and distractor items \(unseen\)\.Although all original headlines included in the study were collected from reputable news venues \(NYT, NPR, CNN, Yahoo news, CBS, CNBC, NBC, Washington Post, USA Today, and Forbes\) and only revisions with a high accuracy acceptance rate were included in the study, we reasoned that the introduction of first and second person pronouns might affect truth judgements\. Past work has also found evidence of the illusory truth effect: repetition affects perceived truthfulness of informationPennycooket al\.\([2018](https://arxiv.org/html/2604.19189#bib.bib5)\); Vellaniet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib6)\)\. Like for the main measures, no difference in truth judgements between original and revised headlines was found across studies\. We do observe slight, though statistically insignificant indications of illusory truth effect, meaning that headlines encountered in the presentation phase were considered slightly more truthful on average than headlines first seen in the recognition phase\.

#### Exploratory Analysis and Interpretation

On average, revised headlines were longer and had shorter words \(average character count: 81\.09±21\.37, average word length: 4\.84±0\.58\) compared to original headlines \(67\.78±15\.07, 5\.18±0\.7\)\. To identify to what extent this might impact memorability, we pool all headlines across the three studies and perform a correlation analysis, taking into account recognition and recall rates, cosine similarities of recalled items, headline lengths, and average word lengths \(see Figure[3](https://arxiv.org/html/2604.19189#S4.F3)\)\. We find a significant negative correlation between recall similarity and headline length based on both word and character counts, but recall and recognition rates show no clear correlation with length features\. This indicates that longer headlines tend to be remembered in less detail, but are retrievable and recognizable at similar rates as shorter headlines\. We also find a significant positive correlation between recognition and recall rates, indicating that if a headline can be recognized correctly, it also tends to be accessible in the absence of any retrieval cues \(and vice versa\)\.

![Refer to caption](https://arxiv.org/html/2604.19189v1/images/correlation_metrics_lengths.png)Figure 3:Bonferroni\-corrected Pearson correlations between memory measures and headline length features\. \* denotes significant correlations at p<<0\.05, \*\* at p<<0\.01, and \*\*\* at p<<0\.001\.Qualitative inspection of recognition and recall rates further reveals individual headlines for which the addition of a pronoun clearly increases or decreases recognition\. As shown in the examples provided in Table[1](https://arxiv.org/html/2604.19189#S3.T1), it becomes apparent that revisions that naturally incorporate pronouns—either through restructuring or simple pronoun insertion—show a tendency to improve memorability, while clickbaity or unnatural edits seem to reduce it\.

![Refer to caption](https://arxiv.org/html/2604.19189v1/images/DeltaRecall_by_NounCategory.png)Figure 4:Increases/decreases in recall rate between original and revised headlines for insertions by nominal category, i\.e\. personal pronouns \(“we”, “you”\) and possessives with different nominal heads: Health/social \(e\.g\. “your insurance”, “our babies”\), Economy \(e\.g\. “your dollars”, “our farmers”\) and State/politics \(e\.g\. “your country”, “our election”\)\.Moreover, effects on recall are influenced by the immediate context of the inserted pronoun: in particular, possessive pronouns with social/health\-related nominal heads seem to improve recall, but economic ones do not \(e\.g\. “our babies” vs\. “our farmers”, see Figure[4](https://arxiv.org/html/2604.19189#S4.F4)\)\. A potential reason for this might be that information related to health and social factors has a higher potential to be personally relevant and contain actionable information, compared to political and economic information which is less directly controllable by consumers of news\. Future work could incorporate judgements of personal relevance to verify this\.

## 5\. Conclusion

In this paper, we present computational experiments targeting a specific linguistic change, namely the insertion of first\- and second\-person pronouns into news headlines, along with user studies examining their effects on memorability\. We show that LLMs do not always insert pronouns appropriately, as indicated by crowdsourcing evaluations\. Collectively, our memory studies lead to the conclusion that pronoun insertion in itself has no consistent effect on memorability\. A closer look revealed individual cases of memorability impairment and enhancement, with substantial variation across contexts suggesting a need for more fine\-grained analyses and additional data\. Moving forward, we plan to investigate other linguistic features that may more strongly influence memorability and to expand our evaluation of LLM capabilities in this context\. Ultimately, our goal is to develop computational methods for making news and other information encountered online more memorable while preserving its original content\. We believe that this approach has the potential to boost true, high\-quality information over misinformation, thus complementing other efforts towards the mitigation of the impact of misinformation on society\. To support further research, we release our data, including revision annotations and 7,680 unique memory and truth judgments from three experiments involving 240 participants\.333[https://zenodo\.org/records/19254945](https://zenodo.org/records/19254945)We encourage NLP researchers to consider memorability as a modelling feature alongside engagement and related factors\.

## 6\. Limitations

We identify several limitations in this work, which we describe below:

We are aware of potential interaction effects of other aspects related to pronoun insertion, such as increases in headline length or long words, changes in syntactic structure, or use of loaded language, which might be responsible for some headlines gaining a boost in memorability, while others saw deterioration when pronouns were inserted\. In the future, we plan to develop computational methods to make targeted changes, such as the ones presented, while changing as little as possible of the remaining text\. We also plan to explore more impactful changes to news texts, including headlines, to explore a greater variety of linguistic features which can impact memorability in this context, and run studies on larger scales, including more study items in the process, to increase the robustness of experimental results\. While we are confident that the set of experiments presented here is suitable to conclude that pronoun insertion alone does not consistently boost memorability of news headlines, we are aware that the comparably small number of included headline items limits our ability to explain why some headlines benefit from pronoun insertion, whereas for others it leads to a decrease in memorability\. Consequently, in future studies, we will focus on creating a database large enough to uncover linguistic patterns that interact with changes to increase or decrease memorability\.

In cognitive psychology, experiments as the one described in this paper are often conducted in lab settings\. While using Prolific for data collection yields many advantages, it also increases the likelihood of participants using external tools to help them remember headlines or be exposed to external distractors\. To counteract this, we specifically instructed participants not to use external tools and make sure they are undisturbed for the duration of the study\. We also embedded the news headlines as images instead of text, to prevent participants from copy\-pasting content for later study phases and made participants aware that their compensation does not depend on their performance during the memory tasks at the beginning of the study\.

## 7\. Ethical Considerations

Although previous research has found generative AI less likely to change content and tone of the original message when paraphrasing in formal contexts \(e\.g\. academic, news\) than in informal contextsTriptoet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib34)\), LLM revisions in this context can potentially introduce misinformation and inaccuracies\. To prevent showing misinformation to participants during the study, we only included headlines which were judged as accurate by at least 62\.5% of annotators, with a mean accuracy acceptance rate of 80\.6% over all presented LLM revisions\.

For all data collected on Prolific, participants were compensated in GBP at a hourly rate equivalent to the current minimum wage in the country of this work \(approx\. 11 GBP\)\. Participants who did not pass the qualification test were compensated for their time at the same rate\. This corresponds to more than twice the federal minimum wage in the US, where all study participants and annotators were based\. The student who rewrote the headlines for study III was employed at the university and compensated at an hourly rate in accordance to the official salary scale for research assistants in the country of this work\.

All data was collected anonymously and does not allow conclusions about participants’ identities\. Participants in the memory study explicitly agreed to provided personal information \(e\.g\. age, political orientation, summarized in Appendix[E](https://arxiv.org/html/2604.19189#A5)\) being published in the informed consent before beginning the study\.

## Acknowledgements

We thank Noas Shaalan for support in preparing the data for study III\. We also thank the anonymous CMCL reviewers for their valuable and constructive feedback\.

## 8\. Bibliographical References

- M\. Abel and K\. T\. Bäuml \(2023\)Item\-method directed forgetting and perceived truth of news headlines\.Memory31\(10\),pp\. 1371–1386\.Cited by:[§3\.1](https://arxiv.org/html/2604.19189#S3.SS1.p1.1)\.
- J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.SS0.SSS0.Px3.p1.1)\.
- X\. Ao, X\. Wang, L\. Luo, Y\. Qiao, Q\. He, and X\. Xie \(2021\)PENS: a dataset and generic framework for personalized news headline generation\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(Volume 1: Long Papers\),C\. Zong, F\. Xia, W\. Li, and R\. Navigli \(Eds\.\),Online,pp\. 82–92\.External Links:[Link](https://aclanthology.org/2021.acl-long.7/),[Document](https://dx.doi.org/10.18653/v1/2021.acl-long.7)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p2.1)\.
- A\. Bashardoust, S\. Feuerriegel, and Y\. R\. Shrestha \(2024\)Comparing the willingness to share for human\-generated vs\. ai\-generated fake news\.Proc\. ACM Hum\.\-Comput\. Interact\.8\(CSCW2\)\.External Links:[Link](https://doi.org/10.1145/3687028),[Document](https://dx.doi.org/10.1145/3687028)Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p2.1)\.
- T\. T\. Brunyé, T\. Ditman, C\. R\. Mahoney, and H\. A\. Taylor \(2011\)Better you than i: perspectives and emotion simulation during narrative comprehension\.Journal of Cognitive Psychology23\(5\),pp\. 659–666\.Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- C\. Y\. Chen, D\. Wu, and L\. Ku \(2023\)HonestBait: forward references for attractive but faithful headline generation\.InFindings of the Association for Computational Linguistics: ACL 2023,A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 4810–4824\.External Links:[Link](https://aclanthology.org/2023.findings-acl.296/),[Document](https://dx.doi.org/10.18653/v1/2023.findings-acl.296)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p2.1)\.
- T\. H\. Clark, G\. Tuckute, B\. Medina, and E\. Fedorenko \(2026\)A distinctive meaning makes a sentence memorable\.Journal of Memory and Language146,pp\. 104700\.Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- R\. E\. Cruz, J\. M\. Leonhardt, and T\. Pezzuti \(2017\)Second person pronouns enhance consumer involvement and brand attitude\.Journal of Interactive Marketing39\(1\),pp\. 104–116\.External Links:[Document](https://dx.doi.org/10.1016/j.intmar.2017.05.001),[Link](https://journals.sagepub.com/doi/abs/10.1016/j.intmar.2017.05.001),https://journals\.sagepub\.com/doi/pdf/10\.1016/j\.intmar\.2017\.05\.001Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- H\. Davoudi, A\. An, and G\. Edall \(2019\)Content\-based dwell time engagement prediction model for news articles\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 \(Industry Papers\),A\. Loukina, M\. Morales, and R\. Kumar \(Eds\.\),Minneapolis, Minnesota,pp\. 226–233\.External Links:[Link](https://aclanthology.org/N19-2028/),[Document](https://dx.doi.org/10.18653/v1/N19-2028)Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p1.1)\.
- A\. Efrat, O\. Honovich, and O\. Levy \(2023\)LMentry: a language model benchmark of elementary language tasks\.InFindings of the Association for Computational Linguistics: ACL 2023,A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 10476–10501\.External Links:[Link](https://aclanthology.org/2023.findings-acl.666/),[Document](https://dx.doi.org/10.18653/v1/2023.findings-acl.666)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p1.1),[§4\.1](https://arxiv.org/html/2604.19189#S4.SS1.SSS0.Px1.p1.1)\.
- M\. Garry, W\. M\. Chan, J\. Foster, and L\. A\. Henkel \(2024\)Large language models \(llms\) and the institutionalization of misinformation\.Trends in cognigarry2024largetive sciences\.Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p2.1)\.
- R\. Gopalakrishna Pillai, A\. Fokkens, and W\. van Atteveldt \(2025\)Engagement\-driven persona prompting for rewriting news tweets\.InProceedings of the 31st International Conference on Computational Linguistics,O\. Rambow, L\. Wanner, M\. Apidianaki, H\. Al\-Khalifa, B\. D\. Eugenio, and S\. Schockaert \(Eds\.\),Abu Dhabi, UAE,pp\. 8612–8622\.External Links:[Link](https://aclanthology.org/2025.coling-main.576/)Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p1.1),[§2](https://arxiv.org/html/2604.19189#S2.p2.1)\.
- A\. Grattafiori, A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Vaughan, A\. Yang, A\. Fan, A\. Goyal, A\. Hartshorn, A\. Yang, A\. Mitra, A\. Sravankumar, A\. Korenev, A\. Hinsvark, A\. Rao, A\. Zhang, A\. Rodriguez, A\. Gregerson, A\. Spataru, B\. Roziere, B\. Biron, B\. Tang, B\. Chern, C\. Caucheteux, C\. Nayak, C\. Bi, C\. Marra, C\. McConnell, C\. Keller, C\. Touret, C\. Wu, C\. Wong, C\. C\. Ferrer, C\. Nikolaidis, D\. Allonsius, D\. Song, D\. Pintz, D\. Livshits, D\. Wyatt, D\. Esiobu, D\. Choudhary, D\. Mahajan, D\. Garcia\-Olano, D\. Perino, D\. Hupkes, E\. Lakomkin, E\. AlBadawy, E\. Lobanova, E\. Dinan, E\. M\. Smith, F\. Radenovic, F\. Guzmán, F\. Zhang, G\. Synnaeve, G\. Lee, G\. L\. Anderson, G\. Thattai, G\. Nail, G\. Mialon, G\. Pang, G\. Cucurell, H\. Nguyen, H\. Korevaar, H\. Xu, H\. Touvron, I\. Zarov, I\. A\. Ibarra, I\. Kloumann, I\. Misra, I\. Evtimov, J\. Zhang, J\. Copet, J\. Lee, J\. Geffert, J\. Vranes, J\. Park, J\. Mahadeokar, J\. Shah, J\. van der Linde, J\. Billock, J\. Hong, J\. Lee, J\. Fu, J\. Chi, J\. Huang, J\. Liu, J\. Wang, J\. Yu, J\. Bitton, J\. Spisak, J\. Park, J\. Rocca, J\. Johnstun, J\. Saxe, J\. Jia, K\. V\. Alwala, K\. Prasad, K\. Upasani, K\. Plawiak, K\. Li, K\. Heafield, K\. Stone, K\. El\-Arini, K\. Iyer, K\. Malik, K\. Chiu, K\. Bhalla, K\. Lakhotia, L\. Rantala\-Yeary, L\. van der Maaten, L\. Chen, L\. Tan, L\. Jenkins, L\. Martin, L\. Madaan, L\. Malo, L\. Blecher, L\. Landzaat, L\. de Oliveira, M\. Muzzi, M\. Pasupuleti, M\. Singh, M\. Paluri, M\. Kardas, M\. Tsimpoukelli, M\. Oldham, M\. Rita, M\. Pavlova, M\. Kambadur, M\. Lewis, M\. Si, M\. K\. Singh, M\. Hassan, N\. Goyal, N\. Torabi, N\. Bashlykov, N\. Bogoychev, N\. Chatterji, N\. Zhang, O\. Duchenne, O\. Çelebi, P\. Alrassy, P\. Zhang, P\. Li, P\. Vasic, P\. Weng, P\. Bhargava, P\. Dubal, P\. Krishnan, P\. S\. Koura, P\. Xu, Q\. He, Q\. Dong, R\. Srinivasan, R\. Ganapathy, R\. Calderer, R\. S\. Cabral, R\. Stojnic, R\. Raileanu, R\. Maheswari, R\. Girdhar, R\. Patel, R\. Sauvestre, R\. Polidoro, R\. Sumbaly, R\. Taylor, R\. Silva, R\. Hou, R\. Wang, S\. Hosseini, S\. Chennabasappa, S\. Singh, S\. Bell, S\. S\. Kim, S\. Edunov, S\. Nie, S\. Narang, S\. Raparthy, S\. Shen, S\. Wan, S\. Bhosale, S\. Zhang, S\. Vandenhende, S\. Batra, S\. Whitman, S\. Sootla, S\. Collot, S\. Gururangan, S\. Borodinsky, T\. Herman, T\. Fowler, T\. Sheasha, T\. Georgiou, T\. Scialom, T\. Speckbacher, T\. Mihaylov, T\. Xiao, U\. Karn, V\. Goswami, V\. Gupta, V\. Ramanathan, V\. Kerkez, V\. Gonguet, V\. Do, V\. Vogeti, V\. Albiero, V\. Petrovic, W\. Chu, W\. Xiong, W\. Fu, W\. Meers, X\. Martinet, X\. Wang, X\. Wang, X\. E\. Tan, X\. Xia, X\. Xie, X\. Jia, X\. Wang, Y\. Goldschlag, Y\. Gaur, Y\. Babaei, Y\. Wen, Y\. Song, Y\. Zhang, Y\. Li, Y\. Mao, Z\. D\. Coudert, Z\. Yan, Z\. Chen, Z\. Papakipos, A\. Singh, A\. Srivastava, A\. Jain, A\. Kelsey, A\. Shajnfeld, A\. Gangidi, A\. Victoria, A\. Goldstand, A\. Menon, A\. Sharma, A\. Boesenberg, A\. Baevski, A\. Feinstein, A\. Kallet, A\. Sangani, A\. Teo, A\. Yunus, A\. Lupu, A\. Alvarado, A\. Caples, A\. Gu, A\. Ho, A\. Poulton, A\. Ryan, A\. Ramchandani, A\. Dong, A\. Franco, A\. Goyal, A\. Saraf, A\. Chowdhury, A\. Gabriel, A\. Bharambe, A\. Eisenman, A\. Yazdan, B\. James, B\. Maurer, B\. Leonhardi, B\. Huang, B\. Loyd, B\. D\. Paola, B\. Paranjape, B\. Liu, B\. Wu, B\. Ni, B\. Hancock, B\. Wasti, B\. Spence, B\. Stojkovic, B\. Gamido, B\. Montalvo, C\. Parker, C\. Burton, C\. Mejia, C\. Liu, C\. Wang, C\. Kim, C\. Zhou, C\. Hu, C\. Chu, C\. Cai, C\. Tindal, C\. Feichtenhofer, C\. Gao, D\. Civin, D\. Beaty, D\. Kreymer, D\. Li, D\. Adkins, D\. Xu, D\. Testuggine, D\. David, D\. Parikh, D\. Liskovich, D\. Foss, D\. Wang, D\. Le, D\. Holland, E\. Dowling, E\. Jamil, E\. Montgomery, E\. Presani, E\. Hahn, E\. Wood, E\. Le, E\. Brinkman, E\. Arcaute, E\. Dunbar, E\. Smothers, F\. Sun, F\. Kreuk, F\. Tian, F\. Kokkinos, F\. Ozgenel, F\. Caggioni, F\. Kanayet, F\. Seide, G\. M\. Florez, G\. Schwarz, G\. Badeer, G\. Swee, G\. Halpern, G\. Herman, G\. Sizov, Guangyi, Zhang, G\. Lakshminarayanan, H\. Inan, H\. Shojanazeri, H\. Zou, H\. Wang, H\. Zha, H\. Habeeb, H\. Rudolph, H\. Suk, H\. Aspegren, H\. Goldman, H\. Zhan, I\. Damlaj, I\. Molybog, I\. Tufanov, I\. Leontiadis, I\. Veliche, I\. Gat, J\. Weissman, J\. Geboski, J\. Kohli, J\. Lam, J\. Asher, J\. Gaya, J\. Marcus, J\. Tang, J\. Chan, J\. Zhen, J\. Reizenstein, J\. Teboul, J\. Zhong, J\. Jin, J\. Yang, J\. Cummings, J\. Carvill, J\. Shepard, J\. McPhie, J\. Torres, J\. Ginsburg, J\. Wang, K\. Wu, K\. H\. U, K\. Saxena, K\. Khandelwal, K\. Zand, K\. Matosich, K\. Veeraraghavan, K\. Michelena, K\. Li, K\. Jagadeesh, K\. Huang, K\. Chawla, K\. Huang, L\. Chen, L\. Garg, L\. A, L\. Silva, L\. Bell, L\. Zhang, L\. Guo, L\. Yu, L\. Moshkovich, L\. Wehrstedt, M\. Khabsa, M\. Avalani, M\. Bhatt, M\. Mankus, M\. Hasson, M\. Lennie, M\. Reso, M\. Groshev, M\. Naumov, M\. Lathi, M\. Keneally, M\. Liu, M\. L\. Seltzer, M\. Valko, M\. Restrepo, M\. Patel, M\. Vyatskov, M\. Samvelyan, M\. Clark, M\. Macey, M\. Wang, M\. J\. Hermoso, M\. Metanat, M\. Rastegari, M\. Bansal, N\. Santhanam, N\. Parks, N\. White, N\. Bawa, N\. Singhal, N\. Egebo, N\. Usunier, N\. Mehta, N\. P\. Laptev, N\. Dong, N\. Cheng, O\. Chernoguz, O\. Hart, O\. Salpekar, O\. Kalinli, P\. Kent, P\. Parekh, P\. Saab, P\. Balaji, P\. Rittner, P\. Bontrager, P\. Roux, P\. Dollar, P\. Zvyagina, P\. Ratanchandani, P\. Yuvraj, Q\. Liang, R\. Alao, R\. Rodriguez, R\. Ayub, R\. Murthy, R\. Nayani, R\. Mitra, R\. Parthasarathy, R\. Li, R\. Hogan, R\. Battey, R\. Wang, R\. Howes, R\. Rinott, S\. Mehta, S\. Siby, S\. J\. Bondu, S\. Datta, S\. Chugh, S\. Hunt, S\. Dhillon, S\. Sidorov, S\. Pan, S\. Mahajan, S\. Verma, S\. Yamamoto, S\. Ramaswamy, S\. Lindsay, S\. Lindsay, S\. Feng, S\. Lin, S\. C\. Zha, S\. Patil, S\. Shankar, S\. Zhang, S\. Zhang, S\. Wang, S\. Agarwal, S\. Sajuyigbe, S\. Chintala, S\. Max, S\. Chen, S\. Kehoe, S\. Satterfield, S\. Govindaprasad, S\. Gupta, S\. Deng, S\. Cho, S\. Virk, S\. Subramanian, S\. Choudhury, S\. Goldman, T\. Remez, T\. Glaser, T\. Best, T\. Koehler, T\. Robinson, T\. Li, T\. Zhang, T\. Matthews, T\. Chou, T\. Shaked, V\. Vontimitta, V\. Ajayi, V\. Montanez, V\. Mohan, V\. S\. Kumar, V\. Mangla, V\. Ionescu, V\. Poenaru, V\. T\. Mihailescu, V\. Ivanov, W\. Li, W\. Wang, W\. Jiang, W\. Bouaziz, W\. Constable, X\. Tang, X\. Wu, X\. Wang, X\. Wu, X\. Gao, Y\. Kleinman, Y\. Chen, Y\. Hu, Y\. Jia, Y\. Qi, Y\. Li, Y\. Zhang, Y\. Zhang, Y\. Adi, Y\. Nam, Yu, Wang, Y\. Zhao, Y\. Hao, Y\. Qian, Y\. Li, Y\. He, Z\. Rait, Z\. DeVito, Z\. Rosnbrick, Z\. Wen, Z\. Yang, Z\. Zhao, and Z\. Ma \(2024\)The llama 3 herd of models\.External Links:2407\.21783,[Link](https://arxiv.org/abs/2407.21783)Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.SS0.SSS0.Px3.p1.1)\.
- M\. Honnibal, I\. Montani, S\. Van Landeghem, and A\. Boyd \(2020\)spaCy: Industrial\-strength Natural Language Processing in Python\.External Links:[Document](https://dx.doi.org/10.5281/zenodo.1212303)Cited by:[Appendix A](https://arxiv.org/html/2604.19189#A1.SS0.SSS0.Px1.p1.1)\.
- A\. Q\. Jiang, A\. Sablayrolles, A\. Mensch, C\. Bamford, D\. S\. Chaplot, D\. de las Casas, F\. Bressand, G\. Lengyel, G\. Lample, L\. Saulnier, L\. R\. Lavaud, M\. Lachaux, P\. Stock, T\. L\. Scao, T\. Lavril, T\. Wang, T\. Lacroix, and W\. E\. Sayed \(2023\)Mistral 7b\.External Links:2310\.06825,[Link](https://arxiv.org/abs/2310.06825)Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.SS0.SSS0.Px3.p1.1)\.
- A\. Q\. Jiang, A\. Sablayrolles, A\. Roux, A\. Mensch, B\. Savary, C\. Bamford, D\. S\. Chaplot, D\. de las Casas, E\. B\. Hanna, F\. Bressand, G\. Lengyel, G\. Bour, G\. Lample, L\. R\. Lavaud, L\. Saulnier, M\. Lachaux, P\. Stock, S\. Subramanian, S\. Yang, S\. Antoniak, T\. L\. Scao, T\. Gervet, T\. Lavril, T\. Wang, T\. Lacroix, and W\. E\. Sayed \(2024\)Mixtral of experts\.External Links:2401\.04088,[Link](https://arxiv.org/abs/2401.04088)Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.SS0.SSS0.Px3.p1.1)\.
- A\. Liu, B\. Feng, B\. Xue, B\. Wang, B\. Wu, C\. Lu, C\. Zhao, C\. Deng, C\. Zhang, C\. Ruan,et al\.\(2024\)Deepseek\-v3 technical report\.arXiv preprint arXiv:2412\.19437\.Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.SS0.SSS0.Px3.p1.1)\.
- A\. R\. Lommel, A\. Burchardt, and H\. Uszkoreit \(2013\)Multidimensional quality metrics: a flexible system for assessing translation quality\.InProceedings of Translating and the Computer 35,London, UK\.External Links:[Link](https://aclanthology.org/2013.tc-1.6/)Cited by:[Appendix C](https://arxiv.org/html/2604.19189#A3.p1.1),[§3\.2](https://arxiv.org/html/2604.19189#S3.SS2.SSS0.Px2.p1.1)\.
- B\. Lutz, M\. Adam, S\. Feuerriegel, N\. Pröllochs, and D\. Neumann \(2024\)Which linguistic cues make people fall for fake news? a comparison of cognitive and affective processing\.Proc\. ACM Hum\. Comput\. Interact\.8\(CSCW1\),pp\. 1–22\(en\)\.Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- C\. M\. MacLeod and K\. E\. Kampe \(1996\)Word frequency effects on recall, recognition, and word fragment completion tests\.\.Journal of experimental psychology: Learning, memory, and cognition22\(1\),pp\. 132\.Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- S\. Mukherjee, A\. Kr\. Ojha, and O\. Dusek \(2024\)Are large language models actually good at text style transfer?\.InProceedings of the 17th International Natural Language Generation Conference,S\. Mahamood, N\. L\. Minh, and D\. Ippolito \(Eds\.\),Tokyo, Japan,pp\. 523–539\.External Links:[Link](https://aclanthology.org/2024.inlg-main.42/),[Document](https://dx.doi.org/10.18653/v1/2024.inlg-main.42)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p1.1)\.
- K\. Park, H\. Kwak, J\. An, and S\. Chawla \(2021\)How\-to present news on social media: a causal analysis of editing news headlines for boosting user engagement\.InProceedings of the International AAAI Conference on Web and Social Media,Vol\.15,pp\. 491–502\.Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p1.1)\.
- T\. Peña, R\. Maswood, M\. Chen, and S\. Rajaram \(2023\)Memory for tweets versus headlines: does message consistency matter?\.Appl\. Cogn\. Psychol\.37\(4\),pp\. 768–784\(en\)\.Cited by:[Appendix A](https://arxiv.org/html/2604.19189#A1.SS0.SSS0.Px1),[Appendix A](https://arxiv.org/html/2604.19189#A1.SS0.SSS0.Px1.p1.1),[§2](https://arxiv.org/html/2604.19189#S2.p3.1),[§3\.1](https://arxiv.org/html/2604.19189#S3.SS1.p1.1),[§3](https://arxiv.org/html/2604.19189#S3.p1.1),[footnote 1](https://arxiv.org/html/2604.19189#footnote1)\.
- G\. Pennycook, T\. D\. Cannon, and D\. G\. Rand \(2018\)Prior exposure increases perceived accuracy of fake news\.J\. Exp\. Psychol\. Gen\.147\(12\),pp\. 1865–1880\(en\)\.Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p2.1),[§4\.2](https://arxiv.org/html/2604.19189#S4.SS2.SSS0.Px3.p2.1)\.
- G\. Pennycook and D\. G\. Rand \(2021\)The psychology of fake news\.Trends in cognitive sciences25\(5\),pp\. 388–402\.Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p3.1)\.
- V\. Raheja, D\. Kumar, R\. Koo, and D\. Kang \(2023\)CoEdIT: text editing by task\-specific instruction tuning\.InFindings of the Association for Computational Linguistics: EMNLP 2023,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 5274–5291\.External Links:[Link](https://aclanthology.org/2023.findings-emnlp.350/),[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.350)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p1.1)\.
- N\. Reimers and I\. Gurevych \(2019\)Sentence\-BERT: sentence embeddings using Siamese BERT\-networks\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),Hong Kong, China,pp\. 3982–3992\.External Links:[Link](https://aclanthology.org/D19-1410/),[Document](https://dx.doi.org/10.18653/v1/D19-1410)Cited by:[§4\.2](https://arxiv.org/html/2604.19189#S4.SS2.SSS0.Px1.p1.1)\.
- G\. Spitale, N\. Biller\-Andorno, and F\. Germani \(2023\)AI model gpt\-3 \(dis\)informs us better than humans\.Science Advances9\(26\),pp\. eadh1850\.External Links:[Document](https://dx.doi.org/10.1126/sciadv.adh1850),[Link](https://www.science.org/doi/abs/10.1126/sciadv.adh1850),https://www\.science\.org/doi/pdf/10\.1126/sciadv\.adh1850Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p2.1)\.
- I\. Srba, O\. Razuvayevskaya, J\. A\. Leite, R\. Moro, I\. B\. Schlicht, S\. Tonelli, F\. M\. García, S\. B\. Lottmann, D\. Teyssou, V\. Porcellini, C\. Scarton, K\. Bontcheva, and M\. Bielikova \(2024\)A survey on automatic credibility assessment of textual credibility signals in the era of large language models\.arXiv \[cs\.CL\]\.Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p1.1)\.
- C\. S\. Symons and B\. T\. Johnson \(1997\)The self\-reference effect in memory: a meta\-analysis\.\.Psychological bulletin121\(3\),pp\. 371\.Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- Q\. Team \(2025\)Qwen3 technical report\.External Links:2505\.09388,[Link](https://arxiv.org/abs/2505.09388)Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.SS0.SSS0.Px3.p1.1)\.
- N\. I\. Tripto, S\. Venkatraman, D\. Macko, R\. Moro, I\. Srba, A\. Uchendu, T\. Le, and D\. Lee \(2024\)A ship of theseus: curious cases of paraphrasing in LLM\-generated texts\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 6608–6625\.External Links:[Link](https://aclanthology.org/2024.acl-long.357/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.357)Cited by:[§7](https://arxiv.org/html/2604.19189#S7.p1.1)\.
- N\. Unsworth and G\. A\. Brewer \(2009\)Examining the relationships among item recognition, source recognition, and recall from an individual differences perspective\.\.Journal of Experimental Psychology: Learning, Memory, and Cognition35\(6\),pp\. 1578\.Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p3.1)\.
- V\. Vellani, S\. Zheng, D\. Ercelik, and T\. Sharot \(2023\)The illusory truth effect leads to the spread of misinformation\.Cognition236\(105421\),pp\. 105421\(en\)\.Cited by:[§1](https://arxiv.org/html/2604.19189#S1.p2.1),[§4\.2](https://arxiv.org/html/2604.19189#S4.SS2.SSS0.Px3.p2.1)\.
- J\. Wei, Y\. Tay, R\. Bommasani, C\. Raffel, B\. Zoph, S\. Borgeaud, D\. Yogatama, M\. Bosma, D\. Zhou, D\. Metzler, E\. H\. Chi, T\. Hashimoto, O\. Vinyals, P\. Liang, J\. Dean, and W\. Fedus \(2022\)Emergent abilities of large language models\.Transactions on Machine Learning Research\.Note:Survey CertificationExternal Links:ISSN 2835\-8856,[Link](https://openreview.net/forum?id=yzkSU5zdwD)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p1.1)\.
- J\. White, Q\. Fu, S\. Hays, M\. Sandborn, C\. Olea, H\. Gilbert, A\. Elnashar, J\. Spencer\-Smith, and D\. C\. Schmidt \(2023\)A prompt pattern catalog to enhance prompt engineering with chatgpt\.InProceedings of the 30th Conference on Pattern Languages of Programs,pp\. 1–31\.Cited by:[Appendix B](https://arxiv.org/html/2604.19189#A2.p1.1),[Appendix B](https://arxiv.org/html/2604.19189#A2.p2.1)\.
- Y\. Zhang and Z\. He \(2024\)Large language models can not perform well in understanding and manipulating natural language at both character and word levels?\.InFindings of the Association for Computational Linguistics: EMNLP 2024,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 11826–11842\.External Links:[Link](https://aclanthology.org/2024.findings-emnlp.691/),[Document](https://dx.doi.org/10.18653/v1/2024.findings-emnlp.691)Cited by:[§2](https://arxiv.org/html/2604.19189#S2.p1.1),[§4\.1](https://arxiv.org/html/2604.19189#S4.SS1.SSS0.Px1.p1.1)\.

## Appendix APilot Study

#### Posthoc Analysis of[Peñaet al\.](https://arxiv.org/html/2604.19189#bib.bib22)’s data

We obtain[Peñaet al\.](https://arxiv.org/html/2604.19189#bib.bib22)’s study data and perform analyses using spacy’sHonnibalet al\.\([2020](https://arxiv.org/html/2604.19189#bib.bib26)\)English transformer pipeline to identify linguistic features which may impact memorability of news headlines and tweets in their data\. This post hoc analysis reveals significant correlations between memorability and pronoun use \(Spearman Rank Coefficientρ\\rho=0\.31, p<0\.001\)\. At the same time, we also find that these characteristics are used significantly more in tweets than in news headlines \(p<0\.001\) in the study items selected by[Peñaet al\.](https://arxiv.org/html/2604.19189#bib.bib22)\. We were thus interested in whether these effects would persist, when only news headlines are used\. In other words, we wondered if the increase in memorability of tweets compared to news headlines might stem from the language that is used, rather than the content or text type\. To address this question, we performed a pilot user study using a within\-subject design to reproduce findings from this analysis, using only naturalistic news headlines found in the wild and no tweets\.

#### Pilot Study

For our pilot study, we collected 32 news headlines from popular US news outlets with eight headlines for each of four topics: entertainment, politics, environment, and health\. Within each topic, half the headlines contain pronouns and half do not\. 60 participants were randomly split into two groups: each saw a different set of 16 headlines during the presentation phase, with the other 16 appearing first during the recognition phase\. For both groups, headlines were balanced by topic and pronoun use\. The study followed the same format as the main study\.

We conducted a mixed\-effects linear regression to investigate whether the presence of pronouns in headlines influenced their recognition, taking into account participants’ experimental group\. While neither the main effect of pronoun presence nor the main effect of experimental group reached statistical significance, their interaction did: headlines containing pronouns were recognized at different rates depending on the experimental group \(β=0\.73\\beta=0\.73,SE= 0\.31,z= 2\.39,p= 0\.017\)\.

Follow\-up analysis of individual headlines revealed that Group 2—where recognition rates for pronoun\-containing headlines were higher—had a greater number of headlines using first\- and second\-person pronouns compared to Group 1\. Across all participants, we also observed a consistent pattern: headlines featuring first\- and second\-person pronouns were more likely to be recognized than those without any pronouns, reinforcing the importance of pronoun type in headline recognition \(see Figure[5](https://arxiv.org/html/2604.19189#A1.F5)\)\.

![Refer to caption](https://arxiv.org/html/2604.19189v1/images/hit_proportion_by_pronoun_type.png)Figure 5:headline recognition likelihood given the contained pronoun type

## Appendix BLLM Prompts and Setup

Our prompt is based on a combination of best\-practice strategies suggested byWhiteet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib27)\)\. We provide the LLM with a persona in the system prompt\.

We also make use of the alternative approaches pattern and the reflection pattern described byWhiteet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib27)\)\. After providing a description of the task and the constraints, LLMs are prompted to provide five revisions of a given headline\. They are then asked to reflect on the quality of the headlines and choose the best headline based on specified criteria\. The output is constrained by providing a response template and forcing JSON\-format\. If a model revision for a headline did not contain a first or second person pronoun on the first try \(this happened for gpt\-4o\-mini, mistral, mixtral, and qwen\), the generation process was repeated for the specific headline until the requirement was fulfilled\.

#### System Prompt:

You are an editor at a high\-quality newspaper\. Your task is to subtly modify article headlines to make them more engaging, without altering the core message\. Specifically, when given a headline, you will rewrite it by incorporating second\-person \(e\.g\., "you", "your"\) and/or first\-person \(e\.g\., "I", "my"\) pronouns\. This will make the headline more relatable and attention\-grabbing for the reader\. Ensure that the revised headline remains true to the original tone and meaning\. Your goal is to make each headline more compelling and conversational, while maintaining clarity and relevance to the reader’s experience\. After generating multiple options, choose the one that fits best in terms of engagement, clarity, and relevance for the target audience\.

#### Prompt:

You are given a news article headline\. Your task is to rewrite it using first\-person \("I", "my"\) and/or second\-person \("you", "your"\) pronouns to make it more engaging and personally relevant to readers\. Generate exactly five alternative versions of the headline\. Each version should: \- Preserve the original tone and core message as closely as possible\. \- Use first\-person and/or second\-person pronouns to create a direct, conversational appeal\. After generating the five rewrites, analyze which one is the most effective\. Your analysis should consider: \- Reader engagement \- Clarity \- Faithfulness to the original meaning Finally, select the single best version based on your reasoning\. Use the following json\-format to return your output \(no additional explanation or text\): r​e​w​r​i​t​e​\_​1rewrite\\\_1: First rewritten headline, r​e​w​r​i​t​e​\_​2rewrite\\\_2: Second rewritten headline, r​e​w​r​i​t​e​\_​3rewrite\\\_3: Third rewritten headline, r​e​w​r​i​t​e​\_​4rewrite\\\_4: Fourth rewritten headline, r​e​w​r​i​t​e​\_​5rewrite\\\_5: Fifth rewritten headline, r​e​a​s​o​n​i​n​greasoning: Explain why one version stands out in terms of engagement, clarity, and preservation of the original message\., b​e​s​t​\_​h​e​a​d​l​i​n​ebest\\\_headline: The best headline from above Original headline:‘‘row\[′headline′\]\`\`\{row\[^\{\\prime\}headline^\{\\prime\}\]\}”

#### Setup and Parameters

We used the default settings for all LLM calls, with the temperature parameter set to 0\.3\. GPTAchiamet al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib28)\)and DeepSeekLiuet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib31)\)models were accessed via the OpenAI API, while MistralJianget al\.\([2023](https://arxiv.org/html/2604.19189#bib.bib29)\)and LLaMAGrattafioriet al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib32)\)ran on a single A40 GPU\. MixtralJianget al\.\([2024](https://arxiv.org/html/2604.19189#bib.bib30)\)and QwenTeam \([2025](https://arxiv.org/html/2604.19189#bib.bib33)\)models were run on two A40 GPUs\. Given the small number of headlines to process, generating all LLM outputs took less than an hour\.

## Appendix CInstructions for Human Annotation of Rewritten Headlines

The following instructions, adapted from the multidimensional quality metrics frameworkLommelet al\.\([2013](https://arxiv.org/html/2604.19189#bib.bib25)\)were provided to annotators for the evaluation of LLM\-revised headlines\. The instructions were followed by a list of 6 more examples\. Headlines used for examples and the qualification test did not appear in the main annotation task\.

Overview You will see two versions of a news headline, one marked as "Original", the other as "Revision"\. A revision is a rewriting of the original headline, so that it includes one or more first or second person pronouns\. Your task is to annotate whether the revised version retains the orsiginal content and style and to what extent it reflects a news headline you are likely to read on typical news outlets\. You are asked to judge the following categories: 1\. Accuracy: Does the content in the revised version accurately reflect the content of the source text? There are three ways, in which accuracy is commonly violated, which can also occur together:

- •Misrepresentation: The revision misrepresents information provided in the original headline
- •Addition: The revision includes content not present in the original headline
- •Omission: The revision is missing content present in the original headline

2\. Style: Is the language style of the revised headline appropriate?Inappropriate style can manifest in various forms, that can also occur together:

- •Grammar: The revision contains grammar or language errors
- •Awkward Style: The revision is grammatical, but unnatural as a news headline or awkward \(e\.g it involves excessive wordiness or overly embedded clauses\)
- •Inconsistent Style: The style or tone is inconsistent within the revision \(e\.g\. factual, dry information is paired with sensationalism\)

If a revision differs in tone or emotion compared to the original version, you can indicate this in a separate checkbox\. Also, feel free to add comments in the comment field\. You can even suggest revision improvements, if you can think of a better phrasing \(but this is not the main goal of this annotation task\)

Examples:

Original:

Jonathan Majors reportedly admits to being ‘aggressive’ with ex\-girlfriend in newly released audio clip
Revision:You need to hear Jonathan Majors’ disturbing admission about being ’aggressive’ with an ex\-girlfriend
1\. Does the content in the revised version accurately reflect the content of the source text?
\[x\] Yes \[ \] No
2\. Is the language style of the revised headline appropriate?
\[x\] Yes \[ \] No
\[x\] The revision differs in tone or emotion compared to the original
Explanation: The headline is appropriate as a news headline and accurately reflects the content of the original headline\. The addition of the word “disturbing” changes the emotional tone of the revision compared to the original\.

## Appendix DInstructions for Human Revisions

The following instructions were given to participants when collecting human revisions of news headlines\. The same instructions were also given to the graduate student who revised headlines for study III\.

You are given a set of news article headlines, one after the other\. Your task is to rewrite each headline using first\-person \(e\.g\. "I", "my", "our"\) and/or second\-person \(e\.g\. "you", "your"\) pronouns to make it more engaging and personally relevant to readers\. Your revision should preserve the original tone and core message as closely as possible and use one or more first\-person and/or second\-person pronouns to create a direct, conversational appeal\. For some headlines, this might be easier than for others\. Feel free to make changes to the structure or wording of a headline if needed, but make sure the content and tone stay faithful to the original headline\.

## Appendix EDemographics of Memory Study Participants

#### Study I

participants were between 20 and 74 years old \(mean age: 42\.41\)\. 29 identified as female and 28 as male and 3 chose not to disclose\. 56\.14% held a Bachelor’s or Master’s degree, 21\.05% had some college education, 14\.04% had only a high school degree and the rest held associate or professional degrees\. The majority of participants \(64\.91%\) were employed\. When asked about their political views on a five point likert scale ranging from 1 \- left to 5 \- right, 19\.3% indicated political affiliation with the left and 7\.02% with the right, whereas the rest fell between\. 28\.07% of participants indicated they consumed news on news websites more than once a day and 28\.07% once a week or less, with the rest falling in\-between\. 55\.36% consumed news on social media more than once a day, and 19\.65% once a week or less\.

#### Study II

participants were between 22 and 67 years old \(mean age: 41\.98\)\. 33 identified as female, 21 male, 2 non\-binary and 4 chose not to disclose\. 60\.72% held a Bachelor’s or Master’s degree, 21\.43% had some college education, 8\.93% had only a high school degree and the rest held associate or professional degrees\. The majority of participants \(67\.86%\) were employed\. 36\.36% indicated political affiliation with the left and 23\.64% with the right, whereas the rest fell between\. 34\.55% of participants indicated they consumed news on news websites more than once a day and 14\.55% once a week or less, with the rest falling in\-between\. 66\.07% consumed news on social media more than once a day, and 12\.5% once a week or less\.

#### Study III

participants were between 21 and 68 years old \(mean age: 40\.23\)\. 54 identified as female, 66 male\. 65% held a Bachelor’s or Master’s degree, 13\.33% had some college education, 10% had only a high school degree and the rest held associate or professional degrees\. The majority of participants \(78\.33%\) were employed\. 25\.83% indicated political affiliation with the left and 25% with the right, whereas the rest fell between\. 26\.89% of participants indicated they consumed news on news websites more than once a day and 25\.21% once a week or less, with the rest falling in\-between\. 65% consumed news on social media more than once a day, and 10% once a week or less\.

Similar Articles

Content for Content’s Sake

Armin Ronacher

The author investigates how LLMs are influencing word usage in coding and everyday language, finding that words favored by LLMs show increased frequency in both coding sessions and Google Trends, raising concerns about humans adopting LLM writing styles.

## Language adaption as language models become integral part of society.

Reddit r/ArtificialInteligence

The article argues that as LLM-based AI becomes ubiquitous, language should adapt by creating new pronouns for AI, since neither human pronouns ('he/she') nor impersonal 'it' accurately reflect the unique relationship with language-capable non-human entities.