Ideology Prediction of German Political Texts

arXiv cs.CL Papers

Summary

The paper proposes a transformer-based model to predict political ideology of German political texts on a continuous left-to-right spectrum. The study compares 13 models and finds DeBERTa-large and Gemma2-2B perform best on different tasks.

arXiv:2605.14352v1 Announce Type: new Abstract: Elections represent a crucial milestone in a nation's ongoing development. To better understand the political rhetoric from various movements, ranging from left to right, we propose a transformer-based model capable of projecting the political orientation of a text on a continuous left-to-right spectrum, represented by a normalized scalar d between -1 and 1. This approach enables analysts to focus on specific segments of the political landscape, such as conservatives, while excluding liberal and far-right movements. Such a task can only be achieved with multiclass classifiers, provided that the desired orientation is incorporated within one of their predefined classes. To determine the most suitable foundation model among 13 candidate transformers for this task, we constructed four distinct corpora. One corpus comprised annotated plenary notes from the German Bundestag, while another was based on an official online decision-making tool, Wahl-O-Mat. The third corpus consisted of articles from 33 newspapers, each identified by its political orientation, and the fourth included 535,200 tweets from 597 members of the 20th and 21st German Bundestag. To mitigate overfitting, we used two distinct corpora for training and two for testing, respectively. For in-domain performance, DeBERTa-large achieved the highest F1 score F1=0.844 as well as for the X (Twitter) out-of-domain test ACC=0.864. Regarding the newspaper out-of-domain test, Gemma2-2B excelled (MAE = 0.172). This study demonstrates that transformer models can recognize political framing in German news at the level of public opinion polls. Our findings suggest that both the model architecture and the availability of domain-specific training data can be as influential as model size for estimating political bias. We discuss methodological limitations and outline directions for improving the robustness of bias measurement.
Original Article
View Cached Full Text

Cached at: 05/15/26, 06:21 AM

# Ideology Prediction of German Political Texts
Source: [https://arxiv.org/html/2605.14352](https://arxiv.org/html/2605.14352)
###### Abstract

Elections represent a crucial milestone in a nation’s ongoing development\. To better understand the political rhetoric from various movements, ranging from left to right, we propose a transformer\-based model capable of projecting the political orientation of a text on a continuous left\-to\-right spectrum, represented by a normalized scalar,d∈\[−1,1\]d\\in\\left\[\-1,1\\right\]\. This approach enables analysts to focus on specific segments of the political landscape, such as conservatives, while excluding liberal and far\-right movements\. Such a task can only be achieved with multiclass classifiers, provided that the desired orientation is incorporated within one of their predefined classes\. To determine the most suitable foundation model among 13 candidate transformers for this task, we constructed four distinct corpora\. One corpus comprised annotated plenary notes from the German Bundestag, while another was based on an official online decision\-making tool, Wahl\-O\-Mat\. The third corpus consisted of articles from 33 newspapers, each identified by its political orientation, and the fourth included 535,200 tweets from 597 members of the 20th and 21st German Bundestag\. To mitigate overfitting, we used two distinct corpora for training and two for testing, respectively\. For in\-domain performance, DeBERTa\-large achieved the highest F1 score \(F1=0\.844F\_\{1\}=0\.844\) as well as for the X \(Twitter\) out\-of\-domain test \(A​C​C=0\.864ACC=0\.864\)\. Regarding the newspaper out\-of\-domain test, Gemma2\-2B excelled \(M​A​E=0\.172MAE=0\.172\)\. This study demonstrates that transformer models can recognize political framing in German news at the level of public opinion polls\. Our findings suggest that both the model architecture and the availability of domain\-specific training data can be as influential as model size for estimating political bias\. We discuss methodological limitations and outline directions for improving the robustness of bias measurement\.

Code—https://github\.com/SinclairSchneider/german˙ideology˙prediction

Bundestag/Wahl\-O\-Mat Datasets—https://doi\.org/10\.57967/hf/4924

German Media Datasets—https://huggingface\.co/collections/SinclairSchneider/german\-media\-67dcb6c0bf4c007db3999153

## Introduction

In February 2023, investigative journalists from the network “Forbidden Stories” uncovered a disinformation\-as\-a\-service provider, working with social media bot accounts, known as “Team Jorge”\(Andrzejewski[2023](https://arxiv.org/html/2605.14352#bib.bib71)\)\. This entity claims to have manipulated 33 elections, 27 of which were deemed successful\. To demonstrate their capabilities, Team Jorge spread false rumors about a deceased emu \(\#RIP\_Emmanuel\), which ultimately led to real issues at the animal’s farm\. Although this is a particularly negative example, it highlights the considerable influence of social media on politics\.

We believe that the robust tools of social media analysis can play a valuable role in helping political parties better understand the needs and preferences of their constituents, as well as in forecasting the trajectory of political discourse\. To achieve this goal, the political ideology spectrum can be quantified on a continuous scale from \-1 \(left\) to 1 \(right\)\. Assuming such a mapping is found, individuals’ political ideology can be approximated from tweets on X\. A range of−1≤θ≤−0\.9\-1\\leq\\theta\\leq\-0\.9would yield left\-wing topics such as the establishment of a single public healthcare system, the withdrawal of U\.S\. troops from Germany, a focus on social justice and climate protection, and an end to weapons exports\. More centrist positions may be found in a range of−0\.1≤θ≤0\.1\-0\.1\\leq\\theta\\leq 0\.1, including principles against extremism, efforts to combat hate speech and misinformation, democratic values, military modernization, and digital strategies\. Consequently, a threshold of0\.9≤θ≤10\.9\\leq\\theta\\leq 1might reveal right\-wing topics such as the end of weapon supplies to Ukraine, claims of economic destruction linked to voting for the Green Party, viewing climate change as a business model, and the perception of immigration and Islam as threats to Western countries\.

To achieve this, one could implement a topic modeling algorithm such as BERTopic\(Grootendorst[2022](https://arxiv.org/html/2605.14352#bib.bib61)\)\. However, these approaches lack an essential component: the ability to dynamically focus on a specific political direction, which can only be addressed partially by classifiers with predefined categories\. Therefore, this paper introduces a new algorithm that maps political texts onto a continuous scale ranging from \-1 to 1, with a liberal orientation at 0\.

This paper addresses three significant challenges: first, it aims to map text onto a continuous left\-to\-right spectrum rather than simply categorizing it into discrete classes\. Second, it seeks to adapt the generated algorithm to account for local political biases through a semi\-supervised labeling approach\. Third, it focuses on ensuring the algorithm’s effectiveness by testing on distinct, out\-of\-domain datasets\.

#### Approach

The foundation for training a classifier that maps texts to a continuous left\-to\-right spectrum is the association of two\-dimensional normalized vectors with political parties\. An entirely left\-wing party would be represented by a vector pointing to the left \(\-1, 0\), while a right\-wing party would have a vector directed to the right \(1, 0\)\. A centrist party would be indicated by an upward vector towards the center \(0, 1\)\. Intermediate positions are encoded by vectors of unit length at corresponding angles\.

The output of a trained multilabel classifier, indicating the extent to which a party agrees with a given statement, is then multiplied by the corresponding vectors\. At the end, all vectors are added, and the angle of the newly formed vector represents the classification result\. To demonstrate that this approach is effective, it is finally tested on both crawled German newspapers and politicians’ tweets, for which the political leanings are known\. This outlines both the classifier’s accuracy and its out\-of\-domain capabilities\. In order to do so, we trained and tested 13 transformer classifiers\.

#### Contribution

The main contributions of this paper are the extension of previous approaches that used categorical variables with a continuous left\-right spectrum between \-1 and 1, as well as demonstrating the out\-of\-sample capabilities of our classifier\. When tested against the 33 newspapers, our best classifier yielded a mean error \(ME\) of 0\.17 on a scale between \-1 and 1, which is an error of 8\.58% on a survey\-based benchmark dataset\. Regarding the origin\-prediction tweets, we found that accuracy increases to 0\.864 when 100\+ words are available\. By using plenary speeches from the German Bundestag as one of the training sets, we ensured that our classifier is perfectly aligned with the German left\-right spectrum without introducing the author’s bias\. With a total of four self\-collected datasets, we also made sure that the out\-of\-domain accuracy is provided\. By adapting the task of political stance prediction to a German context, we contribute to a more diverse array of training data and models, as this not only requires linguistic adaptation but also considers the unique political environment\.

## Related Work

Political ideology detection is typically done by building classes such as left, center, or right, using a manual annotation approach\(Balyet al\.[2020](https://arxiv.org/html/2605.14352#bib.bib31)\)\.

Different research projects approach the issue of such a limited political scale in various ways\. Some focus solely on detecting \(extreme\) left\-wing or right\-wing opinions\(Kieselet al\.[2019](https://arxiv.org/html/2605.14352#bib.bib32); Jakobet al\.[2024](https://arxiv.org/html/2605.14352#bib.bib24)\), while others offer a broader spectrum\(AllSides[2025](https://arxiv.org/html/2605.14352#bib.bib29)\)\. These broader approaches include classifications for “lean left” and “lean right”, situated between the center and the two extremes\. Others offer an even more fine\-grained classification of seven or more classes\(Preoţiuc\-Pietroet al\.[2017](https://arxiv.org/html/2605.14352#bib.bib38); Fagni and Cresci[2022](https://arxiv.org/html/2605.14352#bib.bib21)\), for instance, very conservative, conservative, moderately conservative\.

Most foundational research is conducted in English, which often leads to an association with the United States\. However, simply translating existing English\-language datasets is insufficient for their application to German politics, given the diverse political views across countries\. For this reason, researchers have begun to collect and label specific datasets in German, utilizing information from German newspapers\(Aksenovet al\.[2021](https://arxiv.org/html/2605.14352#bib.bib23)\)\.

The global nature of social media platforms, which span across borders and cultures, makes it difficult to develop generalizable models trained on tweets\. For instance, methods that achieve over 90% accuracy on a carefully selected dataset can drop to approximately 65% when applied to different users within the same network\(Cohen and Ruths[2013](https://arxiv.org/html/2605.14352#bib.bib22)\)\. Despite this, social media continues to be a focal point for transformer\-based classification methods, particularly with models tailored for social media like BERTweet\(Nguyenet al\.[2020](https://arxiv.org/html/2605.14352#bib.bib33)\)and PoliBERTweet\(Kawintiranon and Singh[2022](https://arxiv.org/html/2605.14352#bib.bib34)\)\.

Expanding beyond a text\-only approach to ideology classification and incorporating users’ networks opens up new opportunities for classification methods that utilize transformers, as demonstrated in previous research\(Jianget al\.[2023](https://arxiv.org/html/2605.14352#bib.bib26)\)\.

Exploring publications analyzing German Bundestag speeches leads us to the work of Erhard et al\.\([2025](https://arxiv.org/html/2605.14352#bib.bib25)\), who investigated the rise of populism using these speeches\. They identified four main categories: anti\-elitism, people\-centrism, left\-wing ideology, and right\-wing ideology\. This framework enhances the traditional two\-dimensional political spectrum by incorporating anti\-elitism and people\-centrism, while still relying on hand\-labeled discrete categories\.

Baly et al\.\([2019](https://arxiv.org/html/2605.14352#bib.bib37)\)adopt a similar approach by introducing trustworthiness as a second dimension on a three\-point scale\. Their work demonstrates that political orientation can be a useful factor in detecting misinformation, bias, and propaganda\.

The issue of models trained on specific domains, such as news sites, performing poorly on other domains, like social media, in ideology classification has been noted by[Volf and Simko](https://arxiv.org/html/2605.14352#bib.bib39)\([2025](https://arxiv.org/html/2605.14352#bib.bib39)\)\. They addressed this challenge by mixing datasets from multiple domains for the training process\. Another way to improve the classifier’s output is to build a dataset comprising the same stories told by news outlets with different political biases, providing a direct comparison of the same story across different political perspectives\(Liuet al\.[2022](https://arxiv.org/html/2605.14352#bib.bib36)\)\.

All approaches discussed so far are limited due to their categorical outputs\. Specifically, ordinal scales cannot measure the extent to which left\- or right\-leaning perspectives are present\. As there is no convention regarding the specific categories, model usage is limited to a predefined context\. For instance, the concept of a left\-wing opinion in the US may differ significantly from that in Germany\.

## Methodology

The processing pipeline was structured as follows: First, data from several sources was collected and further enriched to obtain generalizable models\. Second, a binary political classifier and subsequent multi\-label party classifiers were trained, using multiple BERT, Llama, and Gemma LLMs\. Third, the multilabel output was converted to a continuous left\-right spectrum \(\-1 to 1\)\. Finally, in\-domain and out\-of\-domain performance was evaluated using separate test sets, each drawn from an independent dataset\. Furthermore, pre\- and post\-vector\-optimization results are compared\.

### Datasets

Two independent sources \(Bundestag, Wahlomat\) were preprocessed for model training and testing\. Despite artificially enriching and splitting the data \(80:20 train\-test split\), models may overfit\. This is why two additional datasets \(newspapers, tweets\) were used for model evaluation\. For training and evaluation, the data of all datasets were either pre\- or auto\-labeled as explained below\.

#### Bundestag Dataset

All plenary debates of the German Bundestag are recorded in writing by stenographers and published\(Deutscher Bundestag[2025](https://arxiv.org/html/2605.14352#bib.bib27)\)\. Besides the text of the speech, the speaker’s name and party membership are minuted\. This is also true regarding requests \(question, party and name of the questioner\) and all other potential speech interruptions, such as interjections, hissing, applause, etc\. \(type and party, resp\. parties\)\. All protocols were collected and processed for the period from October 2017 to September 2024\. The raw speech data comprises 34,174 speeches\.

##### Labeling

The combination of speeches and interruptions constitutes a robust auto\-labeling approach\. All speeches were filtered for recorded interruptions\. Speeches without any interruptions were discarded\. For the remaining ones, the sentiment was extracted from the comments\. The described extraction process is illustrated in Figure[6](https://arxiv.org/html/2605.14352#A1.F6)\. This procedure yielded a dataset of 32,246 annotated statements \(i\.e\., pro or contra opinions of parties\)\. The association between parties based on the extracted sentiment is depicted in Figure[1](https://arxiv.org/html/2605.14352#Sx3.F1)\(upper triangle\)\.

##### Data Enrichment

In order for a classifier to correctly categorise not only political speeches but also political statements in general, the linguistic variance of the statements was artificially increased\. For this purpose, a LLama 3\.1 model was asked to summarize each text in five different versions: In the words of a child, of a teenager, of an adult, of an eloquent person, or as a social media post \(tweet\)\. The expanded dataset consisted of 449,209 statements\. It was made publicly available\(Schneider[2025b](https://arxiv.org/html/2605.14352#bib.bib30)\)after combining it with the Wahlomat dataset, which is described below\.

#### Wahlomat Dataset

The German multi\-party system makes it difficult for voters to find the party that represents their interests best\. Hence, a digital voters’ guide calledWahl\-O\-Matis released ahead of every federal and state election by the Bundeszentrale für politische Bildung \(Federal Agency for Civic Education\)\. It consists of several political statements that the user can agree or disagree with \(viz\. Fig\.[5](https://arxiv.org/html/2605.14352#A1.F5)for an example of the federal election in 2025\)\. For this system to function, the respective party positions \(approval, neutral, rejection\) were officially surveyed in advance by the Federal Agency\.

The used data is available online\(Bolte[2025](https://arxiv.org/html/2605.14352#bib.bib28)\), comprising 1,751 unique statements regarding the elections between 1998 and 2021\.

##### Labeling

No annotation was needed as the data already consists of statements and attitudes of all parties\. Attitudes were coded as 1 \(approval\), 0 \(neutral\), or \-1 \(rejection\), respectively\. Based on these values, the association between parties is illustrated in Figure[1](https://arxiv.org/html/2605.14352#Sx3.F1)\(lower triangle\)\.

##### Data Enrichment

The dataset was also synthetically enriched as described above, yielding 87,210 labelled statements\. Table[6](https://arxiv.org/html/2605.14352#A1.T6)presents an example of how the call for introducing a wealth tax could be expressed from various perspectives\. The positions of the various parties regarding the original statement and thus also concerning the generated ones can be found in Table[4](https://arxiv.org/html/2605.14352#A1.T4)\.

To ensure that the enriched sentences maintain similarity to the originals, we utilized the Qwen3\-Embedding\-8B model\(Zhanget al\.[2025](https://arxiv.org/html/2605.14352#bib.bib69)\)to map them into a vector space and calculated the cosine similarity against the original sentences\. In contrast to parliamentary speeches containing substantial extraneous content \(e\.g\., greetings\), the Wahlomat dataset consists exclusively of condensed statements\. Hence, only the latter was used for comparisons\. The overall similarity of the paraphrased examples is 0\.74, while the most similar sentences, paraphrased for a teenage audience, yielded an average cosine similarity of 0\.78\. To determine whether political bias was introduced during data enrichment, the cosine similarity distribution is assessed\. As is common in statistics, the 5th percentile is computed\. Since this extreme quantile is still sufficient with 0\.54, we can assume that no fundamental bias has been introduced\.

The combined training dataset \(Bundestag\+Wahlomat\) consisted of 570,416 samples and is publicly available\(Schneider[2025b](https://arxiv.org/html/2605.14352#bib.bib30)\)\.

![Refer to caption](https://arxiv.org/html/2605.14352v1/x1.png)Figure 1:Associations between the parties based on Bundestag sentiments \(upper triangle\) and Wahlomat statements \(lower triangle\)\. Profile similarity \(within, viz\. diagonal\) estimated Pearson’s correlation of Phi measures per party between Bundestag and Wahlomat datasets\.

#### Tweet Dataset

To evaluate the performance of classifiers on short social media texts, we curated a dataset consisting of 535,200 tweets from 597 members of the 20th and 21st German Bundestag \(Federal Parliament\)\. Each political party is represented by 89,200 tweets, filtered to include only political content\.

##### Labeling

The labeling is based on the account owners’ affiliation with the respective political party\. Each tweet is assigned to a single political party only\.

#### Newspaper Dataset

Based on the assumption that the German media landscape sufficiently represents the political spectrum\(cf\. Maurer et al\.[2024](https://arxiv.org/html/2605.14352#bib.bib68)\), a dataset of 33 newspapers was examined\. From each source, at least 10,000 articles were collected, resulting in a representative dataset of approximately 10 million articles\. An overview with precise numbers for all media is appended \(cf\. Table[5](https://arxiv.org/html/2605.14352#A1.T5)\)\. Additionally, we retained metadata, such as news categories, to train a binary politics\-non\-politics classifier that serves as a filter later\. The dataset was based on prior political classifications available for 39 newspapers \(see below\)\. Six newspapers were either discontinued or inaccessible due to technical issues\.

##### Labeling

The political stance of the articles was unknown, but several estimates exist at the newspaper level\. The main one used here is based onn=1148n=1148participants who ratedk=39k=39newspapers on a scale from 1 \(extreme left\-wing\) over 4 \(minimal party affiliation\) to 7 \(extreme right\-wing\), with fake news and conspiracy theories falling under both extremes, respectively\(Medienkompass\.org[2025](https://arxiv.org/html/2605.14352#bib.bib19)\)\.

To verify the validity, we compared the ratings with the ones provided by two independent sources: Firstly, a comparable bias\-rating platform that covers various international outlets\(Mediabiasfactcheck\.com[2025](https://arxiv.org/html/2605.14352#bib.bib20)\)and secondly, a scientific report about the German media landscape\(Maureret al\.[2024](https://arxiv.org/html/2605.14352#bib.bib68)\)\. Regarding both sources, appropriate association measures were computed using all pairwise complete cases to estimate convergent validity\. We also report the respective measures for the subset of our sample\.

Mediabiasfactcheck\.com reports data fork=77k=77media outlets, but only non\-numeric labels in roughly half of the cases\. The ratings are based on a scale from \-10 \(extrem left\) over 0 \(least biased\) to \+10 \(extreme right\)\. For better comparability, both considered scales werez–transformed\. Note that this does not affect the correlation estimates but makes the scores directly comparable, as reported in Table[5](https://arxiv.org/html/2605.14352#A1.T5)\(mean values of zero with standard deviations of one\)\. Both estimates were very highly correlated withr=\.90r=\.90\(resp\.r=\.91r=\.91regarding the sample\)\. However, this estimate was based on the overlap ofk=9k=9outlets only \(k′=7k^\{\\prime\}=7regarding our sample\)\. To enlarge the intersection, the provided ordinal labels were converted into numerical values \(i\.e\.,leftwas assigned to \-2,left\-centerto \-1,least biasedto 0, etc\. with positive values for the right\-hand side\)\. Using Spearman’sρ\\rhofor ordinal data yielded an even higher correlation ofρ=\.95\\rho=\.95fork=19k=19pairs \(ρ=\.96\\rho=\.96fork′=17k^\{\\prime\}=17regarding the sample\)\.

Although the correlations are very high, it could be criticized that both ratings come from public platforms\. Accordingly, the ratings from a scientific study were examined \(Maurer et al\.[2024](https://arxiv.org/html/2605.14352#bib.bib68)\), providing data fork=47k=47media outlets by onlyn=9n=9but extensively trained raters\. Here, political ideology was rated using two separate five\-point scales\. As these showed a strong positive correlation \(r=\.63r=\.63\), both were reduced to a single dimension using principal component analysis \(PCA; default settings, varimax rotation\)\. From the resulting one\-dimensional values, a subset ofk′=21k^\{\\prime\}=21outlets was present at Mediencompass\.org, yielding a very high correlation ofr=\.95r=\.95\(r=\.94r=\.94for the subset ofk′=22k^\{\\prime\}=22regarding the sample\)\.

Since ratings were shown to be very highly correlated with two independent sources, the validity of Mediencompass\.org can be considered sufficient\. This is also the case regarding our sample, which had approximately the same correlation coefficients\.

![Refer to caption](https://arxiv.org/html/2605.14352v1/x2.png)Figure 2:Exemplary comparison of the Green Party \(B’90\) against the right\-wing \(AfD\), liberals \(FDP\), and the left party \(Linke\) in subplot a\. Hachures indicate Wahlomat items on which both compared parties agree or disagree regarding the Brandenburg election in 2024\. The mean overlap of all election results is displayed in subplot b\. Results are mapped onto a left\-right spectrum in subplot c regarding \(dis\-\)similarity distance to the other parties\.

### Models

#### Foundation Models

To effectively classify German political texts, we needed to select appropriate foundation models for this multilabel classification task\. We used smaller encoder\-only models with 0\.21\-2\.1 billion parameters, alongside larger decoder\-only models with 1\.0\-9\.0 billion parameters\.

For the encoder\-only models, we chose DeBERTa Large\(Dadaet al\.[2023](https://arxiv.org/html/2605.14352#bib.bib42)\), GottBERT Large\(Scheibleet al\.[2024](https://arxiv.org/html/2605.14352#bib.bib43)\), GBERT and GELECTRA Large\(Chanet al\.[2020](https://arxiv.org/html/2605.14352#bib.bib44)\), xlm\-roberta Large\(Conneauet al\.[2020](https://arxiv.org/html/2605.14352#bib.bib45)\)and EuroBERT\(Boizardet al\.[2025](https://arxiv.org/html/2605.14352#bib.bib46)\)\. In contrast to the original DeBERTa model presented by[Heet al\.](https://arxiv.org/html/2605.14352#bib.bib47),[Dadaet al\.](https://arxiv.org/html/2605.14352#bib.bib42)trained a model on the same architecture but used a diverse German training corpus\. This collection includes online encyclopedias, social media content, legal documents, medical texts, and fiction books, which collectively make the foundation model well\-suited for a wide array of German\-language applications\.

The authors of GottBERT followed a similar approach, except that they employed a RoBERTa BASE architecture\(Zhuanget al\.[2021](https://arxiv.org/html/2605.14352#bib.bib48)\)combined with the OSCAR\(Ortiz Suárezet al\.[2019](https://arxiv.org/html/2605.14352#bib.bib54)\)dataset\.

GBERT and GELECTRA Large are developed by the same authors and use the BERT\(Devlinet al\.[2019](https://arxiv.org/html/2605.14352#bib.bib52)\)and ELECTRA\(Clarket al\.[2020](https://arxiv.org/html/2605.14352#bib.bib53)\)architectures, respectively, to build German foundation models from German text\. The training data for these models is sourced from the OSCAR and OPUS corpora\(Tiedemann[2012](https://arxiv.org/html/2605.14352#bib.bib55)\), as well as Wikipedia and OpenLegalData\(Ostendorffet al\.[2020](https://arxiv.org/html/2605.14352#bib.bib56)\)\.

The EuroBERT series of models is also an encoder\-only architecture trained on 5 trillion tokens across 15 European languages, including German\. The family of decoder\-only models, including the following, is best known for their generative capabilities but can also be used for classification tasks\.

The Gemma 2 models\(Riviereet al\.[2024](https://arxiv.org/html/2605.14352#bib.bib59)\)have primarily been trained using English texts, however, they use a significantly larger tokenizer inherited from the Gemini model\(Gemini Team Google[2025](https://arxiv.org/html/2605.14352#bib.bib57)\), comprising 256,000 entries\. This extensive vocabulary, combined with the multilingual nature of web data, enables the model to comprehend languages beyond English\. A distinctive characteristic of the smaller Gemma models, specifically those with 2 and 9 billion parameters, is that they are trained via knowledge distillation from a larger teacher model\. This methodology gives these models an advantage over others of similar size, enhancing their performance and efficacy in various applications\.

In contrast, the authors of the Llama 3\.2\(Dubeyet al\.[2024](https://arxiv.org/html/2605.14352#bib.bib58)\)models used FastText to categorize the training data into 176 different languages, including German\.

In summary, the smaller encoder\-only models are mostly trained on German texts, whereas the larger decoder\-only models were trained on multilingual data, including a German component\.

#### Political Classifier

Prior to the political orientation classifiers, we require an additional classifier to determine whether a text is political\. This is crucial for assessing a newspaper’s leaning, whether it is more left or right of center\. If we were to classify all texts indiscriminately, we would also include non\-political content, which could skew our average classification results toward the center\. For this purpose, we used the metadata described in the previous newspaper section\. By merging all political newspaper categories into a single political section and grouping other categories, such as entertainment, separately, we created a well\-balanced dataset comprising 234,978 political and non\-political texts\(Schneider[2025c](https://arxiv.org/html/2605.14352#bib.bib40)\)\. This dataset is then used to train a German DeBERTa model\(Dadaet al\.[2023](https://arxiv.org/html/2605.14352#bib.bib42)\)that can predict the probability that a text is politically related\. The model is subsequently used with a threshold of 0\.8, as suggested by the authors, ensuring that only political texts are processed further\. The model achieves an F1 score of 0\.99 on the test set, although a slight decline in performance on out\-of\-domain data is anticipated\.

#### Political Party Classifiers

To determine the appropriate political alignment of a text, we have trained 13 classifiers, including DeBERTa\-large, EuroBERT, GBERT, XLM\-RoBERTa, Llama, and Gemma, using a multilabel classification approach\. This approach links an input text to one or more of the six major German political parties\. After training, the best\-performing classifier among the 13 candidates is evaluated using the out\-of\-domain newspaper data\. The training process itself involves feeding lines similar to those in Table[4](https://arxiv.org/html/2605.14352#A1.T4)into the pretrained foundation models and fine\-tuning their weights for four epochs\. For a detailed list of models and used parameter specifications, see Table[1](https://arxiv.org/html/2605.14352#Sx3.T1)\. The six parties Die Linke, Bündnis 90 Die Grünen, SPD, FDP, CDU/CSU, and AfD were selected based on their consistent representation in the German parliament over the past few years\. For the training, we focused on likes, excluding dislikes\.

The training was conducted on various GPU servers, ranging from 4 A6000 Ada to 8 H200 GPUs\. All training files are publicly accessible\(Schneider[2025a](https://arxiv.org/html/2605.14352#bib.bib41)\), and the DeBERTA model was executed multiple times to identify the optimal hyperparameter configuration\. Given that larger models required several days to train, we were unable to conduct a full training run for each configuration\. Instead, we stopped training when the training loss no longer decreased and adjusted the parameters accordingly\.

Table 1:Overview of the used models, parameter sizes, evaluation metrics, training hours, and hardware used for training, i\.e\.,a\.4 A6000 Ada GPUs \(4×\\times48GB vRAM\);b\.8 H100 GPUs \(8×\\times80GB\);c\.8 H200 GPUs \(8×\\times141GB\)

### From Multilabel to a Continuous Scale

At this stage, it is necessary to explain why we have introduced a multilabel classifier while simultaneously representing a continuous output for political direction on a scale from \-1 \(left\-wing\) to 1 \(right\-wing\)\. The key missing element is an adaptation model that translates the outputs of the multilabel classifiers \(which correspond to six political parties\) into the left\-right spectrum\. This adaptation model is based on the premise that each political party can be positioned on a left\-right continuum, with varying degrees of liberalism\. An alternative geometric representation is a semicircle\. In this representation, we position three fixed points: Die Linke, the most left\-wing party, on the far left; the FDP, an economically liberal German party, at the center; and the AfD, a strongly right\-oriented party, at the far right end of the semicircle\.

The remaining task is to determine the positioning of the other three parties\. Based on known political positions of the German parties, we know that the CDU is more conservative than the FDP, so it should be placed somewhere between the fixed points represented by the FDP and the AfD\. Additionally, we recognize that the Grüne and SPD parties are more left\-leaning than the FDP, indicating that they should be positioned between the fixed points of Die Linke and the FDP\.

To begin our analysis with the party Die Grünen, we need to determine whether they align more closely with Die Linke or the FDP\. To do so, we use the Wahlomat dataset described above\. As it contains responses from political parties to the statements, we can compute the overlap across parties\.

Consider the following scoring system for measuring agreement: assign a distance of 0\.0 to two parties who provide identical answers, a distance of 0\.5 to two parties whose responses differ slightly \(one agrees or disagrees with a given statement while the other remains neutral\), and a distance of 1\.0 to two parties who are in complete disagreement\. Figure[2](https://arxiv.org/html/2605.14352#Sx3.F2)a illustrates how the principle operates using the example of a particular election\. Whenever there is an overlap in opinions, such as both parties endorsing the same statement, a striped pattern appears\. The greater the number of striped boxes, the more similar the two parties are\. In Figure[2](https://arxiv.org/html/2605.14352#Sx3.F2)b, we see that the Grüne party has the most overlaps with Die Linke\. Meanwhile, Figure[2](https://arxiv.org/html/2605.14352#Sx3.F2)c accurately positions the Grüne party between Die Linke and the FDP, reflecting the calculated distances and angles\.

The following calculation will be used for the sake of illustration\. The Green Party and the Left Party collectively addressed 2,111 questions, providing identical responses toI=1,530I=1,530of them\. InP=284P=284cases, one party took a neutral stance while the other either agreed or disagreed\. Furthermore, onO=297O=297questions, the two parties expressed differing opinions\.

The Green Party and the Liberal Party \(FDP\) answered a total of 2,249 questions together\. They fully agreed onI=828I=828questions, partially agreed onP=383P=383questions, and disagreed onO=1,038O=1,038questions\.

Letd\(a,b\):=\(0\.5⋅P\+O\)/Td\_\{\(a,b\)\}:=\(0\.5\\cdot P\+O\)/Tdenote the estimated distance of two partiesaaandbb, whereT=I\+P\+OT=I\+P\+O\. Regarding the example, this yieldsdB′​90,L​i​n​k​e=0\.208d\_\{B^\{\\prime\}90,Linke\}=0\.208anddB′​90,F​D​P=0\.547d\_\{B^\{\\prime\}90,FDP\}=0\.547\. The relative proximity of a partyato the two reference partiesbandcis then mapped onto the interval\[−90∘,0∘\]\[\-90^\{\\circ\},0^\{\\circ\}\]for left wing resp\.\[0∘,90∘\]\[0^\{\\circ\},90^\{\\circ\}\]for right\-wing parties by usingθa=φ⋅\(d\(a,b\)\)/\(d\(a,c\)\+d\(a,b\)\)\\theta\_\{a\}=\\varphi\\cdot\\big\(d\_\{\(a,b\)\}\\big\)/\\big\(d\_\{\(a,c\)\}\+d\_\{\(a,b\)\}\\big\)\. Regarding the example ofB’90 \- die Grünenbeing a potentially left party,φ=−90∘\\varphi=\-90^\{\\circ\}is used, leading toθB′​90≈−65\.2∘\\theta\_\{B^\{\\prime\}90\}\\approx\-65\.2^\{\\circ\}\. The same reference parties are used for SPD, yieldingθS​P​D≈−53\.9∘\\theta\_\{SPD\}\\approx\-53\.9^\{\\circ\}\. However, usingφ=90∘\\varphi=90^\{\\circ\}the more right\-wing CDU party is compared with FDP and AfD, resulting inθC​D​U≈37\.9∘\\theta\_\{CDU\}\\approx 37\.9^\{\\circ\}\. For implementation, the arctan2 function is used to determine the quadrant of the circle\. See Table[2](https://arxiv.org/html/2605.14352#Sx3.T2)for an overview of all six party vectors\.

Based on these anglesθi\\theta\_\{i\}, we calculate unit vectorsvifor each partyi∈\{Linke,B′​90,SPD,FDP,CDU,AfD\}i\\in\\\{\\mathrm\{Linke,B^\{\\prime\}90,SPD,FDP,CDU,AfD\}\\\}as𝐯i:=\(sin⁡\(θi\),cos⁡\(θi\)\)\\mathbf\{v\}\_\{i\}:=\\big\(\\sin\\left\(\\theta\_\{i\}\\right\),\\cos\\left\(\\theta\_\{i\}\\right\)\\big\)\.

Table 2:Party vectorsθi\\theta\_\{i\}and unit vectorsv\.In the final step, the outputpip\_\{i\}from the multilabel classifier is multiplied by the corresponding vectors𝐯i\\mathbf\{v\}\_\{i\}, and all of these vectors are summed together\. Formally,𝐯res=∑ipi​𝐯i\\mathbf\{v\}\_\{\\mathrm\{res\}\}=\\sum\_\{i\}p\_\{i\}\\,\\mathbf\{v\}\_\{i\}\. Finally, the angle is then calculated usingatan2​\(𝐯res\)\\mathrm\{atan2\}\(\\mathbf\{v\}\_\{\\mathrm\{res\}\}\)and divided throughπ/2\\pi/2to transfer the resulting angle to a final classification score\.

### Overall Architecture

In this section, we combine all the building blocks that have been introduced in the previous sections\. This is illustrated by processing the sentence: “Familienpolitik soll Wahlfreiheit ermöglichen: gute Kitas, Ganztagsschulen und flexible Arbeitsmodelle\.” \(Family policy should enable freedom of choice: good daycare centers, all\-day schools, and flexible working models\)\. Since we are discussing social benefits, we expect a left\-leaning result\(s​c​o​r​e<0\)\(score<0\)\.

#### Political Classifier

First, we check whether the example is political by using the DeBERTa political classifier introduced previously\. This results in a score of 0\.99, indicating that this statement is political\.

#### Political Party Classifier

If a text is classified as political, it is next processed using all 13 trained political party classifiers\. For this example, we use gemma2\-9b solely, which yields the following probabilities:P​\(party=‘​Linke​’\)=\.0307P\(\\mathrm\{party\}=\\mathrm\{\`Linke\\textrm\{'\}\}\)=\.0307,P​\(‘​B′​90​’\)=\.2806P\(\\mathrm\{\`B^\{\\prime\}90\\textrm\{'\}\}\)=\.2806,P​\(‘​SPD​’\)=\.2743P\(\\mathrm\{\`SPD\\textrm\{'\}\}\)=\.2743,P​\(‘​FDP​’\)=\.4508P\(\\mathrm\{\`FDP\\textrm\{'\}\}\)=\.4508,P​\(‘​CDU​’\)=\.0698P\(\\mathrm\{\`CDU\\textrm\{'\}\}\)=\.0698,P​\(‘​AfD​’\)=\.0011P\(\\mathrm\{\`AfD\\textrm\{'\}\}\)=\.0011\.

#### From Multilabel to a Continuous Scale

We multiply the obtained model probabilities by the given party vectors and calculate the combined vector𝐯res=\(−0\.159,0\.277\)\\mathbf\{v\}\_\{\\mathrm\{res\}\}=\(\-0\.159,0\.277\)\.

From the combined result vector, we can accurately calculate the angle using the arctangent or atan2 function, which considers all four quadrants of the circle\.

θresult=atan2​\(−0\.159,0\.277\)≈−0\.521​rad≈−29\.851∘\\theta\_\{\\mathrm\{result\}\}=\\mathrm\{atan2\}\\\!\\left\(\-0\.159,\\;0\.277\\right\)\\approx\-0\.521\\ \\mathrm\{rad\}\\approx\-29\.851^\{\\circ\}
To obtain a scoress, scaled between \-1 and 1 instead of \-90 and 90, we must divide the result by 90 degrees or byπ/2\\pi/2when using radians:score=θr​e​s​u​l​t∘/90∘≈−0\.332\\mathrm\{score\}=\\theta\_\{result\}^\{\\circ\}/\{90^\{\\circ\}\}\\approx\-0\.332\.

Our example regarding family politics has resulted in a slightly left\-leaning vector, as assumed previously\.

### Evaluation Using Newspapers

The final step of the evaluation involves using the newspaper ratings from Mediencompass\.org to compare its general orientation with the classifier’s results\. For instance, the newspaper Bild\(Springer[2025](https://arxiv.org/html/2605.14352#bib.bib62)\)is rated 5\.2 \(0\.4 on our scale,\[−1,1\]\[\-1,1\]\) by the project, indicating a slightly right\-wing orientation\. Since our classifier operates at the per\-article level, whereas the ground truth is at the per\-newspaper level, we need to aggregate our classifier’s results across all political articles and evaluate how closely the computed average aligns with the ground truth\. As explained above, a filtering process is essential before assessing the political direction, ensuring that all articles meet a politicalness threshold of 0\.8\. If the average classification of all political articles in the newspaper Bild were 0\.3, the error would be 0\.1\. The mean error across all 33 newspapers then reflects the quality of the specific trained classification model\.

Letaabe an article of a newspaperAAout of the set of newspapers𝒜\\mathcal\{A\}\.

First, we compute the political leaning for each articleaain a newspaperAAand compare the result with the expected leaning based on Mediencompass\.orgLL\(newspaper level\)\. The mean absolute errorM​A​EMAEof the tested model is computed as the average over the absolute differences between all newspapers tested by the respective model\.

### Final Optimization

![Refer to caption](https://arxiv.org/html/2605.14352v1/x3.png)Figure 3:Comparison of the party vectors before and after the optimization for Gemma2\-2bIn our final step, we aim to refine the vector model to better align with the newspaper data\. Specifically, while we optimize for the evaluation data, we impose a constraint that limits the adjustment of each party vector to a maximum of 0\.25 in either direction\. This new optimization builds on our initial use of the Wahlomat responses, in which we anchored the liberal party, the FDP, at the top of the semicircle\. Although this initial method involved several guiding assumptions, it did not guarantee an optimal outcome\. By introducing these constraints, we acknowledge the value of our initial approach and seek to prevent the model from overfitting to the evaluation data\.

Moreover, we position Die Linke on the left side and the AfD on the right side, while allowing for adjustments to the liberal FDP\. This decision also has a technical basis\. When we move the vectors representing the leftmost and rightmost positions upward, these positions cannot be reached through vector combinations without introducing negative contributions, which is not feasible, as the multilabel classifier only produces positive outputs ranging from 0 to 1\. Thus, adjusting the leftmost and rightmost vectors would restrict the set of reachable vectors\.

min\{Δ​vp\}p∈𝒫⁡MAE​\(τ\)=1\|𝒜\|​∑A∈𝒜\|L^A​\(τ;\{vp\+Δ​vp\}\)−LA\|\\min\_\{\\\{\\Delta v\_\{p\}\\\}\_\{p\\in\\mathcal\{P\}\}\}\\ \\mathrm\{MAE\}\(\\tau\)=\\frac\{1\}\{\|\\mathcal\{A\}\|\}\\sum\_\{A\\in\\mathcal\{A\}\}\\left\|\\,\\widehat\{L\}\_\{A\}\(\\tau;\\\{v\_\{p\}\+\\Delta v\_\{p\}\\\}\)\-L\_\{A\}\\,\\right\|
subject to‖Δ​vp‖≤δp∀p∈𝒫\\\|\\Delta v\_\{p\}\\\|\\leq\\delta\_\{p\}\\quad\\forall\\,p\\in\\mathcal\{P\}

with​δp=\{0if​p∈\{Linke,AfD\}0\.25else\\mathrm\{with\}~~\\delta\_\{p\}=\\left\\\{\\begin\{array\}\[\]\{ll\}0&\\mbox\{if \}p\\in\\\{\\mbox\{Linke\},\\,\\mbox\{AfD\}\\\}\\\\\[4\.0pt\] 0\.25&\\mbox\{else\}\\end\{array\}\\right\.

## Results

Table 3:Pre versus post optimization comparison of the used models by mean squared error \(MSE\) and mean absolute error \(MAE\), ordered by post MSE### Findings

Upon reviewing the results, we conclude that our transformer models, when utilized with vectors, effectively identify political stances in German texts\. The accuracy of our classifier closely aligns with that of public left/right polls, suggesting that its outcomes are consistent with those of human raters\. Additionally, we found that a model’s size does not necessarily guarantee effectiveness across all scenarios, as depicted and further explained in Figure[7](https://arxiv.org/html/2605.14352#A1.F7)\. For instance, both Llama 3\.2 models \(with 1B and 3B parameters\) performed worse on in\- and out\-of\-domain classification than the significantly smaller DeBERTa\-large model \(435M\)\. The DeBERTa\-large model achieved the highest in\-domain performance, with anF1F\_\{1\}score of 0\.84, while Gemma2\-2b demonstrated the best out\-of\-domain results both before and after optimization, as illustrated Table[3](https://arxiv.org/html/2605.14352#Sx4.T3)\.

### In\-Domain Model Performance

To evaluate the model’s in\-domain performance, we used the 20% Bundestag and Wahlomat test set parts\. The models that performed best were DeBERTa\-large \(F1=0\.84F\_\{1\}=0\.84\), Gemma\-2\-9b \(F1=0\.79F\_\{1\}=0\.79\), and EuroBERT\-610m \(F1=0\.79F\_\{1\}=0\.79\)\.

### Out\-of\-Domain Model Performance on Tweets

The out\-of\-domain evaluation was carried out using posts from members of the German Bundestag\. It is important to note that our knowledge of each tweet is limited to its author and their associated party; we do not have information about the tweet’s political stance or how it might be received by others\. Consequently, our evaluation focused solely on the accuracy of the three top\-performing classifiers from the earlier in\-domain classification task\. We investigated a strong correlation between tweet length and the accuracy of our classifier of0\.96≤r≤0\.970\.96\\leq r\\leq 0\.97depending on the used model\. This aspect is crucial, as tweets are often quite brief, and in some cases, the author may require the reader to rely on external context to fully understand a quote\.

As illustrated in Figure[4](https://arxiv.org/html/2605.14352#Sx4.F4), the accuracy for shorter tweets ranges between 50% and 65%\. However, this accuracy increases to over 80% when tweets contain 50 or more words\.

![Refer to caption](https://arxiv.org/html/2605.14352v1/x4.png)Figure 4:Classifier performance on tweets
### Out\-of\-Domain Model Performance on Newsmedia

The out\-of\-domain evaluation on newspapers was conducted using 33 German news outlets, ranging from left\-wing publications like Jungle World to right\-wing ones, such as Compact\. Additionally, only news articles with a political score of 0\.8 or higher were further processed and reviewed using a different model\. The score was calculated as the MAE combined with the percentage relative to the total scale of 2 \[\-1, 1\], providing a clearer perspective\. The top\-performing models were Gemma2\-2b with an MAE of 0\.1852 \(9\.26%\), Gemma2\-9b with an MAE of 0\.1859 \(9\.29%\), and gbert\-large with an MAE of 0\.1965 \(9\.82%\)\. Notably, the ranking differs from the in\-domain results, suggesting that some models generalize text more effectively than others\.

### Effect of the Vector Optimization

Following the initial out\-of\-domain test, we aimed to determine whether a different vector alignment could minimize the mean absolute error\. We conducted a numerical optimization that allowed us to adjust each vector by ±0\.25, except for the extreme positions held by Die Linke and AfD, which had to remain fixed to ensure the model’s functionality\. Our findings revealed that the optimizer successfully minimized the mean absolute error while keeping the adjustments within the specified range\. Figure[3](https://arxiv.org/html/2605.14352#Sx3.F3)illustrates the party vectors before and after the optimization process\. Notably, the Grüne party shifted further to the left, while the FDP and CDU moved to the right\. Post\-optimization, each model possessed its own tailored set of vectors for all six parties\.

After optimization, all models exhibited an average decrease in mean absolute error of 0\.0239, corresponding to 1\.1946%\. EuroBERT\-610m demonstrated the most substantial reduction, with a decrease of 0\.304 or 5\.73%, whereas EuroBERT\-2\.1B was the only model to perform worse, with a decline of 0\.0016 or 0\.08%\.

## Discussion

To train multiple BERT, Llama, and Gemma LLMs, a wide variety of data was used, including minutes from the German Bundestag and statements from a digital voters’ guide \(Wahlomat\)\. Unlike the Bundestag data, which captures attitudes implicitly \(e\.g\., via emotions\), the Wahlomat dataset is based on explicit information provided directly by the parties\. In order to further increase the \(linguistic\) variance, both datasets were artificially enriched\. The models’ multi\-label responses were translated into a numerical continuum from \-1 \(left\) to 1 \(right\)\. Performance was tested in \(test set\) and out of the domain \(independent newspaper dataset and tweets\)\.

We have observed that a trained multi\-label transformer model, when combined with the appropriate party vector projection, can recognize political stances at a level comparable to that of polls for political classification\. During our search for the optimal transformer model for our classifier, we discovered that the best\-performing model within the specific domain \(DeBERTa\-large with an F1 score of 0\.84\) does not necessarily translate to being the best for other domains\. In the out\-of\-domain test, DeBERTa\-large was outperformed by Gemma\-2\-2B, indicating that a combination of model size, architecture, and training data is crucial for achieving superior out\-of\-sample accuracy\.

The German\-pretrained DeBERTa\-large model demonstrated superior performance in in\-domain classification, attributable to its extensive training corpus, as detailed in the models section of the methodology chapter\. Unlike the conventional approach of relying solely on the OSCAR dataset\(Ortiz Suárezet al\.[2019](https://arxiv.org/html/2605.14352#bib.bib54)\), the developers of the German DeBERTa model\(Dadaet al\.[2023](https://arxiv.org/html/2605.14352#bib.bib42)\)leveraged a more diverse dataset encompassing multiple domains\. This strategic selection, incorporating formal, informal, legal, medical, and literary texts, enhanced the model’s in\-domain efficacy, underscoring the importance of data diversity in model training\.

A compelling reason for the outstanding performance of the Gemma2\-2b and Gemma2\-9b models\(Riviereet al\.[2024](https://arxiv.org/html/2605.14352#bib.bib59)\)in the out\-of\-domain classification task is their unique training paradigms\. By employing knowledge distillation from larger models, these models function as effectively condensed experts\. This approach enables them to generalize significantly better than other models within the same size category\.

Additionally, we found that providing the vectors from our initial approach to an optimizer, allowing it some flexibility in adjusting those vectors, can further enhance out\-of\-domain accuracy\. Our method represents a novel approach compared to many existing techniques, which rely on discrete labels\. Despite the evolving political landscape, our proposed methodology eliminates the need for manual labeling\. This enables us to conduct periodic retraining by updating only the training datasets for our models\.

### Practical Implications

It is important to acknowledge that individuals exhibit biases to varying degrees\. However, being aware of these biases can significantly enhance our understanding of the world\. A classifier like the one introduced in this paper can assist in categorizing news outlets, authors, discussion threads, and various conversations, helping to prevent us from becoming trapped in echo chambers and encouraging a broader perspective\. Additionally, a browser plugin could display the bias of every newspaper we visit\.

An implementation could involve tracking a rolling score over a day or week to monitor the trajectory of topics, newspapers, or discussion threads\. Warning signals could be triggered if a news outlet remains too entrenched in one extreme for an extended period\.

Another application could involve conducting targeted searches for discussions aligned with left\- or right\-wing ideologies to analyze the topics with which specific groups engage\. This would facilitate a deeper understanding of discourse patterns and sentiment within these ideological communities\.

Our approach is highly adaptable to different countries and use cases because we avoid manual labeling and extract the political spectrum directly from the data\. For instance, if there is a political shift in Germany, we can easily accommodate this by retraining the model\. In contrast, comparable projects that rely on manual labeling require a significantly larger workforce\. Our primary requirements are political texts from parties and fixed reference points, such as newspapers\. A well\-distributed representation of political parties within the spectrum is particularly beneficial, as it facilitates coverage of various points by combining those vectors\. In contexts with only two major parties, achieving fine\-grained classification can be more challenging because there is no liberal segment between them\.

Our paper can also serve as a foundation for further research on social media\. For instance, if a researcher has collected sufficient social media data and aims to track political shifts before, during, and after a governmental transition, they can use our tool\.

Adapting the tool for other languages and cultural contexts is also possible\. We introduce a new method for effectively classifying texts along the political spectrum, using comments and other indicators as proxies to construct a training corpus\.

### Limitations

The limitations can be categorized as model\-related and methodological limitations\. In our case, model\-related limitations arise from using a classification model rather than a reasoning\-based transformer model\. At times, when quoting individuals, it can be challenging for the model to discern whether the quoted opinion is being critiqued\. For example, consider the tweet: “ ‘Those who want human society must overcome male society\.’Svenja Schulze \(February 17, 2022\) This was or is also stated in the SPD party program\.”\. Based on the available information, we cannot determine whether the tweet’s author agrees or disagrees with the statement\. The classifier will categorize it as left\-wing solely on the basis of the quoted content\.

Another limitation of our analysis’s non\-reasoning capabilities lies in its evaluation of foundational statements\. For instance, when we input Pierre\-Joseph Proudhon’s statement “Property is theft”\(Proudhon[1840](https://arxiv.org/html/2605.14352#bib.bib63)\)into our gemma\-2\-9b classifier, it produces a score of 0\.78, classifying it as a right\-wing assertion\. Despite Proudhon’s typically being recognized as a figure of the libertarian left\(Honeywell[2021](https://arxiv.org/html/2605.14352#bib.bib66); Levy and Adams[2019](https://arxiv.org/html/2605.14352#bib.bib67)\), some scholars have controversially interpreted him as a precursor to fascism\(Schapiro[1945](https://arxiv.org/html/2605.14352#bib.bib64); Krier[2009](https://arxiv.org/html/2605.14352#bib.bib65)\)\. In this scenario, it would be intriguing to understand why the classification model categorized the statement as right\-wing\. However, because we are not using a reasoning model, we cannot examine the rationale for the classification\. A possible explanation for the misclassification in this example is that such ideological statements are neither discussed in parliament nor part of the political agenda of mainstream political parties\.

Given the increase in accuracy associated with a higher number of words per tweet, we conclude that the models may struggle with very short texts, particularly when background knowledge is required to interpret their meaning\.

From a methodological standpoint, a one\-dimensional projection may not offer enough entropy to accurately map political views in certain instances\. This becomes evident when we examine Figure[1](https://arxiv.org/html/2605.14352#Sx3.F1)\. One might intuitively assume that the left\-most party, Die Linke, and the right\-most party, AfD, exhibit the greatest distance\. However, the negative correlations between the right\-wing AfD and the Grüne \(\-0\.32\) and SPD \(\-0\.34\) parties are significantly stronger than that between the AfD and Die Linke \(\-0\.18\)\.

One possible interpretation for this phenomenon is that both the AfD and Die Linke respond to certain issues in similar ways, albeit for distinct reasons\. For instance, consider the issue of supplying arms to a nation under attack by its neighbor\. A left\-wing party such as Die Linke would oppose these arms deliveries from a pacifist perspective, whereas the right\-wing AfD would object from a nationalist perspective, prioritizing Germany’s interests and its trade relations with the aggressor state\.

Another possible explanation is that both the left\-wing Die Linke and the right\-wing AfD are not part of the governing coalitions and instead belong to the opposition\. In this role, opposition parties commonly critique the governing parties’ policies and actions\. Consequently, this can lead to the formation of similar opinions on both ends of the political spectrum\.

The political shift occurring in numerous societies has been noted, but it also presents limitations\. What may be viewed as slightly left or right today could be regarded as a liberal stance in the near future\. To address this issue, it is recommended that the classifier be retrained with updated data whenever such a shift becomes apparent or at regular intervals\.

A classifier’s applicability is inherently limited to the linguistic and cultural context in which it was trained\. For instance, the fundamentally divergent perspectives on public health care and labor rights observed in Germany and the United States underscore the pitfalls of deploying a classifier across different cultural frameworks without appropriate adaptation\. The cultural context not only establishes limits for the classifier, but the diversity of the input data also plays a significant role\. Positions further to the left of the party Die Linke, as well as those to the right of the party AfD, cannot be distinguished from the positions of these two parties since the scale based on the six parties is bounded by these parties\.

It is essential to note that we employ a classification\-based approach, which inherently lacks interpretability and reasoning\. The classifier cannot provide insights into why it categorizes a specific text as belonging to the Grüne or SPD parties\. Additionally, it does not explicitly model underlying principles of left\- or right\-wing politics, as it is not designed to serve as a reasoning model\. While it effectively categorizes topics in the training data, it may struggle to classify novel concepts, as it lacks the capacity to reason about them\. In summary, while the classifier can make errors, it should not be relied upon to block texts without additional verification or similar measures\. To minimize the impact of misclassification, it is advisable to apply the classifier to a large volume of texts, such as those in a newspaper, as illustrated in the given example\. This way, the significance of any single mistake is diminished\.

### Future Work

To address the limitations discussed above, it would be advantageous to develop a classifier capable of explaining why it assigns a text to a specific position on the left\-right spectrum\. This feature would help users understand the rationale for classifying a text as far\-right or far\-left and would also facilitate the identification of potential errors\. Users could contest the classifier’s assessment and dismiss the result in cases of inaccuracies, rather than accepting its conclusions unquestioningly\. Additionally, a reasoning model would enhance its utility by enabling it to apply foundational concepts of left, liberal, and right\-wing politics to new topics not encountered in its training data\. Improved explainability and generalizability could benefit future developments\.

### Social Impact and Misuse

The developed models, once sufficiently advanced for everyday use, could potentially pose risks if individuals rely on them to filter news\. Instead of expanding their perspectives, users might choose to exclude all news that exceeds a certain threshold in a political direction, whether left or right\. As a result, what is intended to be a useful application could also be misused\.

A second potential misuse of the model is the risk of discrimination against individuals based on their political beliefs\. On a broader level, institutions could utilize the model to monitor live social media streams and identify individuals expressing dissenting opinions\.

We firmly oppose all forms of discrimination on the basis of political viewpoints and strongly advocate for free speech\. Additionally, we are committed to transparency by making all training data publicly available, ensuring that our models and their results are clear and accessible\.

## Ethical Statement

All of our training data has been sourced exclusively from publicly available materials and is intended solely for academic use\. We did not bypass any safety mechanisms or use data behind a paywall\. Additionally, we have not collected any personal data, except for political speeches by public figures\.

We emphasize that the introduced classification method is still in its early stages, and errors cannot be discounted\. Furthermore, our classifier should not be used to evaluate or discriminate against individuals on the basis of their political beliefs\. We strongly advocate for a diverse political landscape and uphold the principles of free speech in respectful interactions, free from personal attacks\.

## Acknowledgments

The authors would like to thank the System Sciences Chair for Communication Systems and Network Security under the direction of Prof\. Dr\. Gabi Dreo Rodosek\.

The authors acknowledge the financial support from the Federal Ministry of Education and Research of Germany in the program “Souverän\. Digital\. Vernetzt\.” Joint project 6G\-life, project identification number: 16KISK002\.

## References

- D\. Aksenov, P\. Bourgonje, K\. Zaczynska, M\. Ostendorff, J\. Moreno\-Schneider, and G\. Rehm \(2021\)Fine\-grained Classification of Political Bias in German News: A Data Set and Initial Experiments\.InProceedings of the 5th Workshop on Online Abuse and Harms \(WOAH\),A\. Mostafazadeh Davani, D\. Kiela, M\. Lambert, B\. Vidgen, V\. Prabhakaran, and Z\. Waseem \(Eds\.\),Stroudsburg, Penn\.,pp\. 121–131\.External Links:[Link](https://aclanthology.org/2021.woah-1.13/),[Document](https://dx.doi.org/10/gm6kgs)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p3.1)\.
- AllSides \(2025\)AllSides: unbiased balanced news\.Note:https://www\.allsides\.com/Accessed 2025\-09\-10Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p2.1)\.
- C\. Andrzejewski \(2023\)Team Jorge: In the heart of a global disinformation machine\.\(en\-GB\)\.Note:https://forbiddenstories\.org/team\-jorge\-disinformationAccessed 2026\-01\-15External Links:[Link](https://forbiddenstories.org/team-jorge-disinformation/)Cited by:[Introduction](https://arxiv.org/html/2605.14352#Sx1.p1.1)\.
- R\. Baly, G\. Da San Martino, J\. Glass, and P\. Nakov \(2020\)We can detect your bias: predicting the political ideology of news articles\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),B\. Webber, T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 4982–4991\.External Links:[Link](https://aclanthology.org/2020.emnlp-main.404/),[Document](https://dx.doi.org/10/gm3nx6)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p1.1)\.
- R\. Baly, G\. Karadzhov, A\. Saleh, J\. Glass, and P\. Nakov \(2019\)Multi\-task ordinal regression for jointly predicting the trustworthiness and the leading political ideology of news media\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Vol\.1,Minneapolis, Minn\.,pp\. 2109–2116\.External Links:[Link](https://aclanthology.org/N19-1216/),[Document](https://dx.doi.org/10.18653/v1/N19-1216)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p7.1)\.
- N\. Boizard, H\. Gisserot\-Boukhlef, D\. M\. Alves, A\. Martins, A\. Hammal, C\. Corro, C\. Hudelot, E\. Malherbe, E\. Malaboeuf, F\. Jourdan, G\. Hautreux, J\. Alves, K\. E\. Haddad, M\. Faysse, M\. Peyrard, N\. M\. Guerreiro, P\. Fernandes, R\. Rei, and P\. Colombo \(2025\)EuroBERT: Scaling Multilingual Encoders for European Languages\.External Links:[Link](https://openreview.net/forum?id=jdOC24msVq#discussion),2503\.05500Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p2.1)\.
- F\. Bolte \(2025\)Qual\-o\-mat\-data\.GitHub\.Note:https://github\.com/gockelhahn/qual\-o\-mat\-dataAccessed 2025\-09\-10Cited by:[Wahlomat Dataset](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx2.p2.1)\.
- B\. Chan, S\. Schweter, and T\. Möller \(2020\)German’s Next Language Model\.InProceedings of the 28th International Conference on Computational Linguistics,Barcelona, Spain,pp\. 6788–6796\(en\)\.External Links:[Link](https://www.aclweb.org/anthology/2020.coling-main.598),[Document](https://dx.doi.org/10.18653/v1/2020.coling-main.598)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p2.1)\.
- K\. Clark, M\. Luong, Q\. V\. Le, and C\. D\. Manning \(2020\)ELECTRA: Pre\-training Text Encoders as Discriminators Rather Than Generators\.External Links:2003\.10555Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p4.1)\.
- R\. Cohen and D\. Ruths \(2013\)Classifying Political Orientation on Twitter: It’s Not Easy\!\.InProceedings of the International AAAI Conference on Weblogs and Social Media \(ICWSM\),Vol\.7,Cambridge, Mass\.,pp\. 91–99\(en\)\.External Links:[Link](https://ojs.aaai.org/index.php/ICWSM/article/view/14434),[Document](https://dx.doi.org/10/g9q3dt)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p4.1)\.
- A\. Conneau, K\. Khandelwal, N\. Goyal, V\. Chaudhary, G\. Wenzek, F\. Guzmán, E\. Grave, M\. Ott, L\. Zettlemoyer, and V\. Stoyanov \(2020\)Unsupervised Cross\-lingual Representation Learning at Scale\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,Online,pp\. 8440–8451\(en\)\.External Links:[Link](https://www.aclweb.org/anthology/2020.acl-main.747),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.747)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p2.1)\.
- G\. Cumming \(2013\)Understanding the new statistics: effect sizes, confidence intervals, and meta\-analysis\.Routledge\.External Links:[Document](https://dx.doi.org/10.4324/9780203807002)Cited by:[Figure 7](https://arxiv.org/html/2605.14352#A1.F7)\.
- A\. Dada, A\. Chen, C\. Peng, K\. Smith, A\. Idrissi\-Yaghir, C\. Seibold, J\. Li, L\. Heiliger, C\. Friedrich, D\. Truhn, J\. Egger, J\. Bian, J\. Kleesiek, and Y\. Wu \(2023\)On the Impact of Cross\-Domain Data on German Language Models\.InFindings of the Association for Computational Linguistics: EMNLP 2023,Singapore,pp\. 13801–13813\(en\)\.External Links:[Link](https://aclanthology.org/2023.findings-emnlp.922),[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.922)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p2.1),[Political Classifier](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx2.p1.1),[Discussion](https://arxiv.org/html/2605.14352#Sx5.p3.1)\.
- Deutscher Bundestag \(2025\)Open data\.Note:https://www\.bundestag\.de/services/opendataAccessed 2025\-09\-10Cited by:[Bundestag Dataset](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx1.p1.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: Pre\-training of Deep Bidirectional Transformers for Language Understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Vol\.1,Minneapolis, Minn\.,pp\. 4171–4186\(en\)\.External Links:[Link](http://aclweb.org/anthology/N19-1423),[Document](https://dx.doi.org/10.18653/v1/N19-1423)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p4.1)\.
- A\. Dubey, A\. Jauhri, and A\. Pandey \(2024\)The Llama 3 Herd of Models\.External Links:[Link](http://arxiv.org/abs/2407.21783),[Document](https://dx.doi.org/10/ndw6),2407\.21783Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p7.1)\.
- L\. Erhard, S\. Hanke, U\. Remer, A\. Falenska, and R\. H\. Heiberger \(2025\)PopBERT\. detecting populism and its host ideologies in the german bundestag\.Political Analysis33\(1\),pp\. 1–17\.External Links:[Document](https://dx.doi.org/10.1017/pan.2024.12)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p6.1)\.
- T\. Fagni and S\. Cresci \(2022\)Fine\-grained Prediction of Political Leaning on Social Media with Unsupervised Deep Learning\.Journal of Artificial Intelligence Research73,pp\. 633–672\.External Links:ISSN 1076\-9757,[Link](https://dl.acm.org/doi/10.1613/jair.1.13112),[Document](https://dx.doi.org/10/gpk643)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p2.1)\.
- FORCE11 \(2020\)The fair data principles\.Note:https://force11\.org/info/the\-fair\-data\-principles/Cited by:[item 5f](https://arxiv.org/html/2605.14352#Sx8.I1.i5.I1.i6.p1.1)\.
- T\. Gebru, J\. Morgenstern, B\. Vecchione, J\. W\. Vaughan, H\. Wallach, H\. D\. Iii, and K\. Crawford \(2021\)Datasheets for datasets\.Communications of the ACM64\(12\),pp\. 86–92\.Cited by:[item 5g](https://arxiv.org/html/2605.14352#Sx8.I1.i5.I1.i7.p1.1)\.
- Gemini Team Google \(2025\)Gemini: A Family of Highly Capable Multimodal Models\.External Links:[Link](http://arxiv.org/abs/2312.11805),[Document](https://dx.doi.org/10/g9bxr8),2312\.11805Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p6.1)\.
- M\. Grootendorst \(2022\)BERTopic: Neural topic modeling with a class\-based TF\-IDF procedure\.External Links:[Link](http://arxiv.org/abs/2203.05794),[Document](https://dx.doi.org/10/gqjjxd),2203\.05794Cited by:[Introduction](https://arxiv.org/html/2605.14352#Sx1.p3.1)\.
- P\. He, X\. Liu, J\. Gao, and W\. Chen \(2020\)DeBERTa: decoding\-enhanced bert with disentangled attention\.\(en\)\.External Links:2006\.03654Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p2.1)\.
- C\. Honeywell \(2021\)Anarchism\.Key Concepts in Political Theory,Polity Press,Cambridge, United Kingdom\.Cited by:[Limitations](https://arxiv.org/html/2605.14352#Sx5.SSx2.p2.1)\.
- C\. Jakob, P\. Wenzel, S\. Mohtaj, and V\. Schmitt \(2024\)Augmented Political Leaning Detection: Leveraging Parliamentary Speeches for Classifying News Articles\.InProceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers,C\. Klamm, G\. Lapesa, S\. P\. Ponzetto, I\. Rehbein, and I\. Sen \(Eds\.\),Vienna, Austria,pp\. 126–133\.External Links:[Link](https://aclanthology.org/2024.cpss-1.11/)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p2.1)\.
- J\. Jiang, X\. Ren, and E\. Ferrara \(2023\)Retweet\-bert: political leaning detection using language features and information diffusion on social networks\.Proceedings of the International AAAI Conference on Web and Social Media17\(1\),pp\. 459–469\.Note:https://doi\.org/10/g93frmExternal Links:[Document](https://dx.doi.org/10/g93frm)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p5.1)\.
- K\. Kawintiranon and L\. Singh \(2022\)PoliBERTweet: a pre\-trained language model for analyzing political content on Twitter\.InProceedings of the Thirteenth Language Resources and Evaluation Conference,N\. Calzolari, F\. Béchet, P\. Blache, K\. Choukri, C\. Cieri, T\. Declerck, S\. Goggi, H\. Isahara, B\. Maegaard, J\. Mariani, H\. Mazo, J\. Odijk, and S\. Piperidis \(Eds\.\),Marseille, France,pp\. 7360–7367\.External Links:[Link](https://aclanthology.org/2022.lrec-1.801/)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p4.1)\.
- J\. Kiesel, M\. Mestre, R\. Shukla, E\. Vincent, P\. Adineh, D\. Corney, B\. Stein, and M\. Potthast \(2019\)SemEval\-2019 task 4: hyperpartisan news detection\.InProceedings of the 13th International Workshop on Semantic Evaluation,J\. May, E\. Shutova, A\. Herbelot, X\. Zhu, M\. Apidianaki, and S\. M\. Mohammad \(Eds\.\),Minneapolis, Minn\.,pp\. 829–839\.External Links:[Link](https://aclanthology.org/S19-2145/),[Document](https://dx.doi.org/10.18653/v1/S19-2145)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p2.1)\.
- F\. Krier \(2009\)Sozialismus für Kleinbürger: Pierre Joseph Proudhon \- Wegbereiter des Dritten Reiches\.Böhlau,Köln\(ger\)\.Cited by:[Limitations](https://arxiv.org/html/2605.14352#Sx5.SSx2.p2.1)\.
- C\. Levy and M\. S\. Adams \(Eds\.\) \(2019\)The Palgrave Handbook of Anarchism\.Springer,Cham\(en\)\.External Links:[Link](https://link.springer.com/10.1007/978-3-319-75620-2),[Document](https://dx.doi.org/10/qwkm)Cited by:[Limitations](https://arxiv.org/html/2605.14352#Sx5.SSx2.p2.1)\.
- Y\. Liu, X\. F\. Zhang, D\. Wegsman, N\. Beauchamp, and L\. Wang \(2022\)POLITICS: pretraining with same\-story article comparison for ideology prediction and stance detection\.InFindings of the Association for Computational Linguistics: NAACL 2022,M\. Carpuat, M\. de Marneffe, and I\. V\. Meza Ruiz \(Eds\.\),Seattle, Wash\.,pp\. 1354–1374\.External Links:[Link](https://aclanthology.org/2022.findings-naacl.101/),[Document](https://dx.doi.org/10.18653/v1/2022.findings-naacl.101)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p8.1)\.
- M\. Maurer, S\. Kruschinski, and P\. Jost \(2024\)Fehlt da was? Perspektivenvielfalt in den öffentlich\-rechtlichen Nachrichtenformaten\.Technical reportJohannes Gutenberg\-Universität Mainz, Institut für Publizistik,Mainz\.External Links:[Link](https://assets.ctfassets.net/mj324dykhxwi/2q2zTid7LQxhqM2gMM9JGc/c786f4bbabe28f1f794fc2f8f285f576/pm_perspektivenvielfalt.pdf)Cited by:[Table 5](https://arxiv.org/html/2605.14352#A1.T5),[Labeling](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx4.Px1.p2.1),[Labeling](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx4.Px1.p4.7),[Newspaper Dataset](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx4.p1.1)\.
- Mediabiasfactcheck\.com \(2025\)Media bias/fact check \- search and learn the bias of news media\.Note:https://mediabiasfactcheck\.com/filtered\-search/?country=DEAccessed: 2026\-01\-04Cited by:[Table 5](https://arxiv.org/html/2605.14352#A1.T5),[Labeling](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx4.Px1.p2.1)\.
- Medienkompass\.org \(2025\)Deutsche medienlandschaft\.Note:https://medienkompass\.org/deutsche\-medienlandschaft/Accessed: 2025\-07\-07Cited by:[Table 5](https://arxiv.org/html/2605.14352#A1.T5),[Labeling](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx4.Px1.p1.2)\.
- D\. Q\. Nguyen, T\. Vu, and A\. Tuan Nguyen \(2020\)BERTweet: a pre\-trained language model for English tweets\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,Q\. Liu and D\. Schlangen \(Eds\.\),Online,pp\. 9–14\.External Links:[Link](https://aclanthology.org/2020.emnlp-demos.2/),[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-demos.2)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p4.1)\.
- P\. J\. Ortiz Suárez, B\. Sagot, and L\. Romary \(2019\)Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures\.InProceedings of the Workshop on Challenges in the Management of Large Corpora,P\. Bański, A\. Barbaresi, H\. Biber, E\. Breiteneder, S\. Clematide, M\. Kupietz, H\. Lüngen, and C\. Iliadi \(Eds\.\),Cardiff, United Kingdom,pp\. 9–16\(en\)\.External Links:[Link](https://ids-pub.bsz-bw.de/9021),[Document](https://dx.doi.org/10.14618/IDS-PUB-9021)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p3.1),[Discussion](https://arxiv.org/html/2605.14352#Sx5.p3.1)\.
- M\. Ostendorff, T\. Blume, and S\. Ostendorff \(2020\)Towards an Open Platform for Legal Information\.InProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020,Online,pp\. 385–388\(en\)\.External Links:[Link](https://dl.acm.org/doi/10.1145/3383583.3398616),[Document](https://dx.doi.org/10.1145/3383583.3398616)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p4.1)\.
- D\. Preoţiuc\-Pietro, Y\. Liu, D\. Hopkins, and L\. Ungar \(2017\)Beyond binary labels: political ideology prediction of Twitter users\.InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics,R\. Barzilay and M\. Kan \(Eds\.\),Vol\.1,Vancouver, Canada,pp\. 729–740\.External Links:[Link](https://aclanthology.org/P17-1068/),[Document](https://dx.doi.org/10.18653/v1/P17-1068)Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p2.1)\.
- P\. Proudhon \(1840\)Qu’est\-ce que la propriété?, ou, Recherches sur le principe du droit et du gouvernement: Premier mémoire\.Brocard\(fr\)\.Cited by:[Limitations](https://arxiv.org/html/2605.14352#Sx5.SSx2.p2.1)\.
- M\. Riviere, S\. Pathak, P\. G\. Sessa, C\. Hardin, and S\. Bhupatiraju \(2024\)Gemma 2: Improving Open Language Models at a Practical Size\.External Links:[Link](http://arxiv.org/abs/2408.00118),[Document](https://dx.doi.org/10/nd57),2408\.00118Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p6.1),[Discussion](https://arxiv.org/html/2605.14352#Sx5.p4.1)\.
- J\. S\. Schapiro \(1945\)Pierre Joseph Proudhon, Harbinger of Fascism\.The American Historical Review50\(4\),pp\. 714\.External Links:ISSN 00028762,[Link](https://www.jstor.org/stable/10.2307/1842699?origin=crossref),[Document](https://dx.doi.org/10.2307/1842699)Cited by:[Limitations](https://arxiv.org/html/2605.14352#Sx5.SSx2.p2.1)\.
- R\. Scheible, J\. Frei, F\. Thomczyk, H\. He, P\. Tippmann, J\. Knaus, V\. Jaravine, F\. Kramer, and M\. Boeker \(2024\)GottBERT: a pure German Language Model\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Miami, Fla\.,pp\. 21237–21250\(en\)\.External Links:[Link](https://aclanthology.org/2024.emnlp-main.1183),[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.1183)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p2.1)\.
- S\. Schneider \(2025a\)german\_ideology\_prediction\.Note:https://github\.com/SinclairSchneider/german\_ideology\_predictionAccessed 2025\-09\-15Cited by:[Political Party Classifiers](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx3.p2.1)\.
- S\. Schneider \(2025b\)Trainset\_political\_party\_big\.Hugging Face\.Note:https://doi\.org/10/qvxbRevision 444f2da, Accessed 2025\-09\-10Cited by:[Data Enrichment](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx1.Px2.p1.1),[Data Enrichment](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx2.Px2.p3.1)\.
- S\. Schneider \(2025c\)Trainset\_political\_text\_yes\_no\_german\.Hugging Face\.Note:https://doi\.org/10/qvw9Accessed 2025\-09\-12Cited by:[Political Classifier](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx2.p1.1)\.
- Springer \(2025\)BILD\.de\.\(de\)\.Note:https://www\.bild\.deAccessed 2025\-12\-22Cited by:[Evaluation Using Newspapers](https://arxiv.org/html/2605.14352#Sx3.SSx5.p1.1)\.
- J\. Tiedemann \(2012\)Parallel Data, Tools and Interfaces in OPUS\.InProceedings of the Eighth International Conference on Language Resources and Evaluation \(LREC’12\),N\. Calzolari, K\. Choukri, T\. Declerck, M\. U\. Doğan, B\. Maegaard, J\. Mariani, A\. Moreno, J\. Odijk, and S\. Piperidis \(Eds\.\),Istanbul, Turkey,pp\. 2214–2218\.External Links:[Link](https://aclanthology.org/L12-1246/)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p4.1)\.
- M\. Volf and J\. Simko \(2025\)Political Leaning and Politicalness Classification of Texts\.External Links:[Document](https://dx.doi.org/10/p5m2),2507\.13913Cited by:[Related Work](https://arxiv.org/html/2605.14352#Sx2.p8.1)\.
- Y\. Zhang, M\. Li, D\. Long, X\. Zhang, H\. Lin, B\. Yang, P\. Xie, A\. Yang, D\. Liu, J\. Lin, F\. Huang, and J\. Zhou \(2025\)Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models\.External Links:[Link](http://arxiv.org/abs/2506.05176),[Document](https://dx.doi.org/10/qwmq),2506\.05176Cited by:[Data Enrichment](https://arxiv.org/html/2605.14352#Sx3.SSx1.SSSx2.Px2.p2.1)\.
- L\. Zhuang, L\. Wayne, S\. Ya, and Z\. Jun \(2021\)A Robustly Optimized BERT Pre\-training Approach with Post\-training\.InProceedings of the 20th Chinese National Conference on Computational Linguistics,S\. Li, M\. Sun, Y\. Liu, H\. Wu, K\. Liu, W\. Che, S\. He, and G\. Rao \(Eds\.\),Huhhot, China,pp\. 1218–1227\(eng\)\.External Links:[Link](https://aclanthology.org/2021.ccl-1.108/)Cited by:[Foundation Models](https://arxiv.org/html/2605.14352#Sx3.SSx2.SSSx1.p3.1)\.

## Paper Checklist

1. 1\.For most authors… 1. \(a\)Would answering this research question advance science without violating social contracts, such as violating privacy norms, perpetuating unfair profiling, exacerbating the socio\-economic divide, or implying disrespect to societies or cultures?Yes, see the Ethical Statement 2. \(b\)Do your main claims in the abstract and introduction accurately reflect the paper’s contributions and scope?Yes 3. \(c\)Do you clarify how the proposed methodological approach is appropriate for the claims made?Yes, see the Methods section 4. \(d\)Do you clarify what are possible artifacts in the data used, given population\-specific distributions?Yes, see the Methods, SubsectionDataset 5. \(e\)Did you describe the limitations of your work?Yes, see the Discussion, SubsectionLimitations 6. \(f\)Did you discuss any potential negative societal impacts of your work?Yes, see the Discussion, SubsectionPractical implications 7. \(g\)Did you discuss any potential misuse of your work?Yes, see the Discussion, SubsectionSocial impact and misuse 8. \(h\)Did you describe steps taken to prevent or mitigate potential negative outcomes of the research, such as data and model documentation, data anonymization, responsible release, access control, and the reproducibility of findings?Yes, see the Discussion, SubsectionSocial impact and misuse 9. \(i\)Have you read the ethics review guidelines and ensured that your paper conforms to them?Yes
2. 2\.Additionally, if your study involves hypotheses testing… 1. \(a\)Did you clearly state the assumptions underlying all theoretical results?NA 2. \(b\)Have you provided justifications for all theoretical results?NA 3. \(c\)Did you discuss competing hypotheses or theories that might challenge or complement your theoretical results?NA 4. \(d\)Have you considered alternative mechanisms or explanations that might account for the same outcomes observed in your study?NA 5. \(e\)Did you address potential biases or limitations in your theoretical framework?NA 6. \(f\)Have you related your theoretical results to the existing literature in social science?NA 7. \(g\)Did you discuss the implications of your theoretical results for policy, practice, or further research in the social science domain?NA
3. 3\.Additionally, if you are including theoretical proofs… 1. \(a\)Did you state the full set of assumptions of all theoretical results?NA 2. \(b\)Did you include complete proofs of all theoretical results?NA
4. 4\.Additionally, if you ran machine learning experiments… 1. \(a\)Did you include the code, data, and instructions needed to reproduce the main experimental results \(either in the supplemental material or as a URL\)?Yes 2. \(b\)Did you specify all the training details \(e\.g\., data splits, hyperparameters, how they were chosen\)?Yes, see the linked GitHub repository, folder 05\_train\_new\_model 3. \(c\)Did you report error bars \(e\.g\., with respect to the random seed after running experiments multiple times\)?Yes, see Figure[7](https://arxiv.org/html/2605.14352#A1.F7) 4. \(d\)Did you include the total amount of compute and the type of resources used \(e\.g\., type of GPUs, internal cluster, or cloud provider\)?Yes, see Table[1](https://arxiv.org/html/2605.14352#Sx3.T1) 5. \(e\)Do you justify how the proposed evaluation is sufficient and appropriate to the claims made?Yes, see the Methods 6. \(f\)Do you discuss what is “the cost“ of misclassification and fault \(in\)tolerance?Yes, see the Discussion, SubsectionLimitations
5. 5\.Additionally, if you are using existing assets \(e\.g\., code, data, models\) or curating/releasing new assets,without compromising anonymity… 1. \(a\)If your work uses existing assets, did you cite the creators?Yes, see the References and the Methods, SubsectionDataset 2. \(b\)Did you mention the license of the assets?Yes 3. \(c\)Did you include any new assets in the supplemental material or as a URL?Yes, we included links to the code and datasets 4. \(d\)Did you discuss whether and how consent was obtained from people whose data you’re using/curating?Yes, see the Ethical Statement 5. \(e\)Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content?Yes, see the Ethical Statement the Methods, Subsection Datasets 6. \(f\)If you are curating or releasing new datasets, did you discuss how you intend to make your datasets FAIR \(seeFORCE11 \([2020](https://arxiv.org/html/2605.14352#bib.bib18)\)\)?Yes 7. \(g\)If you are curating or releasing new datasets, did you create a Datasheet for the Dataset \(seeGebruet al\.\([2021](https://arxiv.org/html/2605.14352#bib.bib14)\)\)?Yes, see the model cards on HuggingFace
6. 6\.Additionally, if you used crowdsourcing or conducted research with human subjects,without compromising anonymity… 1. \(a\)Did you include the full text of instructions given to participants and screenshots?NA 2. \(b\)Did you describe any potential participant risks, with mentions of Institutional Review Board \(IRB\) approvals?NA 3. \(c\)Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation?NA 4. \(d\)Did you discuss how data is stored, shared, and deidentified?NA

## Appendix AAppendices

### Tables and Figures

Table 4:\(Dis\)agreement of various parties regarding three exemplary statements: 1\.A tax is to be reintroduced on high net worth individuals, 2\.Germany should keep the euro as its currency\., 3\.A minimum wage should be introduced\.Table 5:Overview of the German media landscape, including several online versions of newspapers likeFrankfurter Allgemeine Zeitung\(FAZ\),die tageszeitung\(TAZ\); television channels likeMitteldeutscher Rundfunk\(MDR\),Norddeutscher Rundfunk\(NDR\),Westdeutsche Rundfunk\(WDR\),Radio Télévision Luxembourg\(RTL\), and various other online news media formats\. Media bias estimates were collected from three sources: A\. ratings ofk=39k=39media outlets fromn=1148n=1148participants on a seven\-point Likert scale fromextrem left\(1\) toextrem right\(7\), provided byMedienkompass\.org \([2025](https://arxiv.org/html/2605.14352#bib.bib19)\); B\. ratings ofk=47k=47media outlets from onlyn=9n=9extensively trained raters on two correlated five\-point scales, provided byMaureret al\.\([2024](https://arxiv.org/html/2605.14352#bib.bib68)\); and C\. ratings regardingk=77k=77outlets rated on a scale fromextrem left\(\-10\) toextreme right\(10\) retrieved fromMediabiasfactcheck\.com \([2025](https://arxiv.org/html/2605.14352#bib.bib20)\)\. For source B, both correlated scales were reduced using principal components analysis \(PCA\), yielding one principal component \(PC\)\. Numeric ratings werez–transformed for comparability \(standardised, i\.e\.M=0M=0,S​D=1SD=1\)\. Regarding source C, numeric scores were missing for various media outlets; ordinal \(ord\.\) scores were estimated from the given labels accordingly\. Appropriate association estimates for pairwise completed cases showed high correlations, indicating convergent validity\. Based on the media list from source A,k=33k=33\(approx\. 85%\) media outlets were scraped, yielding a dataset of approx\. 10M articles; 74K per outlet on average\.Table 6:Example of paraphrasing an original statement in the words of different persona![Refer to caption](https://arxiv.org/html/2605.14352v1/x5.png)Figure 5:Exemplary statement 1/38:Germany should continue to provide military support to Ukraine, sourced from the Wahlomat service regarding the German federal elections in 2025 \(www\.wahl\-o\-mat\.de/bundestagswahl2025\)\. Screenshotashows the user view with response options \(approval, neutral, disapproval\),bdepicts the stance of selected parties \(disapproval by the most left\-wing and right\-wing parties, approval by the others\)\.![Refer to caption](https://arxiv.org/html/2605.14352v1/x6.png)Figure 6:Flowchart of sentiment extraction![Refer to caption](https://arxiv.org/html/2605.14352v1/x7.png)Figure 7:Depicted is the effect of optimization across all 13 models and 33 news media outlets, measured using the mean absolute error \(MAE, lower panel\) and mean squared error \(MSE, upper panel\)\. Error bars represent standard errors \(SE\)\. Values are sorted by model class and, within each class, by parameter size for various Gemma, Llama, and Bert derivatives\. The color contrast highlights the effect of the optimization: values before optimization \(light bars\) are generally higher than those after optimization \(dark bars\)\. This reduction in error metrics due to the optimization is evident from the dashed horizontal lines, which represent the mean values across the models\. The differences indicate moderately strong effects, which we report asda​vd\_\{av\}according toCumming \([2013](https://arxiv.org/html/2605.14352#bib.bib72)\)with 95% confidence intervals \(CI\)\. Specifically, the optimization had an estimated effect ofda​v=0\.37d\_\{av\}=0\.37,C​I95%​\[0\.08,0\.66\]CI\_\{95\\%\}\[0\.08,0\.66\]as measured by the MAE, and an only slightly smallerda​v=0\.36d\_\{av\}=0\.36,C​I95%​\[0\.00,0\.73\]CI\_\{95\\%\}\[0\.00,0\.73\]with respect to the MSE\. These effects were largely consistent across models and metrics, as reflected in high pre\-post correlations \(rMAE=\.91r\_\{\\mathrm\{MAE\}\}=\.91andrMSE=\.88r\_\{\\mathrm\{MSE\}\}=\.88\)\. Two exceptions stand out: EuroBERT\-610M and Gemma\-3\-1B, for which optimization had a stronger effect regardless of the metric considered\. These are also the models whose initial values were clearly above the average values \(cf\. the upper dashed line in both panels\)\. No clear effect of model size \(in terms of the number of parameters\) on performance is evident; for both metrics and measurement points, size and error correlated only weakly withr≈−\.25r\\approx\-\.25\(for all metrics before and after optimization, with only post\-optimization MSE showing a slightly higher correlation ofr=−\.27r=\-\.27\)\. In other words, a higher number of parameters tends to produce smaller errors across models, though this does not necessarily hold true for individual models\. For example, the smaller Llama\-3\.2 with 1B parameters consistently yields lower errors than the much larger model with 3B parameters\. The results suggest that model size alone is not a reliable predictor\. At this point, it should be noted that these findings are reported purely descriptively; our setup did not have the primary goal of demonstrating an effect of model size but rather aimed to identify the best model\. Here, Gemma2\-2B yielded the lowest errors, regardless of the optimization or metric\. However, the error bars suggest that the performance of much smaller models such as GBERT\-337 or DeBERTa\-425M does not differ significantly\. No pairwise tests were calculated\.

Similar Articles

Ideology Prediction of German Political Texts

Hugging Face Daily Papers

This paper presents a transformer-based model that projects political orientation of German texts onto a continuous left-to-right spectrum, achieving high accuracy across multiple corpora including Bundestag plenary notes, Wahl-O-Mat, newspapers, and tweets.

TextLDM: Language Modeling with Continuous Latent Diffusion

Hugging Face Daily Papers

This paper introduces TextLDM, a method that adapts visual latent diffusion transformers for language modeling by mapping discrete tokens to continuous latents. It demonstrates that this approach, enhanced by representation alignment, matches GPT-2 performance and unifies visual and text generation architectures.

Better language models and their implications

OpenAI Blog

OpenAI introduces GPT-2, a 1.5 billion parameter transformer-based language model trained on 40GB of internet text that achieves state-of-the-art performance on language modeling benchmarks and demonstrates zero-shot capabilities in reading comprehension, translation, question answering, and summarization. Due to safety concerns, only a smaller model and technical paper are released publicly rather than the full trained model.