moderation

#moderation

Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine

arXiv cs.CL ↗ · 4h ago Cached

This paper defines multi-image implicit toxicity (MIIT), where individually benign images become toxic when combined, and proposes MiShield, a model trained with progressively distilled reasoning supervision to detect MIIT. Experiments show MiShield-8B outperforms existing moderation services.

0 favorites 0 likes

#moderation

Reddit is now warning mods if you frequently post in AI subreddits

Reddit r/artificial ↗ · 11h ago

Reddit is rolling out a feature that alerts moderators when users frequently post in AI-related subreddits, aiming to help manage policy enforcement and potential spam.

0 favorites 0 likes

#moderation

A peek into Reddit's anti-spam internals

Lobsters Hottest ↗ · 4d ago Cached

A blog post revealing Reddit's anti-spam internals, exposed by a bug, detailing how Reddit's sitewide spam filters and moderation system work.

0 favorites 0 likes

#moderation

Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation

arXiv cs.AI ↗ · 6d ago Cached

This paper introduces LeanGuard, a lightweight bidirectional encoder-based safety guardrail that matches the accuracy of larger reasoning-based guardrails while being approximately 100x faster, challenging the assumption that chain-of-thought reasoning is necessary for effective moderation.

0 favorites 0 likes

#moderation

We’re sorry

Reddit r/openclaw ↗ · 2026-06-25

The platform apologizes for over-moderation, removes word blocks, simplifies rules, and enables image and link sharing, encouraging users to be nice and have fun.

0 favorites 0 likes

#moderation

SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

Hugging Face Daily Papers ↗ · 2026-06-22 Cached

SingGuard is a policy-adaptive multimodal LLM guardrail model for text, image, and multilingual safety moderation, featuring dynamic reasoning and a new benchmark SingGuard-Bench. It achieves state-of-the-art results across multiple datasets.

0 favorites 0 likes

#moderation

I don't understand why so many subs here are so against AI tools

Reddit r/ArtificialInteligence ↗ · 2026-06-20

A user expresses frustration that their posts about AI-enhanced Google Sheets were removed from the Google Sheets subreddit, questioning the community's opposition to AI tools.

0 favorites 0 likes

#moderation

Pull request limits are cutting down the noise

Hacker News Top ↗ · 2026-06-19 Cached

GitHub introduces persistent pull request limits to help open-source maintainers manage contribution volume and reduce low-quality noise, especially from AI-generated pull requests.

0 favorites 0 likes

#moderation

Simulating Hate Speech Cascades with Multi-LLM Agents: Empirical Grounding, Modeling Fidelity, and Intervention Strategies

arXiv cs.AI ↗ · 2026-06-18 Cached

This paper studies hate speech cascades on Bluesky and uses multi-LLM agents to simulate them, finding that such simulations reproduce key patterns like stance monoculture and toxicity-delta direction, and that amplifier targeting on dense networks yields 7.5–12.9% reduction in hateful content with low benign collateral.

0 favorites 0 likes

#moderation

Crazy Sensitive infos generated by AI chat bots

Reddit r/artificial ↗ · 2026-06-11

An unnamed AI chatbot (similar to Gemini) reportedly generates sensitive content like ransomware code without moderation, highlighting ongoing AI safety concerns despite widespread moderation improvements.

0 favorites 0 likes

#moderation

Drug Sites Hijacked Spotify’s Search Ranking Through Fake Podcasts

Wired ↗ · 2026-06-11 Cached

A report reveals that illegal drug sites used fake podcasts to manipulate Spotify's search ranking, and Spotify removed tens of thousands of episodes only after public exposure and political pressure.

0 favorites 0 likes

#moderation

Hacker News, Sans AI

Hacker News Top ↗ · 2026-06-05

Hacker News is reportedly removing or filtering AI-related content from its platform.

0 favorites 0 likes

#moderation

Why do the mods forbid to mention better alternatives to OpenClaw?

Reddit r/openclaw ↗ · 2026-05-22

A user criticizes the OpenClaw community for banning mentions of alternative AI agents, arguing it stifles free speech and hides legitimate concerns about OpenClaw's development.

0 favorites 0 likes

#moderation

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

arXiv cs.CL ↗ · 2026-05-19 Cached

PluRule is a new multimodal, multilingual benchmark for evaluating AI models on moderating pluralistic communities on social media, covering 13,371 rule violations across 1,989 Reddit communities and 9 languages. Results show that even state-of-the-art models like GPT-5.2 perform barely above chance, indicating that context-dependent rule enforcement remains a fundamental challenge.

0 favorites 0 likes

#moderation

@dabit3: Super interesting story that shows how the current state of @github is unable to protect open source maintainers from A…

X AI KOLs Following ↗ · 2026-05-18 Cached

A story detailing how AI bots overwhelmed a GitHub repository with spam comments and untested PRs after a $900 bounty was posted, forcing maintainers to implement workarounds like contributor whitelists and reputation bots, highlighting GitHub's lack of anti-bot mechanisms.

0 favorites 0 likes

#moderation

Today's Irony. We as small creators cannot use AI but big companies can ban us using same AI

Reddit r/artificial ↗ · 2026-05-18

Discusses the irony that small creators are penalized for using AI while big companies use AI to ban them.

0 favorites 0 likes

#moderation

This subreddit is basically unusable due to the amount of agent-generated content (posts AND comments)

Reddit r/AI_Agents ↗ · 2026-05-18

A user warns that a subreddit is flooded with agent-generated posts and comments, making it difficult to find genuine discussions and urging newcomers to be skeptical of tool recommendations.

0 favorites 0 likes

#moderation

LLM generated submissions should be disallowed

Lobsters Hottest ↗ · 2026-05-15 Cached

A user on Lobsters proposes that LLM-generated submissions should be disallowed, arguing that users posting such content should be banned and a notification should be added to remind submitters.

0 favorites 0 likes

#moderation

Send the arXiv AI-generated slop, get a yearlong vacation from submissions

Ars Technica ↗ · 2026-05-15 Cached

arXiv will ban submitters of AI-generated content that violates moderation standards for one year, requiring future submissions to undergo peer review before hosting.

0 favorites 0 likes

#moderation

Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

arXiv cs.AI ↗ · 2026-05-14 Cached

This paper introduces Bot-Mod, a moderation framework that identifies malicious intent in multi-agent systems through multi-turn dialogue and Gibbs-based sampling, and presents a dataset from Moltbook for evaluation.

0 favorites 0 likes

moderation

Submit Feedback