A peek into Reddit's anti-spam internals

Lobsters Hottest 06/27/26, 03:10 PM News

Summary

A blog post revealing Reddit's anti-spam internals, exposed by a bug, detailing how Reddit's sitewide spam filters and moderation system work.

<p><a href="https://lobste.rs/s/boap41/peek_into_reddit_s_anti_spam_internals">Comments</a></p>

Original Article

View Cached Full Text

Cached at: 06/27/26, 03:54 PM

# A peek into Reddit's anti-spam internals Source: [https://lyra.horse/blog/2026/06/reddit-spam-internals/](https://lyra.horse/blog/2026/06/reddit-spam-internals/) 5 years ago, back when I still used Reddit, something unusual happened\. My app of choice,[Relay for reddit](https://play.google.com/store/apps/details?id=reddit.news), was bombarding me with a bunch of weird notifications aboutremoved spam\. Getting these notifications wasn’t unusual in and of itself \- I was a moderator of a few fairly small subreddits that’d from time to time get posts automaticallyremovedfor spam\. However, when I went to actually look at theremoved spam, I saw something I was never meant to see\. I saw Reddit’s anti\-spam internals\. so that's about it\. Removed: spamurai $\*Removing potential spam content from unproved user\*:comment \`t1\_pupp13\` \(0\.7294469 perspective spam$ by u/GoodBoyBacon$0\.06 days old, spammy: 11, hosted: false, \-1 karma, 4 reports, org: \`ComcastCable\`, email: gmail\.com$ in r/GoodBoysOnly$guest$ posting nil from \`oauth\.reddit\.com\` via \`nil\` from UA: Mozilla/5\.0 $Windows NT 10\.0; Win64; x64$ AppleWebKit/537\.36 $KHTML, like Gecko$ Chrome/95\.0\.4638\.54 Safari/537\.36 Edg/95\.0\.1020\.30, RHS: oc:ac:kT:lw:bV:aX:af:a6:l5:y3:aT:m9:pt:f3:hZ:az:aR:aQ, LANG: en\-US,en,q=0\.9, TLS: j7bXVc3l/qer8FRj2aEiqOrx1ro=DDZ0TViWlY5HYgOPw1SZqDxwiO8= \- referrer: https://www\.reddit\.com/, thumbnail: \`\` \-\)•GoodBoyBacon• 1 points • 27 minYou seeu/BadGuy67? He's the same guy ashttps://www\.reddit\.com/r/ReallyBadGuys/comments/qw3rt1/if\_ur\_a\_bad\_guy\_post\_here\_please/ Removed: Reddit $shadowban applied on 10\-27\-2021$•GoodBoyBacon• 0 points • 1 hrI'm not the same guy as that other guy please read my comment ## How Reddit moderation works So Reddit is a site comprising of smaller sub\-communities, which are called subreddits\. For example,[/r/mylittlepony](https://old.reddit.com/r/mylittlepony)is a subreddit for fans of My Little Pony\. These subreddits can be created by anyone, and they are moderated by a group of community moderators appointed by the creator of the subreddit\. If we go[1](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:1)on[/r/mylittlepony](https://old.reddit.com/r/mylittlepony)we can see the list of moderators on the sidebar: ## MODERATORS - MESSAGE THE MODS - message sent\. - Orschmann - optimistic\_outcome - Chinch335 - IllusionOf\_Integrity - spokesthebrony - TheeLinker - Lankygit - Raging\_Mouse - Searchbar\_Trixie - gbeaudette - \.\.\.and\.\.\.and These moderators can remove posts, ban users, manage modmail etc, but they are just normal Reddit users\. If you’re a moderator you can see who removed a post or comment: Removed: rebane2001•ExampleUser• 1 points • 1 hrI'm breaking the rules 😈 This includes the[automod](https://old.reddit.com/r/reddit.com/wiki/automoderator)\- a rules\-based moderation system: Removed: AutoModerator•ExampleUser• 1 points • 1 hrbad word But then you’ll sometimes also see the mysterious “Auto”: Removed: Auto•ExampleUser• 1 points • 1 hrhi This is what happens when something gets caught in Reddit’s mysterious spam filters, or when Reddit’s sitewide adminsremovesomething manually\. In the moderator log, they’ll show up as “reddit” and “Anti\-Evil Operations”: ## ExampleSubreddit: moderation log filter by action: filter by moderator: You cannot bereasonedwith\.13 days agoAnti\-Evil Operationsremovedlink "\[ Removed by Reddit \]" byEvilPoster27 days agoAnti\-Evil Operationsremovedlink "\[ Removed by Reddit \]" byRule\_Breaker13371 month agoredditremovedlink "buy my shirt" byxXx\_ShirtSeller\_xXx1 month agoredditremovedlink "sexy ladies" bySpamBot\_000183412 months agoAnti\-Evil Operationsremovedlink "\[ Removed by Reddit \]" byPuppyGirlHater These sitewide spamremovalsis what the rest of this post is going to be about\. ## Oopsie What happened to me back in 2021 was that due to some kind of an error on Reddit’s side, the usualRemoved: Autotext had been replaced with theactual removal reason\. Why this happened to me I do not know \- it returned back to normal after an hour or so\. All I was left with was a bunch of screenshots I managed to take while this stuff was still going on\. But that doesn’t mean we can’t speculate\! Up until 2017, Reddit’s[source code](https://github.com/reddit-archive/reddit/)was publicly available\. Of course, a lot has changed since then, but we can still analyze the archived code and hypothesize what might be happening\. The function responsible for moderator removals is**[POST\_remove](https://github.com/reddit-archive/reddit/blob/753b17407e9a9dca09558526805922de24133d53/r2/r2/controllers/api.py#L3037-L3090)**: ``` def POST_remove(self, thing, spam): """Remove a link, comment, or modmail message.""" ... admintools.spam(thing, auto=False, moderator_banned=not c.user_is_admin, banner=c.user.name, train_spam=train_spam) ``` We can see it calls**admintools\.spam**with a few arguments, notably:**moderator\_banned**, which marks whether something was removed by a moderator or an admin, and**banner**, which notes down the username of whoever did the ban action\. Poking around a bit more, we find the**[get\_mod\_attributes](https://github.com/reddit-archive/reddit/blob/753b17407e9a9dca09558526805922de24133d53/r2/r2/lib/jsontemplates.py#L618-L639)**function: ``` # Comments added by me for the blogpost def get_mod_attributes(item): data = {} # If user is logged in and a moderator if c.user_is_loggedin and item.can_ban: data["num_reports"] = item.reported data["report_reasons"] = Report.get_reasons(item) ban_info = getattr(item, "ban_info", {}) # If post was removed if item._spam: data["approved_by"] = None # If post was removed by a mod if ban_info.get('moderator_banned'): # Show the banner name data["banned_by"] = ban_info.get("banner") else: # else, if post was removed by an admin # Hide the banner name data["banned_by"] = True else: data["approved_by"] = ban_info.get("unbanner") data["banned_by"] = None else: data["num_reports"] = None data["report_reasons"] = None data["approved_by"] = None data["banned_by"] = None return data ``` This is the part of the API that actually returns us the information about removals \- the**banner**in*ban\_info*is thered textI was seeing Relay\. And it seems like it will only get returned if the removal was by a moderator, not an admin\. But where does thatAutotext come from? Reddit’s API only returns an actual username, or`True`\. Turns out that it’s actually coming from Relay[2](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:2)itself: ``` // reddit/news/oauth/reddit/model/base/RedditLinkComment.java if (this.bannedBy.equalsIgnoreCase("true")) { this.bannedBy = "Auto"; } else if (this.bannedBy.equalsIgnoreCase("null")) { this.bannedBy = ""; } ``` Okay, that explains that\. But where am I getting theseinternal messagesfrom? Well, it seems like[Reddit is re\-using](https://github.com/reddit-archive/reddit/blob/753b17407e9a9dca09558526805922de24133d53/r2/r2/controllers/api.py#L558-L564)the**banner**field forinternal removal reasons: ``` def POST_submit(self, form, jquery, url, selftext, kind, title, sr, extension, sendreplies, resubmit): """Submit a link to a subreddit.""" ... if not is_self: ban = is_banned_domain(url) if ban: g.stats.simple_event('spam.domainban.link_url') admintools.spam(l, banner = "domain (%s)" % ban.banmsg) hooks.get_hook('banned_domain.submit').call(item=l, url=url, ban=ban) ``` The above code snippet runs whenever a new link is posted\. It checks whether the domain is spam, and if it is it removes it with the**banner**set to “domain $REASON$”\. We can see it in action with thisremovedpost for example: 1I\_EAT\_PONIESinMyLittleOutOfContext## [Conga\!](http://24.media.tumblr.com/tumblr_m7yp9ysNCV1r361q4o1_400.gif) Removed: domain $banned as an experiment to see what happens with tubmlr spam ring\. \- em 5/31/12$• 0 Comments • 24\.media\.tumblr\.com • 9 yrs Seems likeemwas playing around with auto\-removing alltubmlr\[sic\]links on Reddit in 2012? Anyways, it seems like Reddit is stuffing itsinternal spam removal reasonsin the**banner**field, but making it so that only sitewide admins can seethem\. And something in a codepath similar to**[get\_mod\_attributes](https://lyra.horse/blog/2026/06/reddit-spam-internals/#get_mod_attributes)**was broken for a couple hours, allowing me to see thosereasons\. Let’s take a look at the kinds ofreasonsI managed to get a glimpse of\! ## domain $2012 \- present$ The first category is the domainremovals, as shown earlier\. Nearly all of these are justRemoved: domain $spam$, though I did find this gem in there: 1presafurinMyLittleOutOfContext## Just register and look for me here\* h9OI5WUQZPL Removed: domain $le sexxxxy sex spam$• 0 Comments •www\.example\.com • 5 yrs Perhaps I’m just childish, but I find the idea of someone goingle sexxxxy sex spamwhile working on a spamfilter rather amusing\. Reddit probably had some issues with Tumblr spam, because in addition to thetubmlrremoval we saw earlier there was also this: 1JackofH3artinMyLittleOutOfContext## [It hurts so good\.](https://bartl3by.tumblr.com/post/41108523402) Removed: domain $ban \- 11/12/12 mg $•NSFW• 0 Comments • bartl3by\.tumblr\.com • 8 yrs I’m quite certain that thisremovalwas targeted at Tumblr in general, and not the specific blog linked, since[bartl3by\.tumblr\.com](https://bartl3by.tumblr.com/)seems to be a legitimate $although somewhat perverse$ blog\. I believe domainremovalsare the only type of anti\-spam we can actually see in the public Reddit source code\. Though, even that is[partially hidden](https://github.com/reddit-archive/reddit/blob/753b17407e9a9dca09558526805922de24133d53/r2/r2/models/admintools.py#L338-L339)\. ## spammit $2012 \- present$ The next category is spammit, which*somehow*analyses a post and gives it a percentage rating: 1KyderrainMyLittleOutOfContext## [I'm very fondling of you](https://theponyarchive.com/archive/mlfw/mlfw/mlfw8871-155478_-_animated_nudge_rainbow_dash_spitfire_wingboner.gif) Removed: spammit$72\.98% spammy$• 0 Comments • dashie\.mylittlefacewhen\.com • 8 yrs Yes, there’s no space betweenspammitand the parenthesis\. The percentages of removed posts were generally fairly high, with the lowest one being39\.71% spammyand highest one98\.19%\. That being said, spammit doesn’t seem like a very accurate anti\-spam measure for the subs I moderate because it seemed to hit a lot of legitimate Imgur posts with a 70\-98% spammy rating\. ## bans $2016 \- present$ Next, we have postremovalsfor banned users\. 1kaitlynwwrettininMyLittleOutOfContext## Cost Reduction & Cost Saving Consultants \|Puppygirl Consulting Removed: banned user• 0 Comments •www\.example\.com • 3 yrs Some of them are marked with just aRemoved: banned user, though others get a fancyRemoved: Reddit $banall performed$\. 1KerryVinebt403inMyLittleOutOfContext## casino online Removed: Reddit $banall performed$• 0 Comments •example\.com • 3 yrs The posts I saw beingremovedlike this were all very obvious spam\. Mostly just ads for all kinds of services\. I suspect this is the admins seeing an obvious spambot and just nuking it from orbit\. ## shadowbans $2016 \- present$ It’s known that Reddit shadowbans its users\. A shadowban is a silent ban where seemingly nothing happens to your account and you’re still able to post/comment, but nobody else will be able to see your posts and comments\. In fact, there’s even[a subreddit](https://old.reddit.com/r/ShadowBan/)for checking whether you’re shadowbanned\. But now we can actually see what a shadowban looks like to admins: 1pickertramontanaintrixiemasterrace## Blonde Teen Takes A MassiveMeowIn HerBark Removed: Reddit $shadowban applied on 11\-06\-2019$• 0 Comments • self\.trixiemasterrace • 1 yr I’m not going to share the specific conversation here, but there was a really funny comments thread going on where a person was blaming mods forremovingall of their comments while in reality being shadowbanned by Reddit\. ## spamurai $2020 \- present$ Now we get to the most interesting part of the entire spam filter thing\. Unlike spammit, spamurai is a system that does have[some public references](https://www.ai-expo.net/global/wp-content/uploads/2019/04/0950_Anand_Mariappan_Reddit_ENT2.pdf)to it\. According to slide \#28, Reddit uses Minsky for “ML”, and Spamurai for “Rules”\. I’m not sure how this is calculated into theremoval reasons, so[I’m just going to ignore it](https://www.youtube.com/watch?v=ZLoXILYESOM)and assume everything is spamurai\. First up, there seems to be some sort of a spamurai subsystem calledechelon\. It seems to remove certain keywords, such as the EqG elsagate spam seen below, and lewd $OF? Snapchat?$ stuff likepuppyvids\.69\. 1mypham71375inPony\_irl## Equestria Girls Princess Animation Series \- Twilight Sparkle Cutie Mark \.\.\. Removed: spamurai $echelon: Equestria Girls Princess Animation Series$• 1 Comments • youtube\.com • 2 yrs Then, there’s some targetedremovals, such as this one that targets clothing spam\. Removed: spamurai $approval required on hyperlink comment from high spam score account \(suspected shirt affiliate spam$\)•Adventurous\-ties• 1 points • 5 monthsDog\-Womenconsulting company will open the Ukrainian pharmaceutical market for you\! ## Hello this is the Ukrainian pharmaceutical market\! ## This market has been opened to you by theDog\-Womenconsulting company\! So what would u like to order? ## Perspectrogen 2mg star ratings Limited time deal \-50%349KARMA 699 KARMA About this item With this item you can see the Perspective scores of*absolutely everything*\! Buy New 349KARMA In stock Buy now No refunds after purchase\. ## Thank you for your purchase\! No refunds\. Legal note: This is a joke, no pharmaceuticals can actually be ordered from this blog post about Reddit spam filters\. And some more general rules\-based filters, such as this one for account age\. Removed: spamurai $comment from account under 30 minutes matching spam conditions$•NewUser67• 1 points • 23 minfuck you fuck you fuck you But alright, let’s try to figure out what’s going on with the infodumpremovalslike the one I put in the banner art of this post\. 1AnywhereAlone6851inPony\_irl## 18 Random Facts That Will Blo Removed: spamurai $\*Removing potential spam content from unproved user\*:link \`t3\_phc4xx\` \(0\.12571795 perspective spam$ by u/AnywhereAlone6851 $2\.948587962963 days old, spammy: 4\.5, hosted: false, 28 karma, 5 reports, org: \`Skyinfo Online\`, email: gmail\.com$ in r/Pony\_irl $guest$ posting pinterest\.com from \`oauth\.reddit\.com\` via \`nil\` from UA: Mozilla/5\.0 $Windows NT 6\.3; Win64; x64$ AppleWebKit/537\.36 $KHTML, like Gecko$ Chrome/93\.0\.4577\.63 Safari/537\.36, RHS: oc:ac:kT:lw:bV:aX:af:a6:l5:y3:aT:m9:pt:f3:hZ:az:aR:aQ, LANG: en\-US,en,q=0\.9, TLS: SwxwvfHLtTxt/9qbo1dvBLEMSIQ=tT1LosI8/xDmUS7LMVuhb/olIJQ= \- referrer: https://www\.reddit\.com/, thumbnail: \`https://b\.thumbs\.redditmedia\.com/K\_Q91r66a3AEopEbzGkjkxHOpisoQbxa3hIoHxDerjc\.jpg\` \-\`\`\`18 Random Facts That Will Blo \`\`\`https://www\.reddit\.com/r/Pony\_irl/comments/phc4xx/18\_random\_facts\_that\_will\_blo/ \)• 0 Comments • pinterest\.com • 1 month That is a lot of information in there\! Let’s break it down bit by bit: link t3\_phc4xx: this is the[“fullname”](https://old.reddit.com/dev/api/#fullnames)ID of the post, it’s what shows up in urls except it is prefixed:**t1**is comment,**t2**is user,**t3**is post,**t4**is private message, and**t5**is subreddit\. 0\.12571795 perspective spam: this is almost certainly using the**[Perspective API](https://perspectiveapi.com/)**\. Perspective is a free[3](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:3)Google[4](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:4)service that uses machine learning to “reduce toxicity online”\. I’m sure of this because*perspective*is a pretty unique word, the[Perspective docs](https://developers.perspectiveapi.com/s/docs-sample-requests)display sample results with similar score numbers $e\.g\. 0\.24173126 and 0\.4445836$, and[Perspective’s case studies page](https://web.archive.org/web/20220126010922/https://www.perspectiveapi.com/case-studies/)contains this quote from the CTO of Reddit: “As Reddit scales, the integrity of our platform and ensuring healthy discourse among users and communities remains a priority\. Perspective has been a valuable tool as we continue to strengthen the safety measures and tooling that we have in place\.” —Chris Slowe, Chief Technology Officer at Reddit It seems like Reddit is using[Perspective’s “experimental” SPAM](https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languages)attribute here though, which is intended to detect spam instead of toxicity\. The data for this is trained on a SINGLE DATASET of the comments and moderation in the New York Times, which I find pretty interesting\. Unfortunately, since February 2026, we can no longer create a new Perspective API project on Google Cloud, so it is not possible to try it out anymore\. Well, that is unless we can find some leaked API keys :3\. Which I may or may not have teehee\.\. ``` $ curl 'https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=AIzac29ycnkgdGhpcyBpcyBjZW5zb3JlZCBsb2w' \ --request POST \ --header "Content-Type: application/json" \ --data '{ "comment": { "text":"18 Random Facts That Will Blo " }, "requested_attributes": { "SPAM": {"score_type": "PROBABILITY"} } }' { "attributeScores": { "SPAM": { "spanScores": [ { "begin": 0, "end": 30, "score": { "value": 0.12571794, "type": "PROBABILITY" } } ], "summaryScore": { "value": 0.12571794, "type": "PROBABILITY" } } }, "languages": [ "en" ], "detectedLanguages": [ "en" ] } ``` \.\.and thus, we can be 100% sure that this is the API Reddit used, because we get back the same[5](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:5)0\.12571795as we saw in spamurai earlier\. This is interesting because it means that this entire time it was possible for a bad actor to bypass one of the primary spamurai criterias by just changing their message until it’s non\-spammy for Perspective’s free API\. It’s not even hard to do so, as SPAM score is extremely sensitive to changes of just a few characters: ``` $ query='Puppygirl Consulting is the best way to grow your revenue' $ ./perspective.sh "$query" 0.8638981: Puppygirl Consulting is the best way to grow your revenue $ for letters in {a..z}{a..z} do ./perspective.sh "$query $letters" | grep "0.0" done 0.010811162: Puppygirl Consulting is the best way to grow your revenue qp ``` You can see how going through all 2\-letter combinations got us from a 86% spam score down to 1%, which is significantly less than pretty much any normal text\. It also ignores numbers and case for some reason: ``` $ ./perspective.sh 'Hi there, please call my work phone at 567890' 0.81438655: Hi there, please call my work phone at 567890 $ ./perspective.sh 'hi THEre, pleaSE Call my woRk phonE aT 022102' 0.81438655: hi THEre, pleaSE Call my woRk phonE aT 022102 ``` As well as alternate alphabets: ``` $ ./perspective.sh 'привет' 0.35077864: привет $ ./perspective.sh 'наххуи' 0.35077864: наххуи ``` Which means you can sometimes lower your spam score just by using cyrillic characters: ``` $ query='Buy my product' $ ./perspective.sh "$query" 0.6473346: Buy my product $ ./perspective.sh "$(echo -n $query | sed s/p/р/)" 0.4452748: Buy my рroduct ``` Anyways, moving on… by u/AnywhereAlone6851: username, self\-explanatory 2\.948587962963 days old: account age as days, which is a pretty good indicator of spam accounts and ban evaders\. But it does give us one interesting detail \- I believe the account age is represented in seconds, because all the examples I have come out to a round number when multiplied by 86400 $amount of seconds per day$\. spammy: 4\.5: not sure what this is, could be the*Minsky*thing from earlier? Or the spammit score from earlier? Or a combination of multiple spamurai rules? hosted: false: not sure, maybe to detect known hosting provider ip ranges? 28 karma: self\-explanatory, karma is often used as a measure of an account’s presence 5 reports: total number of reports an account and its posts have received org: Skyinfo Online: the ISP of user\. This can tell you where the user is from and whether they’re using a VPN\. In this case we can see that the spam is coming from Bangladesh, because that’s where[SkyInfo Online](https://www.skyinfoonline.net/)operates from\. Their website is incredible\. email: gmail\.com: e\-mail domain of the user in r/Pony\_irl $guest$: the subreddit that the post is in\. I believe the $guest$ means that the user is not a subscriber of the subreddit\. I assume that it would say something like “member”, “approved”, or “moderator” in other cases, but I don’t actually have any examples of that\. posting pinterest\.com: the domain that’s being linked to from oauth\.reddit\.com via nil: the user was authenticated with Reddit’s oauth flow, which is the default, and I belive the*nil*$Lua’s*null*$ is where the name of a custom client would go, e\.g\. Relay in my case\. I don’t have any examples of this being anything other than*nil*, so this is just speculation\. from UA: Mozilla/5\.0 $Win…: this is the user agent string of the browser that’s being used\. It tells us that the person is posting from the*Chrome*browser on*Windows 8\.1*\. RHS: oc:ac:kT:lw:bV:aX…: this seems to be some sort of a fingerprinting hash Reddit uses\. I believe this is Reddit’s own engineering and not an existing open\-source solution\. This hash is the exact same between this Chrome 93 example, and the Edge 95 example from the beginning\. This leads me to conclude that the hash fingerprints browsers \(Edge and Chrome are both Chromium$ and is meant to detect scripts pretending to be a browser\. LANG: en\-US,en,q=0\.9: the value of the*accept\-language*header, it tells websites what languages you’d like to see websites in\. This can be used to detect potential VPN usage, e\.g\. if someone has Latvian as their language but is joining from a New York IP\. TLS: SwxwvfHLtTxt/9qbo…: this is TLS fingerprinting similar to[JA3](https://github.com/salesforce/ja3)\. It seems to be Reddit’s own engineering though, not an existing implementation\. referrer: https://www\.reddit\.com/: this is the page the user got onto Reddit from\. Sometimes when opening Reddit links directly from other sites, your votes are not counted to discourage brigading, and this is what the referer is used for\. In the case of spamurai it might be useful if the referer is something like*buy\-reddit\-comments\.info*$or more realistically, a platform such as Fiverr$\. thumbnail: https://…: the auto\-generated thumbnail \- \`\`\`18 Random Facts That Will Blo \`\`\`: the markdown body of the post/comment https://www\.reddit\.com/r/Pony\_irl…: link to the post/comment So that’s the full spamurai infodump with no clear reason forremoval\. There are also examples of spamurai clearly using the same data but with specific rules, such as the use of the spammy score here: Removed: spamurai $URL\-only comment from account with high spammy score$•GoodBoyBacon• 0 points • 24 minhttps://www\.reddit\.com/r/ReallyBadGuys/comments/qw3rt1/if\_ur\_a\_bad\_guy\_post\_here\_please/ Or the use of the perspective score here: Removed: spamurai $REPORT: High spam perspective score on comment with hyperlink reported for spam\. Removed but can be re\-approved by mod\.$•BrattyErmine12• 0 points • 11 months**Coins are a virtual good you can use to award exemplary posts or comments\. Support Reddit and encourage your favorite contributors to keep making Reddit better\.** **GET COINS** ## Here’s what you can buy with coins **Spend your coins on these Awards reserved exclusively for the finest Reddit contributors\. Awarding a post or comment highlights it for all to see, and some Awards also grant the honoree special bonuses\.** 📷 ### Silver Award Shows a Silver Award on the post or comment and \.\.\. that’s it\. You’ll need 100 Coins\. The perspective spam score for the above post is either 0\.9761621 or 0\.9782609[6](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:6)\. Also what’s interesting is that the specific rule there got triggered by someone reporting it for spam \- thus we learn that sometimes user reports have an effect even without moderator intervention\. It’s also interesting how some of theremovalsadjust based on mod actions: 1muyuwobsoq9qinPony\_irl## 영어 배우기\! 알파벳송 인기\- تعليم الاطفال مع \- العاب أطفالأغاني الحضانة وأغنية الأط\.\.\. Removed: spamurai $High karma\-to\-spam ratio on link content from 6\+ spammy score account; mod approval of this content will reduce future removals$• 1 Comments • youtube\.com • 2 yrs ## misc There’s also a bunch ofremovalsthat don’t really neatly fit into any of the above categories\. For example,Pinterestredirect links getremoved: Removed: pinterest redirect•22\_ghost\_22• 1 points • 4 monthshttps://pin\.it/Sc4mUr1 As domega\.nzlinks: Removed: streamer spam•EPIC\_Gamer67• 1 points • 11 monthshttps://mega\.nz/folder/Ep1cV1d30s The decryption key is SW52YWxpZCBiYXNlNjQgc3RyaW5n In the case of the comment above, it was actually a legitimate link to some archived YouTube videos, so it was falsefullyremoved\. Another banned kind of link is a freely available subdomain: 1cpsryaninUnusAnnusArchival## All of the Sauce for Unus Annus Archives Removed: freely available subdomains•$Unus Annus Archiving$• 0 Comments • self\.UnusAnnusArchival • 11 months In the above case the post didn’t contain those kinds of links per se, but it did contain a magnet link that reddit found and linkified the*2ftracker\.opentrackr\.org*[7](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:7)inside of\. I’m not sure why opentrackr gets matched under “freely available subdomains” though\. But speaking of trackers, certain strings are straight up regex banned: 1e4e5x0q8e1pinMyLittleOutOfContext## 레인보우 대시 프레젠츠26화 토렌\.트 150927 26화 torrent HD 고화질 FULL레인보우 대시가 선사하는26회 토렌\.트 150927 26화 다시보기 Removed: Matched forbidden regex u'torenteu'• 2 Comments • self\.MyLittleOutOfContext • 10 yrs Now, this one is super interesting to me because nowhere in the post does the stringtorenteuappear, yet we still somehow match our regex? The reason this happens is because[Reddit uses the unidecode library](https://github.com/reddit-archive/reddit/blob/753b17407e9a9dca09558526805922de24133d53/r2/r2/lib/utils/utils.py#L1261-L1264)[8](https://lyra.horse/blog/2026/06/reddit-spam-internals/#fn:8)to convert post titles into ascii: ``` $ python2 Python 2.7.18 (default, Dec 9 2024, 19:35:20) [GCC 9.4.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import unidecode >>> unidecode.unidecode(u"레인보우 대시 프레젠츠 26화 토렌.트 150927 26화 torrent HD 고화질") 'reinbou daesi peurejenceu 26hwa toren.teu 150927 26hwa torrent HD gohwajil' >>> ``` It thenprocesses the string a bit moreand arrives at*"reinbou\_daesi\_peurejenceu\_26hwa\_torenteu\_150927"*, which does match theu’torenteu’regex\. I was curious as to whether this filter still exists, so I made some test posts on a subreddit I moderate using an alt account: ## popstonia overviewcommentssubmittedsorted by: there doesn't seem to be anything here Incorrect combination :3c There is noreasonclicking here should do anything, and yet\.\.\. It’s hard to say? It seems like the string “torenteu” by itself does not get removed, so I assume the other removals are based on various other kinds of spam heuristics? Something I did find interesting is that*UA\-12345678\-*got removed, but*UA\-49307539\-*did not\! It’s interesting because there used to be a filter for that specific phrase too: Removed: Failed inspection: Phrase$s$ \[u'UA\-49307539\-'\]•c4c3u5o8c7n• 0 points • 11 months다시보기강아지들토렌\.트 torrent 토렌 DVD 1080p 720p HD Full HD DVD 1080p MKV 강아지들토렌\.트 file 1080p MKV 다시보기강아지들토렌\.트 토렌\.트 토렌 Torrent Comprehensive 720p HD Coverage aggregated from sources all 토렌\.트 파일 $Torrent$ : 파일 받기 :다시보기강아지들토렌\.트 Torrent \. \. \. \. \. \. \. \. \. Though, this case is a little more curious than just that\. Once again, the removal phrase does not appear in the comment, but this time not even after running the text transformations\! The trick here is that the comment contains a link that goes through several redirects and then ends up on some Korean forum\. And looking at the source code of said forum: ``` <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-anal [break] ytics.com/analy [break] tics.js','ga'); ga('create', 'UA-49307539-2', 'auto'); ga('send', 'pageview'); </script> ``` Aha\! So the “inspection” means that Reddit literally opens the URL, follows redirects, and looks for the pattern on the page\. In this case, the pattern matched is a Google Analytics ID, so that if the same spam ring was to change their IP and domain, the spam filter would still catch them\. I wanted to try this on my own account, so I put the string**<pre\>UA\-49307539\-2</pre\>**on a website and posted a link to it on Reddit\. ## This account has been banned My test account $5 years old\!$ got banned*immediately*, and all of its post history got wiped too\. RIP[/u/popstonia](https://www.reddit.com/user/popstonia/)\. For this reason, I changed the*real*numberin this blogpost to*UA\-49307539\-*, which is in reality a random number \- I would rather not put a piece of text out there that can kill people’s accounts through just posting it\. I tried to recreate the ban with a friend’s account which had a little more history on it, and that one ended up being fine\. So I’m guessing this string only killed my test account because it was already a certain level of suspicious for the anti\-spam filters\. I don’t actually know for sure whether the filter is still active, or whether my account getting nuked was a coincidence, but I’m choosing not to publicize the specific string used just to be safe\. Alright so this next one has a pretty interestingremovalmessage: 1Redstoner7inUnusAnnusArchival## Massive Torrent of Unus Annus content Removed: spam https://www\.reddit\.com/message/messages/8edha7•$Unus Annus Archiving$• 0 Comments • self\.UnusAnnusArchival • 11 months $edited 11 months$ The post here isn’t all that interesting, but what is is that it got immediatelyremoved\. I have no idea why the removal message would link to a specific Reddit PM? Is this the message that was sent to the user? Was it sent to admins? Was it sent to modmail? Is it a DMCA message? I wanted to get to the bottom of this so I did some OSINTing and tracked down the person who made that post,[Aria](https://aria.coffee/)\. I asked her to check the message link, but it turns out that it was not sent to her account\. So this got me even more curious\. Now, something we do have is the id of the message \-8edha7\. Just like the other ids on Reddit, it is sequential\. This meant I could figure out when this linked message was sent based on the messages in my own message history\. And this message appears to land in the latter half of May 2017\! I still don’t know what this message is and why it is linked in theremoval reason, but it is a rather old message from before the Reddit account or the subreddit were even created\. 0neynimeinMyLittleOutOfContext## Sex video chat with Russian girls\. Free registration\. DtAyqbIm Removed: Janitor russian girls chat: Submitted by banned user neynime• 0 Comments •example\.com • 5 yrs I’m not sure what’s up with this one\. Like obviously it’s just sex spam or whatever, but what’s up with that removal message? Is it talking aboutJanitor russian girlswho chat, or is there a redditjanitorwho did the removal? Why is itsubmitted by banned user? Is this like abanallfrom before that was a thing? So many questions, and unlike the previous post I can’t even reach out to anyone to ask about it\. Okay, but there’s one more removal I found pretty interesting, and it’s this one here: Removed: some pages have personal info \- 11/15/12 mg•gnbman• 1 points • 8 years[https://encyclopediadramatica\.se/thumb/8/8a/Woll\_Smoth\_original\.jpg/180px\-Woll\_Smoth\_original\.jpg](https://knowyourmeme.com/memes/woll-smoth) For those out of the loop,Encyclopedia Dramaticais a parody wiki site centered around internet culture and making fun of people\. It’s pretty much like if 4chan was in charge of Wikipedia\. A lot of the pages are pretty mean to their subjects, sometimes \- as you might deduce from the removal message above \- to the point of digging up and documenting their personal history\. Thus, it seems likemgdecided to ban the entire domain and auto\-remove any links to it\. I believe this is noteworthy as it is the only removal here that is not just spam, but instead a legitimate website that Reddit did not like the content of\. ## Reddit engineering So that was all I was able to deduce from what I saw myself, but as it turns out, Reddit has been writing about their anti\-spam systems too\! There’s a post from 2023 on[/r/RedditEng](https://old.reddit.com/r/RedditEng/)titled*[Protecting Reddit Users in Real Time at Scale](https://www.reddit.com/r/RedditEng/comments/16m3t7m/protecting_reddit_users_in_real_time_at_scale/)*that talks about internal systems calledRule\-Executor\-V1$REV1$,REV2, andSnooron\. The timeline is a bit messy, but how I understand it is that REV1 was created in 2016, then Snooron was developed in 2021 to modernize REV1, and two years later everything was migrated to REV2? I wonder if that migration is what led to me seeing the admin removal messages back in 2021\. Both REV1 and REV2 run off of Lua rules such as this: ``` if body_match("some bad text") then action(user) end ``` This leads me to believe that REV1 is what*we*know as spamurai\. The timeline seems to match, and we’ve seen samurai emit strings such as “nil” that you’d expect from Lua\. There have been[fairly recent user reports](https://old.reddit.com/r/ModSupport/comments/1pejhf7/safety_spamurai/)of posts getting removed by the users /u/Safety\_Spamurai and /u/bot\-bouncer, so the spamurai name is still at least*somewhat*in use, even for REV2 or snooron\. But we also saw removals such asu’torenteu’andu’UA\-49307539\-’, which are clearly Python2\.7 unicode strings\. The former was way before 2016, so that makes sense, but what about the latter removal that we only saw in like 2020? Well, REV1 also ran on Python2\.7, so I think there are two possible conclusions: either the REV1 code calls out the URL inspection code written in Python2\.7, or the inspection code is entirely separate from REV1/spamurai\. I suspect the latter, because all of the spamurairemoval messagesseem to be prefixed with “spamurai”\. I also learned that,[according to this talk](https://www.youtube.com/watch?v=lWCt4t1Dhvc), snooron runs on Flink Stateful Functions, classifies posted images, runs OCR on said images, and uses Python3 for its workers\. I also found this[Australian eSafety PDF](https://www.esafety.gov.au/sites/default/files/2025-07/BOSE-responses-to-mandatory-notices-tvec-March2025-updatedJuly2025.pdf)which lists Reddit as using, as of 2024, the[Hive AI](https://thehive.ai/)for OCR and image/video classification, but also the[Google Vision OCR API](https://docs.cloud.google.com/vision/docs/ocr)\. They explain that Hive’s OCR supports 12 languages, and thus they also need Google’s OCR to support a lot more of them\. They also mentioned that they’re working on an internal tool that would support 80 languages\. Though, the text classification itself is done in\-houseusing snooron\. Snooron also has internal image hash\-matching functionality\. I don’t know whether this is just using existing anti\-abuse/anti\-terrorism hash databases, or if it’s also Reddit’s own hashes for common spam and such\. Going back in time, I also found[this ticket](https://web.archive.org/web/2009/http://code.reddit.com/ticket/124)from 2009 where[spez](https://old.reddit.com/u/spez)\[A\]confirms that a user called crm114 is a spam filter that can be trained by moderators\.[CRM114](https://en.wikipedia.org/wiki/CRM114_%28program%29)is an old open\-source spam classification software that, among other things, lets you “train” it with data to make its detection more reliable\. This is also why the**admintools\.spam**method in Reddit’s source code has a**train\_spam**keyword \- it decides whether the anti\-spam filters should be trained off of the performed moderator action\. So, approve good posts in your sub if you want less false\-positives? ## Why now? So why release all this information now and not 5 years ago? I believe the information in this post, if released back in 2021, would’ve been catastrophic for Reddit’s spam issues\. I don’t care too much about large companies, but covering internet forums in spam is not something I strive to do\. In 2026 however, I believe this information is no longer dangerous to publicly share\. First of all, the[Perspective API is shutting down](https://web.archive.org/web/20260624230439/https://perspectiveapi.com/)by the end of this year\. I doubt Reddit is still using this API, and even if they are, they’ll have to migrate off of it soon anyways\. Secondly, there’s that elephant in the room\. LLMs have changed the game and revolutionized… the spam industry\. And thus, I think it’s safe to assume that Reddit has had to overhaul*a lot*in their anti\-spam systems to make it work in the year of 2026\. ## afterword hiii\! probably not the blogpost you were expecting, but hopefully a fun one nontheless\! ^^ as usual, i did the whole “handwritten html/css, no images, no external resources, no javascript” thing for this post too $46kB gzipped btw\!$, but while recreating the old reddit ui i was pleasantly surprised by just how nice its code is\! it feels like it was written by someone who actually loves html and css and wants to give me a warm hug\. i was amused by the css actually using theorangeredcolor*by name*, a rare sight these days\! anyways, some other updates \- many of you are probably awaiting my[x86css](https://lyra.horse/x86css/)blog post, and it is hopefully coming out at some point, but in the meanwhile i[gave a talk](https://lyra.horse/slides/#2026-cssday)about the project at css day $which was a really fun event\!\!$\. unfortunately, the recordings for the talk will initially be behind a paywall, but they should become public*eventually*\. i’m also trying to get the same talk accepted at[40c3](https://events.ccc.de/congress/2026/), in which case the recordings will be available immediately\. besides that, i’ll likely be doing a few other talks this year too \- check[my slides page](https://lyra.horse/slides/)for up to date info\. other than that i’m really hoping to host[x3ctf](https://x3c.tf/)again this year\. we’re still not sure when it is happening, but i think we are all aiming to make it happen before the end of the year\. thank you so much for reading <3 *If you’d like to reach out, feel free to message me on my socials or at lyra\.horse \[at\] gmail\.com\.* **Discuss this post on:**[twitter](https://twitter.com/rebane2001/status/2070887442891026628),[mastodon](https://infosec.exchange/@rebane2001/116822705466646248),[lobsters](https://lobste.rs/s/boap41/peek_into_reddit_s_anti_spam_internals)

A peek into Reddit's anti-spam internals

Similar Articles

Sneaky spam in conversational replies to blog posts

Companies Are Using Reddit to Manipulate ChatGPT and Google AI Search

@dabit3: Super interesting story that shows how the current state of @github is unable to protect open source maintainers from A…

Reddit is now warning mods if you frequently post in AI subreddits

Notes about reading messages with the Python email packages

Submit Feedback

Similar Articles

Sneaky spam in conversational replies to blog posts

Companies Are Using Reddit to Manipulate ChatGPT and Google AI Search

@dabit3: Super interesting story that shows how the current state of @github is unable to protect open source maintainers from A…

Reddit is now warning mods if you frequently post in AI subreddits

Notes about reading messages with the Python email packages