If you don't know what Twitch is, it's an IRC server with some additional functionality. Instead of a channel owner, Twitch channels have broadcasters, they have a possiblity to broadcast live video and provide entertainment with collaboration of chatters. You would go back to using IRC if there was someone constantly providing entertainment, right? Billions [citation needed] people watch TV every day. Turns out that if you wrap it around in a nice website and provide a little incentive for broadcasters, people not only watch, but provide entertainment themselves for free, or even sometimes pay to provide entertainment.
Artist's rendition of live broadcast in mIRC
If you do know what Twitch is, you will realize that the description feels a little off. Twitch is so big nowadays that you could divide communities in hundreds of different circles. There are "communities" that have broadcasters who do not interact with chat at all and people STILL pay for it. I will never understand american tipping culture.
Some of the communities are gaming skill oriented (gaming is an easy way to keep your broadcast live without it being a still image), some are travel oriented, porn oriented. We'll focus on chat interaction oriented communities onwards.
What is P&SL?
I have no idea how or when I came up with name Poo and Spam Labs. Let's not this little detail distract us from the fact that chat interaction oriented communities have a huge problem - the chat. You see, the broadcaster is mostly in complete control of his live broadcast, but anyone can register an account and post anything in the chat. Here comes spam, racist slurs, ascii and braille art of phallus, advertisements, scams, malware, harassment or something written in a language you don't understand. Preventing all of that 100% is impossible and will most likely be impossible forever, it's basically the same you see all over the internet. What you can do is try - this is where the RPG part comes in.
You pretend that you are some kind of big threat prevention guy, the other side pretends that he has hacked the entire channel, all while he's running a script he got off github for free hoping he will get a reaction from a broadcaster by spamming the chat with a bunch of accounts, and you are trying to enumerate all of his accounts using basic API access on a 512MB RAM AWS free tier server just to see if you can.
Why won't Twitch just not do that themselves?
you might ask. Well, they did, but they did a crappy job despite being actually incentivized to do it (less scammed people by others = more scamming monetizable traffic you can do yourself). This is where defunct part comes in. Nowadays Twitch does a good job in preventing some of malicious things by magic of buying a 3rd party solution. It's not 100% of course, but it is very noticeable if compared to what was happening before. This goes for advertisements, scams, malware and such, stuff involving bots and automated traffic.
Racist slurs, harassment, etc. kinda solved itself (in limited circles). People finding fun in that realized it is better that instead of taking a dump in some popular channel with no reaction from a streamer and getting banned in a couple of seconds, it is smarter to get a broadcaster in on the joke instead. Setup goes like this - broadcaster pretends that he's a regular streamer doing streamer things but he's very hurt by all of these racist slurs, song requests and other crap by viewers who pretend that they are evil murderers who will murder everyone. In reality, everyone is on the jig and everyone has some jolly old fun. These channels exist to this day, although you have to be really into the community to get the latest broadcaster channel name because Twitch is still kinda obligated to ban the channel if they notice evil burglars threatening to rob everyone.
Spam and ascii art of phallus isn't really a huge problem. Some communities actually prefer unreadable chats and take pride in maintaining being it so. The containment zone is much larger compared to above, but it must be mentioned that compared to the whole website this containment zone is pretty small but vocal.
The idea of cross-banning
does not work and will likely not work unless someone has other ideas from what I will mention now.
The idea is that communities have more than one channel and the ban list that is maintained by channel moderators is local only in that specific channel, so if channels in a community are so similar, why not synchronize the ban list across the community?
You don't have permission to perform that action.
It is really hard to get moderation permission in a broadcasters channel even you have a moderator who will do bans for you and everyone completely trusts you including the broadcaster. So what's the problem? Turns out broadcasters are notoriously lazy and forgetful. There are other elements to it of course, like someone requesting moderation access can look like a scam.
WHAT? Bill Jezzos is banned?
Imagine you are a broadcaster and suddenly you get given $10000 by some moron you don't even know the real name of. Two days later, he dumps 20 messages at once (naturally, he's a moderator now) threatening someone because he obviously doesn't have it all together. Obvious action is to tell him to apologize - what's the point of bans (or temporary bans) if person doesn't come out better? Or maybe a timeout? Less obvious action is whether he gets cross-banned. He didn't give away anything in other channels, but if you are banning him only in some channels suddenly you find yourself with a cloud-native application without moral integrity.
It takes more than 0.1 seconds to do
Banning in a channel is easy, really easy. One button away easy. Cross banning is not built-in in any tool, not to mention the main website 1. There must be an appeal system in place because humans aren't perfect. You must write a comment for every ban because the system involves more than a couple of moderators of one channel and they can't read your mind.
1 - there are some native capabilities similar to cross banning since then, but its far from being an automated solution
Let's build it anyway
This is where P&SL started (just called CCB for cross channel banner at the time), I don't remember the exact motivation, but initial idea is early prevention. What if instead of banning someone in 0.5 seconds, you could ban in negative whatever seconds? Join a couple of community channels with an anonymous account (read-only), have a list of naughty phrases and we have some chatters that you can do whatever with beforehand. That doesn't usually involve banning because matching one string to another does not represent a positive match for an account that will take a dump in all channels in a row. There was not much work done on the cross-banning aspect itself, it is much more fun to work on the matching of messages.
Banphrases
There are multiple chat bots for twitch you can use for general automated moderation. This will involve some kind of banphrase engine which usually consists of simple and regex string matching. Clearly this is not enough if we are talking about ALL of the ink strokes in unicode and artistic expression in braille canvas.
Simple string matches
Nothing too special we can do here, there are more efficient algorithms designed to match multiple needle strings against a haystack like Wu-Manber multi-pattern matching algorithm or Aho-Corasick algorithm.
Regex matches
PCRE2 is the king in features/performance. Chatters are smart enough to understand that if you are getting banned immediately, it means that bot caught you. To catch a bigger mouse, we need a bigger cat.
Braille and ascii art
Actually scratch that, only braille art. Chatters generally do not use ascii art because for that to work you have to craft your message in a way that renders the same to as many users as possible. In IRC you cannot insert newlines, so you have to rely on word wrapping. Braille (\u2800-\u28FF) characters are more consistent. Here same applies with banphrase evasion, so we need to lossly match braille art. Following works well:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
namespace PooAndSpamLabs.CCBMatcher.Services.BanPhrases.Algorithms
{
public static class Braille
{
public static float BraillePartsV1(ReadOnlySpan<char> _Stored, ReadOnlySpan<char> _Input)
{
string filter(ReadOnlySpan<char> In)
{
var ret = new char[In.Length];
int i = 0;
for (int x = 0;x < In.Length;x++) {
if ((In[x] != '⣿' && In[x] != '⠄' && IsCharBrailleArt(In[x])) || In[x] == ' ' || In[x] == '\u2800')
ret[i++] = In[x];
}
ret = ret[0..i];
return new String(ret);
}
var stored = filter(_Stored).Split(new[] { " ", "\u2800" }, StringSplitOptions.RemoveEmptyEntries).ToList();
var input = filter(_Input).Split(new[] { " ", "\u2800" }, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = new bool[stored.Count];
var matchIndexes = new List<int>(input.Count);
for (int i = 0;i < input.Count;i++) {
for (int x = 0;x < stored.Count;x++) {
if (!matches[x]) {
if (input[i] == stored[x] && !matchIndexes.Contains(i)) {
matches[x] = true;
matchIndexes.Add(i);
}
}
}
}
if (matchIndexes.Count == 0)
return 0;
float matchesScore = (float)matches.Count(x => x == true) / stored.Count;
if (matchIndexes.Count > 1) {
var orderDiff = new float[matchIndexes.Count - 1];
for (int x = 0;x < matchIndexes.Count - 1;x++) {
orderDiff[x] = 1 / ((float)matchIndexes[x + 1] - matchIndexes[x]);
}
var ordered = matchIndexes.OrderBy(x => x).ToList();
var orderMatch = new bool[matchIndexes.Count];
for (int x = 0;x < matchIndexes.Count;x++) {
if (matchIndexes[x] == ordered[x])
orderMatch[x] = true;
}
float orderScore = orderDiff.Sum() / orderDiff.Length;
float orderMatchScore = (float)orderMatch.Count(x => x == true) / orderMatch.Length;
return matchesScore * orderScore * orderMatchScore;
} else {
return matchesScore;
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsCharBrailleArt(char In)
{
return
(In >= 0x2800 && In <= 0x28FF) || // Braille Patterns
(In >= 0x2580 && In <= 0x259F) || // Block Elements
(In >= 0x2500 && In <= 0x257F) || // Box Drawing
(In >= 0x1F300 && In <= 0x1F5FF) || // Miscellaneous Symbols and Pictographs
(In >= 0x2600 && In <= 0x26FF); // Miscellaneous Symbols
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsCharBrailleArt(Rune In)
{
return
(In.Value >= 0x2800 && In.Value <= 0x28FF) || // Braille Patterns
(In.Value >= 0x2580 && In.Value <= 0x259F) || // Block Elements
(In.Value >= 0x2500 && In.Value <= 0x257F) || // Box Drawing
(In.Value >= 0x1F300 && In.Value <= 0x1F5FF) || // Miscellaneous Symbols and Pictographs
(In.Value >= 0x2600 && In.Value <= 0x26FF); // Miscellaneous Symbols
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsCharBraille(Rune In)
{
return In.Value >= 0x2800 && In.Value <= 0x28FF;
}
}
}
BraillePartsV1 returns a score with 1 being a 100% match. I cannot describe what the algorithm does because it has been a while after I wrote it.
Confusables
Anyone that explored unicode even a little bit noticed that you can write same letter in a lot of different ways. There are so many that it becomes not feasible to write a regex that accounts for these characters. ConfusableMatcher is built specifically to "solve" this problem. It allows you to add any arbitrary mapping for a string, for example matching 6
to G
, or /\/\
to M
. Predictibly, it is very resource intensive. It also can match repetitions, word boundaries and account for any ignored strings. It is probably possible to implement this in some other more performant string matching algorithm, or maybe replace character maching code in JITed regex pattern of PCRE2 or source generated C# regex with ConfusableMatcher?
Here's the full confusable map in case you're interested: LINK. You can do a little bit more having this map, if you know that key is value, then value is also the key. Additionally, if key is value, then value is key somewhere else, for example: if K -> |<
and | -> ¦
, then |< -> ¦<
and K -> ¦<
. Last trick unfortunately is too strong - produces too many false positives when matching and is not used.
Scripts
At some point the matching algorithms got so complex it was no longer feasible to perform matching using built-in methods, so I had to include compiling scripts at runtime. Surprisingly for an application written in C#, it is not that hard - AssemblyLoadContext
is your friend. You can also include extensions such as
#pragma lib "System.Console"
#pragma lib "GraphQL.Primitives"
#pragma lib "GraphQL.Client"
#pragma usings
You can parse any #pragma
tags and link with any libraries or include commonly used using
statements for convenience.
The TriHard emote. Despite being an innocent emote, only use it gets on Twitch is to tell banphrase engines how bad the message is
Hades
Was the most notorious currently retired (?) racist slur spammer with a unique technique thought to never have a pattern. However, this is not true. Just think about it - for it to be recognizable by an English speaking person it must have a pattern. Of course no one said it's gonna be easy, but the database had this script which is disabled according to latest snapshot. Probably because of false positives.
#pragma lib "ConfusableMatcher-cs-interop"
#pragma usings
using System.Linq;
using System.Collections.Generic;
using System.Text;
using PooAndSpamLabs.Shared.Utils;
using Serilog;
public class Script : IScriptBanPhrase
{
static readonly char[] Numbers = new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
static readonly string[] WhiteList = new[] {
"nicker",
"angeli79CRY",
"nickiBruh",
"spinningchurro",
"daniel160Cheers",
"jonnykCheer",
"angie6Creep",
"NickEverdin",
"eniggnering",
"negi13NEGIred2",
"lecunnincgrunt",
"AnyBeggers",
"AngelickReaper",
"INSANIQUARIUM",
"nilanaAngry",
"anikicBruh",
"SilviaGunner",
"nyannAngry",
"hungry65CRY",
"omochi2Heart",
"VicBerger",
"OnyxGuard11",
"gopnik1CheekiBreeki",
"money-hungry",
"DINNER!@Grossie_Gore",
"johnny357Gross",
"IncreasinglyAngeryCog",
"skinning/herb",
"envelhecermos",
"johnny357Crane",
"nick_jr",
"Nighrbringer",
"nighrbringer",
"Destiny.gg/shirt",
"jeeninAngryPicklePee",
"TinySquirtle",
"niliPunisher",
"nickPlsdancingincolors"
};
static readonly string[] NRWhiteList = new[] {
"NING-KRUGER",
"ning-krueger",
"nNING-KRUGER",
"nning-krueger",
"NNING-KREUGER",
"NING-KREUGER",
"neighber",
"nsnicker",
"nnisher",
"ning/her",
"ngle/squar",
"NICKK\\GR"
};
static readonly char[] Vowels = new[] { 'A', 'E', 'I', 'O', 'U' };
static readonly string[] HadesSubstringPatterns = new[] {
"CG", "GC",
"XJ", "JX",
"CK", "KC",
"GX", "XG",
"KX", "XK",
"KH", "HK",
"GH", "HG",
"CX", "XC",
"CH", "HC",
"GS", "SG",
"SH", "HS"
};
static readonly char[] HadesTrimRepeating = new[] { 'X', 'G', 'A', '3', '4', '5' };
static readonly char[] HadesUselessLetters = new[] { 'B', 'V' };
static readonly string[] HadesObfuscatedEnds = new[] { "HUR", "GHER", "GGUR", "HUAR", "GGR", "AGUR", "HJUR", "GKUR", "GCUR", "GUAR", "GIUR" };
static readonly (string, string)[] HadesReplacements = new[] {
("J", "I"),
("NIRGG", "NIGG"),
// Pretty lose, to conform to test cases
("AAG", "GG"),
("YAG", "GG"),
("YGAY", "GG"),
("YGA", "GG")
};
static readonly string[] HadesFinalMatches = new[] { "NIGGER", "NIGGIR" };
// R must be first
static readonly string[] HadresREnds = new[] { "R", "RS", "RZ" };
static readonly (int, string)[] HadesNumberMap = new[] {
(0, "O"),
(1, "I"),
(2, ""),
(3, "E"),
(3, "G"),
(4, "A"),
(5, "S"),
(6, "G"),
(7, ""),
(8, ""),
(9, "G")
};
private static bool MatchInsideNR(ReadOnlySpan<char> In)
{
var toMatch = new HashSet<string>() {
In.ToString().ToUpperInvariant()
};
/*int longestSeqWithoutVowels = 0;
* Count longest sequence in string without vowels * {
int currentSeqWithoutVowels = 0;
foreach (var rune in In.EnumerateRunes()) {
bool isVowel = false;
if (rune.Value > 255) {
foreach (var vowel in Vowels) {
var (index, _) = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(rune.ToString().ToUpperInvariant(), vowel.ToString(), false, 0);
if (index == 0) {
isVowel = true;
break;
}
}
} else {
isVowel = Vowels.Contains(Char.ToUpperInvariant((char)rune.Value));
}
if (!isVowel)
currentSeqWithoutVowels++;
else {
if (currentSeqWithoutVowels > longestSeqWithoutVowels)
longestSeqWithoutVowels = currentSeqWithoutVowels;
currentSeqWithoutVowels = 0;
}
}
if (currentSeqWithoutVowels > longestSeqWithoutVowels)
longestSeqWithoutVowels = currentSeqWithoutVowels;
}
if (longestSeqWithoutVowels < 4)
return false;*/
/* Trim repeating chars, search up to 4 runes */ {
var toAddToMatch = new HashSet<string>();
foreach (var match in toMatch) {
var runes = match.ToString().EnumerateRunes().ToArray().AsSpan();
bool replaced = false;
for (var x = 0;x < runes.Length;x++) {
for (var partLen = 4;partLen > 0;partLen--) {
var partLen2 = partLen * 2;
while (runes.Length >= x + partLen2 && runes[x..(x + partLen)].SequenceEqual(runes[(x + partLen)..(x + partLen2)])) {
runes = runes[..x].ToArray().Concat(runes[(x + partLen)..].ToArray()).ToArray();
replaced = true;
}
}
}
if (replaced) {
var toAdd = "";
foreach (var rune in runes) {
toAdd += rune.ToString();
}
if (!toMatch.Contains(toAdd) && !toAddToMatch.Contains(toAdd))
toAddToMatch.Add(toAdd);
}
}
foreach (var x in toAddToMatch)
toMatch.Add(x);
}
/* Cleanup numbers */ {
var toAddToMatch = new List<string>();
foreach (var match in toMatch.Skip(toMatch.Count > 1 ? 1 : 0)) {
var runes = match.EnumerateRunes().ToArray();
bool add(int startingIndex, List<Rune> constructedSoFar)
{
for (var x = startingIndex;x < runes.Length;x++) {
var rune = runes[x];
if (rune.Value <= '9' && rune.Value >= '0') {
foreach (var mapped in HadesNumberMap.Where(x => x.Item1 == rune.Value - '0')) {
if (!add(x+1, constructedSoFar.Concat(mapped.Item2.EnumerateRunes().ToArray()).ToList()))
return false;
if (!add(x+1, constructedSoFar.ToList()))
return false;
}
return true;
} else {
constructedSoFar.Add(rune);
}
}
var sb = new StringBuilder(constructedSoFar.Count);
foreach (var rn in constructedSoFar) {
sb.Append(rn.ToString());
}
var toAdd = sb.ToString();
toAddToMatch.Add(toAdd);
if (toAddToMatch.Count > 300 || toAdd.Length > 50) {
return false;
}
return true;
}
if (!add(0, new List<Rune>()))
return false;
}
foreach (var x in toAddToMatch)
toMatch.Add(x);
}
/* Remove useless letters */ {
var toAddToMatch = new List<string>();
foreach (var match in toMatch) {
string toAdd = match.ToString();
for (var x = 0;x < HadesUselessLetters.Length;x++) {
while (true) {
var found = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(toAdd, HadesUselessLetters[x].ToString(), true, 0);
if (found.Index >= 0) {
toAdd = StringUtils.StringExceptRange(toAdd, found.Index..(found.Index + found.Length));
} else
break;
}
}
if (!toMatch.Contains(toAdd) && !toAddToMatch.Contains(toAdd))
toAddToMatch.Add(toAdd);
}
foreach (var x in toAddToMatch)
toMatch.Add(x);
}
/* CG CK GX KX and friend replacements to GG */ {
var toAddToMatch = new List<string>();
foreach (var match in toMatch) {
var hadesSubstrings = new List<Range>();
for (var x = 0;x < HadesSubstringPatterns.Length;x++) {
var pattern = HadesSubstringPatterns[x];
int curIndex = 0;
while (true) {
var (index, length) = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(match, pattern, false, curIndex);
if (index < 0)
break;
hadesSubstrings.Add(index..(index+length));
curIndex = index + length;
}
}
var ranges = RangeExtensions.MergeOverlappingRanges(hadesSubstrings).OrderBy(x => x.Start.Value);
var toAdd = match;
int removedLength = 0;
foreach (var range in ranges) {
toAdd = StringUtils.StringExceptRange(toAdd, (range.Start.Value - removedLength)..(range.End.Value - removedLength), "GG");
removedLength += range.End.Value - range.Start.Value - 2;
}
if (!toMatch.Contains(toAdd) && !toAddToMatch.Contains(toAdd))
toAddToMatch.Add(toAdd);
}
foreach (var x in toAddToMatch)
toMatch.Add(x);
}
/* Perform some other general replacements */ {
foreach (var (from, to) in HadesReplacements) {
var toAddToMatch = new List<string>();
foreach (var match in toMatch) {
var found = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(match, from, true, 0);
if (found.Index >= 0) {
var toAdd = StringUtils.StringExceptRange(match, found.Index..(found.Index + found.Length), to);
if (!toMatch.Contains(toAdd) && !toAddToMatch.Contains(toAdd))
toAddToMatch.Add(toAdd);
}
}
foreach (var x in toAddToMatch)
toMatch.Add(x);
}
}
/* Replace obfuscated ends to GGER */ {
foreach (var obfuscatedEnd in HadesObfuscatedEnds) {
var toAddToMatch = new List<string>();
foreach (var match in toMatch) {
var found = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(match, obfuscatedEnd, true, match.Length - 1, true);
if (found.Index >= 0) {
var toAdd = StringUtils.StringExceptRange(match, found.Index..(found.Index + found.Length), "GGER");
if (!toMatch.Contains(toAdd) && !toAddToMatch.Contains(toAdd))
toAddToMatch.Add(toAdd);
}
}
foreach (var x in toAddToMatch)
toMatch.Add(x);
}
}
foreach (var elem in toMatch) {
//Console.WriteLine($"Matching {elem}...");
foreach (var target in HadesFinalMatches) {
if (ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(elem, target, true, 0, false, 50000).Index >= 0) {
//Console.WriteLine($"Matched path against {target}{(longestSeqWithoutVowels < 2 ? " but no vowels" : "")}");
return true;
}
}
}
return false;
}
public EvaluateResult Exec(in ChatMessagePackage Package, IServiceScope Scope)
{
ReadOnlySpan<char> span = Package.MessageRaw;
if (!String.IsNullOrEmpty(Package.Raw.ColorHex) || Package.Raw.IsSubscriber)
return EvaluateResult.NoMatch();
foreach (var range in span.Split(' ')) {
var word = span[range];
if (word.Length < 6)
continue;
if (word.Length > 3 && word[0] == '@')
continue;
/* Trim repeating chars at the end of the string, search up to 4 runes */ {
var runes = word.ToString().EnumerateRunes().ToArray().AsSpan();
bool replaced = false;
for (var partLen = 4;partLen > 0;partLen--) {
while (runes.Length >= partLen * 2 && runes[^partLen..^0].SequenceEqual(runes[^(partLen*2)..^partLen])) {
runes = runes[..^partLen];
replaced = true;
}
}
if (replaced) {
var newWord = "";
foreach (var rune in runes) {
newWord += rune.ToString();
}
word = newWord;
}
}
var (Nindex, Nlength) = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(word, "N", true, 0);
if (Nindex == -1)
continue;
int Rindex, Rlength;
(Rindex, Rlength) = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(word, "R", true, word.Length - 1, true);
if (Rindex == -1)
continue;
// N is too far into the center of the string
if (word[..Nindex].CountCodePoints() > 6)
continue;
// R is too far into the center of the string
if (word[..(word.Length - Rindex)].CountCodePoints() > 6)
continue;
// R is first, not N
if (Rindex < Nindex)
continue;
Range? match(ReadOnlySpan<char> word, int nindex, int rindex, int rlen)
{
var insideNR = word[Nindex..(Rindex+Rlength)];
if (insideNR.Length < 7)
return null;
if (!MatchInsideNR(insideNR))
return null;
var result = insideNR.ToString();
return Nindex..(Rindex+Rlength);
}
bool matched = false;
int triedCount = 0;
Range? matchedRange = null;
/* Fix N location, we need the last end not the first one, but pick a couple of last ends */ {
int startIndex = Nindex + Nlength - 1;
while (true) {
var (index, length) = ConfusableMatcherBanPhrase.ConfusableMatcher.IndexOf(word, "N", false, startIndex);
if (index >= 0) {
if (triedCount < 3) {
if ((matchedRange = match(word, index, Rindex, Rlength)) != null) {
break;
}
triedCount++;
}
break;
}
startIndex--;
if (startIndex == -1)
break;
}
}
if (matchedRange == null)
break;
var wordStr = word.ToString();
var nrWordStr = word[matchedRange.Value].ToString();
if (WhiteList.Any(x => x.ToLowerInvariant() == wordStr.ToLowerInvariant().Replace("\U000E0000", "").TrimEnd(new[] { ',', ':', '?' })))
continue;
if (NRWhiteList.Any(x => x.ToLowerInvariant() == nrWordStr.ToLowerInvariant().Replace("\U000E0000", "").TrimEnd(new[] { ',', ':', '?' })))
continue;
if (Package.Raw.EmoteSet?.Emotes?.Any(x => x.Name == wordStr) == true)
continue;
var absoluteRange = (range.Start.Value + matchedRange.Value.Start.Value) .. (range.Start.Value + matchedRange.Value.End.Value);
return new EvaluateResult(EVALUATE_STATUS.MATCH, CHAT_MESSAGE_PART.MESSAGE, absoluteRange);
}
return EvaluateResult.NoMatch();
}
}
There were other scripts, like this one below. Instead of trying to match and wrangle around an exact word, this just looked if the word itself is something no one would write normally.
#pragma usings
using System.Linq;
public class Script : IScriptBanPhrase
{
static readonly char[] Numbers = new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
static readonly string[] WhiteList = new[] {
"vansam1GachiDark",
"MARTEN1TRIHARD7",
"iNF3CT0Rz",
"infect0rz",
"kreshnik1mripa",
"moon2TRIHARD",
"AsianJesus69er",
"ZYGINESA5960RJM",
"kreshnik2mripa",
"kreshnik3mripa",
"kreshnik5mripa",
"kreshnikmripa5",
"knock2lover",
"asfand1yar",
"$asfand1yar",
"nolimi21Biggreens",
"daniel1rodrigues",
"frenchvanilla3sugars",
"strong26TriHardo",
"enigma100Racer",
"lamont4weird"
};
private bool MatchInsideNR(ReadOnlySpan<char> In)
{
bool matched = false;
/* Check if word has more than 3 X's inside */ if (!matched) {
int xes = 0;
foreach (var x in In.EnumerateRunes()) {
if (x.Value == 'X' || x.Value == 'x') {
xes++;
}
}
if (xes >= 3) {
matched = true;
}
}
/* Check if word has numbers inside */ if (!matched) {
matched = In.IndexOfAny(Numbers) != -1;
}
/* Check CGXK (3 letters near) relationship */
/* Check GX repetitions */
return matched;
}
public EvaluateResult Exec(in ChatMessagePackage Package, IServiceScope Scope)
{
ReadOnlySpan<char> span = Package.MessageRaw;
if (span.IndexOf("TriHard") == -1 && span.IndexOf("cmonBruh") == -1) {
return EvaluateResult.NoMatch();
}
foreach (var range in span.Split(' ', '_')) {
var word = span[range];
if (word.Length > 3 && word[0] == '@')
continue;
var Nindex = word.IndexOf("N", StringComparison.OrdinalIgnoreCase);
var Rindex = word.LastIndexOf("R", StringComparison.OrdinalIgnoreCase);
if (Nindex == -1 || Rindex == -1)
continue;
if (Nindex > 6)
continue;
if ((word.Length - Rindex) > 6)
continue;
if (Rindex < Nindex)
continue;
var insideNR = word[(Nindex+1)..Rindex];
if (insideNR.Length < 4)
continue;
if (!MatchInsideNR(insideNR))
continue;
var result = span[range].ToString();
if (WhiteList.Any(x => x.ToLowerInvariant() == result.ToLowerInvariant().Replace("\U000E0000", "").TrimEnd(new[] { ',', ':' })))
continue;
if (Package.Raw.EmoteSet.Emotes.Any(x => x.Name == result))
continue;
return new EvaluateResult(EVALUATE_STATUS.MATCH, CHAT_MESSAGE_PART.MESSAGE, range);
}
return EvaluateResult.NoMatch();
}
}
Then we have the actual user. Each new account must have a username. What a great opportunity to take a dump! This one works great because it checks if user has signs of being new - no badges, no privileges, no name color set. Last one in particular is a little secret for any chatters that only use Twitch web interface because instead of displaying some kind of default color for users with no color set, Twitch web interface would generate a color for you client-side. He would stand out only for 3rd party client users and banphrase engines.
#pragma lib "System.Text.RegularExpressions"
#pragma lib "TwitchLib.Client"
#pragma usings
using System.Linq;
using System.Text.RegularExpressions;
using PooAndSpamLabs.CCBMatcher.Utils;
using PooAndSpamLabs.CCBMatcher.DB;
public class Script : IScriptBanPhrase
{
public Classes.BASE_BANPHRASE_OPT[] Opts { get; } = new[] {
Classes.BASE_BANPHRASE_OPT.MATCH_USERNAME
};
private readonly Regex[] matchAny = new Regex[] {
new Regex(@"^h(?:\d|a)(?:.+?)_(the|lac|dk)_+(kek+_*w*\d*|great|l[o0]+rd)_*$", RegexOptions.Compiled)
};
public EvaluateResult Exec(ChatMessagePackage Package, ChatMessagePart TargetPart, IServiceScope Scope)
{
if (TargetPart != ChatMessagePart.Username)
return EvaluateResult.NoMatch();
return Exec(Package, Scope);
}
public EvaluateResult Exec(ChatMessagePackage Package, IServiceScope Scope)
{
if (!String.IsNullOrEmpty(Package.Raw.ColorHex) ||
Package.Raw.IsSubscriber ||
Package.Raw.IsModerator ||
Package.Raw.Badges?.Count > 0 ||
Package.Raw.CheerBadge?.CheerAmount == 0 ||
Package.Raw.DisplayName != Package.Raw.Username)
{
return EvaluateResult.NoMatch();
}
bool matchedAny = false;
foreach (var rgx in matchAny) {
var match = rgx.IsMatch(Package.Username);
if (match) {
matchedAny = true;
break;
}
}
if (!matchedAny)
return EvaluateResult.NoMatch();
return new EvaluateResult(
EVALUATE_STATUS.MATCH,
ChatMessagePart.Username,
0..,
""
);//
}
}
One unexplored area that nobody I saw try is to actually not have any pattern. Imagine gaining hades-like reputation business as usual, then slowly transitioning into less and less comprehensible messages with less and less comprehensible usernames. Anything you write a couple of times will get noticed and recognized remembering previous reputation, other chatters will make sure to spam a couple of attention-accepting messages to let everyone know something is wrong with the messages.
What are the intentions of this man? Notice how without any reputation and while broadcaster is offline, there is still one person reacting. Later, other chatter flooded the channel by spamming "ASS SHIT FART" with zero reaction from others. Okay this is kind of a lame illustration but you get the point
Best banphrase engine in town
I believe that at least at the time we had the best banphrase engine on Twitch. It supports:
- Plain text matches
- Plain text list matches (like a list of youtube video IDs in one banphrase)
- Regex, automatic unintentional infinite recursion check against 1000 pre-inputted messages
- Confusable matches
- Braille art matches
- Synchronous and asynchronous C# scripts compiled & run at runtime
- Message pre-transforms: binary, braille (not art, actual hieroglyphs), morse, reverse, unidecode. All to deobfuscate message before matching
- Memoization for repeating messages and users
- Both message & username matches with a possibility to combine.
- Multi-threading
- Text classification for ascii-only matches and etc.
- Resolving twitch clip titles and matching that against the whole pipeline (although that was a script so technically doesn't count as a feature)
- Match traceability and debugging
- Link shortener resolver
Restrictions breed creativity
Bot listing

Whatever government mission patch or logo it was, it was too cool not to steal
While all of these banphrase features are great, people who intend to take a dump in a channel with multiple botted accounts do things differently.
In general, I split botters into three categories:
- Ones that use automated tools to create accounts and send messages
- Ones that do not use any automated tools to create accounts, but use automated tools to send messages
- Ones that buy or steal twitch accounts and use automated tools to send messages
Listed in the order of ease of enumeration, each case is different.
Data sources
Not being an employee at xarth.tv oops justin.tv oops twitch.tv, you have a very limited set of data about user - no email addresses, no IP addresses, no access times (mostly). You do however have (as of 2023-03-25):
- Id
- Login
- Roles - affiliate, staff
- Autohost channels (now decommissioned)
- Profile, offline, banner images
- Updated and created dates
- Followers total count (channel following others, that is)
- Profile view count
- All follows and notification settings on each follow (others following this channel, that is)
- Preferred language
- Last broadcast title and time
- Chat name color
- Badges
- Description
- Spent more than 1 bit (imaginary Twitch currency for tax evasion)
- Channel panels
- Social media links
All of this can be resolved via unofficial GQL API which is superior to official Helix API, especially considering that in order to fetch information about each follow, you would have to fire a call for each user with Helix. Due to an incident where someone abused the GQL API, GQL introspection is not available. There were similar instances before, but it took only one big enough that caught Twitch staffs eye.
To search users like you would in the website, Twitch had public Algolia API.
If you had a verified bot account back in the day, you were automatically granted access to firehose (https://tmi.twitch.tv/firehose). If you know some username or message patterns, firehose could allow you to catch bots very early - they usually target small streamers first for bigger reaction.

FILE PHOTO
Firehose messages filtered to known bots
Enumeration methods
There are quite some things you can do with all this data, although most of them are analyzing follows since they provide most data. Mid to large streamers always have follow mode on in their chats, so you are required to follow them to write any messages in chat. This is true even nowadays, even though there are better ways to counter bots.
Follow spikes
Usually first course of action is to identify a date range where bots have followed a channel. Very rarely we would encounter a botmaster who would spread their bot follows in a longer period. For that P&SL had grafana boards of follow counts for monitored channels. Spike = bots. It is very noticeable. You don't see spikes in follows even if streamer is doing some kind of event or a sponsor that invites large amounts of users.
Discovering more channels from bots
If you take a look at follow list of a bot, you will usually see multiple channels. If a botmaster only follows one channel with limited set of his bots, you can fetch some other date range from other channel to make your list more complete.
Import from chat
If you already have a channel that has been shat on, you can pull all participating users into a list and import it into the system for filtering.
Global fetch
Account IDs are sequential in Twitch, so you can just pull accounts globally if you know they are there. Naturally, this pulls a lot of unrelated users which makes it hard to filter through.
FILE PHOTO
You can see a global count graph on the left and graphs for each channel on the right.
Filtering
Lists in P&SL are kept in memory and saved in a file for persistence. There was no need to use psql at the time for this. Not sure if I mentioned this, but everything is controlled in a IRC channel, so every filter is a command. Every command adds one node to a filter tree which is quite flexible - it can branch out into two lists, be reverted, commented and common set operations are used too.
Sanity checks
Is user banned on twitch (you can still query banned accounts)? Is the user already in our 'bots' database? Both questions should be answered as soon a list of users is pulled.
Username
Obvious patterns in username are very often encountered in first (automated + automated) category of bots, but a filter on username is not a silver bullet. Regex, english dictionary and entropy (rarely used) are utilized.
Profile image
Twitch doesn't reencode uploaded images, so if you use the same image for all your bots, their file hashes will match. Also, if you apply same effects to your profile image, one can apply the same effects to all accounts they pulled in a date range and see if they were affected by the effect or not. For example, if you have a very JPEG-y image, you can find out the intensity of the JPEG effect by binary search and then apply the effect to the image. If JPEG distortion intensity matches, the image won't be generally affected.
Creation date
If you have a list of users who followed a channel at a similar time, their creation date usually also match for category 1 and 2 bots. If botmaster has multiple batches cooked, you can also match for multiple ranges.
Chat color
Not a very good indicator, but if all bots have gray names (default), its better than nothing
Description
If you are in a discord pretending to be in a highly organized hacker group, it would look like filling each bot account description with information about your group and discord join link is a good idea. This is similar to profile image hash check.
Follows
All other filters are based on follows, so you need to fetch all follows of all users in a list which can take quite some time. With less efficient methods one botlist was filling up for up to 24 hours.
Follow pool analysis
For a lack of a better name, this very often used filter would sum up occurances of how often it appears in follows of all users in the list and then would divide the number by user follows count and the count of all users in the list. Result is a number in a range of 0-1 that essentially tells the chance of how coordinated follows of the user are among the list. This works perfectly for botmasters which have a fixed list of target streamers to hit.
There are improvements that can be made to this - if you imagine botmaster in your shoes, your follow program with fixed list is probably following channels from start to finish unrandomized, so accounting for some errors (network latency and request failures will not make it 100% identical across all users), you could assign a higher score if you see that follows are in order across many users.
Date of specific follow
Looking at this in a more primitive way, you could just choose one channel a bot has followed, and then check if the date of follow falls within some date range. This is essentially what we are doing when we pull users from a follow spike, but multiple times for different channels.
Follows count
With a fixed list of target streamers, follow count will be around the same for all bots.
Filter tree
Now, filtering methods mentioned are simplified, in total there are about 60~ fully functioning filters. If we boot up latest snapshot of P&SL and take a look at some bot lists, we can see instances where bots were caught very fast and easy due to high confidence of matching exact properties of users - fetching interval is about 3 minutes and follow pool score of >0.55 is exceptionally good.
Others, not so much.
It might look like that follower fetch was done 43 times, but it is an artifact due to how branching works.
Names for lists in set operations are operator defined and arbitrary. Any number in brackets beside the name refers to list node history. General idea with this botlist is to filter what fits, then branch out skipped entries into different list, filter on what fits there, and then merge everything at the end. Basically, divide & conquer.
Getting rid of bot users
Once you have high confidence that your list is free of false-positives, you can "publish" the list and distribute it to other bots and P&SL itself to ban the bots in P&SL-subscribed channels.
As a verified bot, you have elevated rate limits. Back in the day, that would be 250 messages per second globally if you're a mod in the channel. Everything else is pretty much unfeasible at a grand scale. You would probably be fine with 30/s standard mod rate limts for one channel, but that's it.
One little detail that makes this a bit harder is that you must load balance your messages across multiple TCP connections or you will get axed, and for some reason it is not so easy to automatically reconnect to a TCP destination cleanly. This gets more complicated if you are load balancing multiple channels per one TCP connection with both anon (read-only) and mod authentication, ensuring that you have at least one read connection active because you of course don't want to connect to channel anonymously seperately if you already have a message ingress from a mod auth connection.
Now, you have two choices:
- The Sherrif. Fire bans at 250m/s multiplexing them to multiple channels, giving out an illusion that bot is banning slowly (for one channel) and distrupting the chat less, because of course all bans are announced globally for everyone with no option to turn it off.
- The Deputy. Fire bans at 250m/s into single channel, completely wiping out any constructive discussion and giving way to mindless spam.
The Double Down in Candy Town creates a program through a Deep Web Friend (not Aiden)... It's really just some other bot that is using a published P&SL list
But I never saw or heard of any bans during the time!
Remember the disruption part? Turns out that Twitch TMI had a bug for a long time that allowed you to post commands into a channel successfully filling in their database, but without having any delete/ban announces in the channel. So actually, there is no real difference between two rapidfire ban modes at all. All you had to do is instead of sending
PRIVMSG #channel .ban target reason
you send
PRIVMSG channel .ban target reason
To get it end up into parallel universe. There are no integer overflows here unfortunately, just conflicting parsing between twitch services. Mods still get announced through PubSub, but regular users don't see it.
Encrypted reason
As an entity roleplaying as secret, you don't want other users, moderators or the broadcaster to know why the user is banned for. Luckily, you can enter an encrypted reference string into ban reason to later decrypt and correlate with the database. Here is an encoding algorithm I found on the internet at the time (modified) that is efficient and in acceptable format for a ban reason.
static readonly byte[] IllegalChars = new[] {
(byte)0, // null
(byte)9, // tab
(byte)10, // newline
(byte)13, // carriage return
(byte)32, // space
(byte)34 // double quote
};
static int Encode(Span<byte> encoded, ReadOnlySpan<byte> rawData)
{
int curIndex = 0;
int curBit = 0;
int encodedCursor = 0;
byte get7(ReadOnlySpan<byte> rawData)
{
var firstByte = rawData[curIndex];
byte firstPart = (byte)((((0b11111110 >> curBit) & firstByte) << curBit) >> 1);
curBit += 7;
if (curBit < 8)
return firstPart;
curBit -= 8;
curIndex++;
if (curIndex >= rawData.Length)
return firstPart;
var secondByte = rawData[curIndex];
var secondPart = ((0xFF00 >> curBit) & secondByte & 0xFF) >> (8 - curBit);
return (byte)(firstPart | secondPart);
}
while (true) {
if (curIndex >= rawData.Length)
break;
var bits = get7(rawData);
int illegalIndex;
if (bits <= 34 && (illegalIndex = Array.IndexOf(IllegalChars, bits)) != -1) {
byte b1, nextBits;
if (curIndex >= rawData.Length) {
b1 = 0b11011110;
nextBits = bits;
} else {
nextBits = get7(rawData);
b1 = (byte)(0b11000010 | (0b111 & illegalIndex) << 2);
}
encoded[encodedCursor++] = (byte)(b1 | ((nextBits & 0b01000000) >> 6));
encoded[encodedCursor++] = (byte)(0b10000000 | (nextBits & 0b00111111));
} else {
encoded[encodedCursor++] = bits;
}
}
return encodedCursor;
}
static int Decode(Span<byte> decoded, string In)
{
int decodedCursor = 0;
byte curByte = 0;
int bitOfByte = 0;
void push7(Span<byte> decoded, byte byt)
{
byt <<= 1;
curByte |= (byte)(byt >> bitOfByte);
bitOfByte += 7;
if (bitOfByte >= 8) {
decoded[decodedCursor++] = curByte;
bitOfByte -= 8;
curByte = (byte)((byt << (7 - bitOfByte)) & 255);
}
}
for (var i = 0;i < In.Length;i++) {
var c = In[i];
if (c > 0x7F) {
var illegalIndex = (c >> 8) & 0b111;
if (illegalIndex != 0b111)
push7(decoded, IllegalChars[illegalIndex]);
push7(decoded, (byte)(c & 0x7F));
} else {
push7(decoded, (byte)c);
}
}
return decodedCursor;
}
Final words
There is much more that I could write on, like plans on processing incoming media from youtube or twitch clips to scan for taken dumps by video or audio recognition, the custom SSH terminal shell and pop quiz inside of it you have to go through in order to unlist someone from CCB, the shady ass domain for 3rd party bots to fetch the bot lists from with 40 second token expiry to discourage other chatters from clicking on the link, or intricacies of querying follows of multiple users in one GQL query real fast, but uhhhhhh what was the question?
Counting false positives, P&SL has 132 thousand CCB matches and 17.12M bot users (16.65M if excluding potentially loose lists) in 966 bot lists. In total, it has observed 97M followers in 52M unique users.
contact if you were involved and you want a shout out