Keeping it Clean: Creating a Profanity Filter with Flex
I was recently tasked with writing a profanity filter for the chat module of an AIR application. I did some research and alas, there were no Flex examples to be found. I thought I’d share my implementation with you.
The filter needed to replace naughty words with asterisks: so profanities such as ‘f— you’ would appear as ‘**** you’. The filter also required the ability to use localized word ‘blacklists’.
I utilized two key functional areas of Flex to help me: regular expressions and resource bundles. I used regular expressions to search for naughty words within the input string, and resource bundles store the localized black lists.
Here’s a look at the core algorithm:
public static function cleanseChatText(inputString:String):String {
if (chatBlackList == null) {
chatBlackList = ResourceManager.getInstance().getStringArray(“resources”, “chat_Blacklist”);
}
for each (var word:String in chatBlackList) {
var replStr:String = createReplacementWord(word);
// check if string is a naughty word
var regex:RegExp = new RegExp(“^” + word + “$”, “gism”);
inputString = inputString.replace(regex, replStr);
// check if string starts with naughty word
regex = new RegExp(“^” + word + “(\\W)”, “gism”);
inputString = inputString.replace(regex, replStr + “$1”);
// check if string ends with naughty word
regex = new RegExp(“(\\W)” + word + “$”, “gism”);
inputString = inputString.replace(regex, “$1” + replStr);
// check if naughty word is in string
regex = new RegExp(“(\\W)” + word + “(\\W)”, “gism”);
inputString = inputString.replace(regex, “$1” + replStr + “$2”);
// or other words start with naughty word (ignore short stuff)
if (word.length > 3) {
regex = new RegExp(“(\\W)” + word, “gism”);
inputString = inputString.replace(regex, “$1” + replStr);
regex = new RegExp(“^” + word, “gism”);
inputString = inputString.replace(regex, replStr);
}
}
return StringUtil.trim(inputString);
}
So what’s going on? First, we get the localized black list (if we haven’t already cached it). I‘m using the getStringArray()function of the ResourceManager; this returns an array of strings from a comma or otherwise delimited resource entry. After that, we loop through the word list and look for words to replace, using several regular expressions to match the word in various locations of the input string.
You’ll notice there are “$1” and “$2” in the replacement strings. What’s that? It’s Flex’s nomenclature for accessing matched patterns. For example “(\\W)” (which is any non-word character, i.e. not a-zA-Z0-9) is represented by “$1” in the replacement string. So the string “what-the-bleep” is replace with “what-the-*****”, and the non-word characters are retained in the replacement string.
There’s one more handy function. This is used to generate the replacement strings. I use a Dictionary object to cache the replacement strings so I am not constantly rebuilding them. If you don’t want to asterisk out the words, you could modify this function to create a custom replacement word.
private static function createReplacementWord(word:String):String {
var replStr:String = replWordList[word];
if (replStr == null) {
replStr = “”;
for (var i:int = 0; i < word.length; i++) {
replStr += “*”;
}
replWordList[word] = replStr;
}
return replStr;
}
So that’s it! Now you have the tools to ‘keep it clean’ using Flex.
2 Comments