Keeping it Clean: Creating a Profanity Filter with Flex

by

I was recently tasked with writing a profanity filter for the chat module of an AIR application. I did some research and alas, there were no Flex examples to be found. I thought I’d share my implementation with you.

The filter needed to replace naughty words with asterisks: so profanities such as ‘f— you’ would appear as ‘**** you’. The filter also required the ability to use localized word ‘blacklists’.

I utilized two key functional areas of Flex to help me: regular expressions and resource bundles. I used regular expressions to search for naughty words within the input string, and resource bundles store the localized black lists.

Here’s a look at the core algorithm:

public static function cleanseChatText(inputString:String):String {

if (chatBlackList == null) {

chatBlackList = ResourceManager.getInstance().getStringArray(“resources”, “chat_Blacklist”);

}

for each (var word:String in chatBlackList) {

var replStr:String = createReplacementWord(word);

// check if string is a naughty word

var regex:RegExp = new RegExp(“^” + word + “$”, “gism”);

inputString = inputString.replace(regex, replStr);

// check if string starts with naughty word

regex = new RegExp(“^” + word + “(\\W)”, “gism”);

inputString = inputString.replace(regex, replStr + “$1”);

// check if string ends with naughty word

regex = new RegExp(“(\\W)” + word + “$”, “gism”);

inputString = inputString.replace(regex, “$1” + replStr);

// check if naughty word is in string

regex = new RegExp(“(\\W)” + word + “(\\W)”, “gism”);

inputString = inputString.replace(regex, “$1” + replStr + “$2”);

// or other words start with naughty word (ignore short stuff)

if (word.length > 3) {

regex = new RegExp(“(\\W)” + word, “gism”);

inputString = inputString.replace(regex, “$1” + replStr);

regex = new RegExp(“^” + word, “gism”);

inputString = inputString.replace(regex, replStr);

}

}

return StringUtil.trim(inputString);

}

So what’s going on? First, we get the localized black list (if we haven’t already cached it). I‘m using the getStringArray()function of the ResourceManager; this returns an array of strings from a comma or otherwise delimited resource entry. After that, we loop through the word list and look for words to replace, using several regular expressions to match the word in various locations of the input string.

You’ll notice there are “$1” and “$2” in the replacement strings. What’s that? It’s Flex’s nomenclature for accessing matched patterns. For example “(\\W)” (which is any non-word character, i.e. not a-zA-Z0-9) is represented by “$1” in the replacement string. So the string “what-the-bleep” is replace with “what-the-*****”, and the non-word characters are retained in the replacement string.

There’s one more handy function. This is used to generate the replacement strings. I use a Dictionary object to cache the replacement strings so I am not constantly rebuilding them.  If you don’t want to asterisk out the words, you could modify this function to create a custom replacement word.

 

private static function createReplacementWord(word:String):String {

var replStr:String = replWordList[word];

if (replStr == null) {

replStr = “”;

for (var i:int = 0; i < word.length; i++) {

replStr += “*”;

}

replWordList[word] = replStr;

}

return replStr;

}

So that’s it! Now you have the tools to ‘keep it clean’ using Flex.

2 Comments

  1. Doug Kadlecek on said:

    Hi Kanu,
    Unfortunately, I cannot post the exact source files, as they were created for a client. That being said, if you put the two functions listed above in an AS class (e.g. ProfanityFilter.as), you should be good to go (you’ll also need a resource bundle(s) with the entry: chat_Blacklist=word1, word2, …).
    You can get a list of profanity words on the internet. Here’s a link to a good source list: http://www.jivesoftware.com/jivespace/docs/DOC-1906
    Good luck!
    d

  2. kanu on said:

    can you please add all the files related with the post as well as the list of profanities.

Leave a Reply

Your email address will not be published. Required fields are marked