Whats the easiest way to filter string for offensive language

Hi,

Just wondering the best / easiest way to filter string for offensive language is. I have some text that I want to filter for offensive language based on a list of key words.

Thanks

Simon



Answer this question

Whats the easiest way to filter string for offensive language

  • Patrick.I

    hi,

    i have tried this i don't think its the best practice but it works

    class Program

    {

    static void Main(string[] args)

    {

    string[] badwords = { "badword1", "badword2", "badword3", "bad4" };

    string mytext = Console.ReadLine();

    string Cleantext = "";

    foreach (string badword in badwords)

    {

    if(mytext.Contains(badword))

    {

    string newValue = "";

    for (int i = 0; i < badword.Length;i++ )

    {

    newValue += "*";

    }

    Cleantext = mytext.Replace(badword, newValue);

    }

    }

    if (Cleantext.Length == 0)

    {

    Cleantext = mytext;

    }

    Console.WriteLine(Cleantext);

    Console.ReadLine();

    }

    }

    hope this helps



  • Christophe Kung

    Hello again,

    I had a look, but string does not have a method called RemoveAll. However, I did find Regex.Replace (System.Text.RegularExpressions), which does I similar thing to what you mentioned. So the syntax would be:

    Str = Regex.Replace(Str,WORD,ALTERNATIVE);

    Looping if you have a list.

    Cheers


  • Chellam

    You can simply use String.RemoveAll(word). (not 100% on that syntax, it might just be Remove(word)).

    If you start with your string STR and your list of words WORDS[], then repeatedly calling

    STR = STR.RemoveAll(WORDS[index])

    as you step through the words will remove the offending terms. You could also have a secondary list of words that you use to replace offending terms, such as f*ck, fyck, or firetruck. Then it would be a case of

    STR = STR.ReplaceAll(WORDS[index], ALTERNATIVES[index]).

    Hope this helps.



  • Alex Lerner

    I checked again, and string also has a Replace method, not sure what the difference is between the Regex.Replace and String.Replace, I guess you could use either one.

    Cheers


  • rhenders

    Thanks Aaron, this sounds like exaclty what I needed. I will definitely give it a try.
  • Lisa Slater Nicholls

    thanks footballism, I'll give it a try. Why do you say Regex is the best way to go
  • EvelynR

    I firmly believe using regular expression is the best way to do such a thing, I have written a simple language filter ages ago, you can download it from here
    PS: If you have any comments and suggestions, please let me know.

    Sheva


  • Naga Satish Rupenaguntla

    Because finding a word in string using procedural code to some extent is quite difficult and clumsy, because first you have to find the word boundary, and find the word, but with regex, you can achieve this same thing with this simple regex pattern: \b(badword)\b, and you can find badword very easily, and replace it with any decent word you want using Regex.Replace() method.
    And another factor is that regex can be compiled into IL, and you can get some extra performance benefit from it.

    Sheva


  • Whats the easiest way to filter string for offensive language