Slow regexp for first time match

hello,

I have an apllication that have about 3000 regexp compiled using the form

regexp = new Regex(expression, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

when a try to match a string against the 3000 regexp for the first time it take about 2 minutes on a Big machine !!

the second time the match goes faster and take only few milliseconds

what is going on



Answer this question

Slow regexp for first time match

  • sabitha

    Ok i have removed the RegexOptions.Compiled , and you know what

    I am Happy

    many thanks .


  • Anna Ahn

    many Thanks to every body,

    as I understand the RegexOptions.Compiled options is the fastest that I can use .

    that what I do but the regexp is not compiled in the constructor of the Regex, it is only compiled

    on the very first match , so I want a good response time on the first match ,

    the solution I found is to make a fake match immediatly following the build of the regexp

    the code is as:

    regexp = new Regex(expression, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

    regexp.Match("###"); // for really compile the expression really slow on 3000 expression

    did any body known if the pcre lib is usable with C#


  • _mubashir

    Again,

    I try this :

    regexp = new Regex(expression, /* RegexOptions.Compiled | */RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

    //regexp.Match("###"); // for really compile the expression really slow on 3000 expression

    remove Compiled Option ( will be faster )

    dont known why this is better than with the Compiled expression

    compile time is faster , and match not so slow


  • Alexander Mossin

    I try the following bench:

    using System;

    using System.Collections.Generic;

    using System.Text;

    using System.Text.RegularExpressions;

    using System.IO;

    namespace TestRegexp

    {

    class Program

    {

    static void Main(string[] args)

    {

    Test(RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);

    Test(RegexOptions.Compiled|RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);

    Console.ReadLine();

    }

    private static void Test(RegexOptions opts)

    {

    List<Regex> regexps = new List<Regex>();

    string line;

    FileStream file = new FileStream("..\\..\\regexp.txt", FileMode.Open, FileAccess.Read);

    TextReader reader = new StreamReader(file);

    TimeSpan compileTime = new TimeSpan();

    TimeSpan firstMatchTime = new TimeSpan();

    TimeSpan secondMatchTime = new TimeSpan();

    while ((line = reader.ReadLine()) != null)

    {

    DateTime begin = DateTime.Now;

    Regex reg = new Regex(line, opts);

    DateTime end = DateTime.Now;

    compileTime += end - begin;

    regexps.Add(reg);

    }

    foreach (Regex reg in regexps)

    {

    DateTime begin = DateTime.Now;

    reg.Match("###");

    DateTime end = DateTime.Now;

    firstMatchTime += end - begin;

    }

    foreach (Regex reg in regexps)

    {

    DateTime begin = DateTime.Now;

    reg.Match("###");

    DateTime end = DateTime.Now;

    secondMatchTime += end - begin;

    }

    Console.WriteLine(" compile time {0} s first match {1} s second match {2} s for {3} expr",

    compileTime.TotalSeconds, firstMatchTime.TotalSeconds, secondMatchTime.TotalSeconds, regexps.Count);

    }

    }

    }

    results:

    compile time 0,156253 s first match 0,0156253 s second match 0 s for 3010 expr
    compile time 3,0625588 s first match 135,8463582 s second match 0,0156253 s for 3010 expr

    found a bug


  • nidhig83

    You might try pre-compiling the regular expressions into the assembly. try here.

  • RashiQ

    So taking that into account - If you're going to do this You should probably see if you can throw that on a 2nd thread and have them all compile before you're going to use them. For example do this when the application starts, launch another thread that compiles all of these. This way when you go to search they should be compiled and should be alot speedier. Thisof course assumes that the resources not being released till the appliation exits is fine with you.

  • AZ_2005

    MarcD is absolutly right. If you reuse it troughout your application then compile (maybe even in advance), otherwise don't. Just as an addition: If you have frequently changing expressions, e.g. in combination with user inputs, never compile because of the issue with releasing the resources - each expression would be a new one and the compiled resource will not be released until the application domain shuts down.
  • Ahmad_Jafari

    The concept behind people's understanding of this is flawed I think.
    You compile it before you need it if you're going to use it tons and tons of times. If you're only going to use that regex a few times the option of compiling them is most likely not worth it. I've never compiled the regex. If i have a regex whom's pattern stays the same I'll create it once and then use that regex instance over and over. I've seldom ever noticed a performance problem and I've run actual benchmarks where one has had to prase 500-1k+ patterns across 20-30 very complex regex patterns. I've honestly never used the Compile option but I do know that if one did use that it would be best served to be done before one actually wanted to use
    .

    Hope that helps.

  • Alex cai

    The first time you create the regular expression the expression is compiled, that takes a considerably huge amount of time. After that you got a compiled version of the regex, so the second execution does not have to compile again and executes very very fast.

    As far as I can remember from what I read, one drawback with using the RegexOptions.Compiled is that the compiled ressource is not released anymore until the program exits.

    Edit: I found the link to that again: http://msdn2.microsoft.com/en-us/8zbs0h2f.aspx


  • Slow regexp for first time match