MultiThreading Question

To the multithreading gurus:
Given the following code, is it necessary/good practice to use a lock somewhere

Thanks in advance,
zulu

using System;
using System.IO;
using System.Threading;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
FileCounter cnt = new FileCounter();
cnt.StartAsync(@"C:\");
while (true)
{
bool exit = !cnt.IsRunning;
Console.Write(string.Format("Files: {0}, Folders :{1}\r", cnt.Files, cnt.Folders));
if (exit)
break;
Thread.Sleep(10);
}
Console.WriteLine("\nFinished.");
Console.ReadKey();
}
}

class FileCounter
{
private delegate void ThreadStarter(string drive);
private long m_files;
private long m_folders;
private bool m_isRunning;

public long Files
{
get { return m_files; }
}

public long Folders
{
get { return m_folders; }
}

public bool IsRunning
{
get { return m_isRunning; }
}

public void StartAsync(string drive)
{
m_files = 0;
m_folders = 0;
m_isRunning = true;
new ThreadStarter(CounterThread).BeginInvoke(drive, null, null);
}

private void CounterThread(string drive)
{
DirectoryInfo curDir = new DirectoryInfo(drive);
CountRecursive(curDir);
m_isRunning = false;
}

private void CountRecursive(DirectoryInfo curDir)
{
m_folders++;

FileInfo[] files = curDir.GetFiles();
foreach(FileInfo file in files)
m_files++;

DirectoryInfo[] dirs = curDir.GetDirectories();
foreach (DirectoryInfo dir in dirs)
CountRecursive(dir);
}
}
}



Answer this question

MultiThreading Question

  • draykula

    _zulu_ wrote:
    Only the main thread will read the properties Files, Folders and IsRunning.
    Or do read accesses need to be synchronized as well

    As long as there is only one thread writing and one thread reading the code is fine.


    Can't i think of it as "either the variable has already been updated by the worker thread if the main thread reads it, or, it has not. anyway, it will be updated when the main thread reads it in the next loop iteration" Or is it possible that the variable is still being updated by the worker thread _while_ the main thread is reading it (see my last question)

    That is correct


    Two other questions bothering me:

    - Why do you lock the write to a bool variable Isn't that write atomic i think of it being a single move instruction or something similar.

    I was only demonstrating a lock(object) pattern. Since you will only have one StartAsync call and therefore one thread at any time writing to that bool you dont need it.

    - Imagine the following situation:
    On a true multiprocessor system, you have 2 threads executing exactly simultanously.
    Thread A executed on processor 1 writes to a memory address (lets say an int variable).
    Thread B executed on processor 2 reads that memory address the very same time.
    Now, is it possible that thread B reads that address while Thread A has not finished writing all 32 bits yet (don't laugh at me! :)

    Not at all! Your questions arent trivial. Thread B can read the address that Thread A is writing to, but if the write isnt complete the dirty read value cannot be guaranteed.



  • Greg Rutherford

  • Doomgrinder

    Peter Ritchie wrote:

    There's no point in only locking on one side; especially when there's only one thread on that side. Nikhil's example of adding a lock statement to the IsRunning property set is pointless. You only have one thread that will be setting IsRunning, so it's really just locking itself out of concurrent writing. One thread cannot do two things at once, and since there's no synchronization on the get, you're not locking out any other threads. Yes, he's made setting IsRunning public, so another thread *could* write to IsRunning while the worker thread does; but, he only added the set to IsRunning to consolidate the lock call.

    I agree with Peter, I only added the setter (Why would any another thread want to set a IsRunning property for another thread ) and the lock to demonstrate a lock. I wasnt trying to optimize your code. [Apologies, if I added to the confusion] The code Peter posted is how it should be done.

    Also I was assuming that you have a requirement to show a running counter in console/ui somewhere, since you were incrementing the m_Files and m_Folders instead of getting the array length off the File/Folder list.

    However,Since we are trying to make it faster, I prefer to use Interlocked.Increment/Decrement to increment counters. In my experience I found that Interlocked was about 50 times faster than lock() (which actually is a design pattern for Monitor.Enter/Exit.)

    If you're protecting a large block of code with a number of operations from collisions by multiple threads, lock is probably the best solution. But if you're simply incrementing, decrementing, or exchanging values and need thread safety as well, then using members of the Interlocked class will be simpler—and faster.



  • Dan Delorey

    Actually, seeing as how you're basically getting incremental results you may want to have a look at the Event-based Asynchronous Pattern in addition to (or instead of) IAsyncResult.

  • kunala

    I'm not a multithreading guru (yet ), but i take it you are refering to the properties you are reading in the main thread while you are writing them in the worker thread.

    I am not sure if m_folders++ or the return m_folders are atomic operations, but i don't think so. If you would increment the field in the main thread as well, you would really need a lock. But since only one of the threads writes to the field, and the other one only reads, I don't think one is needed.


  • marilyn

    Thanks for the fast reply!

    The real world program implementation of StartAsync() reads ..

    public void StartAync(string drive)
    {
    if (m_isRunning)
    throw new InvalidOperationExcepion("Counter is alredy running.");
    ...
    }

    .. so StartAsync() indeed can only be called once. Am I right to assume that i do not need the Interlocked code in this case
    Only the main thread will read the properties Files, Folders and IsRunning.
    Or do read accesses need to be synchronized as well
    Can't i think of it as "either the variable has already been updated by the worker thread if the main thread reads it, or, it has not. anyway, it will be updated when the main thread reads it in the next loop iteration" Or is it possible that the variable is still being updated by the worker thread _while_ the main thread is reading it (see my last question)

    Two other questions bothering me:

    - Why do you lock the write to a bool variable Isn't that write atomic i think of it being a single move instruction or something similar.

    - Imagine the following situation:
    On a true multiprocessor system, you have 2 threads executing exactly simultanously.
    Thread A executed on processor 1 writes to a memory address (lets say an int variable).
    Thread B executed on processor 2 reads that memory address the very same time.
    Now, is it possible that thread B reads that address while Thread A has not finished writing all 32 bits yet (don't laugh at me! :)

    Thanks again,
    zulu

  • Apolodor

    Your code will work fine as long as you only call StartAsync() once, and only start one new Thread. It isnt thread safe.

    Use InterlockedIncrement to increment m_folders and m_files to give a basic level of thread safety. Have a object to lock to change the m_isRunning flag. (Never lock(this)! as other objects may use your object to lock.)

    eg:

    class FileCounter
    {
    private delegate void ThreadStarter(string drive);
    private long m_files;
    private long m_folders;
    private bool m_isRunning;

    private object m_lockObject = new object();

    public long Files
    {
    get { return m_files; }
    }

    public long Folders
    {
    get { return m_folders; }
    }

    public bool IsRunning
    {
    get { return m_isRunning; }

    set {

    lock(m_lockObject){m_isRunning = value;}

    return m_isRunning; }
    }

    public void StartAsync(string drive)
    {
    m_files = 0;
    m_folders = 0;
    IsRunning = true;
    new ThreadStarter(CounterThread).BeginInvoke(drive, null, null);
    }

    private void CounterThread(string drive)
    {
    DirectoryInfo curDir = new DirectoryInfo(drive);
    CountRecursive(curDir);
    IsRunning = false;
    }

    private void CountRecursive(DirectoryInfo curDir)
    {
    Interlocked.Increment(m_folders);

    FileInfo[] files = curDir.GetFiles();
    foreach(FileInfo file in files)
    Interlocked.Increment(m_Files);

    DirectoryInfo[] dirs = curDir.GetDirectories();
    foreach (DirectoryInfo dir in dirs)
    CountRecursive(dir);
    }
    }

    Visit http://msdn.microsoft.com/library/en-us/cpgenref/html/cpconThreadingDesignGuidelines.asp to get threading guidelines and locking best practices/mechanics.



  • kieronlanning

    Josh Williams wrote:
    Why do you call interlocked increment 64 a "so-called 'atomic' operation", it is atomic.
    Internlocked.Increment(Int64) is documented as follows:
    http://msdn2.microsoft.com/en-US/library/zs86dyzy(VS.80).aspx wrote:
    (Increment, Decrement and Add) are atomic with respect to each other, but not with respect to other means of accessing the data.
    Historically, InterlockedIncrement has simply been documented in the Platform SDK as:
    http://msdn.microsoft.com/library/default.asp url=/library/en-us/dllproc/base/interlockedincrement.asp wrote:
    The function prevents more than one thread from using the same variable simultaneously.
    leading the the misconception that the Interlocked methods are actually completely "atomic", which they are not. That's why I called the 64-bit Interlocked methods so-called "atomic" operations.

    Some good references regarding InterlockedIncrement:

    Which may also apply to .NET; but, I haven't checked.



  • CSSMr.Smith

    In terms of locking, what Nikhil suggested is true, but not complete. If you'll notice he's used a new Object instead of simply lock(m_isRunning). Since m_isRunning is a value type, using it with lock() means it will get boxed and the new temporary Object will be used for the lock; meaning you're not locking anyone else out. But, his suggestion is incomplete. For most types you don't want to be writing to a value while someone may be reading it, with non-intrinsic types this could mean random data. So, you usually lock the read too. Intrinsic type reads/writes are only "atomic" on a single-processor (or single-core) CPU, depending on the CPU.

    Since you're using asynchronous delegates you could use IAsyncResult.IsCompleted instead of writing your own m_isRunning variable, which is unreliable as your thread will still (technically) be running when m_isRunning is set to false, meaning you have a potential race condition.

    Another observation: your while(true) loop will consume alot of CPU. Sleep(x) doesn't mean the thread doesn't use CPU. If you really want to relinqish CPU time you should be using WaitOne with a WaitHandle. Non-MT observations: since the average refresh rate of a monitor is 60 times a second, 10 ms is a little fast to be refreshing. 60Hz is about 16 ms, meaning if you refresh any faster than every 16ms, some WriteLines will never show. I would wait 20ms instead to improve performance. Also, enumerating an array to count the files is not very efficient.

    Which results in:


    using System;
    using System.IO;
    using System.Threading;
    namespace ConsoleApplication1
    {
    class Program
    {
    static void Main(string[] args)
    {
    FileCounter cnt = new FileCounter();
    IAsyncResult asyncResult = cnt.BeginCounting(@"C:\");
    DateTime startTime = DateTime.Now;
    while (!asyncResult.AsyncWaitHandle.WaitOne(20, false))
    {
    Console.Write(string.Format("Files: {0}, Folders :{1}\r", cnt.Files, cnt.Folders));
    }
    Console.WriteLine("\nFinished.");
    Console.ReadKey();
    }
    }

    class FileCounter
    {
    private IAsyncResult asyncResult;
    private delegate void ThreadStarter(string drive);
    private long m_files;
    private long m_folders;
    private Object m_filesLock = new Object(), m_foldersLock = new Object();
    public long Files
    {
    get { lock (m_filesLock) { return m_files; } }
    private set { lock (m_filesLock) { m_files = value; } }
    }

    public long Folders
    {
    get { lock (m_foldersLock) { return m_folders; } }
    private set { lock (m_foldersLock) { m_folders = value; } }
    }

    public bool IsRunning
    {
    get { return asyncResult != null && (!asyncResult.IsCompleted); }
    }

    public IAsyncResult BeginCounting(string drive)
    {
    Files = 0;
    Folders = 0;
    asyncResult = new ThreadStarter(CounterThread).BeginInvoke(drive, null, null);
    return asyncResult;
    }

    private void CounterThread(string drive)
    {
    DirectoryInfo curDir = new DirectoryInfo(drive);
    CountRecursive(curDir);
    }

    private void CountRecursive(DirectoryInfo curDir)
    {
    Folders++;

    FileInfo[] files = curDir.GetFiles();
    Files += files.Length;

    DirectoryInfo[] dirs = curDir.GetDirectories();
    foreach (DirectoryInfo dir in dirs)
    CountRecursive(dir);
    }
    }
    }

    Also, I've used VS2005 new disparate property visibility with the Files and Folders properties to allow private setters to consolidate the set/get code with locking.

    If you were dealing with the possiblity of more than one background thread I would suggest smarter locking (e.g. within FileCounter, done use the Files and Folders settings that I added and wrap the entire write or read/write operation in a lock). eg:


    lock(m_filesLock)
    {
    m_files += files.Length;
    }

    Which avoids the possibility of m_files changing between the retrieval of it's value, the addition of files.Length, and the assignment back to m_files;

    --Peter
    ______
    http://www.peterRitchie.com/Blog/SortableValueType.snippet



  • Paps

    <Peter Wrote>
    Since 32-bit operations on properly-aligned variables are already atomic, Interlocked.Increment() is pointless with 32-bit values (it also requires they be properly aligned to guarantee the atomicity), if only one thread is writing. Interlocked.Increment() is really only useful if you're using 64-bit values. But, again, if you're not synchronizing the read (by making use of Interlocked.Read), then you're only synchronizing writes and another thread could still read the 64-bit value in the middle of the so-called "atomic" operation resulting in a read of random, garbage, data.
    </Peter Wrote>

    Peter --

    Why do you call interlocked increment 64 a "so-called 'atomic' operation", it is atomic. In this case the read is not and therefore what is really happening is that part of the read happens before the update and part happens after the update.

    Interlocked operations also have the effect of forcing the result of the operation to be published immediately to all of the other processors in the system (generally via flushing the write buffers on the processor doing the writing and invalidating the cache line containing the write on other processors in the system) which can be important in some scenarios where you are using these types of operations as flags to indicate state change to other threads in motion on other processors, this is one of the reason that interlocked operations are much slower than normal writes.

    Generally however I agree with you that low-lock techniques such interlocked operations should be reserved for extremly simple scenarios or very hot spots where you're using convention common-sense locking techniques and are willing to spend the time to do the analysis to prove correctness.

    -josh

    p.s. with regard to the mention of using interlocked increment/decrement for counters, i believe these fall into the simple scenarios where i would also use interlocked operations, it is note-worthy that for counters where accuracy is not necessary (you just need the magnitude of how often something occurs or what have you) forgoing the interlocked operations completly can be a win... however, as mentioned this is not useful for counts where accuracy matters (ref counts and the like).


  • madda

    There's no point in only locking on one side; especially when there's only one thread on that side. Nikhil's example of adding a lock statement to the IsRunning property set is pointless. You only have one thread that will be setting IsRunning, so it's really just locking itself out of concurrent writing. One thread cannot do two things at once, and since there's no synchronization on the get, you're not locking out any other threads. Yes, he's made setting IsRunning public, so another thread *could* write to IsRunning while the worker thread does; but, he only added the set to IsRunning to consolidate the lock call.

    Personally, I don't think it's safe to be testing the running state of a thread. It's much safer to wait (or be told via a callback) for the thread to finish. In your original example of testing IsRunning then calling Sleep, you're needlessly sleeping when, in fact, the thread has completed. Yes, up to 10ms isn't that big at deal; but, it just shows the design is flawed. Not to mention the overhead and performance implications of calling Sleep [1].

    You're writing to your server properties on one thread about 4000 times a second and reading from them on another at least 50 times a second (at least in the code that I posted--closer to 100 times a second in the original).

    You're much more likely to run into a contentious issue in those circumstances. The BackgroundWorker.isRunning field is realistically only going to be written to once--by only one thread. And, given it's a boolean there isn't going to be the issue that that thread will be half done writing it when another tries to read it; so, you can get by without synchronizing it. Although, adding a lock statement to IsBusy wouldn't affect performance; and would only reinforce proper multi-threaded design--but, we're not supposed to see that :-).

    One axiom of software design is "premature optimization is evil" [2]. Follow common-sense multi-threaded design (synchronize all shared resources, use lock ordering if synchronizations can overlap or depend on one another [3], etc.). If a correct design is then found to have poor performance, optimize from there; taking into account how the methods *are* used, instead of just how they *can* be used. Don't optimize for the sake of optimization, optimize in response to a known bottleneck [4].

    I think you'll find the code that I posted is performant (it should certainly be faster than the original); but, do some performance testing with all the locking, then without to compare.

    Multi-threaded design can get pretty hairy; but, it can also be designed to be fairly simple--especially when you're not dealing with read/write situations on all threads and dealing with a single background worker thread that "returns" a result.

    [1] http://www.flounder.com/badprogram.htm#Sleep
    [2] http://en.wikipedia.org/wiki/Software_optimization
    [3] http://blogs.msdn.com/larryosterman/archive/2005/02/17/375449.aspx
    [4] http://www.flounder.com/optimization.htm



  • SpeedyBoy

    many thanks for your time and efforts. your answers were a really interesting read.
    i'm currently busy with another project, but when i get back to this one i'll apply some of your suggestions, especially the IAsyncResult pattern and locking the counter properties.


  • dribar

    Thanks for the detailed explanation!
    So you recommend i should lock all my of server class properties, even if client threads can only read them
    wouldn't that make the code slow with all the locks that must be taken and released every few milliseconds, and
    wouldnt threads interfere with each other

    I just reviewed the System.ComponentModel.BackgroundWorker class using .NET Reflector and noticed that Microsoft doesn't lock the IsRunning flag there.
    Is this just bad design

  • RichBond007

     Nikhil Rajwade wrote:
    However,Since we are trying to make it faster, I prefer to use Interlocked.Increment/Decrement to increment counters. In my experience I found that Interlocked was about 50 times faster than lock() (which actually is a design pattern for Monitor.Enter/Exit.)

    If you're protecting a large block of code with a number of operations from collisions by multiple threads, lock is probably the best solution. But if you're simply incrementing, decrementing, or exchanging values and need thread safety as well, then using members of the Interlocked class will be simpler—and faster.

    Agreed, Interlocked.Increment() and .Decrement() are faster than using "lock". But, Increment() and Decrement() only lock out writers (i.e. while in Increment(), other calls to Increment(), Decrement(), Add(), etc. will be blocked; but no direct access of the variables passed to Increment() will be blocked, of course).  If only one thread is doing the incrementing then we get back into that situation where nothing can be locked out.  And since Interlocked only offers a method of reading 64-bit values, we're forced to use "lock".

    Since 32-bit operations on properly-aligned variables are already atomic, Interlocked.Increment() is pointless with 32-bit values (it also requires they be properly aligned to guarantee the atomicity), if only one thread is writing.  Interlocked.Increment() is really only useful if you're using 64-bit values.  But, again, if you're not synchronizing the read (by making use of Interlocked.Read), then you're only synchronizing writes and another thread could still read the 64-bit value in the middle of the so-called "atomic" operation resulting in a read of random, garbage, data.

    I view the use of Interlocked.Increment as an optimization.  If you're using common-sense locking techniques and find there's a performance issue; and thorough analysis shows that Interlocked can be used instead of locking, then it can be added at that point.



  • MultiThreading Question