To the multithreading gurus:
Given the following code, is it necessary/good practice to use a lock somewhere
Thanks in advance,
zulu
using System;
using System.IO;
using System.Threading;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
FileCounter cnt = new FileCounter();
cnt.StartAsync(@"C:\");
while (true)
{
bool exit = !cnt.IsRunning;
Console.Write(string.Format("Files: {0}, Folders :{1}\r", cnt.Files, cnt.Folders));
if (exit)
break;
Thread.Sleep(10);
}
Console.WriteLine("\nFinished.");
Console.ReadKey();
}
}
class FileCounter
{
private delegate void ThreadStarter(string drive);
private long m_files;
private long m_folders;
private bool m_isRunning;
public long Files
{
get { return m_files; }
}
public long Folders
{
get { return m_folders; }
}
public bool IsRunning
{
get { return m_isRunning; }
}
public void StartAsync(string drive)
{
m_files = 0;
m_folders = 0;
m_isRunning = true;
new ThreadStarter(CounterThread).BeginInvoke(drive, null, null);
}
private void CounterThread(string drive)
{
DirectoryInfo curDir = new DirectoryInfo(drive);
CountRecursive(curDir);
m_isRunning = false;
}
private void CountRecursive(DirectoryInfo curDir)
{
m_folders++;
FileInfo[] files = curDir.GetFiles();
foreach(FileInfo file in files)
m_files++;
DirectoryInfo[] dirs = curDir.GetDirectories();
foreach (DirectoryInfo dir in dirs)
CountRecursive(dir);
}
}
}

MultiThreading Question
draykula
Greg Rutherford
BTW, some excellent multithreading references:
Doomgrinder
I agree with Peter, I only added the setter (Why would any another thread want to set a IsRunning property for another thread ) and the lock to demonstrate a lock. I wasnt trying to optimize your code. [Apologies, if I added to the confusion] The code Peter posted is how it should be done.
Also I was assuming that you have a requirement to show a running counter in console/ui somewhere, since you were incrementing the m_Files and m_Folders instead of getting the array length off the File/Folder list.
However,Since we are trying to make it faster, I prefer to use Interlocked.Increment/Decrement to increment counters. In my experience I found that Interlocked was about 50 times faster than lock() (which actually is a design pattern for Monitor.Enter/Exit.)
If you're protecting a large block of code with a number of operations from collisions by multiple threads, lock is probably the best solution. But if you're simply incrementing, decrementing, or exchanging values and need thread safety as well, then using members of the Interlocked class will be simpler—and faster.
Dan Delorey
kunala
I'm not a multithreading guru (yet
), but i take it you are refering to the properties you are reading in the main thread while you are writing them in the worker thread.
I am not sure if m_folders++ or the return m_folders are atomic operations, but i don't think so. If you would increment the field in the main thread as well, you would really need a lock. But since only one of the threads writes to the field, and the other one only reads, I don't think one is needed.
marilyn
The real world program implementation of StartAsync() reads ..
public void StartAync(string drive)
{
if (m_isRunning)
throw new InvalidOperationExcepion("Counter is alredy running.");
...
}
.. so StartAsync() indeed can only be called once. Am I right to assume that i do not need the Interlocked code in this case
Only the main thread will read the properties Files, Folders and IsRunning.
Or do read accesses need to be synchronized as well
Can't i think of it as "either the variable has already been updated by the worker thread if the main thread reads it, or, it has not. anyway, it will be updated when the main thread reads it in the next loop iteration" Or is it possible that the variable is still being updated by the worker thread _while_ the main thread is reading it (see my last question)
Two other questions bothering me:
- Why do you lock the write to a bool variable Isn't that write atomic i think of it being a single move instruction or something similar.
- Imagine the following situation:
On a true multiprocessor system, you have 2 threads executing exactly simultanously.
Thread A executed on processor 1 writes to a memory address (lets say an int variable).
Thread B executed on processor 2 reads that memory address the very same time.
Now, is it possible that thread B reads that address while Thread A has not finished writing all 32 bits yet (don't laugh at me! :)
Thanks again,
zulu
Apolodor
Your code will work fine as long as you only call StartAsync() once, and only start one new Thread. It isnt thread safe.
Use InterlockedIncrement to increment m_folders and m_files to give a basic level of thread safety. Have a object to lock to change the m_isRunning flag. (Never lock(this)! as other objects may use your object to lock.)
eg:
class FileCounter
{
private delegate void ThreadStarter(string drive);
private long m_files;
private long m_folders;
private bool m_isRunning;
private object m_lockObject = new object();
public long Files
{
get { return m_files; }
}
public long Folders
{
get { return m_folders; }
}
public bool IsRunning
{
get { return m_isRunning; }
set {
lock(m_lockObject){m_isRunning = value;}
return m_isRunning; }
}
public void StartAsync(string drive)
{
m_files = 0;
m_folders = 0;
IsRunning = true;
new ThreadStarter(CounterThread).BeginInvoke(drive, null, null);
}
private void CounterThread(string drive)
{
DirectoryInfo curDir = new DirectoryInfo(drive);
CountRecursive(curDir);
IsRunning = false;
}
private void CountRecursive(DirectoryInfo curDir)
{
Interlocked.Increment(m_folders);
FileInfo[] files = curDir.GetFiles();
foreach(FileInfo file in files)
Interlocked.Increment(m_Files);
DirectoryInfo[] dirs = curDir.GetDirectories();
foreach (DirectoryInfo dir in dirs)
CountRecursive(dir);
}
}
Visit http://msdn.microsoft.com/library/en-us/cpgenref/html/cpconThreadingDesignGuidelines.asp to get threading guidelines and locking best practices/mechanics.
kieronlanning
Some good references regarding InterlockedIncrement:
Which may also apply to .NET; but, I haven't checked.
CSSMr.Smith
In terms of locking, what Nikhil suggested is true, but not complete. If you'll notice he's used a new Object instead of simply lock(m_isRunning). Since m_isRunning is a value type, using it with lock() means it will get boxed and the new temporary Object will be used for the lock; meaning you're not locking anyone else out. But, his suggestion is incomplete. For most types you don't want to be writing to a value while someone may be reading it, with non-intrinsic types this could mean random data. So, you usually lock the read too. Intrinsic type reads/writes are only "atomic" on a single-processor (or single-core) CPU, depending on the CPU.
Since you're using asynchronous delegates you could use IAsyncResult.IsCompleted instead of writing your own m_isRunning variable, which is unreliable as your thread will still (technically) be running when m_isRunning is set to false, meaning you have a potential race condition.
Another observation: your while(true) loop will consume alot of CPU. Sleep(x) doesn't mean the thread doesn't use CPU. If you really want to relinqish CPU time you should be using WaitOne with a WaitHandle. Non-MT observations: since the average refresh rate of a monitor is 60 times a second, 10 ms is a little fast to be refreshing. 60Hz is about 16 ms, meaning if you refresh any faster than every 16ms, some WriteLines will never show. I would wait 20ms instead to improve performance. Also, enumerating an array to count the files is not very efficient.
Which results in:
using System;
using System.IO;
using System.Threading;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
FileCounter cnt = new FileCounter();
IAsyncResult asyncResult = cnt.BeginCounting(@"C:\");
DateTime startTime = DateTime.Now;
while (!asyncResult.AsyncWaitHandle.WaitOne(20, false))
{
Console.Write(string.Format("Files: {0}, Folders :{1}\r", cnt.Files, cnt.Folders));
}
Console.WriteLine("\nFinished.");
Console.ReadKey();
}
}
class FileCounter
{
private IAsyncResult asyncResult;
private delegate void ThreadStarter(string drive);
private long m_files;
private long m_folders;
private Object m_filesLock = new Object(), m_foldersLock = new Object();
public long Files
{
get { lock (m_filesLock) { return m_files; } }
private set { lock (m_filesLock) { m_files = value; } }
}
public long Folders
{
get { lock (m_foldersLock) { return m_folders; } }
private set { lock (m_foldersLock) { m_folders = value; } }
}
public bool IsRunning
{
get { return asyncResult != null && (!asyncResult.IsCompleted); }
}
public IAsyncResult BeginCounting(string drive)
{
Files = 0;
Folders = 0;
asyncResult = new ThreadStarter(CounterThread).BeginInvoke(drive, null, null);
return asyncResult;
}
private void CounterThread(string drive)
{
DirectoryInfo curDir = new DirectoryInfo(drive);
CountRecursive(curDir);
}
private void CountRecursive(DirectoryInfo curDir)
{
Folders++;
FileInfo[] files = curDir.GetFiles();
Files += files.Length;
DirectoryInfo[] dirs = curDir.GetDirectories();
foreach (DirectoryInfo dir in dirs)
CountRecursive(dir);
}
}
}
If you were dealing with the possiblity of more than one background thread I would suggest smarter locking (e.g. within FileCounter, done use the Files and Folders settings that I added and wrap the entire write or read/write operation in a lock). eg:
lock(m_filesLock)
{
m_files += files.Length;
}
--Peter
______
http://www.peterRitchie.com/Blog/SortableValueType.snippet
Paps
Since 32-bit operations on properly-aligned variables are already atomic, Interlocked.Increment() is pointless with 32-bit values (it also requires they be properly aligned to guarantee the atomicity), if only one thread is writing. Interlocked.Increment() is really only useful if you're using 64-bit values. But, again, if you're not synchronizing the read (by making use of Interlocked.Read), then you're only synchronizing writes and another thread could still read the 64-bit value in the middle of the so-called "atomic" operation resulting in a read of random, garbage, data.
</Peter Wrote>
Peter --
Why do you call interlocked increment 64 a "so-called 'atomic' operation", it is atomic. In this case the read is not and therefore what is really happening is that part of the read happens before the update and part happens after the update.
Interlocked operations also have the effect of forcing the result of the operation to be published immediately to all of the other processors in the system (generally via flushing the write buffers on the processor doing the writing and invalidating the cache line containing the write on other processors in the system) which can be important in some scenarios where you are using these types of operations as flags to indicate state change to other threads in motion on other processors, this is one of the reason that interlocked operations are much slower than normal writes.
Generally however I agree with you that low-lock techniques such interlocked operations should be reserved for extremly simple scenarios or very hot spots where you're using convention common-sense locking techniques and are willing to spend the time to do the analysis to prove correctness.
-josh
p.s. with regard to the mention of using interlocked increment/decrement for counters, i believe these fall into the simple scenarios where i would also use interlocked operations, it is note-worthy that for counters where accuracy is not necessary (you just need the magnitude of how often something occurs or what have you) forgoing the interlocked operations completly can be a win... however, as mentioned this is not useful for counts where accuracy matters (ref counts and the like).
madda
There's no point in only locking on one side; especially when there's only one thread on that side. Nikhil's example of adding a lock statement to the IsRunning property set is pointless. You only have one thread that will be setting IsRunning, so it's really just locking itself out of concurrent writing. One thread cannot do two things at once, and since there's no synchronization on the get, you're not locking out any other threads. Yes, he's made setting IsRunning public, so another thread *could* write to IsRunning while the worker thread does; but, he only added the set to IsRunning to consolidate the lock call.
Personally, I don't think it's safe to be testing the running state of a thread. It's much safer to wait (or be told via a callback) for the thread to finish. In your original example of testing IsRunning then calling Sleep, you're needlessly sleeping when, in fact, the thread has completed. Yes, up to 10ms isn't that big at deal; but, it just shows the design is flawed. Not to mention the overhead and performance implications of calling Sleep [1].
You're writing to your server properties on one thread about 4000 times a second and reading from them on another at least 50 times a second (at least in the code that I posted--closer to 100 times a second in the original).
You're much more likely to run into a contentious issue in those circumstances. The BackgroundWorker.isRunning field is realistically only going to be written to once--by only one thread. And, given it's a boolean there isn't going to be the issue that that thread will be half done writing it when another tries to read it; so, you can get by without synchronizing it. Although, adding a lock statement to IsBusy wouldn't affect performance; and would only reinforce proper multi-threaded design--but, we're not supposed to see that :-).
One axiom of software design is "premature optimization is evil" [2]. Follow common-sense multi-threaded design (synchronize all shared resources, use lock ordering if synchronizations can overlap or depend on one another [3], etc.). If a correct design is then found to have poor performance, optimize from there; taking into account how the methods *are* used, instead of just how they *can* be used. Don't optimize for the sake of optimization, optimize in response to a known bottleneck [4].
I think you'll find the code that I posted is performant (it should certainly be faster than the original); but, do some performance testing with all the locking, then without to compare.
Multi-threaded design can get pretty hairy; but, it can also be designed to be fairly simple--especially when you're not dealing with read/write situations on all threads and dealing with a single background worker thread that "returns" a result.
[1] http://www.flounder.com/badprogram.htm#Sleep
[2] http://en.wikipedia.org/wiki/Software_optimization
[3] http://blogs.msdn.com/larryosterman/archive/2005/02/17/375449.aspx
[4] http://www.flounder.com/optimization.htm
SpeedyBoy
i'm currently busy with another project, but when i get back to this one i'll apply some of your suggestions, especially the IAsyncResult pattern and locking the counter properties.
dribar
So you recommend i should lock all my of server class properties, even if client threads can only read them
wouldn't that make the code slow with all the locks that must be taken and released every few milliseconds, and
wouldnt threads interfere with each other
I just reviewed the System.ComponentModel.BackgroundWorker class using .NET Reflector and noticed that Microsoft doesn't lock the IsRunning flag there.
Is this just bad design
RichBond007
Since 32-bit operations on properly-aligned variables are already atomic, Interlocked.Increment() is pointless with 32-bit values (it also requires they be properly aligned to guarantee the atomicity), if only one thread is writing. Interlocked.Increment() is really only useful if you're using 64-bit values. But, again, if you're not synchronizing the read (by making use of Interlocked.Read), then you're only synchronizing writes and another thread could still read the 64-bit value in the middle of the so-called "atomic" operation resulting in a read of random, garbage, data.
I view the use of Interlocked.Increment as an optimization. If you're using common-sense locking techniques and find there's a performance issue; and thorough analysis shows that Interlocked can be used instead of locking, then it can be added at that point.