Understanding the Requirements of Multithreaded Applications

Article contributed by stippi on Wed, 2007-09-05 08:18

Though I am programming on BeOS since 1999, only in recent years I have slowly become more comfortable with various multithreading related issues in my programs. So I thought I'd like to share some of my experiences here for beginning programmers or programmers skeptical about multithreading. I hope to be extending this as a series of articles to help learn the benefits and pitfalls of multithreading. All with an emphasis on programming for the Haiku API.

Multithreading in Applications

We are frequently confronted with situations, in which an application is forced to wait for some condition. A multimedia player has to wait for the drive to spin up, a network app is stalling for data to arrive - these are just few examples for when it would be nice to have more multithreading applied to all kinds of programs. It's not nice if you can't even move a window on screen because the interface is blocking on something. Multithreading is the way to go.

Some modern compilers will allow you to mark sections of code and have them automatically executed on all cores of the system. This is nice and useful, but is not the kind of multithreading that I am interested in. Larger applications can benefit greatly from multithreading as a design concept throughout the whole program.

Just to give you an example of the kind of multithreading which I am interested in: Imagine you are programming a graphics application. Depending on the complexity of the graphics, it might still take some time to render the document, even on modern CPUs. Yet you want your user to be able to manipulate objects in the document fluently, while the rendering happens at it's own pace. For example, the transformation box around an object will follow the mouse movements fluently, while the object that it transforms will update at a different refresh rate, according to how expensive it is to render this area of the document. Multithreading sounds like a good solution for this problem.

You may be aware that I am the author of WonderBrush, and if you ever used it, you might have noticed that the program doesn't work like I am outlining above: The pace at which you can manipulate objects in WonderBrush is painfully locked to the rendering speed. It is something that has always bothered me about WonderBrush and recently I have come up with a new prototype in which all this is fixed. As always, Ingo Weinhold was a great help in achieving this. What I learned there is also what motivated me to start a series of articles about multithreading. But first things first, so let's start with the basics.

What Is "Locking" and Why Do You Need It?

Multithreading means that the code paths in your application are executed in parallel. You have to be aware, that the operating system will interrupt the flow of any one thread at any time, so that another thread will get CPU time. It doesn't matter if this is happening on a single or multi core CPU. What makes multithreaded programming difficult is that multiple threads in your application might have to run through the same sections of code, accessing the same data structures. This is pretty much unavoidable at a certain degree. I will try to illustrate this. The example might not be the best use of multithreading, but I want to explain a certain problem. Imagine you have a data object which is a list. Thread A in your application might add some items to the list, while Thread B in your application might iterate over the items to do something with them. For the purpose of this excercise, let's assume it may happen at the same time.

Thread A:

void
add_strings(BList* list, const char** strings, int32 stringCount)
{
	for (int i = 0; i < stringCount; i++)
		list->AddItem(new BString(strings[i]));
}

Thread B:

void
print_strings(const BList* list)
{
	int32 count = list->CountItems();
	for (int i = 0; i < count; i++)
		printf("string %ld: %s\n", i, ((BString*)list->ItemAt(i))->String());
}

One thinks of the code flow like this:

Thread A  |--loop-i=0----loop-i=1----loop-i=2----[...]---------|
Thread B  |--init "count"----loop-i=0----loop-i=1----loop-i=2----[...]-------|

What really happens though is this:

Thread A                              |--loop-i=0----loop-i=1--|                 
Thread B  |--init "count"----loop-i=0-|                        |---loop-i=1---

Thread A                        |--loop-i=2----[...]---------|
Thread B  --loop-i=2----[...]---|                            |----|

Both of these functions are executed in parallel, which means the thread scheduler of the operating system will give each thread some time to run the function. But each thread can be interrupted at any time and then the other thread continues. For example, Thread B could be interrupted by the OS in the middle of executing the loop, and new items might be added to the list in Thread A while Thread B is put on hold. The count variable in the function of Thread B will stay the same though, because when Thread B is allowed to continue, it will pick up the work exactly where it was interrupted - in the middle of executing the loop.

If you can already see how bad the parallel access to the list really is, you can skip to the next paragraph. On the other hand, if you think that failing to print the additional strings is not such a big deal - read on. You might be aware that the BList implementation allocates blocks of memory, whenever it needs to grow in order to hold new items. This could mean that the previous memory is copied into a new block and then freed. There is a time when the private class member which points to the old array is reassigned to the new array. But in the middle of doing that in Thread A, the operating system could reschedule the threads, and then Thread B will access invalid memory. Worse yet, other class members of the BList object, like the size of the array, might already have been changed. At this point, I hope it is clear that the code, as is, is very broken.

What's needed is synchronization. This is like installing access gates which protect the data and guarantee it stays valid for the time it is needed.

To provide this feature, the operating system has synchronization primitives, for example semaphores. The Haiku API has an easy to use object for this, the BLocker, which wraps the efficient use of a semaphore. Like the name implies, it locks a section of code: While one thread holds the lock, another thread will be made to wait if it wants the lock as well. It is allowed to continue when the first thread releases the lock.

Let's change the code accordingly:

Thread A:

void
add_strings(BLocker* lock, BList* list, const char** strings, int32 stringCount)
{
	if (!lock->Lock())
		return;

	for (int i = 0; i < stringCount; i++)
		list->AddItem(new BString(strings[i]));

	lock->Unlock();
}


Thread B:

void
print_strings(BLocker* lock, const BList* list)
{
	if (!lock->Lock())
		return;

	int32 count = list->CountItems();
	for (int i = 0; i < count; i++)
		printf("string %ld: %s\n", i, ((BString*)list->ItemAt())->String());

	lock->Unlock();
}

Now the list data is protected by the lock. (Of course, both functions need to be passed the same BLocker object.) The operating system will enforce, that these two threads are no longer executed in parallel while one of them holds the lock. At this point, you might think "Hey wait, but I thought multithreading was all about running in parallel". Yes of course, but only the code that can run in parallel. You will frequently be in situations, in which two threads need to access some data one after the other. The point is to keep these sections of code as small as possible. To completely avoid them might not be possible. Once you understand more of this, you will know how to design your applications in such a way, that the critical sections, the code sections that need to be protected by a lock, will be as small as possible in order to allow most of the code to really run in parallel.

A Frequent Locking Headache

One problem that one will surely run into are deadlocks. Deadlocks happen, when you have not only one lock, but multiple locks in your application, and when you do not (or can not) enforce a certain locking strategy.

An Example. Let's construct a frequent application design: Assume you have a data model. The same data model is represented in multiple windows. Naturally, these windows will each run in their own thread on Haiku. There is no way around this. If you design your application properly, you have separated the data from the representation of that data. You might also be using a listener->observer interface. This means that each window will have registered itself as a listener on the data. Don't worry if you never heard of these concepts, it will be demonstrated in code a little further below. It means whenever the data changes, each registered listener (a window in our case) will be notified. This is how the windows update on screen when the data changes. On the other hand, you will want to change the data through the user interface of each window. If you want to invalidate (cause to redraw) an interface element in Haiku, you will have to lock the window to which this interface element belongs. And here, we are at a point where frequent mistakes happen. It all depends on how the notification mechanism is designed, and in which thread the notifications happen. Assume this setup:

  • Data -> protected by Lock A
  • Window 1 -> protected by Lock B
  • Window 2 -> protected by Lock C

Lock B and C are system provided, there is no way around having these. The data is really independent though, and it should have it's own lock, Lock A.

Whenever the windows want to access the Data, be it because they want to read it for drawing the on screen representation of the data, or because they want to manipulate the data, they need to lock it via Lock A. Noteworthy is that whenever a window would manipulate data, it would probably do so in it's own thread. (It is ok to do so as long as you know manipulating the data is fast, and that nowhere in your application the data lock is held for a long time.) So assume the user clicked and dragged something in Window 1, Lock B will already be locked by the system, because it is processing some message from the Window 1 event queue, and then your application code will lock Lock A because it wants to manipulate the Data.

Because we designed our application accordingly, the manipulation of the Data will trigger a notification so that Window 2 will update. And here it is important to understand, that the notification has to happen in a certain way, which is it needs to be asynchronous. Why does it need to be asynchronous and how is this achieved? To answer this question, let's look at how a synchronous notification could be implemented and why that will prove to cause a deadlock in certain situations.

class Data {
public:

	class Listener {
		virtual void DataChanged() = 0;
	};

	...

	bool Lock()
	{
		return fLock.Lock();
	}
	void Unlock()
	{
		fLock.Unlock();
	}

	void ManipulateData()
	{
		...
		_NotifyListeners();
	}

	void AddListener(Listener* listener)
	{
		fListeners.AddItem(listener);
	}
	void RemoveListener(Listener* listener)
	{
		fListeners.RemoveItem(listener);
	}

private:
	void _NotifyListeners()
	{
		int32 count = fListeners.CountItems();

		for (int32 i = 0; i < count; i++) {
			Listener* listener = (Listener*)fListeners.ItemAt(i);
			listener->DataChanged();
		}
	}

	BList	fListeners;
	BLocker	fLock;
};

This is our Data class with a way to manipulate the data and trigger notifications when the data changes via the embedded Listener class. Just derive a class from Data::Listener, attach it to the Data via AddListener() and you are ready to receive notifications. This works because the Data implementation will call the hook function DataChanged() which you can implement to react in which ever way you see fit. Our windows will contain views to display the data. The classes could be declared like this:

class DataView : public BView, public Data::Listener {
public:
	DataView(Data* data)
		: BView(...),
		  fData(data)
	{
		// data is assumed to be locked
		fData->AddListener(this);
	}

	~DataView()
	{
		// data is assumed to be locked
		fData->RemoveListener(this);
	}

	...

	virtual void DataChanged()
	{
		if (!LockLooper())
			return;

		Invalidate();

		UnlockLooper();
	}

	virtual void Draw(BRect updateRect)
	{
		if (!fData->Lock())
			return;

		// ... render the data

		fData->Unlock();
	}

	virtual void MouseMoved(...)
	{
		if (!fData->Lock())
			return;

		fData->ManipulateData();

		fData->Unlock();
	}

private:
	Data* fData;
};

Notice how the looper (the window which contains the view) has to be locked in DataChanged(). This is because the manipulation of the data could be happening in any thread, but the thread of the view's window needs to be locked before we can use the Invalidate() BView method.

And here we have produced a nice setup for a typical deadlock! Just assume that the user is manipulating the data in Window 1, but that Window 2 - for some reason - needs to render the data already, and blocks on the data lock (happens in DataView::Draw()). The window thread is already locked (Lock C), since it reacts to an event which leads to DataView::Draw() being called. So again, the window thread of Window 2 is locked (Lock C), and now it is trying to get the data lock (Lock A) in the Draw() function. But Window 1 on the other hand already has the data lock (it was going to manipulate the data), so Window 2 is blocked at this point. Manipulating the data in Window thread 1 will trigger a notification, which in turn arrives in DataView 2 in Window 2, and wants to lock Window 2 from a different thread (Window thread 1). So Window 1 blocks on the Window 2 lock (Lock C) while holding the Data lock, but Window 2 is itself already locked and blocking on the Data lock! Each thread is blocking in this situation waiting for each other to release another lock, no thread can continue to run and no lock is ever released. The application will freeze.

There is simply no way to avoid this problem with synchronous notifications. If synchronous notifications have to be implemented in a way that eventually requires to obtain other locks which other threads are already holding, which in turn are blocking on a lock that the notifying thread already holds, then there will always eventually be a deadlock situation.

The way out are asynchronous notifications. These are easily implemented in Haiku via messaging. Here is how the code could be extended to implement asynchronous notifications:

class DataView : public BView, public Data::Listener {
public:
	DataView(Data* data)
		: BView(...),
		  fData(data)
	{
		// data is assumed to be locked
		fData->AddListener(this);
	}

	~DataView()
	{
		// data is assumed to be locked
		fData->RemoveListener(this);
	}

	...

	virtual void DataChanged()
	{
		Looper()->PostMessage(MSG_DATA_CHANGED, this);
	}

private:
	void _DataChanged()
	{
		Invalidate();
	}

public:
	virtual void MessageReceived(BMessage* message)
	{
		switch (message->what) {
			case MSG_DATA_CHANGED:
				_DataChanged();
				break;

			...

			default:
				BView::MessageReceived(message);
				break;
		}
	}


	virtual void Draw(BRect updateRect)
	{
		if (!fData->Lock())
			return;

		// ... render the data

		fData->Unlock();
	}

	virtual void MouseMoved(...)
	{
		if (!fData->Lock())
			return;

		fData->ManipulateData();

		fData->Unlock();
	}

private:
	Data* fData;
};

Notice how the DataChanged() hook function does nothing but send a message. Message sending does not require to lock the looper (the window thread) - which is the whole point. Then later (asynchronously) the window processes the message in it's own thread. If the window was already blocking at the time the notification was generated in another thread, then that's no problem, the code path for the notification will run unaffected.

By the way, a nice method to analyze deadlocks is by using BDB, which comes with the BeOS developer tools. At anytime, if you run an application and it freezes and you think it might be a deadlock, you can simply launch BDB. In the running teams list, double click the application to have a look at the stack crawl of each thread. You can see which locks each thread is trying to grab and which ones it already holds.

The Next Steps

The application design I have just outlined has one problem. It only works well when the data lock is only held for short periods of time. It is no suitable solution for when some threads of the application need to make expensive computations with the data and would need to hold the data lock for the whole time. My next article on multithreading will therefor focus on the concept of making cheap snapshots of data in order to keep the times short for which the lock needs to be held. This is the concept I used to implement my new prototype of the asynchronous rendering in the eventual next WonderBrush version.

Next in series: Using Snapshots For Short Locking Times

Comments

Re: Understanding the Requirements of Multithreaded Applications

Very interesting and very clear description, worth a read! I'm looking forward to the next article!

Re: Understanding the Requirements of Multithreaded Applications

Oh thanks! I was a bit worried if it is as clear as I hoped. Maybe I saw some way to improve already. I thought the introductory section about parallel access to data could be more clear on how bad this really is, since it isn't for example clear, that even invalid memory might be accessed. The purpose of the first section is to make it absolutely clear, that locking is _really_ needed in certain situations.

Thanks for the positive comment! Don't know when I will get to the next one. I have not started yet.

Re: Understanding the Requirements of Multithreaded Applications

Thanks for the article Stephan, nice to read some technical stuff on the Haiku site, it's a shame those articles seemed to stop after the end of the newsletters. I wonder if it would be possible to show the deadlock situation with the aid of some sort of diagram to show what function calls what function on which thread.

I've written an app with the drawing multithreaded, where each thread has its own copy of all the data needed for drawing. The actual arrays are shared between threads though because the data doesn't change much - when it does the new threads just get a different pointer to the new memory. It seems to work pretty well, and .net memory management takes care of clearing up the old arrays when there's no longer a reference to them (I hope). I'm looking forward to the next article in the series!

Re: Understanding the Requirements of Multithreaded Applications

Stephan, you write really well. Its a joy to read your articles.

Re: Understanding the Requirements of Multithreaded Applications

Very nice article. Explains threads nicely.

I don't code on a daily basis but having clear understanding of such concepts is very important.

I was in a confusion when I had to decide whether to use thread or not for my application (pypt-offline).
Surprisingly, I found that there are two groups of programmers, one who believe threads are crap others believe threads are wonderful/powerful.

Personally, if you are designing a network based application, I truly believe you need the application to be threaded.

The way you explained the importance of locking was very good.

Re: Understanding the Requirements of Multithreaded Applications

Quote:

Surprisingly, I found that there are two groups of programmers, one who believe threads are crap others believe threads are wonderful/powerful.

Unfortunately the programmers who think threads are crap are dinosaurs with their heads in the sand (to mix a few metaphors) because the future of efficient programming lies in multi-threading. Why? Because Moore's Law has hit a roadblock and the only way to make processors faster with current technology is to add cores. More cores means more threads running truly concurrently. Therefore if you don't multi-thread your applications, they won't run any faster on newer processors. Of course not everything can be made multi-threaded, but a lot can.

Intel is really pushing multi-threading and in fact has released their Thread Building Blocks library as open source (GPL v2 with runtime exception.) One day I plan to investigate porting this to Haiku, as it may better help developers cope with threads.

Re: Understanding the Requirements of Multithreaded Applications

The decision to keep the BLooper locked while a message is being processed seems strange to me. I would only keep the lock to dequeue the message and then release it before performing the action associated with the message. That way, you don't need a recursive mutex for the BLooper.

An invalidate() could be performed asyncronously by enqueuing an invalidate message. The advantage is that the areas of consecutive invalidate messages can be merged. The disadvantage of asyncronous redraws is that efficient scrolling gets problematic.

I would be seriously interested to know why the decision was made to keep the BLooper locked in the Haiku API.

Re: Understanding the Requirements of Multithreaded Applications

Edit: This was meant as a reply to Ryan.

Working with threads is fine if you know exactly what you're doing. But it can get very problematic if several people with different levels of programming experience are working on one multithreaded project.

Therefore, many people are sceptical of working with threads and locks in the classic way. The goal is to find a mechanism that is less error prone. A messaging system combined with copy on write semantics and garbage collection is one possible solution. I think Qt4 is actually going this route.

Re: Understanding the Requirements of Multithreaded Applications

Regarding Intel's open sourced TBB, I'm too very impressed. I've played with it since few weeks at work, and it's a very handy library indeed. The days of actual parallel (not just the usual gui + worker threads, but the threading a lot of tasks) algorithms are coming, and TBB help greatly.

I've looked for OS dependency, and beside the lack of conditional variables, it should be very easy to add BeOS/Haiku support to TBB. What could be tricky, however, is GCC 2.95.3, as TBB C++ may (or not, dunno yet) use some advanced template features that such old C++ compiler may chock on. Will see.

----
Philippe Houdoin, occasional OpenGL & Networking team leader ;-)

Re: Understanding the Requirements of Multithreaded Applications

Isn't the OSS version of TBB GPL? That sort of puts a damper on it :(

I see above a mention of "runtime exception" - what does that mean? I'm going to read now...

Update: Ok - interesting... I'm not sure why they didn't just write a new license for that.

Re: Understanding the Requirements of Multithreaded Applications

Quote:

Working with threads is fine if you know exactly what you're doing. But it can get very problematic if several people with different levels of programming experience are working on one multithreaded project.

I don't disagree. Threads are certainly complicated, and even after taking a course on threading in college and working with BeOS and Haiku for a while, I won't claim I am an expert. But regardless threads and parallel programming is here to stay, and my main argument in my last post was to say we should accept threading and learn to cope with it.

We do that by better educating ourselves and others on how to properly use threads and semaphores and similar concepts, and also by making better frameworks. So we are on the same page in regards to seeking less error prone methods. That is why I mentioned Thread Building Blocks. It also wouldn't hurt to see what is happening with Qt and other frameworks as well. By the way the Haiku GUI Layout API was primarily inspired by Qt's layout system and quite a few Trolltech employees use to work for YellowTab. So Qt is certainly something Haiku developers would be willing to learn from.

But even with higher-level frameworks sometimes you have to drop to the lower level, so I think knowing the basics of mutexes, etc. is a good thing.

Re: Understanding the Requirements of Multithreaded Applications

This is a nice article. Is there a follow up article with more Haiku coding examples?