C++ Concurrency Bugs

One class of bugs which can be particularly hard to find are concurrency bugs. The purpose of this article is to discuss some common types of concurrency bugs, as well as some techniques for finding and fixing them. Additionally, we'll examine how understanding what the CPU is doing behind-the-scenes can help avoid some subtle concurrency errors. I'll be using C++ for this, but some of this article will apply to other programming languages as well (especially C).

It's worth noting that this article does assume that we're developing in a Linux environment. I think there's a strong argument to be made that the tools available on Linux (and similar operating systems) are much better than those available on Windows for detecting these types of problems.

Common Bugs and Techniques for Finding Them

I would say that there are three main common concurrency bugs: data races, lock ordering problems, and deadlocks. These are all somewhat related, and the technique for finding and fixing them is fairly similar.

Data Races

Arguably the simplest type of concurrency bug is the data race. This occurs whenever two threads are trying to access the same piece of data at the same time. In general, trying to do concurrent reads - for example, with a constant variable - is perfectly safe. The real problem arises when one thread is writing to the variable while other threads are trying to read from it. Consider the following contrived example program:


#include <iostream>
#include <functional>
#include <mutex>
#include <thread>
#include <chrono>
#include <cstdint>

int main(int, char **)
{
    uint64_t value = 0;

    std::function<void()> fnA = [&]()
    {
        while(true)
        {
            ++value;
            std::this_thread::sleep_for(
                std::chrono::milliseconds(1));
        }
    };

    std::function<void()> fnB = [&]()
    {
        while(true)
        {
            std::cout << value << "\n";
            std::this_thread::sleep_for(
                std::chrono::seconds(1));
        }
    };

    std::thread threadA(fnA);
    std::thread threadB(fnB);

    threadA.join();
    threadB.join();

    return 0;
}

This type of bug can be extremely difficult to find with a debugger like GDB. The problem is that, even though there is a data race, the action the program actually takes may still be entirely reasonable. In the example above, the program could probably run forever without any weird behavior. Thus, we turn to Helgrind, which makes solving this type of problem extremely easy (I've removed some portions of the output for brevity):


$ valgrind --tool=helgrind ./race
[...]
==6644== Possible data race during read of size 8 at 0xFFEFFF418 by thread #3
==6644== Locks held: none
==6644==    at 0x400FC1: operator() (race.cpp:26)
[...]
==6644==
==6644== This conflicts with a previous write of size 8 by thread #2
==6644== Locks held: none
==6644==    at 0x400F92: operator() (race.cpp:16)
[...]
==6644==  Address 0xffefff418 is on thread #1's stack
==6644==  in frame #4, created by main (race.cpp:9)
==6644==
[...]

Because we have line numbers for the place we performed the read as well as the write, it should be fairly obvious where the problem lies. The solution is, generally, to just add a mutex (or, better yet, to just make the data atomic if circumstances allow it).

Lock Ordering Problems

One problem that can be very serious is that of lock ordering. Any time two critical sections running in two different threads are guarded by more than one of the same mutexes, there is the potential for a deadlock if the locks are not acquired in the same order in each thread. Consider the following example program:


#include <iostream>
#include <functional>
#include <mutex>
#include <thread>
#include <chrono>

int main(int, char **)
{
    std::mutex mutexA;
    std::mutex mutexB;

    std::function<void()> fnA = [&]()
    {
        while(true)
        {
            {
                std::lock_guard<std::mutex> lockA(mutexA);
                std::lock_guard<std::mutex> lockB(mutexB);

                std::cout << "A\n";
            }

            std::this_thread::sleep_for(
                std::chrono::microseconds(10));
        }
    };

    std::function<void()> fnB = [&]()
    {
        while(true)
        {
            std::lock_guard<std::mutex> lockB(mutexB);
            std::lock_guard<std::mutex> lockA(mutexA);

            std::cout << "B\n";
        }
    };

    std::thread threadA(fnA);
    std::thread threadB(fnB);

    threadA.join();
    threadB.join();

    return 0;
}

Despite being - yet again - contrived, this example illustrates the problem pretty clearly. If you try running this program, it will print out several lines, but eventually the following sequence of events will occur:

Thread A acquires a lock on mutex A.
Thread B acquires a lock on mutex B.
Thread A attempts to acquire a lock on mutex B, and blocks.
Thread B attempts to acquire a lock on mutex A, and blocks.

At this point, our two threads are completely deadlocked. The worst part is that this type of problem can be very unpredictable, since it will only occur when the timing happens to work out such that the two threads are acquiring their locks at the same time. So how do we detect this problem easily?

I would yet again recommend Helgrind. The good thing about this tool is that it makes the problem very obvious to solve (usually), and (perhaps more importantly), it will produce an error message even if the deadlock doesn't actually occur. This can be great in real-world situations, since the deadlock may occur only very, very rarely (depending on how the code is structured). The output from Helgrind makes the problem very clear (again, I've removed some portions of the output):


$ valgrind --tool=helgrind ./lockorder
[...]
==6588== ----------------------------------------------------------------
==6588==
==6588== Thread #3: lock order "0xFFEFFF400 before 0xFFEFFF430" violated
==6588==
==6588== Observed (incorrect) order is: acquisition of lock at 0xFFEFFF430
==6588==    at 0x4C30A36: pthread_mutex_lock (hg_intercepts.c:593)
==6588==    by 0x4010AE: __gthread_mutex_lock (gthr-default.h:748)
==6588==    by 0x4010AE: lock (mutex:134)
==6588==    by 0x4010AE: lock_guard (mutex:414)
==6588==    by 0x4010AE: operator() (lockorder.cpp:32)
[...]
==6588==
==6588==  followed by a later acquisition of lock at 0xFFEFFF400
==6588==    at 0x4C30A36: pthread_mutex_lock (hg_intercepts.c:593)
==6588==    by 0x4010D1: __gthread_mutex_lock (gthr-default.h:748)
==6588==    by 0x4010D1: lock (mutex:134)
==6588==    by 0x4010D1: lock_guard (mutex:414)
==6588==    by 0x4010D1: operator() (lockorder.cpp:33)
[...]
==6588==
==6588== Required order was established by acquisition of lock at 0xFFEFFF400
==6588==    at 0x4C30A36: pthread_mutex_lock (hg_intercepts.c:593)
==6588==    by 0x401161: __gthread_mutex_lock (gthr-default.h:748)
==6588==    by 0x401161: lock (mutex:134)
==6588==    by 0x401161: lock_guard (mutex:414)
==6588==    by 0x401161: operator() (lockorder.cpp:17)
[...]
==6588==
==6588==  followed by a later acquisition of lock at 0xFFEFFF430
==6588==    at 0x4C30A36: pthread_mutex_lock (hg_intercepts.c:593)
==6588==    by 0x40118A: __gthread_mutex_lock (gthr-default.h:748)
==6588==    by 0x40118A: lock (mutex:134)
==6588==    by 0x40118A: lock_guard (mutex:414)
==6588==    by 0x40118A: operator() (lockorder.cpp:18)
[...]

So the problem is that we acquired lock B and then lock A on lines 32 and 33, but this conflicts with the previous acquisition of lock A and then lock B on lines 17 and 18. The solution is simply to re-order the locks in either place so that the ordering matches.

Another decent way to find this type of problem is to use GDB. More or less, the process in this case is to run the program until the deadlock is encountered, and then to inspect the thread state. So, for example:


$ gdb ./lockorder
GNU gdb (Gentoo 7.7.1 p1) 7.7.1
[...]
Reading symbols from ./lockorder...done.
(gdb) r
Starting program: /home/ajr/temp/lockorder
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6ffe700 (LWP 6597)]
A
[New Thread 0x7ffff67fd700 (LWP 6598)]
B
B
B
B
^C
Program received signal SIGINT, Interrupt.
0x00007ffff73af6af in pthread_join () from /lib64/libpthread.so.0
(gdb) info thread
  Id   Target Id         Frame
  3    Thread 0x7ffff67fd700 (LWP 6598) "lockorder" 0x00007ffff73b4ec4
       in __lll_lock_wait () from /lib64/libpthread.so.0
  2    Thread 0x7ffff6ffe700 (LWP 6597) "lockorder" 0x00007ffff73b4ec4
       in __lll_lock_wait () from /lib64/libpthread.so.0
* 1    Thread 0x7ffff7fbb740 (LWP 6593) "lockorder" 0x00007ffff73af6af
       in pthread_join () from /lib64/libpthread.so.0

The hint we're interested in here is that we have two threads waiting for a lock (in ___lll_lock_wait). Whenever the program is deadlocked, and two (or more) threads are waiting for a lock, a lock order problem is a decent suspect. We can verify this by switching to the stuck threads in GDB (using the "thread" command"), and then inspecting the backtrace (with the "backtrace" command) to see what mutexes they're waiting on. This isn't quite as obvious as what Helgrind told us, but it gets the job done (and it can be useful in cases where Helgrind can't be used - for example when it slows the program being analyzed down too much).

Deadlocks

The techniques discussed for detecting lock order problems are also useful for finding deadlocks in general (even those resulting from problems other than lock order issues). Helgrind can detect a great number of different types of problems, but sometimes analyzing the program's state with GDB may be the only way to solve a deadlock, depending on the exact cause of the problem.

Signaling Condition Variables Without Locking

One situation where we can run into unexpected trouble is when we are signaling other threads that are waiting on a condition variable. Both the C++ API and the underlying pthreads API allow us to signal a condition variable without holding a lock on the mutex associated with the condition variable. It might seem that this is the proper way to do things, since adding extra locking when it isn't strictly needed only increases complexity and decreases performance. As we'll see, however, we can run into some weird bugs if we don't hold the lock and we aren't very careful.

Waking up the Wrong Thread

The main potential problem we run into when we signal without holding the lock on the mutex is that we might wake up a thread we didn't mean to wake up. If we do hold a lock, then it is guaranteed that any threads waiting on the condition variable will at least be able to contend for the mutex. Without holding the lock, though, we may accidentally wake up a thread that is waiting to acquire the lock, rather than a thread waiting on the condition variable. Consider the following example program:


#include <cstddef>
#include <mutex>
#include <condition_variable>
#include <cstdint>
#include <vector>
#include <memory>
#include <thread>
#include <functional>
#include <chrono>
#include <iostream>

const std::size_t THREAD_COUNT = 8;

int main(int, char **)
{
    std::mutex mutex;
    std::condition_variable condition;
    bool predicate = false;

    uint64_t wakeups = 0;
    uint64_t spuriousWakeups = 0;

    std::vector<std::shared_ptr<std::thread>> threads;

    for(std::size_t i = 0; i < THREAD_COUNT; ++i)
    {
        std::function<void()> threadfn = [&]()
        {
            while(true)
            {
                std::unique_lock<std::mutex> lock(mutex);

                while(!predicate)
                {
                    condition.wait(lock);
                    ++wakeups;

                    if(!predicate)
                        ++spuriousWakeups;
                }

                predicate = false;
            }
        };

        std::shared_ptr<std::thread> t(new std::thread(threadfn));
        threads.push_back(t);
    }

    std::thread signaler([&]()
    {
        while(true)
        {
            {
                std::lock_guard<std::mutex> lock(mutex);
                predicate = true;
            }

            condition.notify_one();
        }
    });

    std::this_thread::sleep_for(std::chrono::seconds(5));

    std::lock_guard<std::mutex> lock(mutex);

    double percent = (static_cast<double>(spuriousWakeups) /
        static_cast<double>(wakeups)) * 100.0;

    std::cout << "Wakeups:          " << wakeups << "\n";
    std::cout << "Spurious wakeups: " << spuriousWakeups << "\n";
    std::cout << "Percent spurious: " << percent << "\n";

    return 0;
}

The particular situation that might occur here is the following:

Thread A is waiting to lock the mutex (in threadfn).
Thread B is waiting on the condition variable (in threadfn).
Thread C holds a lock on the mutex (in signaler).
Thread C releases its lock on the mutex.
Thread A acquires a lock on the mutex.
Thread A begins executing.

Since thread C would have signaled the condition variable immediately after releasing the lock, without thinking too hard about how this program is working we might expect thread B to wake up instead of thread A.

In our example, though, we aren't just waking up the wrong thread. Thread A will see that the predicate is true, and then it will reset it to false. Then, thread C will still signal the condition variable, and thread B will wake up, only to have to re-wait on the condition variable. Although spurious wakeups can happen in this program even if we remove the scoping around the lock in signaler, the program as is will generally encounter more spurious wakeups than it would otherwise. This can reduce performance, and (depending on what exactly thread A does when it wakes up) other weird bugs can be encountered, up to and including crashes.

Unpredictable Crashes

For an example of how a crash might occur, consider the following program. The idea of this program is that each of the worker threads run for a little while, and then when they're done they decrement the count of threads that are still alive, and they notify the shutdown thread. When the shutdown thread wakes up and no other threads seem to be alive, it cleans up some program state and then returns.


#include <cstddef>
#include <vector>
#include <memory>
#include <thread>
#include <cstdint>
#include <mutex>
#include <condition_variable>
#include <functional>
#include <iostream>

const std::size_t THREAD_COUNT = 8;

int main(int, char **)
{
    std::vector<std::shared_ptr<std::thread>> threads;
    uint64_t count = 0;

    while(true)
    {
        std::size_t nthreads = THREAD_COUNT;
        std::mutex mutex;
        std::shared_ptr<std::condition_variable> condition(
            new std::condition_variable());

        std::thread shutdownThread([&]()
        {
            std::unique_lock<std::mutex> lock(mutex);

            while(nthreads > 0)
                condition->wait(lock);

            condition.reset();
        });

        for(std::size_t i = 0; i < THREAD_COUNT; ++i)
        {
            std::function<void()> threadfn = [&]()
            {
                {
                    std::lock_guard<std::mutex> lock(mutex);
                    --nthreads;
                }

                condition->notify_one();
            };

            std::shared_ptr<std::thread> thread(
                new std::thread(threadfn));
            thread->detach();

            threads.push_back(thread);
        }

        shutdownThread.join();
        threads.clear();

        ++count;
        std::cout << count << "\n";
    }

    return 0;
}

This program, in general, will run for a little while, and then it will eventually encounter a segmentation fault. The reason for this is that, sometimes, the shutdown thread will wake up, see that the number of threads remaining is zero, and destroy the condition variable. After this happens, one of the other threads can attempt to notify the condition variable. Consider the following sequence of events:

Worker thread decrements the number of threads, and unlocks the mutex.
Shutdown thread acquires its lock on the mutex, and wakes up.
Shutdown thread sees that the number of threads is now zero (after being decremented), so it exits its wait loop.
Shutdown thread destroys the condition variable and exits.
Worker thread calls notify_one().
A segmentation fault occurs.

Just like in the previous example, this is by no means guaranteed to happen with every iteration of the program, which is actually worse. The fact that this program works "properly" several thousand times before it even fails means that this type of a bug could be very difficult to find (in a larger, more complex application). In conclusion, it may be worth always holding the lock when signaling a condition variable, since otherwise the behavior of our program can be less obvious. Clearly it is possible to write bug-free programs without holding the lock, but doing so is arguably more difficult than it would be otherwise.