Writing a portable lock-free reader/writer lock

05 Oct 2017 in C++ / Lock / Algorithms / Data-structure / Concurrency / Lock-free

A portable C++ reader/writer lock implementation without using locks

It is sort of puzzling for me why C++ standard doesn’t have a implementation of reader/writer lock. So just like every other C++ developer, I’ve decided to roll my own.

Getting started

A quick refresher if you haven’t looked at a reader/writer lock in a while. A reader/writer lock supports:

exclusive write operation - only one writer can modify the data at the same time
concurrent readers - there can be multiple readers reading the data at the same time

It is great for operations where write is infrequent and performance of read operations are more important needs to have minimal overhead.

Design considerations

A typical implementation might be using semaphore / mutex, such as documented in wikipedia. However, this has a major disadvantage that it requires entering a lock mutex every time a reader comes in, and this can be expensive as this usually leads to a kernel transition that are costly and can become a bottle neck when multiple readers are trying to access.

If we think about this a bit more, we really have 3 states:

Free - any reader/writer can come in
One or more readers - no writer can come in
One writer - no one else can come in

We can effectively use one unsigned integer count to represent these states:

Free - 0
Has a writer - 0xffffffff
Has one or more readers - reader count, > 0

The reason we are maintain the reader count (instead a bool whether there are readers), is to make it easier to track whether there are readers - we just need to increase the counter when a reader comes in, and decrease the counter when the reader leaves.

The advatange of this approach is that we only need to atomicly operate on this count, without having to using lock to protect any region of code.

Of course, the trick is to how to make it atomic without using any locks.

Making it atomic

In C++, std::atomic support an operation called compare_exchange_weak/compare_exchange_strong. This is the magic key to lock-free operation on the count. It tries to update a variable atomically if and only if the variable matches the expected value and then to the new value, atomically. This is important because another thread might be updating the value at the same time.

Let’s use increasing the reader count as an example:

int retry = 0;
while (true)
{
    // DO NOT worry about writer access for now - this is for demonstrating atomic operation only

    int prev_readers = _readers;            // current count
    int new_readers = prev_reader + 1;      // new count - note this is using the local value prev_readers 
                                            // in case _readers has changed in between
    if (_readers.compare_exchange_weak(prev_readers, new_readers))
    {
        // we've won the race
        break;
    }

    // we've failed, retry
    retry++;
    if (retry > RETRY_COUNT)
    {
        retry = 0;
        std::this_thread::yield();
    }
}

There are a few things to keep in mind:

We use a local variable to read off the current value and use that to calculate the new value. We can’t use the current value again to calculate the new value because the current value might have changed in between. The point here is to read _readers only once and keep that value for future access.
Once we obtain the current value into expected value and the new value, the goal of the current loop iteration is decided - it’ll update the _readers variable only if the value is expected value and then to the new value. For example, if the current value is 6, the goal of the current iteration is to update 6 to 7. If we use _readers to calculate the new value, it might become 8.
We need to keep retry atomic update in case some other threads is also updating _readers. For example, if we decide to update _readers from 6 to 7, and another thread come in updated 6 to 7 (remember it is also doing the same thing), we need to retry again, and this time, our goal is to update it from 7 to 8. Rinse and repeat.
One important point in the retry: if we’ve failed multiple times, it is often a good idea to tell the OS thread scheduler “hey, I’ve tried many times, perhaps it is time to give other threads a chance to run”. This will both save power (busy loop makes the thread/core busy) and also let other threads in the system to preempt this thread. This is about being a good citizen and laptop users will appreciate it. The exact value of RETRY_COUNT is best found through extensive testing.

One common question that people often ask is: what is the difference between compare_exchange_weak vs compare_exchange_strong? The difference between the two is that compare_exchange_strong avoids spurious failures due to underlying hardware consistency models (not due to race condition), while compare_exchange_weak might, but can have better performance in some platforms. If you are doing a loop, you are already prepared for failures due to race conditions, so you are already protected against failures in general, so in this case using compare_exchange_weak is a reasonable choice to get a bit better performance potentially in some platforms.

The R/W lock without locks

OK. Now that we get the basic out of the way, it’s time to implement the real reader/writer lock. Let’s review the state transition:

For a writer
- Taking a writer lock - from (FREE = 0) to (HAS_WRITERS = 0xffffffff)
- Releasing a writer lock - from (HAS_WRITERS = 0xffffffff) to (FREE = 0)
For a reader
- Taking a reader lock - from (reader_count >= 0) to (reader_count + 1)
- Releasing a reader lock - from (reader_count >= 0) to (reader_count - 1)

Let’s take acquire_writer_lock as an example:

void acquire_writer()
{
    int retry = 0;
    while (true)
    {
        uint32_t prev_readers = _readers;
        if (prev_readers == 0)
        {
            if (_readers.compare_exchange_weak(prev_readers, HAS_WRITER))
            {
                // we've won the race
                return;
            }
        }

        retry++;
        if (retry > RETRY_THRESHOLD)
        {
            // save some cpu cycles
            retry = 0;
            this_thread::yield();
        }
    }
}

This is very similar to what we had earlier, the only difference is the transition from 0 to HAS_WRITER.

void acquire_reader()
{
    int retry = 0;
    while (true)
    {
        uint32_t prev_readers = _readers;
        if (prev_readers != HAS_WRITER)
        {
            uint32_t new_readers = prev_readers + 1;
            if (_readers.compare_exchange_weak(prev_readers, new_readers))
            {
                // we've won the race
                return;
            }
        }

        retry++;
        if (retry > RETRY_THRESHOLD)
        {
            retry = 0;
            this_thread::yield();
        }
    }
}

acquire_reader is also very similiar. The difference is the check for HAS_WRITER - if there is a writer in progress, there is no point increasing the reader count

And to wrap it up, just like any other serious C++ programmer, we also need to add a holder for taking/releasing the lock, following the RAII pattern (assuming the class we have earlier is called rw_spin_lock).

class writer_lock
{
public:
    writer_lock(rw_spin_lock &lock) : _lock(lock)
    {
        _lock.acquire_writer();
    }

    ~writer_lock()
    {
        _lock.release_writer();
    }

private:
    rw_spin_lock &_lock;
};

You can find the entire code in github here

Hope this helps.

Writing a portable lock-free reader/writer lock

Getting started

Design considerations

Making it atomic

The R/W lock without locks

yizhang82's blog

Error

Getting started

Design considerations

Making it atomic

The R/W lock without locks

Templates (for web app):

Error