Basic concurrency in Rust and it's API design

Introduction

Concurrency is a fundamental concept in modern programming that allows multiple tasks to execute concurrently, improving the performance and responsiveness of applications. Rust, a systems programming language known for its focus on safety and performance, provides powerful concurrency primitives that help developers write concurrent code without sacrificing safety.

A little bit about me: I only have 2+ years of experience in Rust. If I were a Go or Java or C++ programmer, I should never talk about concurrency. Doing concurrency in C/C++ is like playing with raw fire. We rely on abstractions like the Boost library and hope that clients don't misuse the APIs in ways that break invariants and lead to undefined behavior. As someone who doesn't have much experience to debug such code, I've never used them.

Now let's explore the concurrency landscape in Rust. This blog post focuses on the code that Rust won't allow, showcasing Rust's strengths compared to other languages. The actual code for each concurrency model is available here. You can copy the source code from each file and paste it into the Rust Playground to run them.

Difficulties of transitioning from single-threaded to multi-threaded code

Complexity arises when we decide to run tasks concurrently, and Rust helps prevent issues even before running your code.

Local variables and references have bounded lifetimes, whether they're allocated on the stack or the heap. There's no guarantee that references (mutable or immutable) to them will still exist when used across threads. Rust errs on the side of caution and refuses to compile the code below.
We can't use a variable without initializing it first. Declaring a variable in the main thread and initializing it in a spawned thread is not allowed. Such mistakes won't happen in safe Rust.

    let stack_local: &str = "Lives in main thread";
    let heap_local: HashSet<usize> = HashSet::new();
    spawn(|| {
        println!("{} {:?}", stack_local, heap_local);
    })
    .join()
    .unwrap();

On the other hand, static values have an unbounded lifetime (as long as the program runs), and sharing them between threads produces race conditions unless there is no mutation. However, Rust's static values are immutable by default, so sharing them doesn't result in race conditions, as no one can mutate them.
What about mutable static values? Without synchronization, we get undefined results. That's why mutating static variables in Rust is considered unsafe and should only be done within an unsafe block. This is different from most languages where modification can occur whenever and wherever you want.
Rust has more restrictions compared to other languages to prevent these bugs. The code below won't compile, even though we're only reading the value across the threads. Rust assumes the worst to enforce synchronization or the use of unsafe blocks to handle the responsibility. This is how all the concurrency APIs are designed: to force programmers to follow the rules.

  static mut MONTHS: u8 = 12;
    spawn(|| {
        println!("{}", MONTHS);
    })
    .join()
    .unwrap();

Channels are used to communicate between different threads. But what happens once we send data to the channel and attempt to use it again in another thread or even in the same thread after it's sent? Most programming languages like Go and others allow this behavior. This is because their type systems cannot express these invariants in code without language extensions to prevent such occurrences.
In Rust, due to the single ownership rule and the signature of the channel sender, such occurrences are prevented at compile time. Once values are sent to the channel, we no longer have access to them without encountering a compile error.

let data = vec![1, 2, 3, 4, 5, 6];
    let (sender, rec) = channel();
    // Data is moved to the thread closure.
    // Enforced by the compiler.
    spawn(move || {
        // Data is moved again to the channel thread, available only at the receiver.
        sender.send(data).unwrap();
        // We can't even use data in the same thread after the above code.
        println!("{:?}", data);
    })
    .join()
    .unwrap();
    // The data was moved to the above thread due to move semantics.
    // We're not allowed to use it here.
    println!("{:?}", data);

Synchronization ensures that memory access, whether from single-threaded or multi-threaded contexts, is correct. This is achieved through atomic operations provided by hardware instructions. As long as the low-level primitives are implemented correctly, the abstraction at the high level can't be misused. For example, in a read-write lock where multiple reads and a single write permission to shared data are given. Let's say our data is initialized to 20. When a write access is needed to the data, which lives in multiple threads, it must be locked. This ensures that any readers or writers are blocked, preventing reading an incorrect state. Once the writer writes the data and releases the lock, the result is reflected in any reader or writer. The synchronization guarantees correct memory access and ordering.

However, achieving this correctly in a programming language can be a nightmare. Forgetting to acquire a lock before touching the data affects correctness, or forgetting to release a lock after using the data can lead to deadlock, where no writer or reader can access the data. Due to these challenges, Python doesn't support concurrency at all. Python is single-threaded because of the Global Interpreter Lock (GIL), though there's a proposal within Python internals to remove the GIL.

Rust's synchronization APIs are designed using type states to encode concurrency protocols within their types. This ensures that sharing data with a thread is done by following the API requirements, guaranteeing race-free and synchronized code.

Concurrency Models

Lock-free concurrency involves spawning different threads for different tasks, returning results to the main thread, and aggregating the results. No synchronization is needed beyond joining each spawned thread to communicate with the main thread.
Message Passing involves isolated tasks running independently, communicating through channels, while the actual computation happens in the spawned threads. Threads are independent of each other on the same channel, only waiting at the receiver to collect results from all senders.
Asynchronous programming allows tasks to run concurrently without blocking and without even using multiple threads. In conventional sequential program, a function returns results as soon as they're available. However, in asynchronous programming, functions return a Future that's either ready to run or can be polled to resume later without blocking other tasks. This type of concurrency is suitable for tasks involving I/O and web servers. We will talk more about asynchronous programming in the coming post as it's the backbone of Rust backend frameworks like Actix and Axum.
Shared memory models introduce challenges like race conditions, data races, and deadlocks. In this model, data is shared among all threads running on a machine. Without synchronization, reasoning about the outcomes becomes more difficult.

Rust's concurrency safety guarantees rely on two marker traits: Send and Sync. These are called marker traits because they mark certain types and have no size themselves just information to the compiler at compile time. Thanks to these traits, types like Rc and RefCell cannot be sent between threads, as they are not Send or Sync. Rust's thread APIs accept types that implement either Send or Sync, depending on the usage pattern. Most types are Send by default. Here are types that implements Send or Sync. These types don't have special capabilities or methods; they provide type requirements to give guarantee about usage of concurrency primitives. Refer to the following code,

fn main() {
   // Compilation error
    accept(SomeType);
// Compiles
    accept(OurType);
}
// Rust's type system can represent empty types

// Marker trait
trait Concurrency {
}
// Unit structs, having no size
// and no constructor to initialize.
struct OurType;
struct SomeType;
impl Concurrency for OurType {}
// This function only accepts types that are concurrency-compatible
fn accept<T: Concurrency>(i: T) {}

These marker traits provide compile-time guarantee for our types. It's up to us to avoid violating the invariants. These types enable efficient implementation of type state patterns without runtime overhead, useful for modeling invariants, not just for concurrency. These types are known as Phantom Types.

Rust supports various concurrency paradigms without imposing the use of specific concurrency models. All data structures in the standard library and built-in types are single-threaded by default. We can choose the appropriate concurrency models without worrying about causing data races. As the Rust creator aptly said:

If the language is only good at one thing, it'll be a failure.

Message Passing

Rust channels are generic, in fact, all concurrency primitives are generic. This greatly reduces the need to create different primitive types from scratch. This generality also extends to user-defined types. Threads communicate in channels by passing messages to receivers. Channels can be either bounded or unbounded. Bounded channels block the current thread if the queue exceeds the specified capacity.

Bounded channels may use less memory than unbounded channels if we have prior knowledge about the task just like how having a upfront allocation size for the Vector or String types. With an unbounded channel, we can send as many tasks as needed without blocking other threads.

We don't need to worry about race conditions for computations where the execution order doesn't matter, and the result remains the same. Here's an example of such a situation,

  let m = Arc::new(Mutex::new(10));
    let clone_1 = m.clone();
    spawn(move || {
        *clone_1.lock().unwrap() += 4;
    })
    .join()
    .unwrap();
    let clone_2 = m.clone();
    spawn(move || {
        *clone_2.lock().unwrap() += 8;
    })
    .join()
    .unwrap();
    println!("{:?}", *m.lock().unwrap());
    assert_eq!(*m.lock().unwrap(), 22);

If the first thread acquires the lock, the data becomes 14. This is synchronized, so when the second thread acquires the lock, it adds 8 to 14, resulting in 22. The same process occurs if the second thread acquires the lock first, with a different adding step (18 + 4), but the end result remains the same, as long as the additive properties hold true: 18 + 4 = 14 + 8. But not all computation fall into this category.

The advantage of message passing through channels and independently spawning threads is that we don't have to worry about race conditions and data mutation. This is why functional languages like Haskell, Scala, Erlang, and Go provide first-class support for writing concurrent code in terms of message passing.

Shared Memory Model

Limitation of message passing is that they were only able to run tasks that are independent of each other. But not all tasks are expressible in those patterns. A lot of dangerous words like deadlock, data races, and race conditions are mostly associated with the shared memory model. Some more difficulties are forgetting to release the lock after touching the data and,forgetting to acquire the lock before touching data often leads to undefined results which are harder to reproduce because of the time-dependent of threads. Some Rust rules are bent here but still promise the safety guarantees. In Mutex types and Cell types, we have interior mutability rather than inherent mutability.

//inherent mutability
//Note the mut in front of x
let mut x = 10; 

//We can't mutate the y but we can mutate x through y.
let y = &mut x; 
*m=11;

//Interior mutability.
//we can mutate the data inside the refcell without even using mut infront of them
let m = RefCell::new(10); 
 *m.borrow_mut()+=1;

Rust concurrency tools are built around type state pattern. You can perform operations only when they are in a particular state. For example, how we can access the data inside the mutex. The steps are stated so we can't transition from one state to an arbitrary state and causing difficulties.

1) The only way to access the data inside the lock is to call the lock method on the mutex 2) The return type of lock is Result<MutexGuard> 3) We must handle the error using pattern matching or using the method defined on the result type to get the inner value i.e unwrap them or similar methods 4) If Result is Err variant ,Rust would panic so we can't continue afterward. But if the result is Ok variant then we have MutexGuard 5) MutexGuard contains the actual type we shared to the thread when we put them inside the mutex when initialized 6) MutexGuard implements the Deref and DerefMut traits. Due to these traits, we can call the methods of the inner type without doing anything manual. At this stage, we can do whatever with the data. 7) Finally the mutex is unlocked when the scope ends or we explicitly drop them before the scope end which is needed in some situations.

The downside of Mutex is that it's a blocking operation because even if we want to read the data we have to lock it first which blocks other threads from accessing them. The alternative to mutex is RwLock(Read Write Lock) where reading is a nonblocking operation. In RwLock we have two operations read and write. When we have read access to the data shared between threads from one thread, it doesn't block other threads from reading at the same time because multiple reads are okay. This is because threads without a write lock don't have permission to write the data, but this is not enforced in most used programming languages like Java.

Safety in RwLock:

Read Write Lock is a locking mechanism where multiple immutable references are possible without blocking and Unique mutable reference. When we call read on RwLock type we get the actual type wrapped in Result which must be handled before you can use them. Then we can call any number of immutable methods of that type. But we can't call any method that takes mutable reference or a method that takes ownership otherwise getting a compile error.

What about in other languages: This is problematic for mutable data types where calling the mutator method breaks invariant when we supposed to read which is results in undesirable consequences i.e the data is not what we are expecting. Look at the below buggy Java code which is successfully compiled and run. The full code is here.

// acquire the thread for writing
      writeLock.lock();
      try {          
          list.add("This should be allowed");
        }
      finally {
// release the lock so that other read threads are read concurrently without blocking
       writeLock.unlock();
        }

    // acquire the read lock for reading
     readLock.lock();
      try {
            System.out.println(list); //Still not modified
 //I do it on a purpose for this blog post but what about when accidentally perform 
// this operation
     list.add("This shouldn't be allowed when reading");
 //boom we are supposed to read here but we call any mutator
 //method here
     list.clear();

 //In java ,even though we have read_lock it doesn't mean we don't have 
// permission to write it. 
        }
        finally {
  // Explicitly unlock otherwise undefined behavior
            readLock.unlock();
        }
 //It throws exceptions since we clear the data(mutation) when we
 //supposed to have read access to the data.
        System.out.println(list.get(0));

The above code should not be allowed but allowed by Java because there is no difference between mutable and immutable methods and also there is no special about write lock and read lock apart from nonblocking when reading. These APIs are straightforward to misuse without any effort. Normally we don't encounter these kinds of mistakes with a client where we don't expose the internal details, only provide public APIs to interact with the code which is achievable through object-oriented concepts. In Rust, we don't often need documentation to properly use them. The signature of the API says more. They are encoded in the type signature, we get compile errors if we miss use them however not protect you from all kinds of errors.

One way to solve this issue in Java is, instead of generic locks we write read lock with @overrrideannotation to the mutator methods which throw an exception when accessing them inside read lock. Now let's look at the same behavior in Rust except we use a collection of integers since indexing is not supported for string types.

    let vec = Arc::new(RwLock::new(Vec::new()));
    let vec1 = vec.clone();
    spawn(move || {
        let mut write = vec1.write().unwrap();
        //Both read and write access are permitted but can't move out ownership.
        write.push(12);
    })
    .join()
    .unwrap();
    let vec1 = vec.clone();
    spawn(move || {
        let read = vec1.read().unwrap();
        //Only allowed to read .Can't mutate and move out the ownership.
        //mutator method
        read.clear();
        println!("{} {} ", read.capacity(), read.len());
    })
    .join()
    .unwrap();
    //This will not panic since the read lock doesn't allow the mutator.
    // We can't reach here as compiler stops at above
    println!("{:?}", vec.read().unwrap().get(0));

The above code couldn't compile in the Rust playground. When we call the write method on the RwLock type they return a mutable reference to the inner type because the write method implements both Deref and DerefMut traits.

When we call the read method on the RwLock type they are returning an immutable reference type which means we don't call any methods that take mutable reference which in turn guarantees that no mutations happen when we have read lock, which would be problematic for other readers reading concurrently in Java or may in other languages too.

When we have write permission we have all the mutators and also observers and exclude the static methods and ownership methods. These kinds of isolation while running concurrently are possible because of the Rust differences in mutable and immutable reference and clever use of traits like Deref,DerefMut. The Rust type-system saves us before running them. There is no explicit use of unlock on either of the threads since Rust calls them for us when they go out of scope. But explicitly calling drop is sometimes necessary for a situation like we acquire the lock on the same scope which leads to a deadlock so be sure to call drop on the first lock When you know that the lock is no longer necessary so other threads are able to acquire lock without waiting for scope ends.

In the previous blog post we mention that the clone overrided the meaning of actual cloning by providing different behavior depending on the context. Here the context is multi-threaded and the behavior increases the reference count of the Arc so that we can have multiple ownership because with each spawning thread, we move the data and at end of the scope the drop is called to decrease the reference count. Again, the drop behavior is customized here to provide an implicit way of releasing the lock which actually means we drop the owner at end of the scope and the result are reflected in the inner Arc. The clone is only implemented for a type that makes sense. Implementing a clone for a Mutex, RwLock doesn't make sense and it's an error to do that since mutexes are unique i.e only one thread at a time acquires the lock. If Rust is allowed to clone the mutex then we have a bug.

Ergonomics In Multithreaded Code

Writing multithreaded code involves a lot of explicitness like acquiring the lock, releasing the lock, dereferencing the underlying inner data, and Joining the threads.

Rust won't call lock implicitly but guarantees you that we can't touch without locking it first, which other languages mitigated weakly by providing documentation or hiding them under the abstraction
Locks are implicitly released by default when the lock goes out of scope. Default means we can override them by calling drop explicitly. There is an unlock method on Mutex but it's only available in nightly rust
Smart pointers like Arc, Rcimplicitly decrease the reference count when the cloned data scope goes out of scope. We can't send Rc to the thread boundaries cause Rust won't allow us to do that
Automatically dereferencing when the methods of the inner type are called. However, sometimes we need more control; in such situations, we can use parentheses to order the execution of the method
Scoped threads are implicitly joined which is important because forgetting to do that causes dangling pointers and similar memory errors which Rust doesn't allow by taking responsibility on our behalf. The code is here. But spawned threads are not implicitly joined but forgetting to join them doesn't cause memory errors since the tasks are terminated anyway when the main thread stops running
You can read this blog post if you want to know where Rust does the implicit conversion for us other than multi-thread code.

Some problems specific to Rust design

There is a synchronization problem associated with Copy types if we are not careful. When we dereference the mutex, we get the mutable reference to the inner type. But when the copy types are dereferenced and stored in a thread-local variable we get a clone of that data . The code is here

When doing method chaining it's important to know the operator precedence and the return types. This leads to confusing compile errors as they do not provide more context to debug them and logic bugs in our code.


 *clone.lock().unwrap().push(45);

The above code clone the arc and call the lock method on the mutex then returns the Vec and call the push method on them which returns the unit type, that's what the push method does. Finally, we dereference the unit type because in Rust the method calls have high precedence than the dereference operator.Thus the compile error. One solution is to store the result of unwrap in the thread local variable and perform an operation on that variable without a deference operator as the method calls are implicitly dereferenced to match the signature of the method. Another solution is to use parentheses in the appropriate places but which leads to compile errors if they are not put correctly around the method. The code and solution are here.


  *clone.lock().unwrap()

The above code tries to return the inner mutex types to the caller. This is compiled error for the move types but a logic error for the copy types which returns the clone of the inner types to the caller. So changing anything to that type doesn't synchronize with mutex types. Which should be avoided by directly performing an operation on them without storing them inside a thread scope. The code and solution are here.

Even though Rust is able to eliminate most of the concurrency bugs at compile time, It's still possible to get deadlock in Rust. The below code calls the lock method on the same scope twice which prevents either thread from releasing the lock.

        let mut lock_1 = clone.lock().unwrap();
        //first Lock
        *lock_1+=1;
        //second lock on the same scope.
        let mut lock_2 = clone.lock().unwrap();
        *lock_2+=1;

The solution is to explicitly use the drop function on the first lock before the second lock. The second lock is implicitly released at the end of the scope. The code is here.

Deadlocks and memory leaks most likely happen in code where interior mutability is involved like Mutex, RwLock, RefCell, UnsafeCell which creates a cycle if we are not careful.

These concurrency primitives are low level i.e gives you tools to build higher-level abstraction out of it. Unlike other languages, you can build an abstraction of out abstraction and so on.. without worrying about the indirection, and memory overhead because all the abstractions are naked out at compile time(Only if the types are known at compile time which is often the case for most tasks). Rust follows the zero-cost abstraction principles of C++.

Conclusion:

Rust provides a comprehensive set of tools for working with concurrency, making it possible to write efficient and safe concurrent programs. By leveraging threads, mutexes, and channels, Rust developers can achieve performance gains without sacrificing safety.