Zero-cost and yet memory-safe memory management abstraction in Rust
Explore what ownership and borrowing are in Rust
In the previous post, we saw how ownership and borrowing rules rule out most hard-to-debug memory errors at compile time, but we didn't talk about what ownership actually is. Let's dive in.
Ownership and borrowing are abstractions used to manage memory in Rust. But, they come at zero cost, meaning there is no runtime involved in managing memory.Instead, a set of rules exists at compile time to prevent memory related errors
. Just as type abstraction prevents us from performing meaningless operations, ownership and borrowing abstractions prevent memory errors that violate memory safety
.
Modern C++ introduces RAII (Resource Allocation Is Initialization), i.e., the objects (files, resources) or raw pointers encapsulated inside class objects. When the object goes out of scope, its destructor is called automatically. This is deterministic and reduces the attack surface without employing a garbage collector.
Even though modern C++ encourages you to use these APIs, they are still possible to misuse. Read this blog post about Why modern C++ still causes you a memory error. I'm not here to blame C++ in favor of Rust. I'm just saying that for people like me who came from dynamic programming languages like Python, the Rust compiler is more strict than a C++ compiler. But I definitely admit that Rust is not a beginner-friendly language when starting, but the payoff is worth it.
In Rust, every variable has a unique owner except for smart pointers. The owner is responsible for cleaning up the memory once it's done with it and also ensures that no other references to that object exist when cleaning up that memory—a task that the borrow checker checks using the lifetime to determine how long the memory access is valid.
System programming languages like C/C++, and Rust care about the performance and memory usage of a program and its control. Two ways to use memory in a program are to use stack and heap memory (these are not actual data structures). The stack is fixed and fast, while heap memory is slow and mutable. Each has different use cases depending on the context. The stack provides fixed but less storage, while heap memory depends on the RAM and currently available memory on the system. Copying stack data is trivial and fast, but copying heap data is not because it requires asking the operating system to allocate memory to clone or copy the entire data. That's why Rust moves the data (not cloning it). Stack data can be mutable too, but it cannot shrink or grow, which is what heap memory is for.
Moveable types are moved to a new variable when assigned where as copyable types are copied when assigned to a new variable. How does rust know whether the type is Copy or Move? This is where traits come in. The trait is like Haskell's type classes, swift's protocols, or java,c# interfaces. Here we only consider two traits that are in the std library copy and drop trait. If the type implements the Copy trait, then when assigned to a new variable it's cloned implicitly i.e deep copy. So that we can use the variable after it's assigned and the borrow checker won't complain.
fn main() {
let integer = 34;
let int_copy = integer;
let boolean = true;
let boolean_copy = boolean;
let float = 2.1;
let float_copy = float;
let character = 'M';
let character_copy = character;
let string_literal = "Sanjeevi";
let str_literal_copy = string_literal;
let array = [1, 2, 34];
let array_copy = array;
//only true when all the types inside the tuple themselves are copy types
let tuple = (integer, boolean, float, character);
let tuple_copy = tuple;
println!("{}", integer);
println!("{}", float);
println!("{}", boolean);
println!("{}", character);
println!("{}", string_literal);
println!("{:?}", array);
println!("{:?}", tuple);
}
Copy types are exceptions to ownership restrictions. Because they can be trivially copied, there is no reason to move them and uninitialized the source.
When the type is implemented with either a drop or no-copy trait, it is implicitly moved. This is why attempting to use the old value results in a compile error indicating the use of a moved value
. The Rust way of handling implicit and explicit moves is often for the better. Deep copying of stack data is inexpensive, but deep copying of heap-allocated data is not. Therefore, we must explicitly deep copying heap data by cloning them.
User-defined types, such as structs and enums, are implicitly moved by default unless all the fields are copyable and implement the Copy trait. In this context, move means that the stack's reference to the heap-allocated data has been moved, not the heap data itself (i.e., the actual data). In this scenario, moves are efficient because only the stack is moved, not the actual data residing in the heap.
The design choice of Rust
1) In C++, creating a std::string
and assigning it to a new variable causes it to clone implicitly, resulting in implicit costs. However, we have control over this process.
2) In Python, assigning to a new variable creates a reference count instead of cloning the entire data, unlike the above. However, it requires a garbage collector to clean up the memory, which is not predictable and lack control.
3) Rust can do either of the above. We can emulate both ways of managing memory in Rust. In the first scenario, we need to explicitly call the clone
method to create an independent copy of heap data. For the Python-like scenario, Rust provides smart pointer types to manage reference counts. Below is the code that emulates both approaches in Rust.
use std::rc::Rc;
fn main() {
//C++ way
let string = String::new();
let clone1 = string.clone();
let clone2 = string.clone();
//Python way
let rc_string = Rc::new(String::new());
//Cloning increase the reference count not the heap data
let reference_count1 = rc_string.clone();
let refernce_count2 = rc_string.clone();
}
Note that the reference counter (RC) in Rust has no capability to mutate the data inside it without using other wrapper types. This is because Rc can create cycles that cause them not to release the memory.
Ownership and its implications
fn main() {
let vector = vec!["Homogenous data", "Hi ,rustaceans"];
let vector_move = vector;
let string = String::from("Mutable heap allocated string type");
let string_move = string;
let vector_of_integers = vec![25, 2, 2023];
//We have to be explicit.
let clone_of_vector_of_integers = vector_of_integers.clone();
let tuple = (vector, vector_of_integers, string);
let tuple_move = tuple;
//Error occurred since they were moved to a new variable
println!(
"{vector:?} \n{string:?}\n{vector_of_integers:?}\n
{clone_of_vector_of_integers:?} \n{tuple:?}"
);
}
When the owned type is moved, the source becomes uninitialized, causing memory safety issues if Rust is allowed to access it. If you really want to access the source after it has been moved, you have to initialize it again to access it, as shown below:
let mut source = Vec::new();
let destination = source;
// Not accessible at this point
//println!("{:?}",source);
source = vec!["intializing again to access after moved"];
println!("{:?}", source);
It's trivial to create an implicit clone of copy types such as integers, floats, and string literals by using the indexing operation. However, this wouldn't work if the types are not trivially cloneable—no partial move is possible. Each string is owned, and the vector owns all the strings in it. If you want to take data out of it, this should be done using the methods of the vector, such as remove, pop, swap_remove, and replace. This is also what happens if the user-defined types themselves are move types, where we can't take out single fields.
let copy_type = vec![1, 4, 5];
let cloned = copy_type[0];
let cloned1 = copy_type[2];
let move_type = vec![String::from("Can't move out of"), String::from("Vector")];
let not_possible = move_type[0];
let not_possible1 = move_type[1];
Due to the single ownership restriction, we cannot use it inside the loop. This is because, on the first iteration, it is moved into the loop body, rendering the source uninitialized. We must either use iterator-based methods or smart pointers. Cloning also works, but it incurs a performance penalty.
let move_type = vec![String::from("Can't move out of"), String::from("Vector")];
for _ in 1..=10 {
move_type;
//move_type.clone();
}
The move occurs depending on the condition. It is either moved to the first function or the second function, but not both. Thus, after the control flow, the source is uninitialized.
let move_type = vec![String::from("Can't move out of"), String::from("Vector")];
let conditional_move = false;
if conditional_move {
move_if_true(move_type);
} else {
move_if_false(move_type);
}
//move_if_true(move_type);
fn move_if_true(_: Vec<String>) {}
fn move_if_false(_: Vec<String>) {}
Due to ownership restrictions and rules, Rust provides different methods on collections and options in a way that adheres to the rules. If some operation you want to perform is not allowed by the compiler, it's most likely that there's a method for doing that safely without resorting to an unsafe block.
But ownership is more restrictive than it has to be. Can we use references, like C++ references, but safely? Yes, we can. Enter borrowing, instead of taking ownership, we can temporarily borrow it. There is a restriction: the borrow is either immutable or mutable but not both at the same time
. Like move types, a mutable borrow moves the mutable borrows if assigned to new variables, since they are unique
. Instead of cleaning the memory when the scope ends, we end the borrow since we don't own the data. Like copy types, an immutable borrow copies the borrows because multiple immutable shared borrows are permitted as long as there is no overlapping mutable borrow. Sounds like concurrency, right? Yet this same restriction allows Rust to prevent data races at compile time. Yes, at compile time.
Imagine your friend has an iPhone, and you want to take stunning pictures. There are two things you can do: you may buy a new iPhone just for taking photos (expensive), or you can simply ask your friend to lend you his phone (more efficient).
fn main() {
let mut data = vec![1, 2, 3, 4];
//Multiple immutable references or aliases are allowed in the same scope
let immutable_reference_1 = &data; //reference to whole collection
let immutable_reference_copy = immutable_reference_1;
let immutable_reference_2 = &data[0];
//We cannot mutate data behind immutable references, just as with variables
// *immutable_reference_2=23;
let mutable_reference = &mut data;
//the value is moved here
let mutable_reference_move = mutable_reference;
//mutable_reference doesn't exist at this point
//*mutable_reference=vec![5,6,7,8,9,10];
//Here, we are dereferencing using the * operator, just like in C++
*mutable_reference_move = vec![5, 6, 7, 8, 9, 10];
println!("{:?}", data);
}
Try un-commenting the lines and see what your strict friend (the compiler) tells you about them.
Ownership and borrowing are crucial to understanding the rest of the rust language features. They are applied to local variables, function calls, methods, threads, data structures, and closures.
fn main(){
let mut vector:Vec<i32> =vec![999,666];
mutable_borrowing(&mut vector);
immutable_borrowing(&vector);
let new_owner=takes_and_return(vector);
//we can't call any function that accepts the variable 'vector'
//immutable_borrowing(&vector);
//mutable_borrowing(&mut vector);
moves_and_takes_ownership(new_owner);
}
fn immutable_borrowing(a:&Vec<i32>){
println!("{:?}",a);
}
fn mutable_borrowing(a:&mut Vec<i32>){
a.push(45);
}
//We can also return ownership to the caller
fn takes_and_return(a:Vec<i32>) -> Vec<i32>{
a
}
fn moves_and_takes_ownership(_a:Vec<i32>){
//new_owner is cleaned here if the function is called."
}
The order of the calling function is important. We can alternate between immutable and mutable function calls because they do not overlap with each other. We can call them as many times as we want, as long as the vector is not moved.
Look at the function signatures carefully. Just by looking at them, we can reason about what they might do. The second function call takes an immutable reference, so we can guarantee that there is no mutation inside the function body. This is the same for inherent methods, where self
is the first parameter of the method.
&self -immutable borrow.
&mut self -Mutable borrow so that we can mutate the self values inside the method body.
self -Takes ownership, after that, we are no longer able to call any methods defined on the self if the return type is unit type i.e returns nothing.
Ownership/borrowing in Closure and the Move keyword:
The closure is an anonymous function that captures the surrounding environment either mutably or immutably depending on how it's used inside the closure. Or moved inside the closure when Move keyword is used. As usual, the same rules apply. Move types moved to closure, copy types just deep copy the value.
fn main() {
let mut vector = vec![1, 2, 3];
//Implicitly mutably borrowed because of the push method called inside the closure.
let mut mutable_closure = || vector.push(56);
//calling closure.
mutable_closure();
//immutably borrowed since we are reading.
let immutable_closure = || {
println!("{:?}", vector);
};
immutable_closure();
// The closure takes ownership explicitly by moving.
let ownership_moved_closure = move || vector;
//Use of moved value
// println!("{:?}",m);
println!("{:?}", ownership_moved_closure());
let copy_type = std::f64::consts::PI;
let implicit_cloning = move || copy_type;
println!("{} {}", copy_type, implicit_cloning());
}
Region-based or scope-based memory/resource management, i.e., the values created and destroyed in the lexical scope where they are created, also prevents you from causing temporal memory safety issues. However, while region-based memory management is used to handle memory, Rust employs the same idea to release locks, close files and sockets, end borrows, and manage other resources implicitly, without explicit intervention. Humans are prone to forgetting things, especially as the codebase scales. This implicit handling feels automatic and ergonomic because there is no need to explicitly call unlock on a mutex, close on a file, or socket. This is truly phenomenal. Rust excels in knowing where to perform implicit actions, such as cleaning up resources, and where to carry out explicit actions, such as numerical conversion and cloning on heap data.
Everything is a resource! Everything has an owner. Everything is born, lives and eventually gets destroyed - Source unknown
Even more Weirdo:
Cyclone is a research programming language that introduces the concept of linear types(Ownership, in the case of Rust) in a programming language. For more expressive capability linear types supports sub-region types,i.e borrowing. Rust inherits ideas from the cyclone. Linear and subtypes avoid the Spooky action at a distance. That's why the order of calling function important.
Generics are a way to reduce code duplication and provide better abstraction with static and dynamic dispatch. Normally, generics in C++ are abstract over a type and const. However, in Rust, we also have generics over lifetimes.
Swift also has a proposal for integrating ownership into the language to manage memory efficiently.
Project Verona, a reasearch language that leverage the linear types and other concepts to build reliable concurrent application for the cloud.
References: