When does Rust help you implicitly and when does it require your explicit intent?
Rust is a systems and statically typed programming language that incorporates novel concepts like ownership, borrowing, and lifetimes. Rust employs type inference to infer most of these concepts to avoid excessive explicit annotations. However, Rust still requires your assistance in certain situations by being explicit. This post will cover some of those situations.
Numerical Conversion
Casting must be explicit in Rust, as there is no implicit numerical conversion. First, let's understand what implicit casting is. In Python, most castings are implicit. The code is available here.
a = "" #Empty string is False and Non-empty string is True
b = [] #Empty list is False and Nonempty list is True
c = () #Same as a list but for tuples
d = {} #Same as above but for Dictionary
e = 0.0
f = 0
g = 1.2343
h = 1
i = "Hello"
j = [2]
k = (4,5,6)
l = { "Language" : ["Python", "rust"] }
#implicit conversion of the list, tuple,str, int, float
List=[a,b,c,d,e,f,g,h,i,j,k,l]
for i in range(len(List)):
#This code is executed for all empty types which implicitly cast as False
if List[i]:
print("This block is executed only if it's False")
else:
#This code is executed for all Non-empty types which implicitly cast as True.
print("This block is executed only if it's True")
Rust doesn't cast empty or non-empty collection types to a boolean value.
let x:i32=i32::MAX;
println!("\n\n{} {:0b}",x.count_ones(),x);
//Maximum value of u64 unsigned
let y:u64 =u64::MAX;
println!("\n\n{} {:0b}",y.count_ones(),y);
//compile error rust won't convert y implicitly to i32
//let z= x + y;
// let z= x + y as i32;
let m:[i32;0]=[];
//compile error
println!("{:?}",m as bool);
//Returns Err variant because of high precision to low precision
println!("{:?}",i32::try_from(y));
//Returns Ok variant because all i32 values from i64 values are possible in low precision.
//println!("{:?}",i32::try_from(34i64));
When you analyze the binary representation of signed integers, such as the maximum value of i32, you'll find 31 ones. This is because one bit is used to represent the sign. Consequently, signed integers have a smaller representable range compared to their unsigned counterparts, where all the bits are used to represent numbers. Attempting to interpret one type as another type results in a loss of precision and correctness. This is why, by default, Ops traits are implemented for the same self type, such as i8 to i8, and not the other way around. However, we can construct i32 from i8, u32, and other types using the From and Into traits. It's important to note that not all conversions are bidirectional and infallible. While converting from i32 to i64 is possible because all i32 values are representable in i64, the reverse isn't true.
As long as we use the From
and Into
traits, conversions don't lose data precision. On the other hand, TryFrom and TryInto may result in data loss. These traits are designed for fallible operations, and they return a Result type. When performing conversions, it's advisable to use the from and try_from methods and their variants. These trait methods helps us identify early in development which operations might be fallible. Additionally, it would be more helpful if Rust provides warnings about conversions from high to low integer values when using the as keyword, particularly when run with Clippy.
Implicit dereference
Even though Rust is strict about numerical conversion and provides some implicit conversions via traits, Rust implicitly coerces when there is no ambiguity and when it's appropriate in other contexts. Let's take a look at the code below:
let mut vec = vec![4, 2, 3, 1];
//Sort method defined on slices, not the vec itself, here vec is implicitly
//converted to mutable slices and called the sort method on them.
vec.sort();
//when the mutable slice borrows ends we can call vector methods again.
//Implicitly converted to slice which views into the underlying data it's borrowed
slice(&vec[..]);
//This does not result in a compile error , this implicitly coerced to
//immutable slice but vice versa is not. This is where implicit is appropriate since
//mutable to immutable is okay but not vice versa
slice(&mut vec[..]);
let string = String::from("Rust mutable string type");
//Implicitly converted to str slices because of deref trait on a string and the target is &str
str_slice(&string);
fn slice(x: &[i32]) {}
fn str_slice(x: &str) {}
When slice methods are called on a Vector or String using the dot operator, they are implicitly dereferenced to the Target type of Deref trait by the compiler through the implementation of the Deref and DerefMut traits on String. This reduces the boilerplate code needed, simplifies referencing and dereferencing. Moreover, it facilitates the reuse of a significant amount of code on slices with minimal explicitness.
By default, references are dereferenced in the println!()
macro without printing the address of the pointer. The println!
macro doesn't move the data, as it only needs to have read-only access.
In safe Rust, raw pointers *const T
and *mut T
can't be automatically dereferenced. They do, however, print the address of the pointer. To dereference raw pointers, we must explicitly use an unsafe block. Rust doesn't guarantee that raw pointers point to valid memory or are properly aligned. Thus, the responsibility moved from Rust compiler to Rust developers.
By default, the println!
macro prints values in decimal format. If you wish to print in binary, octal, or hexadecimal, you need to specify this explicitly.
let x = 2023;
let y = &x;
//reference y dereference automatically.
println!("{} {} ",x,y);
//Explicitly ask the macro to print in binary, octal, hexadecimal.
println!("{x : 0b} {x : 0o} {x : 0x}");
//explicitly printing the address of reference only works for the
//reference otherwise compile error
println!("{:p}",y);
let z = &34 as *const i32;;
//printing the address, it's a compile error dereference the raw pointer without unsafe block.
println!("{:?}",z);
unsafe{
println!("{:?}",*z);
}
Smart pointers like Box
, Arc
, and Rc
are implicitly dereferenced when a method is called on them, depending on the inner types. This is the reason why we are able to access the value inside the Box
, Arc
, and Rc
without needing any special operator like ->
in C++. This also enhances ergonomics; otherwise, we would have to use a more convoluted operator to access them.
Explicit type annotation
We need to explicitly specify types when defining a function. However, when passing and using them inside functions, the types are inferred. A function without an explicit return type implicitly returns the unit type ()
, similar to None in Python or void in C/C++. The order of arguments matters. There are no default values for function parameters as in Python, but Rust's structs, enums, and traits may have default values. Closures are an alternative to functions where type annotations are often not needed, and they can access variables in the outer scope which functions can't implicitly access.
use std::rc::Rc;
use std::cell::RefCell;
fn explicit(x:i32,y:f64,
z:String,
x1:Vec<i32>,
x2:Rc<RefCell<Vec<Result<String,std::io::Error>>>>) {}
The parameter type x2
is rather long and confusing. However, that's how they are used in Rust.
Rust iterators abstract the data structures and algorithms to provide unified methods for efficiently manipulating collections of different types. The Rust collect()
method for iterators always requires the explicit type of the collection because the FromIterator
trait is implemented for Vector, HashMap, String, HashSet, and more. This can cause ambiguity, so Rust expect you to specify the kind of collection you want to build.
//Vec of characters.
let vec = vec!['a', 'b', 'c', 'd', '#' , '8','*','{'];
//Here the type is annotated using generic type notation
let collection = vec.iter()
.map(|ch|ch.to_string())
.collect::<String>();
//Here the type is specified when declaring
let collection2:Vec<_> =vec.iter()
.filter(|&ch|ch.is_alphabetic())
.map(|ch|ch.to_string())
.collect();
//The _ symbol is annotated so that Rust is able to infer the type or we explicitly specify
//String if we want.
println!("{} \n {:?}",collection,collection2);
Both static
and const
must be annotated with a type and initialized upon definition.
static MONTH:usize = 12;
const π:f64 = std::f64::consts::PI;
println!("{}",π);
Named structs, enum variants with data, and generic type bounds must be specified when defining them. This allows the implementation to automatically infer the type and operation inside the body. It's a trade-off,
//Named struct i.e fields are Named
struct Named<T>{
x:i32,
y:String,
z:Vec<String>,
z1:T
}
//Enum with data
enum Payload{
Integer(i32),
Float(f64),
r#String(String),
}
use std::fmt::Debug;
impl<T:Ord + Debug> Named<T>
{}
The r#
is called a raw identifier, allowing us to use built-in types or keywords as identifiers without confusing the compiler.
The trait bounds on an impl
block indicate that a type must implement both the Ord
trait for comparison and the Debug
trait for printing. Some traits are automatically implemented by the compiler, such as Sized
, Send
, Sync
, and Copy
..
Type Case Sensitive
Rust types are case-sensitive. This means that if a function parameter is defined as a mutable borrow, but the argument passed is an immutable borrow, the code won't compile. The types must match exactly because there is no implicit coercion when types are ambiguous, specifically, we can't convert immutable borrows to mutable borrows.
For example, [i32;4] != [i32;5]
. Array types with different sizes are considered distinct types, even if they contain the same data type. The difference lies in their sizes. Take a look at the code snippet below ,
fn main() {
let arr1: [i32; 4] = [1, 2, 3, 4];
let arr2: [i32; 5] = [5, 6, 7, 8, 9];
println!("Array 1: {:?}", arr1);
println!("Array 2: {:?}", arr2);
}
In this code, we've defined two arrays, arr1
with a size of 4 and arr2
with a size of 5. The sizes matter, and even though both arrays contain i32
elements, their types are distinct due to the difference in size. &i32
is not i32
.
let x= 10;
let y = &x;
//Compile error
println!("{}",x==y);
//Dereference the y and compares the actual values.
println!("{}",x==*y);
A reference to a reference compares the actual value, not the address of the pointer. If you need to compare the addresses of pointers, you must do so explicitly using std::ptr::eq
.
String != &string != &mut String
. Depending on the signature, they behave differently, owing to Rust's many distinctions and restrictions. The same applies to instance methods, where the first parameter is self
and the signature of self
matters.
//Immuatbly borrow the string
fn borrow(x:&String){}
//Mutably borrow the string
fn mut_borrow(x:&mut String){}
//Own the string once passed
fn own_string(x:String){}
Explicit about your Intent
Types like Arc
, Rc
, Box
, Vec
, String
, HashMap
, and other collections (except for vectors and strings, which are implicitly imported into every Rust module) are allocated in heap memory.
If we need to create a deep copy of data stored in the heap, we explicitly use the clone
method on these types, assuming the type implements the Clone
trait, to obtain independent copies. This is due to the overhead associated with heap allocation, which has to make system calls to allocate additional space. As a result, Rust doesn't implicitly clone these types; instead, it moves them. It's important to note that not all clones result in an allocation. For some types, the cloning behavior can override the standard definition of clone
, providing different behaviors depending on the types.
use std::collections::HashMap;
let x = 10;
//x is implicitly cloned here i.e x and y are independent.
let y = x;
let mut hashmap = HashMap::new();
//The type of key and value of HashMap is inferred here.HashMap<&str,i32>
hashmap.insert("First", 1);
//compile error, expected integer, found `bool` because of the first insert
hashmap.insert("Second", 0 as bool);
//hashmap moved here.
let z_new = hashmap;
//The type annotation needed here because we didn't insert any data to the hashmap yet
let empty_hashmap: HashMap<&str, usize> = HashMap::new();
//Explicit clone which allocate new clone of empty_hashmap.
let empty_hashmap_clone = empty_hashmap.clone();
Custom types or user-defined types are always moved by default, even if the fields are of copy types. Rust takes the cautious approach in this regard. This requires us to explicitly implement the Copy
trait, either by implementing it ourselves or by using the automatic derivation with the annotation #[derive(Copy, Clone)]
on a struct or enum.
#[derive(Copy,Clone,Debug)]
struct unit_struct;
If you want to change or mutate a type, we need to explicitly annotate it with the mut
keyword, except for types that have inherited mutability capabilities. You also use the explicit move
keyword to move the data inside closures. This is important in threads, especially when local references are used inside a thread.
Scoped Resource Managing
Rust doesn't have a garbage collector to clean up memory when it's no longer needed. Instead, Rust employs ownership and scoping rules to automate memory management at compile time, without the need for a garbage collector. What sets this approach apart from a garbage collector is that the same mechanism used to clean up memory when variables go out of scope is also utilized to manage other resources beyond memory.
The lifetime of borrows automatically ends when the scope of the borrow ends.
Heap-allocated memory is deallocated implicitly by invoking the Drop
trait at the end of scope, and Rust prevents us from using that memory afterward. This means that Use After Free scenarios are not possible. If needed, we can also drop explicitly by using the Drop
function, which moves ownership into it's scope.
Resources including memory, files, sockets, databases, and locks are cleaned up automatically when the owner goes out of scope. This is less of burden and of great significance as the explicit handling of such resources could lead to undefined behavior or, at worst, security vulnerabilities. For smart pointers, the drop mechanism is overridden to decrease the reference count when the scope ends. However, memory is only cleaned up when the reference count reaches zero, indicating the last scope.
In Rust code, you don't often see explicit calls like socket.close()
, file.close()
, mutex.unlock()
, or drop()
.
use std::sync::Mutex;
let x =Mutex::new(10);
{
*x.lock().unwrap()+=1;
} //mutex is unlocked here.
println!("{:?}",x); //prints 11
The mutex here is used in a single-threaded context. In multi-threaded code, we require atomic reference counting (Arc) to use a mutex across different scopes and thread boundaries. This is done through the clone
method on the Arc
, which customizes the behavior of cloning.
Rust Defaults
Rust source files are Unicode strings, allowing us to use Unicode symbols alongside ASCII characters in the source file. Rust's string types, such as String
and str
slices, handle Unicode strings. By default, strings are checked for UTF-8 validation. An unchecked version requires an unsafe block and can be performant if done correctly. There are alternative types available for various use cases that provide more restrictions than strings. Examples include PathBuf
and Path
for platform-agnostic file manipulation, OsString
and OsStr
for OS-specific string types, and CString
and Cstr
for interoperability with C. We need to explicitly import them to use them.
In Rust, the default types for integers are i32
, and for floats, it's f64
. The length and index of any type is represented by usize
, which means indexing supports only positive integers.
Rust is safe by default, unlike C++, where unsafe is the default. All collection types and slices are bound checked by default. This eliminates the need to specify initial, stepping sizes, and stopping values in loops to iterate over collections. Most languages, including C++, provide this feature.
let x = ["array","with","same","type"];
//Nice abstraction
for i in x{
println!("{i}");
}
//If you need to specify the details manually for some reason like not starting from zero.
//But incorrect use leads to panicking
for i in 1..3{ //It's better to use a length of x instead of specifying manually unless you are not to
println!("{}",x[i]);
}
//zero indexing, printing the last element
println!("{}",x[3]);
//borrowing array with 3 elements because of range inclusive syntax.
let y=&x[..=2];
//Panicking
println!("{}",y[3]);
Even though 3 is a valid index for the array, it's not valid for slices. Slices carry the starting address and the length of the borrowed contents, which in this case is equal to 3. Thus, accessing an element beyond or equal to the length is considered out-of-bounds access, which Rust forbids in safe Rust, resulting in a panic. There is an unchecked version of this for performance reasons; that's why I mentioned safe Rust.
By default, arithmetic overflow will result in a compile error in debug mode. In release mode, overflows are wrapped to achieve defined behavior at runtime, often lack correctness.
//Maximum value of i8 is 255.
let x = i8::MIN;
//So adding or subtracting one to them cause overflow.
println!("{}",x-1);
let x = i8::MAX; //Variables with the same names are shadowed by let.
println!("{}",x+1);
Without any explicit use of the unsafe block, Rust guarantees Memory Safety. If you want to use unsafe, you have to be explicit about it by wrapping the unsafe code inside an unsafe block. However, it's advisable to avoid using unsafe code when possible, which is how standard libraries are implemented. Using unsafe in Rust is not akin to writing in C/C++, but it is still subjected to restrictions and safety checks.