Eliminate Null Pointer Exceptions and Unhandled Errors in Your Code Forever

Eliminate Null Pointer Exceptions and Unhandled Errors in Your Code Forever

Discover the tools that Rust provides to make our code more robust against errors.

·

21 min read

A robust program should be reliable when comes to handling errors in the code. Errors in code happen in different boundaries. Some errors can't be recoverable like accessing elements beyond specified length and some can be recoverable like opening non-existing files in the system. A programming language provides basic constructs to handle errors. But most of them use a

  • null pointer or null object to represent nothing,
  • Exception handling to handle failure.

But these are not enforced by the compiler and leave the responsibility to the programmer. If we forgot to check null it leads to undefined behavior which may or may not be in the state we expect. C++ null and implicit conversion leads to undesirable consequences where the receiver expects to receive an integer but the failure is also represented as Integer Code -1 which is a valid integer so it's interpreted as an integer value instead of failure and the compiler allows the integer operation on that value but it's indicating the bug in the code but C++ can't distinguish between failure and Integer. Rust won't do implicit numerical conversion for us which makes the code verbose but reduces the bugs and ambiguities. It's a trade-off.

Having a null pointer or object in the type system increases the possibility of bugs in code because they are trivial to miss the check-in of large code bases where the compiler can't help. Facebook use the java's compiler apis to integrate static analysis tool thus effectively mitigating NPE(Null Pointer Exception) in larger code bases when compiling them.

Functional programming languages have different mechanisms for handling errors instead of pointers and exception handling. NoRedInk uses elm to write web apps instead of using javascript or typescript directly i.e. elm is converted to javascript. A lot of runtime errors like Null pointer exceptions and Type errors are caught when compiling the program not when running the program. Here is the comparison between javascript and elm runtime errors in production.

Runtime Exceptions in Js vs Elm
[Source]

The type system considerably reduces the bugs without making them in production. Developing web applications using Elm is much more robust than developing directly in javascript. Each language has their places. There is no one size fits all without making some trade-off. Apart from making our software reliable, it also benefits our users. Crashes are detrimental to the end user.

Now let's see the error handling in Rust. Rust doesn't just provide memory safety without a garbage collector but also forces us to use handling error through their type system. The rust-type system is designed in a way that caught most types of errors at compile time by not compiling the code with those errors.

Type alias and New Type Pattern:

A type alias is used for naming convenience for the actual type and reduces the repeating code if the types are long to type all the time. Type alias doesn't provide any type safety.type UserId = usize which are equal i.e. they are indistinguishable by the compiler. We can pass UserId wherever usize is expected. Which makes it inefficient for representing some type of invariant. For example, let's say we want a union of two table columns that are both represented using usize but one usize is used to represent seconds and another one used for representing milliseconds. These operations are permitted even if they don't make sense to union them, and they also permit us to perform arithmetic operations on them. This is allowed by the compiler since they are type aliases thus indistinguishable by the compiler and allowed the operation. The new type Pattern is different in that the actual type and the type wrap that type are different. struct UserId(usize) != usize.Thus we can't use UserId where usize is expected and vice versa. This is a huge benefit of using user-defined types rather than plain types to have type-level safety. Look at the below code,

  let option:Option<usize> = Some(78);
  //This won't compile
 // println!("{}",option + 1_usize);
   //Do this instead
  println!("{:?}",option.map(|num|num+1));

Option<usize> is not equal to a plain usize. We have to handle the Some and None cases to extract the value inside it before doing anything with that value.

Algebraic Types:

ADT is a algebraic type. We can use that to contain different collections of types under a single type. In rust sum type represented as a tagged union where only one of a constructor is exist when initializing them i.e the cardenality of sum types is less than that of product types. Sum types reduce the invalid state and express the intent clearly in the code without the possibility of introducing a lot of invalid states.

A function or instance method returns a different type from its body. Returning an enum is different from returning a tuple or struct type. When returning the enum, there is only one of its variants is exist not all of its variants. A tuple or struct can't be partially returned like returning one field and ignoring other fields. Read this blog post to see how it's cumbersome to return different types of errors from a function in Golang.

Rules for Enumerated type:

1) We Can't call methods on the variants like calling methods on the null object in other languages

2) We can't access the enum fields directly. This is unlike struct where we are able to access the fields by their name. The the only way to access the enum fields is through explicit pattern matching on the variants or methods of the enum which abstract the pattern matching inside the method body

Why these are relevant to error handling in Rust? This is how Rust encodes the error-handling logic as a type rather than standalone values like nil, null, none, and alike.

If the type is not Option or Result then it's guaranteed to be Non-Null Values. A fallible operation in Rust returns either Option or Result. Because of the Rust standard preludes, we don't have to bring them to scope to use them. Without it, we could use this on every module like this,

use std::{
   option::Option::{self, Some, None}
   result::Result::{self, Ok, Err}
}

Representing nothing is not necessarily implemented using pointers. The same logic can be encoded using the Option sum type.

let null_string : Option<String> = None;
let null_integer : Option<i32>  = Some(34);

Which conveys the same logic as,

  • The null pointer in C/C++ but without segmentation faults by dereferencing null which is meaningless here and also never forgetting to handle None cases,
  • Null or None object in Java and Python respectively but without any runtime exception when calling methods on them

Billion dollar mistake never happen in (Safe)Rust.

Here None is an empty variant that doesn't have any size at runtime and is optimized exactly like C++ null pointer. There is more than one way to represent nothing or empty type in rust. Each has different use cases.

  1. A empty enum variant used to represent the absence of something like in Option or used as a base case for the tree/recursion

  2. Unit struct type can be used to implement a state-like pattern

  3. Marker traits like Send and Sync doesn't have runtime overhead but are able to prevent data races and other concurrency bugs at compile time

  4. Unit type () - which can be used as a type and as a value. This type serves a similar purpose to void in C++ or None in Python.

  5. Never type ! - panic! macro never returns thus ! type

#![feature(never_type)]
use std::mem::size_of;
fn main() {
    enum Unit {}
    struct Empty;
    println!(" Size of () : {}\n 
    Size of Never Type: {}\n 
    Size of Empty enum: {}\n 
    Size of Empty struct: {} ",
    size_of::<()>(),size_of::<!>(),size_of::<Unit>(),size_of::<Empty>(),);
}

There is no need to use a pointer at all to represent the absence of something. But Rust still has null pointer for different use cases.

Some operations that return Option type:

  • Iterator's next method,
  • The/get_mut methods on slices are safer versions of accessing elements using the [] operator. However, they are equivalent when unwrap is called on None.
  • split_first/first/last and mutable version of their methods on slices.
  • take method on slices & key finding method on HashMap, HashSet.

Exception handling is handled using the Result type instead of the try/except mechanism. This type is like Option except it has another type for conveying error messages. The Result type is used when we want context about what went wrong. Which is useful for debugging which is not ergonomically caught using Option type alone.For example,binary_search on slices return Result which returns the Index of a given element or Return the index value of searched element if the element you search is not existing i.e. the value you searched may present in that index if that exists.

let parse_int:i32;
match "475".parse::<i32>(){
 Ok(int) => parse_int = int,
 Err(_)=> parse_int = 0,
}

Here we parse the string into an integer value. The parse method on a string returns the Result because not all strings are converted to integers. Here we hard coded the value but in real code, we may receive from client or command line arguments. We have to pattern match to access the value. Even if the integer is parsed successfully, it still can be failed when the integer is overflowed in which case Rust panic. The wild card pattern is used here to ignore errors and execute the code otherwise, Rust will comply. This is how we are forced to extract the value inside of them. Look methods title to write the above logic in a more concise way.

Some operations return the Result type:

  • TryFrom - TryInto - Conversion between types may fail,
  • File operations - Creating file may fail, reading may fail, writing may fail,
  • bind on Server - Binding ports may fail,
  • Threads, Mutexes,
  • Some Iterator Items are Result type - incoming method on TcpListener.

Pattern Matching:

Pattern matching in Rust is like a switch case in other languages. A pattern must be exhaustive i.e. cover all the possible values of a type otherwise the code won't even compile. This ensures that we handle both cases of Option and Result type. Pattern matching is used for anything that can be matched not just accessing the enum values. Look at the rust book pattern chapter and enum chapter from the programming rust book to know more about pattern matching. Rust has a different control flow mechanism to extract the value out of enums depending on the use cases and expressiveness.

Rust is expression-oriented programming. An expression evaluates to a value. Pattern matching can be done using,

  1. match - Match anything and must be exhaustive,
  2. if let - Match once and ignoring the other case.
  3. while let -Continuously running as long as matching returns the variant that we specified. This is how into_iter the method used inside for loop when we use iterators in for loops.
use std::error::Error;
use std::net::TcpListener;
fn main() {
    let args = TcpListener::bind("127.0.0.1:8080")
        .expect("The port is already in use or no permission to access");
    while let Ok(result) = args.incoming().next().ok_or("Internal Error") {
        if let Ok(stream) = result {
            println!("{:?}", stream)
        }
    }
    if let Some(item) = args.incoming().next() {
        println!("{:?}", item)
    }
}

While let behave like a while loop except we can use a pattern instead of a boolean value. Same for if let but run once. The next method on the iterator returns the current element, and the associated item type is Result<TcpStream, Error>. The iterator's next method returns an Option<Result<TcpStream, Error>>. The ok_or method on Option returns the Result<Result<>, Err> type, which is pattern-matched on while let. This returns the inner Result type, which is again pattern-matched using if let to obtain the inner value of the stream. There are many places where errors can occur, and we don't accidentally work with data without knowing whether it's a success or failure.

Control flow like Match expression, the if-else expression must return the same type as the previous arm. But in a situation like out-of-bound access where we don't have any meaningful value when None returns i.e., we can't use the default value of type in that case. The never type(!) is used in these cases. The never type can't be used directly but is used indirectly using the different mechanisms provided by rust.

let array = [1,5,7,0];
match array.get(10){
  Some(element) => element,
  None => // What we can return here?
}

In the above situation, we can stop the program immediately using a panic macro or use println! macro in both arms to prevent panicking.

Below constructs are returns !,

  • panic!() , todo!() , unimplemented!() - Macros to terminate the program,
  • continue will continue the loop up to the upper bound of the loop which is either inclusive or exclusive depending on the syntax we used.
  • break will break the loop when first encounter and further codes will execute without stopping other computations. break and continue can be used inside loops and labeled blocks.

In some cases, it's helpful to not have to cover all the variants if the enum contains too many of them like the ErrorKind in the std library. In these cases, rust has #[non_exhaustive] macro to bypass that. Using a wild card may be not helpful when we add a new variant to the existing enum because the compiler won't complain about it.

Using pattern matching we can safely access the value inside Some or Ok and we won't forget to handle other cases. But it's cumbersome to write matches everywhere to access the enum values. Option and Result type has methods on them to access the inner value same as pattern matching but more ergonomics to use.

APIs are defined by Option and Result types. These are safe ways to interpret the data whether they contain something or not, without using a match expression in all places but under the hood each method uses a match expression to extract the value. The closures are defined generically in those types. The Result and Option methods are defined Generically by loosely restricting the Bounds on the method i.e. most methods can be used with any type but some methods only work if the bounds are satisfied however they are not too tightly bounded i.e. most type implements that.

T != &T != &mut T

  • T: An owned type
  • &T: An immutable and shared borrowed type
  • &mut T: A mutable and unique borrowed type

There is two implementations of copied one is for &T and the other is for &mut T.Same with Cloned method.

Methods that take reference to self either immutably or mutably:

  • is_some & is_ok and is_none & is_err - take immutable references and return true when Some or Ok is returned, so that we can call unwrap without stopping the program since we have already checked for Ok or Some values. to get the value without panicking or false when None or Err is present.
  • as_ref and as_mut - takes an immutable and mutable reference to the Some or Ok otherwise Return None or reference to Err value.
  • as_deref - Returns the Result with Ok of the target of inner type. For String and Vec ,deref to &str and &[T] respectively, and also user-defined types dereference to Deref Target. Deref on Option or Result of Box, Arc, RC returns the inner type.
    //Deref on Box returns the Exact Inner Type
    let box_option = Some(Box::new(String::from("Hello")));
    let after_deref: &String = box_option.as_deref().unwrap();

    //Deref on Vec or String or Array returns their slice variant
    let vector: Result<Vec<i32>> = Ok(vec![1, 3, 6]);
    let after_deref: &[i32] = vector.as_deref().unwrap();
    let string: Result<String> = Ok(String::from("Deref to &str"));
    let str_slice: &str = string.as_ref().unwrap();
  • as_deref_mut - Same as above but returning mutable reference to the inner target type.
  • take method on option similar to mem::take and very handy when you want to move the data behind a mutable reference.Because rust won't let us to move the data behind reference since references are not owned.
  • take_if - Same as above, but takes the value conditionally. If the predicate is true, it replaces its value with None. However, this feature is only available in the nightly version
  • replace method on Option similar to mem::replace in that replace the provided value and returns the old value to the caller.
       let mut owned_data = Some(String::from("Owned"));
       let mut strr = owned_data.take().unwrap();
       strr.push_str("Data");
       println!("{} {:?}",strr,owned_data);
       let old_data = owned_data.replace(String::from("Previous"));
       println!("{:?} {:?}",owned_data,old_data);
    
    The take and take_if methods are very handy when you want None instead of default values for the type. But they also save us from ownership and borrowing errors.

Methods that move the self when calling on Option or Result:

  • unwrap- Return the value inside Some/Ok or panic if None/Err is returned. Avoid calling unwrap for all tasks; instead, use expect to debug later.
  • expect - Same as above, except we can provide a specific message. Prefer this over the former. These two methods stop the program immediately if the value is None or Err. These two methods are equivalent to panicking when called on None or Err, but at least we know the context of where the error originates in the case of expect.
  • unwrap_or - Returns the value inside Some/Ok or returns Some/Ok with the value we specified when None/Err is prsent. This method can be used when the default value is not what we want and also does not crash the program, as we have a value for None or Err cases
  • unwrap_or_else - Same as above but the value is computed by closure instead of giving directly
  • unwrap_or_default - Returns the value inside Some/Ok or returns the default value for that type when the None/Err variant is present. This method only works if the Default trait is implemented for the type inside the Some or Ok variant.
  • inspect - Used for debugging purposes and returns Option/Result. We can't modify the value using this method, unlike the as_mut method. This method is only available in the nightly version as of this writing. This method is fulfill the same purpose as for_each consumer method on iterators.
//The above mentioned logic can be written as 
let parse_int:i32 = "78u".parse().unwrap_or_default();
let parse_int : u8 = "256".parse().unwrap_or(25);
let k = 100;
let parse_int : u64 = "89".parse().unwrap_or_else(|| k*k);
let parse_env_to_float = std::env::args()
                    //Skip the program name 
                            .skip(1)
    //Take the 1 argument even if mulitple arguments passed
                            .nth(1)
    //Panic if no argument passed      
                            .unwrap()
    //This method removes the spaces ,line feed and new line
    //Which is often happen in cli arguments
                            .trim()
                            .parse::<f64>()
    //Panic if can't parsed to f64                            
                            .unwrap();

If you write code in Rust you know how many places a method may fail which are not considered or sometimes ignored completely in other languages and cause a runtime error latter.

  • copied - Sometimes, we encounter references to the data inside the Ok or Some variant, which can be problematic in cases where we want to return an owned version or work with an owned version without fighting with the borrow checker. In those cases, we can call the copied method to obtain Some(78) or Ok(('a', 5.6)) from Some(&78) or Ok((&'a', &5.6)) respectively. Note that this method only works if the inner type implements the Copy trait and also does not cause any allocation since it performs a bit-by-bit copy

  • cloned - This is the same as above, except it works on any type that implements the Clone trait and might cause allocation, as we clone the heap data

  • ok_or: A method on Option that converts a Some or None value to the equivalent Ok or Err value. That's why this method takes a value, which is used in the Err variant if None is returned

  • ok - A method on Result that converts the Ok or Err value to the Option equivalent of Some or None

  • unwrap_err - A method on Result returns the error to the caller if it is not the Ok variant. This method will panic when the Ok variant is returned. This method may help to identify what kind of error is returned when the Err variant is returned. The alternative method is into_err, which will not panic and can be used instead of the unwrap_err method

    println!("{}","".parse::<i8>().unwrap_err());
    println!("{}","holo".parse::<i16>().unwrap_err());
    println!("{}","256".parse::<u8>().unwrap_err());
    println!("{}","-129".parse::<i8>().unwrap_err());
    println!("{}","Holo".parse::<char>().unwrap_err());
    println!("{}","Holo".parse::<f64>().unwrap_err());

Use case for map_or method of option type.

fn main() {
    head(None);
    head(Some(10));
    println!();
    head1(None);
    head1(Some(10));
}
//workaround for default values like python
fn head(i: Option<usize>) {
    for j in 0..i.map_or(4, |to| to) {
        println!("Repeat {j}");
    }
}
//Without methods things are
//difficult to understand
fn head1(i: Option<usize>) {
    for j in 0..match i {
        None => 4,
        Some(to) => to,
    } {
        println!("Repeat {j}");
    }
}

The map_oris so useful for these kinds of tasks. When passing None we use the default value or continue with the specified value. This is method more ergonomic than ```unwrap_or````.

  • unwrap_unchecked- We can skip the None or Err case if we are sure that the value exists; otherwise, it leads to undefined behavior because it needs to be wrapped in an unsafe block to use this method. But this method is efficient because, instead of checking with is_some or is_ok and then unwrapping, we can directly unwrap the data without checking.

Result and Option implement the from_iterator trait. If the Collection is successful then they are wrapped inside Ok or Some, Otherwise, Err or None i.e. Contain the collection only when all the elements are Some/Ok variant.

    let null_char = Some('a');
    //Only called inside unsafe block,
    println!("{}",unsafe{ null_char.unwrap_unchecked() });

    let vector1 = "34 56 45 78 2 45 6"
        .split_whitespace()
        .map(|val| val.to_option_i32())
        .collect::<Vec<_>>();
    let vector2 = "34 56 45 78 2 45 6"
        .split_whitespace()
        .map(|val| val.to_result_i32())
        .collect::<Vec<_>>();
    //Collect into Option<Vector<i32>>
    println!("{:?}", vector1.into_iter().try_collect::<Vec<i32>>());
    //Collect into Result<Vec<i32>,ParseIntError>
    println!("{:?}", vector2.into_iter().try_collect::<Vec<i32>>());

String slice doesn't have to_option_i32 or to_option_result_i32 methods. The code for this implementation is available as a git gist here.

Wrapping Types:

We can create a type by wrapping the type inside of another type and so on. It's better not to go deeper than 4 levels or less.

    let wrap_options_of_option: Option<Option<Result<Option<i32>, Option<String>>>> =
        Some(Some(Ok(Some(5))));

    //Don't wrap it like above
    wrap_options_of_option
        .map(|option| option.map(|result| result.map(|option| option.map(|i32_| i32_))));

    let option = Some(Some(Some(5)));
    let removes_two_level_nesting = option.flatten().flatten();
    println!("{:?}", removes_two_level_nesting);

The flatten method on both Option and Result removes at most one level and is only callable if at least one nested level exists, i.e., Option<Option<i32>> or Result<Result<String, Error>, Error>.

Convenience in Error Handling:

If the type implements the Error trait then we can display them using {} or debug them using {:?} since the Error trait only implemented when the type also implements both Display and Debug. This implies that we can convert them to String because, for any type that implements Display, we can call to_string() method on them.

  match "".parse::<i32>(){
    Ok(I32) => println!("Parsed value is {I32}"),
    Err(error) => println!("The error is {}",error.to_string()),
}

Return Early from Function

Sometimes we don't want to continue the program when an error occurs. We can use the return keyword to return from the function early without continuing further code in the function body.

fn early_return()-> Result<(),std::io::Error>{
   let open = File::open("Hello.txt");
   let mut file = match open{
        Ok(file) =>file,
        Err(e) => return Err(e),
    } ;

      let mut string = String::new();
      match file.read_to_string(&mut string){
          Ok(_) => Ok(()),
          Err(e) => Err(e),
      }
}

This is same code from the rust book. When the Hello.txt file does not exist then the function returns early with the error to the caller thus the string is not created and the file is not matched. But this is cumbersome if a lot of io operations are performed because most methods are returning Result or anything that returns Result type. It's tedious to write matches everywhere. To solve this rust introduce the try! macro which is the syntactic sugar of the above logic. But try! is deprecated in the flavor of ? operator analogous to the swift nil coalescing operator.

We can chain the methods using ? in between only if the return type is Result. This is a short circuit operation i.e. If any one of the operations fails then the subsequent operation doesn't execute or checked. It's analogous to logical operators except instead used with Option and Result type rather than bool type as in if expression.

  fn chaining() -> Result<(), Box<dyn Error>> {
        let chaining1 = TcpListener::bind("127.0.0.1:6787")?
            .incoming()
            .nth(1)
            .ok_or("Sever Error")??
            .local_addr()?;
        Ok(())

The ok_or method uses two question marks. First question mark on the return type of ok_or on the option and the second operator for the inner Result type which returns the TcpStream instance on success or returns early.

If you need more control over error then use methods on the result to take an appropriate decision than using ? everywhere. The return type must be the result and the error type must implement the Error, From traits to use the question mark operator on your own types.

The question mark operator is not just for handling errors, they can be used anywhere when the computation stops mid-way. Iterators try_for_each,try_fold methods are examples of short-circuiting. Once the Try types are stabilized we can use ? in more places. All these kinds are related to Option and Result types.

Multiple Error Types:

Specifying concrete error type for the error is only applicable to some common basic abstract types for which the error is the same under the struct or enum. For example, parsing an integer from a string has four different errors represented using an enum. We only return these four possible errors in the function body. Look at the below code,

fn diff_error()->Result<(),num::ParseIntError>{
     let empty ="".parse::<i32>()?;
     let too_large = "257".parse::<u8>()?;
     let too_low = "-129".parse::<i8>()?;
     let invalid = "78u".parse::<usize>()?;
      Ok(())
    }

We can't return other errors early from the function. Fortunately, rust has a solution for this. We can use the trait object to return multiple error types from the function. The assumption is all the error type we care about implements the std::error::Error trait.Look at here to know most types are implements this trait.Even external crates like Polars, Actix, Rocket and others implements this a trait so that we can those errors in the same unified way as before without changing the return type.

fn diff_error()->Result<(),Box<dyn std::error::Error>>{
      let mut file = File::create("Hello.txt")?;
      file.write(b"hello")?;
      let parse = "h".parse::<char>()?;
      let var = env::var("ENV_VAR")?;
      Ok(())
    }

Without boxing the error type we can't have this flexibility of returning different error types in the same function.

Computations that are executed for their side-effects which can fail, such as a write,wite_str tend to return Result<(), E> .On success, we don't want to return but return the error when happens. If an error can't happen when returning Result type then we can use Infalliable enum which implements the Error trait.

References

Option

Result

Rust Blog

Error Handling in Rust by Shuttle