Exploring Slice Types in Rust: Understanding Dynamic Types, References, and Their Size and the Traits Involved

Exploring Slice Types in Rust: Understanding Dynamic Types, References, and Their Size and the Traits Involved

·

13 min read

Slice is a dynamically sized view into the contiguous sequence i.e. no copy but a direct view into memory whether stored in the heap or stack. We can make reference to a slice as long as the underlying data structure is contiguous which Rust guarantees.

  • Fized [T;N] - Array.
  • Owned Vec - Vector.
  • Owned String - which is a wrapper around Vec.

The above contiguous data structures are homogeneous, i.e., they contain the same type of element in them. Calling the slice method on a non-contiguous structure is a compilation error, and, in fact, we can't use slice methods if the collection or container is not contiguous . For example, in the VecDequedata structure, If we want to perform a slice method on VecDeque, we must explicitly convert them as contiguous by calling make_contiguous() on VecDeque type before calling any slice methods on them.

What is Dynamically Sized Types(DST)?:

The marker-trait involved in DST is Sized. Rust implicitly binds this type for us. If we want to opt out of this, we explicitly bound the type using the ?Sized syntax rather than !Sized, called a negative bound, but this is not allowed for the Sized trait. The syntax ?Sized says the type may or may not be sized, which has a different meaning than a negative bound. A type is unsized if the size is not known when compiling, which results in a compile error because Rust must know the size of the type at compile time, which may or may not be exact. The code below won't even compile.


//unsized slices
 let vector = vec![1,2,3];
 let unsized_str_slice:str;
 let unsized_slice:[i32] = vector[..];

struct BTree<T>{
    leaf : T ,
    left : BTree<T>,
    right : BTree<T>,
}

//The type ```x``` is unsized 
fn unsized<T>(x: T)
where
    T: ?Sized,
{ }

The struct BTree is not a slice, but it showcases the problem of infinite recursion, which doesn't have a size at compile time. If you are defining this in garbage-collected languages, there won't be a problem because the memory is managed by the language runtime by putting them in a box (heap memory). But in Rust, we are more explicit about our intent. The error message says,

error[E0072]: recursive type `BTree` has infinite size
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle

Here, indirection means placing those behind some sort of pointer to have a size at compile time (not the actual size, but just enough information to allocate more space as needed and shrink as needed). Below are some possible solutions, each with its advantages and disadvantages,

    use std::{rc::Rc, sync::Arc};

    struct BTree<'a, T> {
        leaf: T,
        left: &'a mut BTree<'a, T>,
        right: &'a mut BTree<'a, T>,
    }
    struct BTreeBox<T> {
        leaf: T,
        left: Box<BTreeBox<T>>,
        right: Box<BTreeBox<T>>,
    }
    struct BTreeRc<T> {
        leaf: T,
        left: Rc<BTreeRc<T>>,
        right: Rc<BTreeRc<T>>,
    }
    struct BTreeArc<T> {
        leaf: T,
        left: Arc<BTreeArc<T>>,
        right: Arc<BTreeArc<T>>,
    }

A type for which the exact size is not known at compile time can be put behind a pointer to have a size at compile time. But the actual size is taken at runtime.

use std::mem::{size_of,size_of_val};
fn main(){
    let mut m=vec![];
    for i in 1..=100{
        m.push(i);
    }
    let m1=&m[..];
    println!("The total bytes of size is {} bytes ",size_of_val(m1));
    println!("Vector size is at compile time:  {} bytes",size_of::<Vec<i32>>());
    println!("Integer Slice size is {} bytes",size_of::<&[u64]>());
    println!("String Slice size is {} bytes",size_of::<&[&str]>());

    //Plain Primitive or Scalar Types have fixed sizes. 
    println!("Char size is :{} bytes",size_of::<char>());
    println!("Usize size is :{} bytes",size_of::<usize>());
    println!("F64 size is :{} bytes",size_of::<f64>());

    //Reference to single value always take single usize bytes
    println!("Size of Reference to char : {} bytes",size_of::<&char>());
    println!("Size of Reference to Vec of i32 : {} bytes",size_of::<&Vec<i32>>());
}

Slices in Rust are references to the underlying data which are either stored on stack or heap and they don't own the data they point to. Generally, references are not owned i.e. they can be accessed and the lifetime depends on the owner. So there is no Dangling, UAF, in safe rust.

From Programming Rust book

Ordinary references are non-owning pointers to a single value whereas references to slices are non-owning pointers to multiple values.

An ordinary reference can be created in various ways, and it takes up one word size, i.e., 8 bytes in 64-bit architectures, which are more prevalent than 32-bit architectures. References to primitive types such as integers , float and char types are examples of ordinary references.

    use std::mem::size_of;

    let vector = vec![1, 5, 6];
    let ord_reference = &'a';
    //Single usize indexing, equivalent to &i32.
    let ord_referene = &vector[0];
    println!("{}", size_of::<&char>());
    println!("{}", size_of::<&mut i32>());

The immutable and mutable references don't differ in sizes; they only describe the relationship, such as whether the content can be mutable or not, nothing more.

Possible ways to construct the slice.

  • Slices are constructed implicitly when making reference to a Vector or Array. The lifetime is the same as the referent i.e. array or vector. For the String type, the str slices are created.
use std::fmt::Debug;
use std::collections::VecDeque;
fn main() {
    let mut array = [1, 2, 3, 4];
    let mut vector = vec![45, 34, 2, 87];
    let string = String::from("Owned Data");

    //Implcitly converted to slices and borrowing all elements.
    //In-place modification.
    reverse(&mut array);
    reverse(&mut vector);
    drop(vector);
    //compile error
    //reverse(&mut vector);

    //Doesn't modify the string_
    reverse_str(&string);
}
fn reverse_str(t: &str) {
    let temp:Vec<char> = t.chars().rev().collect();
    println!("{:?}", temp);
}

fn reverse<T: Debug>(i: &mut [T]) {
    i.reverse();
    println!("{:?}", i);
}

Slices are analogous to Array except the size is known at runtime by default i.e you can't grow or shrink them but modify them in a way that doesn't cause any allocation or in other words the length of the slice doesn't change because they don't own the data they point to. The method defined on Vec, String, HashMap and others cause allocation and deallocation. But methods on slices don't have that capability. Just looking at the signature of the function we can assure that no method will ever cause allocation even when they accept a mutable reference to a slice.

  • Explicitly putting behind a pointer or converting to slices.
//Not arrays
    let integer_slice = &[1, 2, 3, 4];
    let char_slice = &mut ['a', 'b', 'c', 'd'];
    //string literal.
    let str_slice:&str = "String Slice";

    let mut vecdeque = VecDeque::new();
    for i in 15..=20 {
        vecdeque.push_back(i);
    }
    //Explicitly convert to slices
    reverse(&mut vecdeque.make_contiguous());
    //string slices are slices thus
    //we don't have to put them behind a pointer again,
    reverse_str(str_slice);

Slices are actual views into the underlying data, not independent copies. Just assigning them like below doesn't create a copy; instead, both variables point to the same data.

    let mut vector = vec![1, 5, 6];
    let read_slice = &vector[..];
    println!("{:p}", read_slice);
    let write_slice = &mut vector[0..=2];
    println!("{:p}", write_slice);

If you run the above code, both read_slice and write_slice point to the same underlying data. For each run, the pointer address might change, but pointing to the same address won't change.

In Rust, slices have ergonomic ways to access them also provide safe and unsafe versions for performance when needed,

  1. Single usize indexing, but not available for string slices.
  2. Different range syntax.
  3. Deref and DerefMut traits for automatic coercions.

All the above ways to access slices involve traits, structs, and enums. Therefore, we can use operators or trait methods, structs, and enums to achieve the same thing, depending on the level of expressivity we want.

Indexing starts from zero, as in most other programming languages, but Rust only permits positive integers (usize) for indexing unlike python where negative integers can also be used for indexing. Due to Rust's distinction between mutable and immutable, indexing methods also have mutable and immutable versions, thus participating in borrowing rules. All operators will panic if the specified index is greater than or equal to the length of the slice, as indexing starts from zero. Also, methods will panic if unwrap or expect is called on None.

[usize] or get/get_mut : - Indexing using a single positive value.

Range - [start..end] Start from zero if the starting value is not specified, and the end is equal to the length of contents if not explicitly specified.

RangeFrom - [start..]up to slice length if not specified.

RangeFull - [..] pointing to whole elements.Which is implicit in the above code. We can specify the value if we want.

RangeInclusive - [start..=end] -start from zero if not specified and include the last element which is excluded above.

RangeTo - [..end]-start from zero if not specified.

RangeToInclusive - [..=end].Start from zero if not specified.

(Bound,Bound) - Using bound variants inside the tuple. Same as range but more verbose than the range operator syntax.

There are two ways to perform the same task. One is by using operators, and the other is by using the struct and enum directly.

These are implemented for your type to use the operators on your custom types and get the same convenience as the built-in types. Operator overloading in Rust is done by simply importing the traits from the std::ops module and implementing them on your type.

Indexing is always constant time no matter which syntax we choose to use because contiguous memory regions are supports efficient random access in constant time.

These indexing methods are not defined on Vec or String or Array but on slices.

All the above conveniences are implemented for slices of type [T]. But string slices don't implement usize and Bound for a reason. This is the reason why rust strings can't be indexed using a single usize value or a tuple of values.

Rust String slices and Strings have different methods for accessing them which need a bit more effort than other languages because of the Unicode and how it's represented in memory.

Using indexing operations it's easy to cause panic if we are not careful since they are bound-checked by default, unlike C++. A number of ways to panic through indexing:

    let mut vector = vec![9,8,62,3,65,79];
    //Panics because of out of bound of access
    println!("{}",vector[6]);

    let str_slice = "ரஸ்ட்";// Rust 
    //panics because of invalid char boundary
    println!("{}",&str_slice[0..2]);

These panics happen only using the safe version of rust. Even though index values are valid but they are not valid char boundaries which is okay for ASCII-only characters but triggers panic for other foreign languages where a single character may not represent the full word. More information about strings and string slices can be found here. Another rare way to get panic in debug mode is to use the maximum value of usize when indexing.

No matter how many elements the slice contains, it's always a multiple of two unsigned usize types at compile time. If you look at the size of Vec or String, they are always 24 bytes, comprised of a one-word pointer ptr, one usize for tracking length, and another usize for managing the capacity. The extra usize is for abstracting the allocation and deallocation of data without having to do it manually, whereas slices or references don't free the resources they point to, as they don't own it.

use std::mem::size_of;
fn main() {

    //Reference to Slices takes 16 bytes if the usize is 64-bit.
    println!("Size of i8 Slices {}", size_of::<&[i8]>());
    println!("Size of i16 Slices {}", size_of::<&[i16]>());
    println!("Size of usize Slices {}", size_of::<&[usize]>());
    println!("Size of f64 Slices {}", size_of::<&[f64]>());
    println!("String slices {}", size_of::<&str>());
    println!("\n");

    //Reference to Scalar values takes 8 bytes.
    println!(
        "Reference to Vec of i32 is : {} bytes",
        size_of::<&Vec<i32>>()
    );
    println!("Reference to i32 is : {} bytes", size_of::<&i32>());
    println!("Reference to char is : {} bytes ", size_of::<&char>());
    println!("\n");

    //Anything that put inside a box takes 8 bytes.
    println!(
        "Box of Vec of String :{} bytes ",
        size_of::<Box<Vec<String>>>()
    );
    println!(
        "Box of i32 is : {} bytes , Box of string Slices : {} bytes",
        size_of::<Box<&[i32]>>(),
        size_of::<Box<&[&str]>>()
    );
}

Slices as Type and value:

A type and value differ where they appear and we can't alternate between them.

    let mut slice = &mut [34, 56, 78, 23];
    let slice_ref: &[i32] = &slice[..];
    let slice_mut_ref: &mut [i32] = &mut slice[0..3];

    //when using the function, the arguments are values.
    type_and_value(&["One", "Two"], &[1.2, 4.5], "Value", &mut ['a', 'g', 't']);

    //when defining a function, the parameters are types.
    fn type_and_value<T>(w: &[&str], x: &[f64], y: &str, z: &mut [T]) {}

Slices are fat pointers like trait objects. But the two-pointer address is used for different purposes here.

  • ptr pointer pointing to the starting address of borrowed content which may or may not same as the starting address of Referent.
  • len pointer has information about the length of borrowed content which must be less than or equal to the length of the Referent.

Because of this, slices can be directly used in loops without keeping track of the start and length of the slice ourselves.

fn main(){
    let mut vector = vec![2,3,5,6,8,9];
    //borrow everything
     for i in &vector[..]{
         println!("{i}");
     }
     println!("\n");
     //only borrowing a subset of whole elements.
     //here starting and ending addresses are 
     //different from the original vector.
     for i in &mut vector[2..5]{
         *i+=1;
         println!("{i}");
     }
}

The second loop in the above code only has mutable access to the indices from 2 to 5, and other elements at different indices don't affect it.

Other traits involved in Slices:

SliceIndex.This is not used directly. It's called indirectly by the methods on Slices.

The sliceIndex trait has six methods.

  • get() and the return type is Option<&T>.
  • get_mut() and the return type is Option<&mut T>.
  • get_unchecked() and return type is *const T.
  • get_unchecked_mut() and the return type is *mut T.
  • index() and return type is &T.
  • index_mut() amd the return type is &mut T.

The first two methods are "Don't panic". Instead, return None when the bound is not valid, as long as you're not calling unwrap or expect. The last two methods panic if the bound is equal to or exceeds the length. The other two methods must be enclosed by the unsafe block to be used because they cause undefined behavior if the bound is not valid. These methods are called indirectly by the slice methods and can't be used directly.

Deref and DerefMut.

  • Methods defined on slices can be called directly on Vec<T> or [T; N] because the above traits on Vec<T> types return the slice type.
  • The same two traits on the String returns the str slice as Target type without any explicit conversion.
  • Not all types are coerced into slices. The Deref trait implementation on the Box , Mutex, Rc and Arc type returns the same inner type instead of a different type as above.

Index and IndexMut.

These traits are customizable to user-defined types. There are two ways to index the slice. Either through the method or Operator. Same as range syntax but using single value.

let mut vec = vec![1,3,4,5];
   //both have the same semantics.
    println!("{}",vec.index(1));
    println!("{}",vec[1]);
    let mut_ref = vec.index_mut(0);
    *mut_ref+=1;
    let mut_ref = &mut vec[0];
    *mut_ref+=1;
    println!("{}",vec[0]);

 let mut box_type =Box::new(vec![34,123,2,6,3,89]);
    //First Box is dereferenced to the Vec<i32> then the sort
    //method on the Vec cause the Deref on vec to slice and then call the   //sort method on them.
    box_type.sort();
    println!("{:?}",box_type);

If you know the bound is not exceeding the length of the slice then you can opt out to avoid the Bound check by explicitly using an unsafe block. Now the guarantee is up to you not to cause the undefined behavior.

fn main(){
 let  mut slice = [1,2,4,5];
 //This is not bound-checked but safe.
 unsafe {
     println!("{}",slice.get_unchecked(3));
 }
  //This is undefined behavior.
 unsafe{
    println!("{:?}",slice.get_unchecked(4));
 }
 //modifying elements that do not belong to the slice 
 //data structure.
 unsafe{
     *slice.get_unchecked_mut(5) +=1;
 }
 println!("{:?}",slice);
}

The secondget_unchecked version returns a random value that is not our data and causes undefined behavior.

At compile time we know the slice starting address and len of the slice. We can use this information to prevent out-of-bound access at compile time rather than runtime.

Static Slicing library uses the trait **associated const and const generics to prevent invalid index at compile time. Using this library we can prevent compiling if the index is out of bounds. This is not absolutely necessary for all tasks anyway.

use static_slicing::{StaticIndex, StaticRangeIndex};
fn main() {
    let array = [1,2,3,4,5];
    //compile time panic.
    println!("{}",array[StaticIndex::<6>]);
    //compile time panic.
    println!("{:?}",&array[StaticRangeIndex::<0,6>]);
}

The above code wouldn't even compile without an error.