James Routley

Ferris the crab

Recently I've been learning and building in Rust to the point where I'm feeling productive with it.

Like many others, I've found the "The Rust Programming Language" book (or "the book" as my rustacean friends like to call it) to be an incredibly helpful resource. But as I've progressed beyond the book I've found that reading the source code of the standard library and the crates I was using was extremely illuminating, and revealed common motifs in Rust development.

There are a few things I've found that I thought might be especially helpful for those new to Rust, which I've collected here.

`Rc<RefCell<T>>` (or `Arc<RwLock<T>>` or `Arc<Mutex<T>>`)

This one is in the book, but it's in the last 3/4, and if you're jumping into the language you might be worried that data structures that were easy in other languages might seem impossible in Rust.

Rc<T> and its thread-safe counterpart Arc<T> allow you to reference-count (or in the case of Arc, atomically reference-count) allocated memory.

.clone() on an Rc<T> will no longer .clone() (and copy) the wrapped value. Instead, a counter representing how many times the Rc is owned will be incremented and the same Rc will be effectively owned in multiple places. As these owned references are dropped the counter decrements, and when the last reference is finally dropped, Rc will drop the wrapped value before it is removed itself.

Rc implements Deref (but not DerefMut), which allows you to easily access the value it wraps via "Deref coercion".

Meanwhile, RefCell<T> (or RwLock<T> and Mutex<T>) lets you separate the mutability of the type that they wrap from the struct or reference that holds it. This is called "interior mutability". Anywhere a RefCell, RwLock or Mutex is borrowed as immutable, you can try to mutably borrow the interior value with .borrow_mut or similar.

RefCell<T> and Rc<T> (and their respective thread-safe counterparts) can be combined to create a value that can be shared (with Rc) and mutated in multiple places (inside of a RefCell, which temporarily provides a borrowed mut value).

Rc<T> can be owned in many places with .clone()
Each clone of Rc points to the same immutable value
You can ask to mutably borrow the value inside the RefCell anywhere the RefCell is immutably borrowed
A RefCell<T> wrapped in Rc can be borrowed immutably anywhere the Rc is owned
You can ask to mutably borrow the value inside the RefCell anywhere the Rc is owned
With this pattern, the interface to ask to borrow a shared value mutably can have many owners

One data structure that might be impossible or needlessly difficult to create in Rust without reference counting and interior mutability is a graph. Below is an example that uses reference counting with Rc<T> to create nodes, one of which is referenced multiple times from multiple other nodes. RefCell<T> is utilized to borrow a mutable reference to the node so that it can be set as visited so we can avoid visiting a node multiple times during a depth-first search.

use std::cell::RefCell;
use std::error::Error;
use std::rc::Rc;

fn main() -> Result<(), Box<dyn Error>> {
    graph_demo()
}

#[derive(Debug)]
struct Node {
    name: String,
    value: i64,
    children: Vec<Rc<RefCell<Node>>>,
    visited: bool,
}

impl Node {
    fn new(name: String, value: i64) -> Self {
        Self {
            name,
            value,
            children: vec![],
            visited: false,
        }
    }
}

fn graph_demo() -> Result<(), Box<dyn Error>> {
    let a = Rc::new(RefCell::new(Node::new("A".into(), 5)));
    let b = Rc::new(RefCell::new(Node::new("B".into(), 10)));
    let c = Rc::new(RefCell::new(Node::new("C".into(), 10)));
    a.try_borrow_mut()?.children.push(b.clone());
    // try_borrow_mut() gets a mutable reference to the interior of the
    // RefCell, or errors if already currently mutably borrowed.
    // b.clone() increments the reference count, instead of copying the
    // whole struct
    b.try_borrow_mut()?.children.push(c.clone());
    a.try_borrow_mut()?.children.push(c.clone());
    c.try_borrow_mut()?.value = 100;

    let mut stack: Vec<_> = vec![a.clone()];
    while let Some(current) = stack.pop() {
        let mut current = current.try_borrow_mut()?;
        if current.visited {
            println!("Already visited {:?}", current);
            continue;
        }
        println!("Visiting {:?}", current);
        current.visited = true;
        for child in current.children.iter() {
            stack.push(child.clone());
        }
    }
    Ok(())
}

`std::collections`

Rust beginners know about Vec<T>. A Vec is essentially a growable region of linear memory. Like in a C array, the location in memory of the n'th element is just the memory location of the first element + n * the size of the stored type.

Vec<T> is growable at the last value in constant time. (...sometimes -- if new capacity needs to be allocated, the new continious capacity will be allocated and the data copied, which happens in linear time.)

But if you need to insert or delete values at the beginning of the collection or somewhere in the middle, values will have to be shuffled in memory to make sure the list stays in order.

Luckily, Rust makes other ways of storing collections of values available in the std::collections module. The module's documentation provides a good summary of when to use the different options.

VecDeque (a ring buffer) and LinkedList provide a structure for sequences with efficient insertion and deletion at both ends of the sequence.
HashMap and HashSet provide O(1) lookups for a key-value map and set, respectively, for keys or items that implement the Hash trait. (You can often derive Hash with #[derive(Hash)].)
BTreeMap and BTreeSet provide O(log(n)) lookups for a key-value map and set, respectively, for keys or items that implement the Ord trait. They can also be iterated through in order as determined by the Ord trait.

Sea shells 1 by Leonardo Aguiar CC BY 2.0

`.await`ing for events

Developers using dynamic languages like JavaScript and Python may be used to using event handlers for new connections or incoming messages.

For example, if you are setting up a WebSocket server, you attach a handler to handle incoming connections. That handler sets up other handlers to define what happens when that connection receives an incoming message.

// javascript example

import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', function connection(ws) {
  ws.on('error', console.error);

  ws.on('message', function message(data) {
    console.log('received: %s', data);
  });

  ws.send('hello!');
});

Rust tends to use a different pattern, where an infinite loop waits on a listener that blocks the thread or future until it has received a connection. This connection can then be moved to a new thread or future to handle that connection, while the loop to accept more connections keeps running.

use futures_util::{SinkExt, StreamExt};
use tokio::net::TcpListener;
use tokio_websockets::{Error, Message, ServerBuilder};

#[tokio::main]
async fn main() -> Result<(), Error> {
    let listener = TcpListener::bind("0.0.0.0:3000").await?;
    while let Ok((stream, _)) = listener.accept().await {
        tokio::spawn(async move {
            let mut ws_stream = ServerBuilder::new().accept(stream).await?;

            ws_stream.send(Message::text("hello!")).await?;

            while let Some(Ok(msg)) = ws_stream.next().await {
                if let Some(txt) = msg.as_text() {
                    println!("received: {}", txt);
                }
            }
            Ok::<_, Error>(())
        });
    }
    Ok(())
}

In this example, it's worth taking a look at SinkExt and StreamExt, extension traits provided by futures_util. These two traits provide an interface for asyncronously handling polling a sink or a stream with .await syntax. ws_stream in this example is a futures_sink::Sink and a futures_core::Stream. Because a Future is, at its essence, an interface that is polled to see if a result is ready, it is possible for SinkExt and StreamExt to wrap a similarly pollable Sink or Stream and provide a Future interface with ws_stream.send and ws_stream.next.

Note that because the stream is owned by the function or closure using it, if we want to wait for messages from a second source -- like a channel -- so we can forward it to the stream, while also waiting for messages from the stream to send back to the channel, we need to use something like tokio::select which will use the result from the first ready Future, and cancel the others:

use futures_util::{SinkExt, StreamExt};
use std::error::Error;
use tokio::net::TcpListener;
use tokio::sync::watch;
use tokio_websockets::{Message, ServerBuilder};

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let listener = TcpListener::bind("127.0.0.1:3000").await?;
    let (pubsub_tx, pubsub_rx) = watch::channel("initial value".to_string());

    while let Ok((stream, _)) = listener.accept().await {
        let tx = pubsub_tx.clone();
        let mut rx = pubsub_rx.clone();
        rx.borrow_and_update(); // marks latest value as seen
        tokio::spawn(async move {
            let Ok(mut ws_stream) = ServerBuilder::new().accept(stream).await else {
                return;
            };

            loop {
                tokio::select! {
                    msg = ws_stream.next() => {
                        if let Some(Ok(msg)) = msg {
                            if let Some(txt) = msg.as_text() {
                                println!("received: {}", txt);
                                if tx.send(txt.to_string()).is_err() {
                                    return;
                                }
                            }
                        } else {
                            break;
                        }
                    }
                    result = rx.changed() => {
                        if result.is_err() {
                            return;
                        }
                        let msg = rx.borrow_and_update().clone();
                        if ws_stream.send(Message::text(msg)).await.is_err() {
                            return;
                        }
                    }
                }
            }
        });
    }
    Ok(())
}

Error handling with `thiserror` and `anyhow`

The first time I implemented an Error, I found that more boilerplate than I was used to from other languages was required.

thiserror provides macros to implement the Error trait. Error types can be created with an enum or a struct:

use thiserror::Error;

#[derive(Error, Debug)]
pub enum ErrorEnum {
    #[error("Not found")]
    NotFound,
    #[error("Permission {0} required")]
    PermissionRequired(String),
}

#[derive(Error, Debug)]
pub struct RemoteError {
    msg: String,
    source: anyhow::Error,
}

anyhow works well with thiserror (or separately) and does a couple things:

Provides an anyhow::Error type that is a lot like Box<dyn Error>, but with some extra helpful guarantees like thread-safety and backtraces.
Provides an anyhow::Result<T> type that is a shortcut for Result<T, anyhow::Error>
Provides a context method allowing you to add context when an error happens that is included in the printed output or can assist in handling by calling functions.

use anyhow::{Context, Result};

fn fail_for_some_reason() -> Result<()> {
    this_will_fail().context("Failed in example fn")?;
    Ok(())
}

There's more to anyhow than this brief example, and the docs are a good resource.

Any

The Any trait -- which "most types implement" -- can allow you to downcast generic types.

In the following example, we can pass any type of Vehicle to the take_public_transit function. Since we are using static dispatch, the compiler should compile the take_public_transit function multiple times for every type of Vehicle that we pass to it. generic_obj.downcast_ref::<T>() checks to see if the TypeId of the passed value is the same for the specified type, and if it is, it returns Some(&T). This is an equality comparison of static values for each compiled iteration of take_public_transit, and so for optimized builds, the type check and dead code paths for specific types should be optimized away! This is very cool!

use anyhow::{bail, Result};
use std::any::Any;

trait Vehicle {
    fn alert(&self);
}

#[derive(Default)]
struct Car {}

impl Vehicle for Car {
    fn alert(&self) {
        println!("Honk! Honk!");
    }
}

impl Car {
    fn drive(&self) {
        println!("Driving the car!");
    }
}

#[derive(Default)]
struct Trolley {}

impl Vehicle for Trolley {
    fn alert(&self) {
        println!("Ding! Ding!");
    }
}

impl Trolley {
    fn ride(&self) {
        println!("Riding the trolley!");
    }
}

#[derive(Default)]
struct Dog {}

fn watch_out(vehicle: &impl Vehicle) {
    vehicle.alert();
}

fn take_public_transit<V: Any + Vehicle>(vehicle: &V) -> Result<()> {
    let vehicle_any = vehicle as &dyn Any;
    if let Some(trolley) = vehicle_any.downcast_ref::<Trolley>() {
        trolley.ride();
    } else {
        bail!("Not sure how to ride this vehicle!");
    }
    Ok(())
}

fn main() -> Result<()> {
    let car = Car::default();
    let trolley = Trolley::default();
    let rover = Dog::default();
    take_public_transit(&trolley)?;
    car.drive();
    watch_out(&car);
    // take_public_transit(&rover);
    // The above line produces a compiler error
    // since it doesn't satisfy trait bounds
    Ok(())
}

Any can also downcast trait objects. Trait objects use "dynamic dispatch". Unlike in our last example where take_public_transit is compiled for every type that it supports, trait objects have a shared "vtable" that points to the implementation for each type. At runtime, the memory location of the implementation for the specific type is looked up in this vtable when it is called. This has a very small performance hit, but it lets us put objects of different types side by side in a collection like a Vec<_>. (It's worth noting here that the ability to do this is useful by itself. It doesn't necessarily need to be paired with Any -- only if you want to be able to downcast.)

To make downcasting work, a function that casts the trait object to Any (.as_any in this example) must be added to the trait.

use anyhow::{bail, Result};
use std::any::Any;

trait Vehicle {
    fn alert(&self);
    fn as_any(&self) -> &dyn Any;
}

#[derive(Default)]
struct Car {}

impl Vehicle for Car {
    fn alert(&self) {
        println!("Honk! Honk!");
    }

    fn as_any(&self) -> &dyn Any {
        self
    }
}

impl Car {
    fn drive(&self) {
        println!("Driving the car!");
    }
}

#[derive(Default)]
struct Trolley {}

impl Vehicle for Trolley {
    fn alert(&self) {
        println!("Ding! Ding!");
    }

    fn as_any(&self) -> &dyn Any {
        self
    }
}

impl Trolley {
    fn ride(&self) {
        println!("Riding the trolley!");
    }
}

fn take_public_transit(vehicle: &Box<dyn Vehicle>) -> Result<()> {
    let vehicle_any = vehicle.as_any();
    if let Some(trolley) = vehicle_any.downcast_ref::<Trolley>() {
        trolley.ride();
    } else {
        bail!("Not sure how to ride this vehicle!");
    }
    Ok(())
}

fn main() {
    let vehicles: Vec<Box<dyn Vehicle>> =
        vec![Box::new(Car::default()), Box::new(Trolley::default())];
    for vehicle in vehicles.iter() {
        if take_public_transit(vehicle).is_err() {
            println!("Didn't ride public transit!");
        }
        vehicle.alert();
    }
}

That's all for now

So that's a few cool things about Rust that once I figured out, unlocked the ability to write programs that really do useful things. I hope it helps some folks out there climb up the Rust learning curve!

Thanks to Jordan for reviewing an early draft of this post!

5 patterns in Rust that are kind of a big deal

Rc<RefCell<T>> (or Arc<RwLock<T>> or Arc<Mutex<T>>)

std::collections

.awaiting for events

Error handling with thiserror and anyhow

Any

That's all for now

`Rc<RefCell<T>>` (or `Arc<RwLock<T>>` or `Arc<Mutex<T>>`)

`std::collections`

`.await`ing for events

Error handling with `thiserror` and `anyhow`