Recently I've been learning and building in Rust to the point where I'm feeling productive with it.
Like many others, I've found the "The Rust Programming Language" book (or "the book" as my rustacean friends like to call it) to be an incredibly helpful resource. But as I've progressed beyond the book I've found that reading the source code of the standard library and the crates I was using was extremely illuminating, and revealed common motifs in Rust development.
There are a few things I've found that I thought might be especially helpful for those new to Rust, which I've collected here.
Rc<RefCell<T>>
(or Arc<RwLock<T>>
or Arc<Mutex<T>>
)This one is in the book, but it's in the last 3/4, and if you're jumping into the language you might be worried that data structures that were easy in other languages might seem impossible in Rust.
Rc<T>
and its thread-safe
counterpart Arc<T>
allow you to reference-count
(or in the case of Arc
, atomically reference-count)
allocated memory.
.clone()
on an Rc<T>
will no longer .clone()
(and copy) the
wrapped value. Instead, a counter representing how many times the Rc
is owned will be incremented and the same Rc
will be effectively owned in
multiple places. As these owned references are dropped the counter decrements,
and when the last reference is finally dropped, Rc
will drop the wrapped value
before it is removed itself.
Rc
implements Deref
(but not DerefMut
), which allows
you to easily access the value it wraps via "Deref coercion".
Meanwhile, RefCell<T>
(or RwLock<T>
and
Mutex<T>
)
lets you separate the mutability
of the type that they wrap from the struct or reference that holds it. This
is called "interior mutability".
Anywhere a RefCell
, RwLock
or Mutex
is
borrowed as immutable, you can try to mutably borrow the
interior value with .borrow_mut
or similar.
RefCell<T>
and Rc<T>
(and their respective thread-safe counterparts)
can be combined to create a value that can be shared (with Rc
) and mutated
in multiple places
(inside of a RefCell
, which temporarily provides a borrowed mut
value).
Rc<T>
can be owned in many places with .clone()
Rc
points to the same immutable valueRefCell
anywhere the RefCell
is immutably borrowedRefCell<T>
wrapped in Rc
can be borrowed immutably
anywhere the Rc
is ownedRefCell
anywhere the Rc
is ownedOne data structure that might be impossible or needlessly difficult to
create in Rust without reference counting and
interior mutability is a graph. Below is an example that
uses reference counting with Rc<T>
to create nodes, one of which is referenced multiple
times from multiple other nodes. RefCell<T>
is utilized to borrow a mutable
reference to the node so that it can be set as visited
so we can avoid
visiting a node multiple times during a depth-first search.
use std::cell::RefCell;
use std::error::Error;
use std::rc::Rc;
fn main() -> Result<(), Box<dyn Error>> {
graph_demo()
}
#[derive(Debug)]
struct Node {
name: String,
value: i64,
children: Vec<Rc<RefCell<Node>>>,
visited: bool,
}
impl Node {
fn new(name: String, value: i64) -> Self {
Self {
name,
value,
children: vec![],
visited: false,
}
}
}
fn graph_demo() -> Result<(), Box<dyn Error>> {
let a = Rc::new(RefCell::new(Node::new("A".into(), 5)));
let b = Rc::new(RefCell::new(Node::new("B".into(), 10)));
let c = Rc::new(RefCell::new(Node::new("C".into(), 10)));
a.try_borrow_mut()?.children.push(b.clone());
// try_borrow_mut() gets a mutable reference to the interior of the
// RefCell, or errors if already currently mutably borrowed.
// b.clone() increments the reference count, instead of copying the
// whole struct
b.try_borrow_mut()?.children.push(c.clone());
a.try_borrow_mut()?.children.push(c.clone());
c.try_borrow_mut()?.value = 100;
let mut stack: Vec<_> = vec![a.clone()];
while let Some(current) = stack.pop() {
let mut current = current.try_borrow_mut()?;
if current.visited {
println!("Already visited {:?}", current);
continue;
}
println!("Visiting {:?}", current);
current.visited = true;
for child in current.children.iter() {
stack.push(child.clone());
}
}
Ok(())
}
std::collections
Rust beginners know about Vec<T>
.
A Vec
is essentially a growable region of
linear memory. Like in a C array, the location in memory of the n'th element
is just the memory location of the first element + n * the size of the stored
type.
Vec<T>
is growable at the last value in
constant time.
(...sometimes -- if new capacity needs to be allocated, the new continious capacity
will be allocated and the data copied, which happens in linear time.)
But if you need to insert or delete values at the beginning of the collection or somewhere in the middle, values will have to be shuffled in memory to make sure the list stays in order.
Luckily, Rust makes other ways of storing collections of values available in
the std::collections
module. The module's documentation provides a good
summary of when to use the different options.
VecDeque
(a ring buffer) and
LinkedList
provide a structure for sequences with
efficient insertion and deletion at both ends of the sequence.HashMap
and
HashSet
provide O(1) lookups for a key-value map and set, respectively,
for keys or items that implement the Hash
trait.
(You can often derive Hash
with #[derive(Hash)]
.)BTreeMap
and
BTreeSet
provide O(log(n)) lookups for a key-value map and set,
respectively, for keys or items that implement the Ord
trait. They can also be
iterated through in order as determined by the Ord
trait.
Sea shells 1 by Leonardo Aguiar CC BY 2.0
.await
ing for eventsDevelopers using dynamic languages like JavaScript and Python may be used to using event handlers for new connections or incoming messages.
For example, if you are setting up a WebSocket server, you attach a handler to handle incoming connections. That handler sets up other handlers to define what happens when that connection receives an incoming message.
// javascript example
import { WebSocketServer } from 'ws';
const wss = new WebSocketServer({ port: 8080 });
wss.on('connection', function connection(ws) {
ws.on('error', console.error);
ws.on('message', function message(data) {
console.log('received: %s', data);
});
ws.send('hello!');
});
Rust tends to use a different pattern, where an infinite loop waits on a listener that blocks the thread or future until it has received a connection. This connection can then be moved to a new thread or future to handle that connection, while the loop to accept more connections keeps running.
use futures_util::{SinkExt, StreamExt};
use tokio::net::TcpListener;
use tokio_websockets::{Error, Message, ServerBuilder};
#[tokio::main]
async fn main() -> Result<(), Error> {
let listener = TcpListener::bind("0.0.0.0:3000").await?;
while let Ok((stream, _)) = listener.accept().await {
tokio::spawn(async move {
let mut ws_stream = ServerBuilder::new().accept(stream).await?;
ws_stream.send(Message::text("hello!")).await?;
while let Some(Ok(msg)) = ws_stream.next().await {
if let Some(txt) = msg.as_text() {
println!("received: {}", txt);
}
}
Ok::<_, Error>(())
});
}
Ok(())
}
In this example, it's worth taking a look at SinkExt
and StreamExt
,
extension traits provided by futures_util
.
These two traits provide an interface for asyncronously
handling polling a sink or a stream with .await
syntax.
ws_stream
in this example
is a futures_sink::Sink
and a
futures_core::Stream
.
Because a Future
is, at its essence, an
interface that is polled to see if a result is ready, it is possible
for SinkExt
and StreamExt
to
wrap a similarly pollable Sink
or Stream
and provide a Future
interface with ws_stream.send
and ws_stream.next
.
Note that because the stream is owned by the function or closure using it,
if we want to wait for messages from a second source -- like a channel -- so we
can forward it
to the stream, while also waiting for messages from the stream to send back
to the channel,
we need to
use something like tokio::select
which will use the result from the first
ready Future
, and cancel the others:
use futures_util::{SinkExt, StreamExt};
use std::error::Error;
use tokio::net::TcpListener;
use tokio::sync::watch;
use tokio_websockets::{Message, ServerBuilder};
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let listener = TcpListener::bind("127.0.0.1:3000").await?;
let (pubsub_tx, pubsub_rx) = watch::channel("initial value".to_string());
while let Ok((stream, _)) = listener.accept().await {
let tx = pubsub_tx.clone();
let mut rx = pubsub_rx.clone();
rx.borrow_and_update(); // marks latest value as seen
tokio::spawn(async move {
let Ok(mut ws_stream) = ServerBuilder::new().accept(stream).await else {
return;
};
loop {
tokio::select! {
msg = ws_stream.next() => {
if let Some(Ok(msg)) = msg {
if let Some(txt) = msg.as_text() {
println!("received: {}", txt);
if tx.send(txt.to_string()).is_err() {
return;
}
}
} else {
break;
}
}
result = rx.changed() => {
if result.is_err() {
return;
}
let msg = rx.borrow_and_update().clone();
if ws_stream.send(Message::text(msg)).await.is_err() {
return;
}
}
}
}
});
}
Ok(())
}
thiserror
and anyhow
The first time I implemented an Error
, I found that more boilerplate than I was
used to from other languages was required.
thiserror
provides macros to implement the Error
trait. Error types can be created
with an enum or a struct:
use thiserror::Error;
#[derive(Error, Debug)]
pub enum ErrorEnum {
#[error("Not found")]
NotFound,
#[error("Permission {0} required")]
PermissionRequired(String),
}
#[derive(Error, Debug)]
pub struct RemoteError {
msg: String,
source: anyhow::Error,
}
anyhow
works well with thiserror
(or separately) and does a couple things:
anyhow::Error
type that is a lot like Box<dyn Error>
, but
with some extra helpful guarantees like thread-safety and backtraces.anyhow::Result<T>
type that is a shortcut for Result<T, anyhow::Error>
context
method allowing you to add context when an error happens
that is included in the printed output or can assist in handling by calling
functions.use anyhow::{Context, Result};
fn fail_for_some_reason() -> Result<()> {
this_will_fail().context("Failed in example fn")?;
Ok(())
}
There's more to anyhow
than this brief example, and the docs
are a good resource.
The Any
trait -- which "most types implement" --
can allow you to downcast generic types.
In the following example, we can pass any type of Vehicle
to the take_public_transit
function. Since we are using static dispatch, the compiler should compile
the take_public_transit
function multiple times for every type of Vehicle
that we pass to it. generic_obj.downcast_ref::<T>()
checks to see if the TypeId
of the passed value is the same for the specified type, and if it is, it returns
Some(&T)
. This is an equality comparison of
static values for each compiled iteration of take_public_transit
, and so for optimized builds, the
type check
and dead code paths for specific types
should be optimized away! This is very cool!
use anyhow::{bail, Result};
use std::any::Any;
trait Vehicle {
fn alert(&self);
}
#[derive(Default)]
struct Car {}
impl Vehicle for Car {
fn alert(&self) {
println!("Honk! Honk!");
}
}
impl Car {
fn drive(&self) {
println!("Driving the car!");
}
}
#[derive(Default)]
struct Trolley {}
impl Vehicle for Trolley {
fn alert(&self) {
println!("Ding! Ding!");
}
}
impl Trolley {
fn ride(&self) {
println!("Riding the trolley!");
}
}
#[derive(Default)]
struct Dog {}
fn watch_out(vehicle: &impl Vehicle) {
vehicle.alert();
}
fn take_public_transit<V: Any + Vehicle>(vehicle: &V) -> Result<()> {
let vehicle_any = vehicle as &dyn Any;
if let Some(trolley) = vehicle_any.downcast_ref::<Trolley>() {
trolley.ride();
} else {
bail!("Not sure how to ride this vehicle!");
}
Ok(())
}
fn main() -> Result<()> {
let car = Car::default();
let trolley = Trolley::default();
let rover = Dog::default();
take_public_transit(&trolley)?;
car.drive();
watch_out(&car);
// take_public_transit(&rover);
// The above line produces a compiler error
// since it doesn't satisfy trait bounds
Ok(())
}
Any
can also downcast trait objects. Trait objects use
"dynamic dispatch". Unlike in our last example where
take_public_transit
is compiled for every type that
it supports, trait objects have a shared
"vtable" that points to the implementation for each type.
At runtime,
the memory location of the implementation for the
specific type is looked up
in this vtable when it is called.
This has a very small performance
hit, but it lets us put objects of different types side by
side in a collection like a Vec<_>
. (It's worth noting
here that the ability to do this is useful by itself.
It doesn't necessarily need to be paired with Any
--
only if you want to be able to downcast.)
To make downcasting work, a function that casts the trait object to Any
(.as_any
in this example)
must be added to
the trait.
use anyhow::{bail, Result};
use std::any::Any;
trait Vehicle {
fn alert(&self);
fn as_any(&self) -> &dyn Any;
}
#[derive(Default)]
struct Car {}
impl Vehicle for Car {
fn alert(&self) {
println!("Honk! Honk!");
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl Car {
fn drive(&self) {
println!("Driving the car!");
}
}
#[derive(Default)]
struct Trolley {}
impl Vehicle for Trolley {
fn alert(&self) {
println!("Ding! Ding!");
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl Trolley {
fn ride(&self) {
println!("Riding the trolley!");
}
}
fn take_public_transit(vehicle: &Box<dyn Vehicle>) -> Result<()> {
let vehicle_any = vehicle.as_any();
if let Some(trolley) = vehicle_any.downcast_ref::<Trolley>() {
trolley.ride();
} else {
bail!("Not sure how to ride this vehicle!");
}
Ok(())
}
fn main() {
let vehicles: Vec<Box<dyn Vehicle>> =
vec![Box::new(Car::default()), Box::new(Trolley::default())];
for vehicle in vehicles.iter() {
if take_public_transit(vehicle).is_err() {
println!("Didn't ride public transit!");
}
vehicle.alert();
}
}
So that's a few cool things about Rust that once I figured out, unlocked the ability to write programs that really do useful things. I hope it helps some folks out there climb up the Rust learning curve!
Thanks to Jordan for reviewing an early draft of this post!