Adam Shirey’s code snippets

Passwordless ssh

2024-06-23T16:11:37+00:00

I kept having to re-learn how to do passwordless authentication with ssh in Linux, so here’s the cheat sheet. For the purposes of discussion, client is the machine from which you are connecting, and server is the host machine to which you will connect. That is to say:

adam@client:~$ ssh server
adam@server's password:

On client, generate a key:

adam@client:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/adam/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/adam/.ssh/id_rsa
Your public key has been saved in /home/adam/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:(...) adam@client
The key's randomart image is:
(...)

The public key, found in ~/.ssh/id_rsa.pub, will now contain a line that looks like this:

ssh-rsa Xy5+In4uXy5+In4uXy5+In4uXy5+In4uX18ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll9fLn4ifi5fLn4ifi5fLn4ifi5fLn4ifi5fCl8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll9fLn4ifi5fLn4ifi5fLn4ifi5fLn4ifi5fICBOb3RoaW5nIHRvIHNlZSBoZXJlICBfLn4ifi5fLn4ifi5fLn4ifi5fLn4ifi5fXy5+In4uXy5+In4uXy5+In4uXy5+In4uXwpfLn4ifi5fLn4ifi5fLn4ifi5fLn4ifi5fXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uXy5+In4uX18ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8ufiJ+Ll8K adam@client

Copy this line and connect (with your password) to server. If it doesn’t already exist, create ~/.ssh/authorized_keys. Append the above line to that file. You should now be able to connect from client to server without a password.

Note: If you want to connect in the other direction, you will need to perform the same steps: generate a key with ssh-keygen on server and its id_rsa.pub line to client.

Making this work with VS Code

To make your VS Code automatically authenticate to a remote host, you’ll want to copy the newly-created id_rsa (not the id_rsa.pub) file to C:\Users\jquser\.ssh\. For example, from within WSL:

adam@client:~$ cp .ssh/id_rsa /mnt/c/Users/adam/.ssh/

Basic notes on Docker

2024-02-18T02:30:33+00:00

Some notes I took a while back on learning the basics of Docker.

Managing images

Lets use minideb, a small Debian-based linux:

$ docker pull bitnami/minideb
Using default tag: latest
latest: Pulling from bitnami/minideb
ba49d470d895: Pull complete
Digest: sha256:cbbc1db2617a7e5224f8dc692c990b723e4fe3ef69864544e7c14aa613c0ccb7
Status: Downloaded newer image for bitnami/minideb:latest
docker.io/bitnami/minideb:latest

We can see this new image is available locally with docker images:

$ docker images
REPOSITORY        TAG       IMAGE ID       CREATED      SIZE
bitnami/minideb   latest    c5eecd6244a8   3 days ago   120MB

And we can remove it with docker image rm :

$ docker image rm c5eecd6244a8
Untagged: bitnami/minideb:latest
Untagged: bitnami/minideb@sha256:cbbc1db2617a7e5224f8dc692c990b723e4fe3ef69864544e7c14aa613c0ccb7
Deleted: sha256:c5eecd6244a829084e2f788e3f877a5ab8ac63f9c8dc55c3cfff4f1d172fc23c
Deleted: sha256:44b47439f86a658d61565e3a9e86c1c9608b2ee8adb4f6e85005634e6f537f43

$ docker images
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE

Running images in containers

We could run this image in a new container with docker run c5eecd6244a8, but it would almost immediately return to our console. With docker container ls -a, we’d see that this container ran and terminated:

$ docker container ls -a
CONTAINER ID   IMAGE          COMMAND       CREATED          STATUS                      PORTS     NAMES
ff11c7f3afb8   c5eecd6244a8   "/bin/bash"   29 seconds ago   Exited (0) 28 seconds ago             clever_franklin

# Delete this terminated container
$ docker container rm ff11c7f3afb8

What we want is to run interactively, so we’ll use docker run -it :

# In the host:
$ docker run -it c5eecd6244a8

# In the container!
root@25bca2749327:/# uname -a
Linux 25bca2749327 5.15.0-1042-azure #49~20.04.1-Ubuntu SMP Wed Jul 12 12:44:56 UTC 2023 x86_64 GNU/Linux

root@25bca2749327:/# cat /etc/os-release | grep NAME
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_CODENAME=bookworm

root@25bca2749327:/# exit

At this point, we’re back in our host. There’s still a terminated container:

$ docker container ls -a
CONTAINER ID   IMAGE          COMMAND       CREATED         STATUS                      PORTS     NAMES
25bca2749327   c5eecd6244a8   "/bin/bash"   2 minutes ago   Exited (0) 49 seconds ago             elegant_saha

$ docker container rm 25bca2749327

To avoid this, use docker run --rm (NB, it has to be before the container name!):

$ docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

$ docker run -it --rm c5eecd6244a8
root@21735047c8bb:/# hostname
21735047c8bb

root@21735047c8bb:/# exit

$ docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

Creating a container

Let’s make use of the minideb image as the basis for a derived image. We use a Dockerfile to describe the image we’ll create:

# The base image for our new image
FROM bitnami/minideb

# For a simple Rust service, see:
# https://aeshirey.github.io/code/2023/02/25/simple-rust-service-in-docker.html
# COPY  
COPY rust-server my-rust-server

CMD ["./my-rust-server"]

To build this, we can use docker build , where is the directory in which the Dockerfile lives (eg, .). Additionally, we’ll use the -t : to give our image a name and tag. If the tag is omitted, latest is used.

$ docker images
REPOSITORY        TAG       IMAGE ID       CREATED      SIZE
bitnami/minideb   latest    c5eecd6244a8   3 days ago   120MB

$ docker build . -t my-simple-container
 => [internal] load .dockerignore                                                                  0.0s
 => => transferring context: 2B                                                                    0.0s
 => [internal] load build definition from Dockerfile                                               0.0s
 => => transferring dockerfile: 141B                                                               0.0s
 => [internal] load metadata for docker.io/bitnami/minideb:latest                                  0.0s
 => [internal] load build context                                                                  0.0s
 => => transferring context: 84B                                                                   0.0s
 => [1/2] FROM docker.io/bitnami/minideb:latest                                                    0.0s
 => CACHED [2/2] COPY simple-server/simple-server my-simple-server                                 0.0s
 => exporting to image                                                                             0.0s
 => => exporting layers                                                                            0.0s
 => => writing image sha256:61a24712801a996b6ceefb378cd9ebccdb9caae8c58ea7acf17eaff0285666bb       0.0s
 => => naming to docker.io/library/my-simple-container                                             0.0s

$ docker images
REPOSITORY            TAG       IMAGE ID       CREATED              SIZE
my-simple-container   latest    61a24712801a   About a minut
bitnami/minideb       latest    c5eecd6244a8   3 days ago           120MB

Because our server exposes port 8080, we want our container to also expose it. Maybe we want to use the same port or maybe we want to remap it. Either way, we’ll use -p ::

$ docker run --rm --init -p 8123:8080 fd83da080eab

Then we can connect in another shell on our host to communicate with this container:

$ curl 127.0.0.1:8123 -l -w "\n"
home

Extras

Need to ‘log into’ a container for an image you built to inspect it?

# Specify 'bash' as the process to run
$ docker run -p 8123:8080 --rm -it 61a24712801a bash
#                        image id ------^        ^-- command to run

Can’t CTRL-C from your `docker run`?

Oops, can’t exit this container:

$ docker run --rm fd83da080eab
^C

From another shell:

$ docker ps
CONTAINER ID   IMAGE          COMMAND               CREATED              STATUS              PORTS     NAMES
260c882a217e   fd83da080eab   "/my-simple-server"   About a minute ago   Up About a minute             gracious_hawking
#    ^------ this is the container we'll want to kill because oopsie

$ docker kill 260c882a217e
260c882a217e

Avoid this by including the --init flag next time you docker run:

$ docker run --rm --init fd83da080eab
^C$

How about accessing the host network?

If you use docker run --network=host, then the container will be able to access the host network. For example:

# In the host OS:
$ ./rust-server &

$ curl 127.0.0.1:8080 -w "\n"
home

$ docker run -it --rm --network=host c5eecd6244a8

# Now in the container
root@hostname:/# curl 127.0.0.1:8080 -w "\n"
home
# 

Exporting/importing images

$ docker images
REPOSITORY            TAG       IMAGE ID       CREATED          SIZE
                    61a24712801a   30 minutes ago   131MB
my-simple-container   latest    fd83da080eab   30 minutes ago   131MB
bitnami/minideb       latest    c5eecd6244a8   3 days ago       120MB

$ docker save fd83da080eab | gzip > my-simple-container.tar.gz

$ file my-simple-container.tar.gz
my-simple-container.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 135666688 gzip compressed data, reserved method, ASCII, extra field, encrypted, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 135666688

$ ls -lh my-simple-container.tar.gz
-rw-r--r-- 1 root root 41M Feb 17 04:48 my-simple-container.tar.gz

# Later/elsewhere, this can be loaded:
$ docker load < my-simple-container.tar.gz
Loaded image: my-simple-container:latest

Calling async Rust code from synchronous

2023-05-13T02:42:52+00:00

Sometimes I find that I have some async code that I really want to call from another project, but I don’t want async/await to infect my entire codebase. At least using tokio, there’s an easy way to do this. Given some async project:

cargo new --lib my-async-crate
cargo add tokio

Which exposes an async function:

pub async fn sleep_a_bit(num_seconds: u64) {
    println!("Hold please...");
    tokio::time::sleep(std::time::Duration::from_secs(num_seconds)).await;
    println!("Thanks for waiting!");
}

We then have a project which wants to use our cool sleep_a_bit function:

cargo new my-project

fn main() {
    my_async_crate::sleep_a_bit(5);
}

This compiles, but it doesn’t do what we want:

warning: unused implementer of `Future` that must be used
 --> src/main.rs:2:5
  |
2 |     my_async_crate::do_async_await(5);
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: futures do nothing unless you `.await` or poll them
  = note: `#[warn(unused_must_use)]` on by default

If we run this project, it will immediately exit. Thus, we need to include the .await call:

fn main() {
// ---- this is not `async`
    my_async_crate::sleep_a_bit(5).await;
    //                             ^^^^^^ only allowed inside `async` functions and blocks
}

What we need is to create a tokio Runtime that can synchronously block until the inner asynchronous operations complete. To do this, we add tokio with the rt feature:

cargo add tokio --features rt

The main function then creates the runtime and creates a Future. For this example, we’ll just block_on:

fn main() {
    let rt = tokio::runtime::Runtime::new().unwrap();

    rt.block_on(async {
        my_async_crate::sleep_a_bit(5).await;
        println!("And we're back!");
    });
}

Visitor Deserialization in Rust

2023-04-04T15:21:06+00:00

Consider the following input JSON:

{
   "documents": [
      { "foo": 1 },
      { "baz": true },
      { "bar": null }
   ],
   "journal": { "timestamp": "2023-04-04T08:28:00" }
}

If we assume that each inner ‘document’ should be simply treated as an arbitrary JSON Value, we can model and read our input as:

#[derive(Deserialize, Debug)]
struct MyData {
    documents: Vec<Value>,
    journal: Value,
}

fn main() {
    let json = std::fs::read_to_string("input.json").unwrap();
    let mydata: MyData = serde_json::from_str(&json).unwrap();
    println!("{mydata:?}");
}

But what if we only need a subset of ‘documents’ and/or need to process each into something else, and they are exceedingly large? This would cause significant memory overhead that we want to avoid. One possibility is to roll your own string reading mechanism, trying to figure out when one document starts and ends, then parsing only that string. This becomes a bit cumbersome, but worse still is that it may be error prone when trying to deal with the arbitrary journal value: how do we know if we’ve finished reading the last document and have arrived at the journal? What if a document legitimately contains a "journal" key?

Fortunately, serde contains the capability to do custom serialization and deserialization and to use a visitor pattern. We can use this approach to handle each document in succession. To do so, we’ll create a new type that represents our documents:

#[derive(Debug)]
struct Documents(Vec<Value>);

#[derive(Deserialize, Debug)]
struct MyData {
    documents: Documents,
    journal: Value,
}

Structurally, this is the same as before, but it allows us to insert our own, manual deserialization step – note that MyData implements Deserialize but Documents doesn’t. The deserialization implementation stub looks like this:

impl<'de> Deserialize<'de> for Documents {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        todo!()
    }
}

Before implementing this part, we’ll create the visitor. First, the type that will know how to deserialize our documents:

struct DocumentVisitor;

Note that DocumentVisitor itself doesn’t collect Values – it just knows how to deserialize them. Serde’s visitor pattern has an associated type that will be the (collected) result of deserialization. This output is what we will have filtered and/or processed from each raw JSON value from input. Here’s the stub for the visitor:

impl<'de> serde::de::Visitor<'de> for DocumentVisitor {
    type Value;

    fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
        todo!()
    }
}

expecting is a required method that:

[Formats] a message stating what data this Visitor expects to receive. … The message should complete the sentence “This Visitor expects to receive …”,

Because our visitor expects a list of documents, we’ll say that:

fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
    write!(formatter, "a list of JSON values")
}

Also because we’re expecting a list (or sequence) of items, we’ll override the visit_seq method. We also set the required associated type, Value, indicating what kind of value this visitor will be returning. (Note that here, the associated type Value is not the same as serde_json::Value. The former is what we’ll be telling serde that we’ll return, which is a Vec. The latter is specific to JSON data.) In visit_seq, we’ll repeatedly call seq.next_element(), propagating up any errors that serde gives us. For now, we’ll just push each item onto a vector that we’ll return:

    type Value = Vec<Value>;

    fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
    where
        A: serde::de::SeqAccess<'de>,
    {
        let mut values = Vec::new();
        while let Some(item) = seq.next_element()? {
            println!("Read item='{item}'");
            values.push(item)
        }
        Ok(values)
    }

That completes the visitor, and we can now implement Deserialize for Documents. We’ll instantiate a visitor, which is passed to the deserializer’s deserialize_seq method:

impl<'de> Deserialize<'de> for Documents {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let visitor = DocumentVisitor;
        let docs = deserializer.deserialize_seq(visitor)?;
        Ok(Documents(docs))
    }
}

Note that by passing a DocumentVisitor to the deserializer, serde knows that it will be returning a Vec (by virtue of the associated type). Thus, that is the type of docs. Our Deserialize implementation returns a Documents object, so we wrap docs in that.

Full implementation

#[derive(Debug)]
struct Documents(Vec<Value>);

#[derive(Deserialize, Debug)]
struct MyData {
    documents: Documents,
    journal: Value,
}

impl<'de> Deserialize<'de> for Documents {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let visitor = DocumentVisitor;
        let docs = deserializer.deserialize_seq(visitor)?;
        Ok(Documents(docs))
    }
}

struct DocumentVisitor;

impl<'de> serde::de::Visitor<'de> for DocumentVisitor {
    type Value = Vec<Value>;

    fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(formatter, "a list")
    }

    fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
    where
        A: serde::de::SeqAccess<'de>,
    {
        let mut values = Vec::new();
        while let Some(item) = seq.next_element()? {
            println!("Read item='{item}'");
            values.push(item)
        }
        Ok(values)
    }
}

Filtering and processing

The above implementation reads and keeps every value of input. The whole idea here, though, was that we could filter/process our values, so let’s now update our code to do that. We’ll only keep documents that are themselves objects, then we’ll take the first key-value pair (ignoring others), skipping those with null values (eg, { "bar": null }"). These first key-value pairs will be aggregated into a single object returned as a vector of one object:

fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
    A: serde::de::SeqAccess<'de>,
{
    let mut agg_map = serde_json::Map::new();
    while let Some(item) = seq.next_element()? {
        // If `item` isn't a JSON object, we'll skip it:
        let Value::Object(map) = item else { continue };

        // Get the first element, assuming we have some
        let (k, v) = match map.into_iter().next() {
            Some(kv) => kv,
            None => continue,
        };

        // Ignore any null values; aggregate everything into a single map
        if v == Value::Null {
            continue;
        } else {
            println!("Keeping {k}={v}");
            agg_map.insert(k,v);
        }
    }

    let values = Value::Object(agg_map);
    println!("Final value is {values}");

    Ok(vec![values])
}

When running this code, the following output is printed to the console:

Keeping foo=1
Keeping baz=true
Final value is {"baz":true,"foo":1}

Simple Rust service in Docker

2023-02-25T23:07:00+00:00

At work, I own a Rust service that runs in an Azure Function. Among other things, the Functions runtime handles restarting the service should it be needed; fortunately, the service is incredibly stable and reliable. That said, I have done almost nothing with Docker (since I guess I’m living in the mid 2010s), and I really should learn more about it, as I expect I may need to deploy Rust services through Docker at some point.

I started looking at some simple Docker examples, but they all seem to use Node as a starting point. I don’t want to start there and try to work my way back, so instead, I figured I’d start with a simple Rust service and see if I can start from scratch.

As a total Docker newbie, here’s a fairly brief summary of my misadventures.

Build a Simple Server

Let’s start with the service itself. Wanting to keep this incredibly simple (in this case, avoiding Rust async), I found OxHTTP, a very simple synchronous HTTP server. We’ll start with a new project that uses it:

$ cargo new rust-server
$ cd rust-server
$ cargo add oxhttp

The provided example is just about perfect for what we want; I’ll just slightly tweak it by wrapping it in a main function:

fn main() {
    use oxhttp::Server;
    use oxhttp::model::{Response, Status};
    use std::time::Duration;
    
    // Builds a new server that returns a 404 everywhere except for "/" where it returns the body 'home'
    let mut server = Server::new(|request| {
        if request.url().path() == "/" {
            Response::builder(Status::OK).with_body("home")
        } else {
            Response::builder(Status::NOT_FOUND).build()
        }
    });
    // Raise a timeout error if the client does not respond after 10s.
    server.set_global_timeout(Duration::from_secs(10));
    // Listen to localhost:8080
    server.listen(("localhost", 8080)).unwrap();
}

Build this with cargo build --release and test it out. I’ve configured my ~/.cargo/config to specify a common build directory:

[build]
target-dir = "/home/adam/cargo-target"

This means that I can run my built server with ~/cargo-target/release/rust-server, and when I visit http://localhost:8080 in my browser, I see the HTTP response “home”. The server now works, so I copied the binary into the current working directory.

Simple Docker image

Next, we’ll need to build a Dockerfile. As I said, I know just about nothing about Docker, but I want to avoid the Node route. It seems pretty much everything is built off of Alpine, so I’ll start there:

FROM alpine:latest
COPY rust-server rust-server
CMD ["rust-server"]

Building this is quick and completes without issue:

$ docker build -t my-rust-server:latest .
[+] Building 0.6s (7/7) FINISHED
 => [internal] load build definition from Dockerfile                                                                                 0.1s
 => => transferring dockerfile: 113B                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                    0.0s
 => => transferring context: 2B                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                     0.0s
 => [internal] load build context                                                                                                    0.1s
 => => transferring context: 4.70MB                                                                                                  0.1s
 => CACHED [1/2] FROM docker.io/library/alpine:latest                                                                                0.0s
 => [2/2] COPY rust-server rust-server                                                                                               0.2s
 => exporting to image                                                                                                               0.1s
 => => exporting layers                                                                                                              0.1s
 => => writing image sha256:108dbb6764b4e6c94cc3bde571eb2157dceb57aab1ac3f393577174c1175a282                                         0.0s
 => => naming to docker.io/library/my-rust-server:latest                                                                             0.0s

Then I run it:

$ docker run -t my-rust-server:latest
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "rust-server": executable file not found in $PATH: unknown.
ERRO[0001] error waiting for container: context canceled

It seems that COPY foo foo places foo into the root directory (ie, /), which isn’t in $PATH, I guess? So let’s try putting it into /bin/:

FROM alpine:latest
COPY rust-server /bin/rust-server
CMD ["/bin/rust-server"]

$ docker run -t my-rust-server:latest
exec /bin/rust-server: no such file or directory

This is a different error, so something changed. But it’s still not finding it? Let’s inspect the container:

$ docker run -it my-rust-server:latest /bin/sh
/ # ls /bin/rust-server
/bin/rust-server
/ # file /bin/rust-server
/bin/sh: file: not found

The binary is definitely there. I tried file to see what the system thinks the binary is, but Alpine doesn’t have it. Instead, we can try ldd to get details:

/ # ldd /bin/rust-server
        /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
Error loading shared library libgcc_s.so.1: No such file or directory (needed by /bin/rust-server)
        librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
        libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
        libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
        libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /bin/rust-server)
Error relocating /bin/rust-server: _Unwind_Resume: symbol not found
Error relocating /bin/rust-server: _Unwind_Backtrace: symbol not found

Ohh, so it’s not that my Docker image can’t find my binary but that when trying to run my binary, it can’t find the dynamically-linked libgcc. A quick search on how to install packages in Alpine (since it’s not Debian-based, I can’t use apt) shows that it uses apk, and libgcc exists in Alpine’s package repository. Adding this to the Dockerfile:

FROM alpine:latest
COPY rust-server /bin/rust-server

# These are new:
RUN apk update
RUN apk add libgcc

CMD ["/bin/rust-server"]

Running this still gives the no such file or directory error. So let’s inspect with ldd again:

/ # ldd /bin/rust-server
        /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7f1e6d2bc000)
        librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
        libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
        libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
        libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /bin/rust-server)
Error relocating /bin/rust-server: __res_init: symbol not found
Error relocating /bin/rust-server: gnu_get_libc_version: symbol not found

libgcc is no longer a problem, but ld-linux still is. And it appears that ld-linux is part of gcompat. After adding RUN apk add gcompat, rebuilding, and rerunning, the message “Error loading shared library ld-linux-x86-64.so.2” goes away, but the “__res_init” and “gnu_get_libc_version” errors remain.

I did some further sleuthing and found a suggestion on Reddit to use this hack to make it work, but instead of continuing down this rabbit hole, I decided to try another approach I saw: musl.

Static linking with musl

Rust can compile to a number of build targets; in my dev environment (Ubuntu in WSL2), the default is:

$ rustc -vV | grep host
host: x86_64-unknown-linux-gnu

We can find supported targets with rustup target list. Doing this shows that there’s x86_64-unknown-linux-musl. Let’s install this toolchain and compile the server:

$ rustup target add x86_64-unknown-linux-musl 
(...)
$ cargo build --target=x86_64-unknown-linux-musl --release
$ mv rust-server rust-server-old
$ cp ~/cargo-target/x86_64-unknown-linux-musl/release/rust-server .

We can compare the old binary to the new one:

$ file rust-server-old rust-server
rust-server-old: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=523b84a693e7b90bcf8332d2eecd51cc9bfbe45a, with debug_info, not stripped
rust-server:     ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped

$ ldd rust-server-old
        linux-vdso.so.1 (0x00007ffea2be5000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd1fa870000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd1fa668000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd1fa449000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd1fa245000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1f9e54000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd1fad44000)

$ ldd rust-server
        statically linked

(Side note: the old and new binaries are 4.5Mb and 4.9Mb, respectively, showing the cost of statically linking. However, if we strip both binaries, their sizes – and the marginal difference – drop: 751Kb and 865Kb, respectively.)

Running the `musl` binary

Since the new binary is statically linked, we don’t need to install extra apk packages, so the Dockerfile is now back to:

FROM alpine:latest
COPY rust-server /bin/rust-server
CMD ["/bin/rust-server"]

This builds very quickly, and calling docker run now has a service running. Going to http://localhost:8080 should work, right?

$ curl localhost:8080
curl: (7) Failed to connect to localhost port 8080: Connection refused

Ah, but we need to publish the container’s port to the host:

$ docker run -p 8080:8080 -t my-rust-server:latest

# In another terminal (because `docker run` is blocking):
$ curl localhost:8080
curl: (52) Empty reply from server

So it’s connecting but not getting any data?

$ telnet localhost 8080
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.

$ telnet localhost 8081
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

The connection on 8080 is opened but immediately closed; the attempt on 8081 fails, as expected, because there’s nothing on that port – it’s showing that there’s something different about 8080. So Docker is forwarding the port, and something is listening. Surely that’s our server.

Looking at the Rust code, we notice that we’re listening on localhost, port8080:

server.listen(("localhost", 8080)).unwrap();

But wait: it turns out that there’s a difference:

127.0.0.1:xxxx is the normal loopback address, and localhost:xxxx is the hostname for 127.0.0.1:xxxx.

0.0.0.0 is slightly different, it’s an address used to refer to all IP addresses on the same machine. Or no specific IP address.

Simply changing from “localhost” to “0.0.0.0”, recompiling, rebuilding the image, and rerunning does the trick

$ curl localhost:8080
home

Bonus: Saving the image:

I am familiar (but have no experience) with Docker Hub, and I have only briefly played with Azure Container Registry, I thought I’d first start with the simplest option: saving the image to a file:

$ docker save my-rust-server:latest | gzip > my-rust-server.tar.gz

$ ls -lh my-rust-server.tar.gz
-rw-r--r-- 1 adam adam 4.5M Feb 24 16:30 my-rust-server.tar.gz

Now let’s remove the image from Docker, make sure we can re-load it, and run it again:

$ docker image rm my-rust-server:latest
Untagged: my-rust-server:latest
Deleted: sha256:0ed0fda582a3c568fdb8f4a313a464ce3244442d1f1d36934be9bb29e8b9e4fd

$ docker images | grep my-rust

$ docker load < my-rust-server.tar.gz
Loaded image: my-rust-server:latest

$ docker images | grep my-rust
my-rust-server   latest    732da9278f98   18 minutes ago   12.1MB

$ docker run -it my-rust-server:latest /bin/sh
/ # ls /bin/rust-server
/bin/rust-server

Rayon thread pools in Rust

2023-02-10T17:02:17+00:00

rayon provides an incredibly simple work stealing framework that, in my experience, requires only two lines of code that can dramatically improve processing throughput. To use, you’ll need to add it to your Cargo.toml with cargo add rayon.

Consider some function that does some intensive work:

/// Do some number of iterations of work
fn do_work(worker: usize, iterations: usize) {
    println!("Worker {worker} doing work");

    if iterations > 0 {
        // simulate long-running work with 'sleep'
        // we might do different kinds of work depending on the worker,
        // eg, open a different file of input.
        std::thread::sleep(std::time::Duration::from_secs(1));
        do_work(worker, iterations - 1)
    }
}

Doing this serially might look like this:

const NUM_WORKERS: usize = 5;
const NUM_ITERATIONS: usize = 4;
fn main() {
    let s = std::time::Instant::now();
    (1..=NUM_WORKERS).for_each(|worker| do_work(worker, NUM_ITERATIONS));
    println!("Work took {:?}", s.elapsed());
}

This produces the very boring output:

Worker 1 doing work
Worker 1 doing work
Worker 1 doing work
Worker 1 doing work
Worker 1 doing work
Worker 2 doing work
Worker 2 doing work
Worker 2 doing work
Worker 2 doing work
Worker 2 doing work
Worker 3 doing work
Worker 3 doing work
Worker 3 doing work
Worker 3 doing work
Worker 3 doing work
Worker 4 doing work
Worker 4 doing work
Worker 4 doing work
Worker 4 doing work
Worker 4 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Work took 20.007646351s

This might be rather inefficient, especially if we have many CPU cores sitting idle. Instead, we can use rayon and use one of the *par_iter variations:

use rayon::prelude::*; // this is new
fn main() {
    let s = std::time::Instant::now();
    (1..=NUM_WORKERS)
        .into_par_iter() // and this is new
        .for_each(|worker| do_work(worker, NUM_ITERATIONS));
    println!("Work took {:?}", s.elapsed());
}

This is much faster, as it will parallelize the work according to the number of CPUs available:

Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 5 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Work took 8.011677642s

Two things to note here:

Because this is now multithreading our work, the order of individual steps – in this case, 1, 3, 2, then 4 – isn’t exactly what we expect.
We had five units of work (ie, five workers), but Rayon parallelized across four threads. In other words, it ran steps 1-4 in parallel, but step 5 ran after the others. This may be sub-optimal, so you can use a ThreadPool to configure the number of threads:

fn main() {
    let s = std::time::Instant::now();
    let pool = rayon::ThreadPoolBuilder::new()
        .num_threads(NUM_WORKERS) // use one thread per work slice
        .build()
        .unwrap();
    
    pool.install(|| {
        (1..=NUM_WORKERS)
            .into_par_iter()
            .for_each(|worker| do_work(worker, NUM_ITERATIONS));
    });
    println!("Work took {:?}", s.elapsed());
}

Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Work took 4.002877261s

Alternately, you may want to limit your parallelism to leave compute available to other tasks:

fn main() {
    let s = std::time::Instant::now();
    let pool = rayon::ThreadPoolBuilder::new()
        .num_threads(2) // use only two threads
        .build()
        .unwrap();

    pool.install(|| {
        (1..=NUM_WORKERS)
            .into_par_iter()
            .for_each(|worker| do_work(worker, NUM_ITERATIONS));
    });
    println!("Work took {:?}", s.elapsed());
}

Worker 1 doing work
Worker 3 doing work
Worker 3 doing work
Worker 1 doing work
Worker 1 doing work
Worker 3 doing work
Worker 3 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 1 doing work
Worker 2 doing work
Worker 4 doing work
Worker 2 doing work
Worker 4 doing work
Worker 2 doing work
Worker 4 doing work
Worker 2 doing work
Worker 4 doing work
Worker 5 doing work
Worker 2 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Work took 12.004555134s

Joinable traits

2022-11-28T14:35:54+00:00

I just released an update to my joinable crate (source code here) as well as a new irisdata crate (source code) well-known in the data science field.

This update to joinable renames the Joinable trait to JoinableGrouped to reflect that the results (at least of inner- and outer-joins) group the right-hand side. It also adds a new trait with the Joinable name that behaves perhaps more intuitively – each left-hand record can be yielded multiple times (as matches are found).

Joinable only defines inner_join and outer_join methods. JoinableGrouped defines inner_join_grouped, outer_join_grouped, semi_join, and anti_join.

use std::cmp::Ordering;

use irisdata::{Species, IRIS_DATA};
use joinable::{JoinableGrouped, RHS};

#[derive(Debug)]
struct IrisData {
    species: Species,
    common_name: &'static str,
    average_sepal_length: f32,
    average_sepal_width: f32,
    average_petal_length: f32,
    average_petal_width: f32,
}

fn main() {
    let common_names = [
        (Species::IrisVersicolor, "blue flag"),
        (Species::IrisVersicolor, "harlequin blueflag"),
        (Species::IrisVersicolor, "larger blue flag"),
        (Species::IrisVersicolor, "northern blue flag"),
        (Species::IrisVersicolor, "poison flag"),
        (Species::IrisVirginica, "Virginia blueflag"),
        (Species::IrisVirginica, "Virginia iris"),
        (Species::IrisVirginica, "great blue flag"),
        (Species::IrisVirginica, "southern blue flag"),
    ];

    let joined = common_names
        .iter()
        .inner_join_grouped(RHS::new_unsorted(&IRIS_DATA[..]), |(lhs_species, _), r| {
            if *lhs_species == r.species {
                Ordering::Equal
            } else {
                Ordering::Less
            }
        })
        .map(|(lhs, grp)| IrisData {
            species: lhs.0,
            common_name: lhs.1,
            average_sepal_length: grp.iter().map(|i| i.sepal_length).sum::<f32>() / grp.len() as f32,
            average_sepal_width: grp.iter().map(|i| i.sepal_width).sum::<f32>() / grp.len() as f32,
            average_petal_length: grp.iter().map(|i| i.petal_length).sum::<f32>() / grp.len() as f32,
            average_petal_width: grp.iter().map(|i| i.petal_width).sum::<f32>() / grp.len() as f32,
        })
        .collect::<Vec<_>>();

    println!("{joined:#?}");
}

SMT for scheduling scouts in tents

2022-07-08T05:43:17+00:00

Yesterday, the adults of my kids’ scout troop had a video call to discuss the upcoming week-long camp many of the scouts will attend. One of the mundane tasks is to figure out which scouts will bunk with which other scouts. There are maybe 20 or so scouts of different ages, genders, and personalities, and they need to be placed into a limited number of tents or cabins.

I wasn’t personally involved in the phone call, but I was call-adjacent and aghast at the pen-and-paper approach to figuring out who should be where. Parents expressed their interest in having their kids with this scout but not with that scout. Boys and girls can’t share a tent. Scouts may only share a tent with other scouts within three years of age (ie, no 17 year-old scouts bunking with 12 year-olds). The mental effort and time that went into that work annoyed my inner geek, so I proceeded to spend the next several hours solving the general case. It was a good excuse to play around with SMT again.

Rather than dive into all the details, I’ll simply share the public Gist with my v1 implementation.

The solution is a Python script that generates SMT-LIB code that is evaluated by Z3. After a few configurations (such as NUM_TENTS to identify how many tents are available), you specify the set of scouts with their age and gender:

scouts = [
        ('Abe', 14, 'm'),     # 0
        ('Brian', 13, 'm'),   # 1
        ('Charlie', 14, 'm'), # 2
        ('Dave', 13, 'm'),    # 3
        ('Eddie', 14, 'm'),   # 4
        ('Lily', 15, 'f'),    # 5
        ('Megan', 14, 'f'),   # 6
        ]

The output is a model that tells you who is in which tent. In this example, tent0 contains scouts 5 (Lily) and 6 (Megan):

  (define-fun tent0 ((x!0 Int)) Int
    (ite (= x!0 2) 5
    (ite (= x!0 3) 6
      (- 1))))

Understanding warp

2022-01-11T19:59:12+00:00

warp bills itself as “a super-easy … web server framework.” And while I am (happily) using it in a production service at work, I didn’t find it super easy to setup. I kind of stumbled into a successful implementation for my service. Because I wanted to better understand how to compose different route handlers, I dived into how it works. This post covers those very basics.

As usual, let’s start with the dependencies. Our webserver should run asynchronously for maximum throughput, so we’ll use tokio and futures, and of course the warp crate:

[dependencies]
tokio = { version = "1", features = ["full"] }
futures = "0.3"
warp = "0.3"

warp uses the concept of composable request Filters: components that match requests, extract data from them (URI components, query parameters, request bodes, etc.), and chain together (and/or).

Welcome

To start, we’ll make our main function asynchronous and add a route that matches the root of the server and prints a welcome message:

use warp::Filter;

#[tokio::main]
async fn main() {
    let index = warp::path::end().map(|| "Welcome!");

    let routes = index;
    warp::serve(routes).run(([0, 0, 0, 0], 3000)).await;
}

warp::path::end is used to identify that the path handling is complete, and since it’s not chained with any previous components, it effectively matches “/”. (Think of it like the regular expression "/$".) We then .map the input request to the &str output.

In this post, I’m using the convention of serving up a routes value; in this first example, we’re only serving a single route.

Very unsurprisingly, if you run this project and go to http://127.0.0.1:3000/, you will see the welcome text.

Hello

This first request isn’t particularly interesting or transparent, so let’s handle an input path such as /hello/adam as a way to say hello to the user, and its code will be added as its own function.

use warp::{path, Filter};

async fn main() {
    let index = warp::path::end().map(|| "Welcome!");

    let hello = path!("hello" / String).then(handle_hello);

    let routes = index.or(hello);
    warp::serve(routes).run(([0, 0, 0, 0], 3000)).await;
}

async fn handle_hello(name: String) -> impl warp::Reply {
    format!("Hello, {}", name)
}

There are several changes here:

hello is defined using the warp::path! macro which adds convenience for declaring URI path components and arguments. path!("hello" / String) declares that we’re handling a path that starts with the literal hello then some String argument.
We augment our handled routes by handling any request that matches the root (via index) or any request that matches the hello handler. When a request comes in, warp will check these in order. (Requests that match none of these are discussed below.)
The asynchronous handle_hello function accepts the provided argument and returns some type that implements Reply. Note that under the hood, this async function ends up returning a Future, but this is transparent to us.
To use an async function, we compose our path with .then; unintuitively, if handle_hello was synchronous, we’d use .map instead.

In addition to the welcome message at the root, you can now go to http://127.0.0.1:3000/hello/adam to see "Hello, adam".

Goodbye - like hello, but different

The goodbye handler will be very similar to hello but with some minor tweaks. First, we might want to either return 200 OK (which is the default) or some alternate status code. The new handler will conditionally return an error code for certain kinds of input:

use std::convert::Infallible;
use warp::{hyper::StatusCode, reply, Reply};

#[tokio::main]
async fn main() {
    // ...

    let goodbye = warp::path("goodbye")
        .and(warp::path::param())
        .and(warp::path::end())
        .and_then(handle_goodbye);

    let routes = index.or(hello).or(goodbye);

    // ...
}

async fn handle_goodbye(name: String) -> Result<impl Reply, Infallible> {
    if name == "earl" {
        Ok(reply::with_status(
            "Earl Grey is a tea".to_string(),
            StatusCode::IM_A_TEAPOT,
        ))
    } else {
        Ok(reply::with_status(
            format!("Goodbye, {}", name),
            StatusCode::OK,
        ))
    }
}

Declaring goodbye is now using the explicit warp Filters path, param, and end. This is very much like path! did for hello. As is common in Rust, type inference is used to determine the type that param expects (by virtue of the function we’re calling). Note that both handle_hello and handle_goodbye use owned values (ie, String instead of &str), which is required for async functions for reasons outside the scope of this post.

The function handle_goodbye now returns a Result<_, Infallible>. This is to say, this function cannot fail (all code paths must return Ok) but still returns a Result. There are times in which you must return a Result (eg, because some trait requires it), and if the function never fails, we can use Infallable as the error type. Becausue this function returns a Result, we switch goodbye from using .then to .and_then – also unintuitive, IMO.

Finally, this function uses the reply::with_status function to return two different replies and statuses depending on some condition (here, the value of name). But both branches will return the same concrete type (that is, warp::reply::WithStatus), so we can still use impl Reply.

Logins are more complicated

All three routes are currently infallible – if an HTTP request matches the path for a route, warp will respond to it. Usually it’s with a 200 OK, but sometimes with 418 IM_A_TEAPOT. And the responses all contain a text body. But what if we want to redirect to another page? Or what if we start handling a request, decide that the designated function isn’t equipped to handle it, and want another function to take over? This is where we make use of Rejections. First, let’s setup a login route:

#[tokio::main]
async fn main() {
    // ...

    let login = warp::path("login")
        .and(warp::path::param())
        .and(warp::path::end())
        .and_then(handle_login);

    let routes = index.or(hello).or(goodbye).or(login);

    // ...
}

(For simplicity, we’re still just using a GET request with path parameters, such as /login/adam.)

The handling function is now no longer fallible and can reject the request (which is to say that it will allow another handler to potentially pick it up). Let’s assume there are a few users that we don’t want to login: agent_smith and neo:

async fn handle_login(name: String) -> Result<String, warp::Rejection> {
    if name == "agent_smith" {
        todo!()
    } else if name == "neo" {
        todo!()
    } else {
        Ok(format!("You are now logged in as '{}'", name))
    }
}

(This function’s happy path returns a String instead of impl Reply only to show that it’s possible to declare it that way. String does implement Reply, so this is functionally identical.)

What should we do for these users? warp::reject provides a not_found function that will reject a request. For Agent Smith, let’s use that:

    if name == "agent_smith" {
        Err(warp::reject::not_found())
    } else if name == "neo" {
        todo!()
    }

But ‘not found’ isn’t particularly descriptive, and it doesn’t give us much control over how the rejection is subsequently handled. We can create our own type that implements Debug and Reject, then we can return this as a custom rejection:

#[derive(Debug)]
struct Neo;
impl warp::reject::Reject for Neo {}

async fn handle_login(name: String) -> Result<String, warp::Rejection> {
    if name == "agent_smith" {
        Err(warp::reject::not_found())
    } else if name == "neo" {
        Err(warp::reject::custom(Neo))
    } else {
        Ok(format!("You are now logged in as '{}'", name))
    }
}

Now we have three different types of responses to our three different logins:

http://127.0.0.1:3000/login/adam will return 200 "You are now logged in as 'adam'"
http://127.0.0.1:3000/login/agent_smith will return 404
http://127.0.0.1:3000/login/neo will return 500 "Unhandled rejection: Neo"

But how can we make use of these rejected requests?

Handling rejections

Rejected requests can be recovered with .recover. The Rejection is passed to the recovery function which can then do something with it. (That something might itself be another rejection.)

#[tokio::main]
async fn main() {
    // ...

    let login = warp::path("login")
        .and(warp::path::param())
        .and(warp::path::end())
        .and_then(handle_login)
        .recover(handle_rejection);

    // ...
}

async fn handle_rejection(err: warp::Rejection) -> Result<Box<dyn Reply>, warp::Rejection> {
    if err.is_not_found() {
        Ok(Box::new(warp::redirect(warp::hyper::Uri::from_static("/"))))
    } else if err.find::<Neo>().is_some() {
        Ok(Box::new(reply::with_status(
            "Follow the white rabbit",
            StatusCode::UNAUTHORIZED,
        )))
    } else {
        Ok(Box::new(reply::with_status(
            r#"¯\_(ツ)_/¯"#,
            StatusCode::INTERNAL_SERVER_ERROR,
        )))
    }
}

We are rejecting /login/agent_smith with a not found request, so those can be handled by checking err.is_not_found. In that case, we’ll redirect to the root of our server. And the Neo rejection type is handled using err.find::<>. In that case, we construct a 401 response with the specified message. All other rejections – which shouldn’t be possible right now – are gracefully handled with a 500. Note that this means the request can’t fall through to any other recovery function should we add one later on.

You’ll also note that this function doesn’t return an impl Reply but a Box. This is because we now no longer have one concrete type being returned but two: the WithStatus as before but also whatever redirect returns, which in this case is a warp::reply::WithHeader. Thus we have to box the return type.

Setting a fall-through handler

All of the request handlers we’ve setup, regardless of their fallibility, are setup to handle specific paths, such as /, /hello/adam, and /login/agent_smith. But our server doesn’t know how to handle other requests such as /about/contact.html. This can be handled with a final handler that matches all requests:

let fallthrough = warp::any().map(|| "All other requests here");
let routes = index.or(hello).or(goodbye).or(login).or(fallthrough);

.any matches all requests. Since fallthrough comes after login in our route handling, if a login request is ultimately still rejected, those requests will also be handled with 200 "All other requests here".

The entire example

Here’s the code of this entire sample, combined and with comments:

use std::convert::Infallible;
use warp::{hyper::StatusCode, path, reply, Filter, Reply};

#[tokio::main]
async fn main() {
    // The index (/) of our webserver shows a simple message.
    let index = warp::path::end().map(|| "Welcome!");

    // A simple GET route declaration using the `path!` macro: we can
    // declare the route ("hello") and the expected parameter type (String)
    let hello = path!("hello" / String).then(handle_hello);

    // The same idea as above but with the individual warp components.
    // Additionally, `goodbye` can return error codes for 'bad' input.
    let goodbye = warp::path("goodbye")
        .and(warp::path::param())
        .and(warp::path::end())
        .and_then(handle_goodbye);

    // `handle_login` might reject some requests, but the subsequent 
    // `handle_rejection` will take care of (some of) them.
    // NB, `.recover` could be added to `routes` instead.
    let login = warp::path("login")
        .and(warp::path::param())
        .and(warp::path::end())
        .and_then(handle_login)
        .recover(handle_rejection);

    // This handler just catches everything in a rather uninteresting way.
    let fallthrough = warp::any().map(|| "All other requests here");

    // warp will handle requests in the following order:
    let routes = index
        .or(hello)
        .or(goodbye)
        .or(login)
        .or(fallthrough);

    warp::serve(routes).run(([0, 0, 0, 0], 3000)).await;
}

/// This function is infallible, so we can simply return an impl Reply.
/// To use it, we make use of [warp::Filter::then], which expects a Future.
async fn handle_hello(name: String) -> impl warp::Reply {
    format!("Hello, {}", name)
}

/// This function is also in fallible, but for demonstration purposes, we'll return a `Result<_, Infallible>`.
/// Because of this, we use [warp::Filter::and_then], which is normally for fallible async functions.
///
/// Unrelated to fallibility, this function may return different error codes depending on the input.
/// We rewrite the response with a [warp::reply::StatusCode], so the impl Reply is a [warp::reply::WithStatus].
/// And because [`impl Trait`](https://doc.rust-lang.org/rust-by-example/trait/impl_trait.html) returns a concrete
/// type, the two branches here must be the same type -- that is, we can't return a String on one
/// side and a WithStatus on the other.
async fn handle_goodbye(name: String) -> Result<impl Reply, Infallible> {
    if name == "earl" {
        Ok(reply::with_status(
            "Earl Grey is a tea".to_string(),
            StatusCode::IM_A_TEAPOT,
        ))
    } else {
        Ok(reply::with_status(
            format!("Goodbye, {}", name),
            StatusCode::OK,
        ))
    }
}

#[derive(Debug)]
struct Neo;
impl warp::reject::Reject for Neo {}

/// On login, we might reject certain inputs and allow some other request handler
/// to take over.
/// 
/// There are two types of rejections here: for `"agent_smith"`, we return a 404,
/// while for `"neo"`, we'll return the custom rejection defined above.
async fn handle_login(name: String) -> Result<impl Reply, warp::Rejection> {
    if name == "agent_smith" {
        Err(warp::reject::not_found())
    } else if name == "neo" {
        Err(warp::reject::custom(Neo))
    } else {
        Ok(format!("You are now logged in as '{}'", name))
    }
}

/// When specifynig the routes we want to serve, we can `.recover` them with this function.
/// Any Rejection that comes before the recovery will be send here, and we can handle it
/// or send it back for yet another later recovery.
/// 
/// Additionally, yet not required, this function returns different kinds of replies, so
/// the return type is `Box`, and each concrete type is boxed accordingly.
async fn handle_rejection(err: warp::Rejection) -> Result<Box<dyn Reply>, warp::Rejection> {
    if err.is_not_found() {
        Ok(Box::new(warp::redirect(warp::hyper::Uri::from_static("/"))))
    } else if err.find::<Neo>().is_some() {
        Ok(Box::new(reply::with_status(
            "Follow the white rabbit",
            StatusCode::UNAUTHORIZED,
        )))
    } else {
        Ok(Box::new(reply::with_status(
            r#"¯\_(ツ)_/¯"#,
            StatusCode::INTERNAL_SERVER_ERROR,
        )))
    }
}

Statefulness in Rust, part 1

2021-12-10T13:15:57+00:00

I’ve been reading Jon Gjengset’s Rust for Rustaceans and recently hit the section on marker types. It helped me understand a little bit better something that I’m working on in another project. This post is to help me organize my thoughts on state transitions with and without marker types and to go over nuances before I go into deeper detail on marker types.

A Simple State Transition

Let’s start with a very simple state transition: stoplights. A stoplight has exactly three states and a simple, circular state diagram: green -> yellow -> red -> green -> ..., which we can represent as:

#[derive(Debug, PartialEq)]
enum Stoplight {
    Red,
    Yellow,
    Green,
}

impl Stoplight {
    pub fn next(&self) -> Stoplight {
        match *self {
            Stoplight::Green => Stoplight::Yellow,
            Stoplight::Yellow => Stoplight::Red,
            Stoplight::Red => Stoplight::Green,
        }
    }
}

fn test_stoplight() {
    let mut stoplight = Stoplight::Green;
    stoplight = stoplight.next(); // yellow
    stoplight = stoplight.next(); // red
    stoplight = stoplight.next(); // green
    assert_eq!(stoplight, Stoplight::Green);
}

Easy peasy. But we’re not storing any data or doing anything particularly interesting yet.

Shopping Cart

A better example might be a shopping cart. We might also have three states to a cart: Empty, InProgress, and Completed. (We could have more, such as Shipped and Delivered, but I’m keeping three for simplicity.) Again, we could represent this as an enum but this time with data as appropriate:

use std::time::Instant;

#[derive(Debug)]
pub enum ShoppingCart {
    Empty,
    InProgress {
        started: Instant,
        products: Vec<String>,
    },
    Completed {
        started: Instant,
        completed: Instant,
        products: Vec<String>,
        total: f32,
    },
}

impl ShoppingCart {
    // TODO
}

This gives us three possible states. An empty cart has no data. It doesn’t “start” until a product is added, at which point we’ll know the start time and have a non-empty list of products. (For simplicity, I’m using Instant instead of something like chrono::DateTime.) A completd cart has the start time and list of products but also a completed time and the calculated total cost.

What functionality does this need in the impl? Well, we need to create a new cart:

pub fn new() -> ShoppingCart {
    ShoppingCart::Empty
}

And we need to be able to add a product. An Empty cart needs to change into an InProgress with the one item; an InProgress needs to just add the item:

pub fn add(&mut self, product: String) {
    match self {
        ShoppingCart::Empty => {
            *self = ShoppingCart::InProgress {
                started: Instant::now(),
                products: vec![product],
            };
        }
        ShoppingCart::InProgress { products, .. } => {
            products.push(product);
        }
        ShoppingCart::Completed { .. } => panic!("Cannot add to a completed cart"),
    }
}

Here’s where we see problems arise: if a cart has been completed, what should happen when .add() is called? In this code, we panic. Alternately, we could return a Result to propagate errors. But let’s continue the checkout function, which applies to non-empty carts:

pub fn checkout(self) -> Self {
    match self {
        ShoppingCart::Empty => panic!("Can't checkout an empty cart"),
        ShoppingCart::InProgress { started, products } => {
            let total = products
                .iter()
                .map(|p| match &p[..] {
                    "Apple" => 1.10,
                    "Orange" => 0.75,
                    _ => todo!("Handle other products here"),
                })
                .sum();

            ShoppingCart::Completed {
                started,
                completed: Instant::now(),
                products,
                total,
            }
        }
        ShoppingCart::Completed { .. } => panic!("Can't checkout a completed cart"),
    }
}

Again, more errors. But we can now test this and everything works:

pub fn test_cart_enum() {
    let mut cart = ShoppingCart::new();
    cart.add("Apple".to_string());
    cart.add("Orange".to_string());
    let cart = cart.checkout();
    println!("{:?}", cart);
}

An aside: .add() takes &mut self and will simply modify the existing value (or, in the case of an empty cart, will replace self with an updated discriminant) while .checkout() consumes self and returns a new value. The reason for this difference is that .checkout() consuming and returning can use move semantics to maintain the existing list of values. In order to remain &mut self, it would have to either clone the product list or do some wonky mem::replace to guarantee that self is always valid. Probably not a great API design.

The above code isn’t great from a safety/correctness perspective. We can gracefully handle runtime errors with Result, but wouldn’t it be better to make it impossible to hit them?

Marker Types

Ideally, an empty cart shouldn’t ever be able to call .checkout(). That is, we’d like to do something like this:

let cart = ShoppingCart::new();
let done = cart.checkout(); // this *should* fail to compile

Let’s start with the states we need but instead of specifying them as an enum, they are unit structs. We’ll also define the shopping cart that can be of some type T and will contain all the data we need:

struct Empty;
struct InProgress;
struct Complete;

struct ShoppingCart<T> {
    started: Instant,
    completed: Instant,
    products: Vec<String>,
    total: f32,
    phantom: PhantomData<T>,
}

hic svnt dracones: this may not be best practice, but it works.

What does this mean? We will use the three unit structs to represent the type of ShoppingCart; that is, a ShoppingCart is a different type than ShoppingCart, and they can have independent implementations.

But what’s this PhantomData? It appeases the compiler because we’re not actually storing a T; PhantomData makes our ShoppingCart act like it contains a T. We could have instead used, say, _state: T if we wanted to.

The Implementations

Interestingly, we can have different impl blocks for the different states:

impl ShoppingCart<Empty> {
    fn new() -> Self {
        ShoppingCart {
            started: Instant::now(),
            completed: Instant::now(),
            products: Vec::new(),
            total: 0.0,
            phantom: PhantomData,
        }
    }
}

fn test_cart_marker() {
    let cart = ShoppingCart::new(); // of type ShoppingCart
}

Currently, there exists exactly one implementation of ShoppingCart for which a new function exists, so Rust knows that it must be a ShoppingCart.

Even though the cart has neither been started (per our definition above) nor completed, these fields shouldn’t exist. We could make the fields optional and set them to None. But we also know the current state is Empty and we can thus set but ignore the value. Again, this may not be best practice.

To add a product to a cart, we provide an .add() function to both the empty cart:

impl ShoppingCart<Empty> {
    fn add(self, product: String) -> ShoppingCart<InProgress> {
        ShoppingCart {
            // The only fields we really care about:
            started: Instant::now(),
            products: vec![product],

            // Everything else has a valid value that we'll just ignore
            completed: self.completed,
            total: self.total,
            phantom: PhantomData,
        }
    }
}

And the separate implementation of ShoppingCart:

impl ShoppingCart<InProgress> {
    fn add(mut self, product: String) -> ShoppingCart<InProgress> {
        self.products.push(product);
        self
    }
}

The distinction here is that the empty cart needs to create an in-progress cart while an in-progress cart can keep its state and simply add a product to its already existing list. Now, our test code can be:

fn test_cart_marker() {
    let cart = ShoppingCart::new(); // of type ShoppingCart
    let cart = cart.add("Apple".to_string()); // of type ShoppingCart
    let cart = cart.add("Orange".to_string()); // of type ShoppingCart
}

By virtue of calling cart.add() on an empty cart, we are returned a differently typed value. That is, just as a Vec is an altogether different animal than a Vec, so too is ShoppingCart different than ShoppingCart. So it is with an in-progress cart being completed:

impl ShoppingCart<InProgress> {
    fn complete(self) -> ShoppingCart<Complete> {
        let total = self
            .products
            .iter()
            .map(|p| match &p[..] {
                "Apple" => 1.10,
                "Orange" => 0.75,
                _ => panic!("Unknown product"),
            })
            .sum();

        ShoppingCart {
            started: self.started,
            completed: Instant::now(),
            total,
            products: self.products,
            phantom: PhantomData,
        }
    }
}

For now, there’s no implementation of the completed cart. Specifically, we don’t create an .add() function on it, which means:

fn test_cart_marker() {
    let cart = ShoppingCart::new(); // of type ShoppingCart
    let cart = cart.add("Apple".to_string()); // of type ShoppingCart
    let cart = cart.add("Orange".to_string()); // of type ShoppingCart
    let cart = cart.complete();
    assert_eq!(cart.total, 1.85);
    //cart.add("Banana".to_string()); // no method named `add` found for struct `ShoppingCart` in the current scope
}

Misc

Because the type is changing on the first call to .add() and on the call to .complete(), we have to shadow the variables – we can’t make cart mut and overwrite it:

fn test_cart_marker() {
    let mut cart = ShoppingCart::new(); // of type ShoppingCart
    //cart = cart.add("Apple".to_string()); // mismatched types
                                            // expected struct `ShoppingCart`
                                            //    found struct `ShoppingCart
}

My next post will include more on converting between the generic types.