Lets use minideb, a small Debian-based linux:
$ docker pull bitnami/minideb
Using default tag: latest
latest: Pulling from bitnami/minideb
ba49d470d895: Pull complete
Digest: sha256:cbbc1db2617a7e5224f8dc692c990b723e4fe3ef69864544e7c14aa613c0ccb7
Status: Downloaded newer image for bitnami/minideb:latest
docker.io/bitnami/minideb:latest
We can see this new image is available locally with docker images
:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
bitnami/minideb latest c5eecd6244a8 3 days ago 120MB
And we can remove it with docker image rm <id>
:
$ docker image rm c5eecd6244a8
Untagged: bitnami/minideb:latest
Untagged: bitnami/minideb@sha256:cbbc1db2617a7e5224f8dc692c990b723e4fe3ef69864544e7c14aa613c0ccb7
Deleted: sha256:c5eecd6244a829084e2f788e3f877a5ab8ac63f9c8dc55c3cfff4f1d172fc23c
Deleted: sha256:44b47439f86a658d61565e3a9e86c1c9608b2ee8adb4f6e85005634e6f537f43
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
We could run this image in a new container with docker run c5eecd6244a8
, but it would almost immediately return to our console. With docker container ls -a
, we’d see that this container ran and terminated:
$ docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff11c7f3afb8 c5eecd6244a8 "/bin/bash" 29 seconds ago Exited (0) 28 seconds ago clever_franklin
# Delete this terminated container
$ docker container rm ff11c7f3afb8
What we want is to run interactively, so we’ll use docker run -it <id>
:
# In the host:
$ docker run -it c5eecd6244a8
# In the container!
root@25bca2749327:/# uname -a
Linux 25bca2749327 5.15.0-1042-azure #49~20.04.1-Ubuntu SMP Wed Jul 12 12:44:56 UTC 2023 x86_64 GNU/Linux
root@25bca2749327:/# cat /etc/os-release | grep NAME
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_CODENAME=bookworm
root@25bca2749327:/# exit
At this point, we’re back in our host. There’s still a terminated container:
$ docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
25bca2749327 c5eecd6244a8 "/bin/bash" 2 minutes ago Exited (0) 49 seconds ago elegant_saha
$ docker container rm 25bca2749327
To avoid this, use docker run --rm
(NB, it has to be before the container name!):
$ docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$ docker run -it --rm c5eecd6244a8
root@21735047c8bb:/# hostname
21735047c8bb
root@21735047c8bb:/# exit
$ docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Let’s make use of the minideb image as the basis for a derived image. We use a Dockerfile to describe the image we’ll create:
# The base image for our new image
FROM bitnami/minideb
# For a simple Rust service, see:
# https://aeshirey.github.io/code/2023/02/25/simple-rust-service-in-docker.html
# COPY <host-filename> <docker-filanem>
COPY rust-server my-rust-server
CMD ["./my-rust-server"]
To build this, we can use docker build <path>
, where <path>
is the directory in which the Dockerfile lives (eg, .
). Additionally, we’ll use the -t <name>:<tag>
to give our image a name and tag. If the tag is omitted, latest is used.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
bitnami/minideb latest c5eecd6244a8 3 days ago 120MB
$ docker build . -t my-simple-container
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 141B 0.0s
=> [internal] load metadata for docker.io/bitnami/minideb:latest 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 84B 0.0s
=> [1/2] FROM docker.io/bitnami/minideb:latest 0.0s
=> CACHED [2/2] COPY simple-server/simple-server my-simple-server 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:61a24712801a996b6ceefb378cd9ebccdb9caae8c58ea7acf17eaff0285666bb 0.0s
=> => naming to docker.io/library/my-simple-container 0.0s
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
my-simple-container latest 61a24712801a About a minut
bitnami/minideb latest c5eecd6244a8 3 days ago 120MB
Because our server exposes port 8080, we want our container to also expose it. Maybe we want to use the same port or maybe we want to remap it. Either way, we’ll use -p <host-port>:<container-port>
:
$ docker run --rm --init -p 8123:8080 fd83da080eab
Then we can connect in another shell on our host to communicate with this container:
$ curl 127.0.0.1:8123 -l -w "\n"
home
# Specify 'bash' as the process to run
$ docker run -p 8123:8080 --rm -it 61a24712801a bash
# image id ------^ ^-- command to run
docker run
?Oops, can’t exit this container:
$ docker run --rm fd83da080eab
^C
From another shell:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
260c882a217e fd83da080eab "/my-simple-server" About a minute ago Up About a minute gracious_hawking
# ^------ this is the container we'll want to kill because oopsie
$ docker kill 260c882a217e
260c882a217e
Avoid this by including the --init
flag next time you docker run
:
$ docker run --rm --init fd83da080eab
^C$
If you use docker run --network=host
, then the container will be able to access the host network. For example:
# In the host OS:
$ ./rust-server &
$ curl 127.0.0.1:8080 -w "\n"
home
$ docker run -it --rm --network=host c5eecd6244a8
# Now in the container
root@hostname:/# curl 127.0.0.1:8080 -w "\n"
home
#
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 61a24712801a 30 minutes ago 131MB
my-simple-container latest fd83da080eab 30 minutes ago 131MB
bitnami/minideb latest c5eecd6244a8 3 days ago 120MB
$ docker save fd83da080eab | gzip > my-simple-container.tar.gz
$ file my-simple-container.tar.gz
my-simple-container.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 135666688 gzip compressed data, reserved method, ASCII, extra field, encrypted, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 135666688
$ ls -lh my-simple-container.tar.gz
-rw-r--r-- 1 root root 41M Feb 17 04:48 my-simple-container.tar.gz
# Later/elsewhere, this can be loaded:
$ docker load < my-simple-container.tar.gz
Loaded image: my-simple-container:latest
async
code that I really want to call from another project, but I don’t want async
/await
to infect my entire codebase. At least using tokio
, there’s an easy way to do this. Given some async project:
cargo new --lib my-async-crate
cargo add tokio
Which exposes an async
function:
pub async fn sleep_a_bit(num_seconds: u64) {
println!("Hold please...");
tokio::time::sleep(std::time::Duration::from_secs(num_seconds)).await;
println!("Thanks for waiting!");
}
We then have a project which wants to use our cool sleep_a_bit
function:
cargo new my-project
fn main() {
my_async_crate::sleep_a_bit(5);
}
This compiles, but it doesn’t do what we want:
warning: unused implementer of `Future` that must be used
--> src/main.rs:2:5
|
2 | my_async_crate::do_async_await(5);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: futures do nothing unless you `.await` or poll them
= note: `#[warn(unused_must_use)]` on by default
If we run this project, it will immediately exit. Thus, we need to include the .await
call:
fn main() {
// ---- this is not `async`
my_async_crate::sleep_a_bit(5).await;
// ^^^^^^ only allowed inside `async` functions and blocks
}
What we need is to create a tokio Runtime
that can synchronously block until the inner asynchronous operations complete. To do this, we add tokio with the rt
feature:
cargo add tokio --features rt
The main function then creates the runtime and creates a Future
. For this example, we’ll just block_on
:
fn main() {
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async {
my_async_crate::sleep_a_bit(5).await;
println!("And we're back!");
});
}
{
"documents": [
{ "foo": 1 },
{ "baz": true },
{ "bar": null }
],
"journal": { "timestamp": "2023-04-04T08:28:00" }
}
If we assume that each inner ‘document’ should be simply treated as an arbitrary JSON Value
, we can model and read our input as:
#[derive(Deserialize, Debug)]
struct MyData {
documents: Vec<Value>,
journal: Value,
}
fn main() {
let json = std::fs::read_to_string("input.json").unwrap();
let mydata: MyData = serde_json::from_str(&json).unwrap();
println!("{mydata:?}");
}
But what if we only need a subset of ‘documents’ and/or need to process each into something else, and they are exceedingly large? This would cause significant memory overhead that we want to avoid. One possibility is to roll your own string reading mechanism, trying to figure out when one document starts and ends, then parsing only that string. This becomes a bit cumbersome, but worse still is that it may be error prone when trying to deal with the arbitrary journal
value: how do we know if we’ve finished reading the last document and have arrived at the journal? What if a document legitimately contains a "journal"
key?
Fortunately, serde
contains the capability to do custom serialization and deserialization and to use a visitor pattern. We can use this approach to handle each document in succession. To do so, we’ll create a new type that represents our documents:
#[derive(Debug)]
struct Documents(Vec<Value>);
#[derive(Deserialize, Debug)]
struct MyData {
documents: Documents,
journal: Value,
}
Structurally, this is the same as before, but it allows us to insert our own, manual deserialization step – note that MyData
implements Deserialize
but Documents
doesn’t. The deserialization implementation stub looks like this:
impl<'de> Deserialize<'de> for Documents {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
todo!()
}
}
Before implementing this part, we’ll create the visitor. First, the type that will know how to deserialize our documents:
struct DocumentVisitor;
Note that DocumentVisitor
itself doesn’t collect Value
s – it just knows how to deserialize them. Serde’s visitor pattern has an associated type that will be the (collected) result of deserialization. This output is what we will have filtered and/or processed from each raw JSON value from input. Here’s the stub for the visitor:
impl<'de> serde::de::Visitor<'de> for DocumentVisitor {
type Value;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
todo!()
}
}
expecting
is a required method that:
[Formats] a message stating what data this Visitor expects to receive. … The message should complete the sentence “This Visitor expects to receive …”,
Because our visitor expects a list of documents, we’ll say that:
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(formatter, "a list of JSON values")
}
Also because we’re expecting a list (or sequence) of items, we’ll override the visit_seq
method. We also set the required associated type, Value
, indicating what kind of value this visitor will be returning. (Note that here, the associated type Value
is not the same as serde_json::Value
. The former is what we’ll be telling serde that we’ll return, which is a Vec<Value>
. The latter is specific to JSON data.) In visit_seq
, we’ll repeatedly call seq.next_element()
, propagating up any errors that serde gives us. For now, we’ll just push each item onto a vector that we’ll return:
type Value = Vec<Value>;
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: serde::de::SeqAccess<'de>,
{
let mut values = Vec::new();
while let Some(item) = seq.next_element()? {
println!("Read item='{item}'");
values.push(item)
}
Ok(values)
}
That completes the visitor, and we can now implement Deserialize for Documents
. We’ll instantiate a visitor, which is passed to the deserializer’s deserialize_seq
method:
impl<'de> Deserialize<'de> for Documents {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let visitor = DocumentVisitor;
let docs = deserializer.deserialize_seq(visitor)?;
Ok(Documents(docs))
}
}
Note that by passing a DocumentVisitor
to the deserializer, serde knows that it will be returning a Vec<Value>
(by virtue of the associated type). Thus, that is the type of docs
. Our Deserialize
implementation returns a Documents
object, so we wrap docs
in that.
#[derive(Debug)]
struct Documents(Vec<Value>);
#[derive(Deserialize, Debug)]
struct MyData {
documents: Documents,
journal: Value,
}
impl<'de> Deserialize<'de> for Documents {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let visitor = DocumentVisitor;
let docs = deserializer.deserialize_seq(visitor)?;
Ok(Documents(docs))
}
}
struct DocumentVisitor;
impl<'de> serde::de::Visitor<'de> for DocumentVisitor {
type Value = Vec<Value>;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(formatter, "a list")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: serde::de::SeqAccess<'de>,
{
let mut values = Vec::new();
while let Some(item) = seq.next_element()? {
println!("Read item='{item}'");
values.push(item)
}
Ok(values)
}
}
The above implementation reads and keeps every value of input. The whole idea here, though, was that we could filter/process our values, so let’s now update our code to do that. We’ll only keep documents that are themselves objects, then we’ll take the first key-value pair (ignoring others), skipping those with null values (eg, { "bar": null }"
). These first key-value pairs will be aggregated into a single object returned as a vector of one object:
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: serde::de::SeqAccess<'de>,
{
let mut agg_map = serde_json::Map::new();
while let Some(item) = seq.next_element()? {
// If `item` isn't a JSON object, we'll skip it:
let Value::Object(map) = item else { continue };
// Get the first element, assuming we have some
let (k, v) = match map.into_iter().next() {
Some(kv) => kv,
None => continue,
};
// Ignore any null values; aggregate everything into a single map
if v == Value::Null {
continue;
} else {
println!("Keeping {k}={v}");
agg_map.insert(k,v);
}
}
let values = Value::Object(agg_map);
println!("Final value is {values}");
Ok(vec![values])
}
When running this code, the following output is printed to the console:
Keeping foo=1
Keeping baz=true
Final value is {"baz":true,"foo":1}
I started looking at some simple Docker examples, but they all seem to use Node as a starting point. I don’t want to start there and try to work my way back, so instead, I figured I’d start with a simple Rust service and see if I can start from scratch.
As a total Docker newbie, here’s a fairly brief summary of my misadventures.
Let’s start with the service itself. Wanting to keep this incredibly simple (in this case, avoiding Rust async), I found OxHTTP, a very simple synchronous HTTP server. We’ll start with a new project that uses it:
$ cargo new rust-server
$ cd rust-server
$ cargo add oxhttp
The provided example is just about perfect for what we want; I’ll just slightly tweak it by wrapping it in a main
function:
fn main() {
use oxhttp::Server;
use oxhttp::model::{Response, Status};
use std::time::Duration;
// Builds a new server that returns a 404 everywhere except for "/" where it returns the body 'home'
let mut server = Server::new(|request| {
if request.url().path() == "/" {
Response::builder(Status::OK).with_body("home")
} else {
Response::builder(Status::NOT_FOUND).build()
}
});
// Raise a timeout error if the client does not respond after 10s.
server.set_global_timeout(Duration::from_secs(10));
// Listen to localhost:8080
server.listen(("localhost", 8080)).unwrap();
}
Build this with cargo build --release
and test it out. I’ve configured my ~/.cargo/config
to specify a common build directory:
[build]
target-dir = "/home/adam/cargo-target"
This means that I can run my built server with ~/cargo-target/release/rust-server
, and when I visit http://localhost:8080
in my browser, I see the HTTP response “home”. The server now works, so I copied the binary into the current working directory.
Next, we’ll need to build a Dockerfile. As I said, I know just about nothing about Docker, but I want to avoid the Node route. It seems pretty much everything is built off of Alpine, so I’ll start there:
FROM alpine:latest
COPY rust-server rust-server
CMD ["rust-server"]
Building this is quick and completes without issue:
$ docker build -t my-rust-server:latest .
[+] Building 0.6s (7/7) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 113B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 4.70MB 0.1s
=> CACHED [1/2] FROM docker.io/library/alpine:latest 0.0s
=> [2/2] COPY rust-server rust-server 0.2s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:108dbb6764b4e6c94cc3bde571eb2157dceb57aab1ac3f393577174c1175a282 0.0s
=> => naming to docker.io/library/my-rust-server:latest 0.0s
Then I run it:
$ docker run -t my-rust-server:latest
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "rust-server": executable file not found in $PATH: unknown.
ERRO[0001] error waiting for container: context canceled
It seems that COPY foo foo
places foo
into the root directory (ie, /
), which isn’t in $PATH
, I guess? So let’s try putting it into /bin/
:
FROM alpine:latest
COPY rust-server /bin/rust-server
CMD ["/bin/rust-server"]
$ docker run -t my-rust-server:latest
exec /bin/rust-server: no such file or directory
This is a different error, so something changed. But it’s still not finding it? Let’s inspect the container:
$ docker run -it my-rust-server:latest /bin/sh
/ # ls /bin/rust-server
/bin/rust-server
/ # file /bin/rust-server
/bin/sh: file: not found
The binary is definitely there. I tried file
to see what the system thinks the binary is, but Alpine doesn’t have it. Instead, we can try ldd
to get details:
/ # ldd /bin/rust-server
/lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
Error loading shared library libgcc_s.so.1: No such file or directory (needed by /bin/rust-server)
librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fa7b4984000)
Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /bin/rust-server)
Error relocating /bin/rust-server: _Unwind_Resume: symbol not found
Error relocating /bin/rust-server: _Unwind_Backtrace: symbol not found
Ohh, so it’s not that my Docker image can’t find my binary but that when trying to run my binary, it can’t find the dynamically-linked libgcc. A quick search on how to install packages in Alpine (since it’s not Debian-based, I can’t use apt
) shows that it uses apk
, and libgcc
exists in Alpine’s package repository. Adding this to the Dockerfile:
FROM alpine:latest
COPY rust-server /bin/rust-server
# These are new:
RUN apk update
RUN apk add libgcc
CMD ["/bin/rust-server"]
Running this still gives the no such file or directory
error. So let’s inspect with ldd
again:
/ # ldd /bin/rust-server
/lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7f1e6d2bc000)
librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f1e6d596000)
Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /bin/rust-server)
Error relocating /bin/rust-server: __res_init: symbol not found
Error relocating /bin/rust-server: gnu_get_libc_version: symbol not found
libgcc
is no longer a problem, but ld-linux
still is. And it appears that ld-linux
is part of gcompat
. After adding RUN apk add gcompat
, rebuilding, and rerunning, the message “Error loading shared library ld-linux-x86-64.so.2” goes away, but the “__res_init” and “gnu_get_libc_version” errors remain.
I did some further sleuthing and found a suggestion on Reddit to use this hack to make it work, but instead of continuing down this rabbit hole, I decided to try another approach I saw: musl
.
Rust can compile to a number of build targets; in my dev environment (Ubuntu in WSL2), the default is:
$ rustc -vV | grep host
host: x86_64-unknown-linux-gnu
We can find supported targets with rustup target list
. Doing this shows that there’s x86_64-unknown-linux-musl. Let’s install this toolchain and compile the server:
$ rustup target add x86_64-unknown-linux-musl
(...)
$ cargo build --target=x86_64-unknown-linux-musl --release
$ mv rust-server rust-server-old
$ cp ~/cargo-target/x86_64-unknown-linux-musl/release/rust-server .
We can compare the old binary to the new one:
$ file rust-server-old rust-server
rust-server-old: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=523b84a693e7b90bcf8332d2eecd51cc9bfbe45a, with debug_info, not stripped
rust-server: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped
$ ldd rust-server-old
linux-vdso.so.1 (0x00007ffea2be5000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd1fa870000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd1fa668000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd1fa449000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd1fa245000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1f9e54000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd1fad44000)
$ ldd rust-server
statically linked
(Side note: the old and new binaries are 4.5Mb and 4.9Mb, respectively, showing the cost of statically linking. However, if we strip
both binaries, their sizes – and the marginal difference – drop: 751Kb and 865Kb, respectively.)
musl
binarySince the new binary is statically linked, we don’t need to install extra apk packages, so the Dockerfile is now back to:
FROM alpine:latest
COPY rust-server /bin/rust-server
CMD ["/bin/rust-server"]
This builds very quickly, and calling docker run
now has a service running. Going to http://localhost:8080 should work, right?
$ curl localhost:8080
curl: (7) Failed to connect to localhost port 8080: Connection refused
Ah, but we need to publish the container’s port to the host:
$ docker run -p 8080:8080 -t my-rust-server:latest
# In another terminal (because `docker run` is blocking):
$ curl localhost:8080
curl: (52) Empty reply from server
So it’s connecting but not getting any data?
$ telnet localhost 8080
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.
$ telnet localhost 8081
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
The connection on 8080 is opened but immediately closed; the attempt on 8081 fails, as expected, because there’s nothing on that port – it’s showing that there’s something different about 8080. So Docker is forwarding the port, and something is listening. Surely that’s our server.
Looking at the Rust code, we notice that we’re listening on localhost, port8080:
server.listen(("localhost", 8080)).unwrap();
But wait: it turns out that there’s a difference:
127.0.0.1:xxxx is the normal loopback address, and localhost:xxxx is the hostname for 127.0.0.1:xxxx.
0.0.0.0 is slightly different, it’s an address used to refer to all IP addresses on the same machine. Or no specific IP address.
Simply changing from “localhost” to “0.0.0.0”, recompiling, rebuilding the image, and rerunning does the trick
$ curl localhost:8080
home
I am familiar (but have no experience) with Docker Hub, and I have only briefly played with Azure Container Registry, I thought I’d first start with the simplest option: saving the image to a file:
$ docker save my-rust-server:latest | gzip > my-rust-server.tar.gz
$ ls -lh my-rust-server.tar.gz
-rw-r--r-- 1 adam adam 4.5M Feb 24 16:30 my-rust-server.tar.gz
Now let’s remove the image from Docker, make sure we can re-load it, and run it again:
$ docker image rm my-rust-server:latest
Untagged: my-rust-server:latest
Deleted: sha256:0ed0fda582a3c568fdb8f4a313a464ce3244442d1f1d36934be9bb29e8b9e4fd
$ docker images | grep my-rust
$ docker load < my-rust-server.tar.gz
Loaded image: my-rust-server:latest
$ docker images | grep my-rust
my-rust-server latest 732da9278f98 18 minutes ago 12.1MB
$ docker run -it my-rust-server:latest /bin/sh
/ # ls /bin/rust-server
/bin/rust-server
rayon
provides an incredibly simple work stealing framework that, in my experience, requires only two lines of code that can dramatically improve processing throughput. To use, you’ll need to add it to your Cargo.toml with cargo add rayon
.
Consider some function that does some intensive work:
/// Do some number of iterations of work
fn do_work(worker: usize, iterations: usize) {
println!("Worker {worker} doing work");
if iterations > 0 {
// simulate long-running work with 'sleep'
// we might do different kinds of work depending on the worker,
// eg, open a different file of input.
std::thread::sleep(std::time::Duration::from_secs(1));
do_work(worker, iterations - 1)
}
}
Doing this serially might look like this:
const NUM_WORKERS: usize = 5;
const NUM_ITERATIONS: usize = 4;
fn main() {
let s = std::time::Instant::now();
(1..=NUM_WORKERS).for_each(|worker| do_work(worker, NUM_ITERATIONS));
println!("Work took {:?}", s.elapsed());
}
This produces the very boring output:
Worker 1 doing work
Worker 1 doing work
Worker 1 doing work
Worker 1 doing work
Worker 1 doing work
Worker 2 doing work
Worker 2 doing work
Worker 2 doing work
Worker 2 doing work
Worker 2 doing work
Worker 3 doing work
Worker 3 doing work
Worker 3 doing work
Worker 3 doing work
Worker 3 doing work
Worker 4 doing work
Worker 4 doing work
Worker 4 doing work
Worker 4 doing work
Worker 4 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Work took 20.007646351s
This might be rather inefficient, especially if we have many CPU cores sitting idle. Instead, we can use rayon and use one of the *par_iter
variations:
use rayon::prelude::*; // this is new
fn main() {
let s = std::time::Instant::now();
(1..=NUM_WORKERS)
.into_par_iter() // and this is new
.for_each(|worker| do_work(worker, NUM_ITERATIONS));
println!("Work took {:?}", s.elapsed());
}
This is much faster, as it will parallelize the work according to the number of CPUs available:
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 1 doing work
Worker 5 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Work took 8.011677642s
Two things to note here:
ThreadPool
to configure the number of threads:fn main() {
let s = std::time::Instant::now();
let pool = rayon::ThreadPoolBuilder::new()
.num_threads(NUM_WORKERS) // use one thread per work slice
.build()
.unwrap();
pool.install(|| {
(1..=NUM_WORKERS)
.into_par_iter()
.for_each(|worker| do_work(worker, NUM_ITERATIONS));
});
println!("Work took {:?}", s.elapsed());
}
Worker 1 doing work
Worker 3 doing work
Worker 2 doing work
Worker 4 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 2 doing work
Worker 5 doing work
Work took 4.002877261s
Alternately, you may want to limit your parallelism to leave compute available to other tasks:
fn main() {
let s = std::time::Instant::now();
let pool = rayon::ThreadPoolBuilder::new()
.num_threads(2) // use only two threads
.build()
.unwrap();
pool.install(|| {
(1..=NUM_WORKERS)
.into_par_iter()
.for_each(|worker| do_work(worker, NUM_ITERATIONS));
});
println!("Work took {:?}", s.elapsed());
}
Worker 1 doing work
Worker 3 doing work
Worker 3 doing work
Worker 1 doing work
Worker 1 doing work
Worker 3 doing work
Worker 3 doing work
Worker 1 doing work
Worker 3 doing work
Worker 4 doing work
Worker 1 doing work
Worker 2 doing work
Worker 4 doing work
Worker 2 doing work
Worker 4 doing work
Worker 2 doing work
Worker 4 doing work
Worker 2 doing work
Worker 4 doing work
Worker 5 doing work
Worker 2 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Worker 5 doing work
Work took 12.004555134s
joinable
crate (source code here) as well as a new irisdata
crate (source code) well-known in the data science field.
This update to joinable
renames the Joinable
trait to JoinableGrouped
to reflect that the results (at least of inner- and outer-joins) group the right-hand side. It also adds a new trait with the Joinable
name that behaves perhaps more intuitively – each left-hand record can be yielded multiple times (as matches are found).
Joinable
only defines inner_join
and outer_join
methods. JoinableGrouped
defines inner_join_grouped
, outer_join_grouped
, semi_join
, and anti_join
.
use std::cmp::Ordering;
use irisdata::{Species, IRIS_DATA};
use joinable::{JoinableGrouped, RHS};
#[derive(Debug)]
struct IrisData {
species: Species,
common_name: &'static str,
average_sepal_length: f32,
average_sepal_width: f32,
average_petal_length: f32,
average_petal_width: f32,
}
fn main() {
let common_names = [
(Species::IrisVersicolor, "blue flag"),
(Species::IrisVersicolor, "harlequin blueflag"),
(Species::IrisVersicolor, "larger blue flag"),
(Species::IrisVersicolor, "northern blue flag"),
(Species::IrisVersicolor, "poison flag"),
(Species::IrisVirginica, "Virginia blueflag"),
(Species::IrisVirginica, "Virginia iris"),
(Species::IrisVirginica, "great blue flag"),
(Species::IrisVirginica, "southern blue flag"),
];
let joined = common_names
.iter()
.inner_join_grouped(RHS::new_unsorted(&IRIS_DATA[..]), |(lhs_species, _), r| {
if *lhs_species == r.species {
Ordering::Equal
} else {
Ordering::Less
}
})
.map(|(lhs, grp)| IrisData {
species: lhs.0,
common_name: lhs.1,
average_sepal_length: grp.iter().map(|i| i.sepal_length).sum::<f32>() / grp.len() as f32,
average_sepal_width: grp.iter().map(|i| i.sepal_width).sum::<f32>() / grp.len() as f32,
average_petal_length: grp.iter().map(|i| i.petal_length).sum::<f32>() / grp.len() as f32,
average_petal_width: grp.iter().map(|i| i.petal_width).sum::<f32>() / grp.len() as f32,
})
.collect::<Vec<_>>();
println!("{joined:#?}");
}
I wasn’t personally involved in the phone call, but I was call-adjacent and aghast at the pen-and-paper approach to figuring out who should be where. Parents expressed their interest in having their kids with this scout but not with that scout. Boys and girls can’t share a tent. Scouts may only share a tent with other scouts within three years of age (ie, no 17 year-old scouts bunking with 12 year-olds). The mental effort and time that went into that work annoyed my inner geek, so I proceeded to spend the next several hours solving the general case. It was a good excuse to play around with SMT again.
Rather than dive into all the details, I’ll simply share the public Gist with my v1 implementation.
The solution is a Python script that generates SMT-LIB code that is evaluated by Z3. After a few configurations (such as NUM_TENTS
to identify how many tents are available), you specify the set of scouts with their age and gender:
scouts = [
('Abe', 14, 'm'), # 0
('Brian', 13, 'm'), # 1
('Charlie', 14, 'm'), # 2
('Dave', 13, 'm'), # 3
('Eddie', 14, 'm'), # 4
('Lily', 15, 'f'), # 5
('Megan', 14, 'f'), # 6
]
The output is a model that tells you who is in which tent. In this example, tent0 contains scouts 5 (Lily) and 6 (Megan):
(define-fun tent0 ((x!0 Int)) Int
(ite (= x!0 2) 5
(ite (= x!0 3) 6
(- 1))))
As usual, let’s start with the dependencies. Our webserver should run asynchronously for maximum throughput, so we’ll use tokio
and futures
, and of course the warp
crate:
[dependencies]
tokio = { version = "1", features = ["full"] }
futures = "0.3"
warp = "0.3"
warp uses the concept of composable request Filter
s: components that match requests, extract data from them (URI components, query parameters, request bodes, etc.), and chain together (and/or).
To start, we’ll make our main
function asynchronous and add a route that matches the root of the server and prints a welcome message:
use warp::Filter;
#[tokio::main]
async fn main() {
let index = warp::path::end().map(|| "Welcome!");
let routes = index;
warp::serve(routes).run(([0, 0, 0, 0], 3000)).await;
}
warp::path::end
is used to identify that the path handling is complete, and since it’s not chained with any previous components, it effectively matches “/”. (Think of it like the regular expression "/$"
.) We then .map
the input request to the &str
output.
In this post, I’m using the convention of serving up a routes
value; in this first example, we’re only serving a single route.
Very unsurprisingly, if you run this project and go to http://127.0.0.1:3000/, you will see the welcome text.
This first request isn’t particularly interesting or transparent, so let’s handle an input path such as /hello/adam
as a way to say hello to the user, and its code will be added as its own function.
use warp::{path, Filter};
async fn main() {
let index = warp::path::end().map(|| "Welcome!");
let hello = path!("hello" / String).then(handle_hello);
let routes = index.or(hello);
warp::serve(routes).run(([0, 0, 0, 0], 3000)).await;
}
async fn handle_hello(name: String) -> impl warp::Reply {
format!("Hello, {}", name)
}
There are several changes here:
hello
is defined using the warp::path!
macro which adds convenience for declaring URI path components and arguments. path!("hello" / String)
declares that we’re handling a path that starts with the literal hello
then some String argument.routes
by handling any request that matches the root (via index
) or any request that matches the hello
handler. When a request comes in, warp will check these in order. (Requests that match none of these are discussed below.)handle_hello
function accepts the provided argument and returns some type that implements Reply
. Note that under the hood, this async function ends up returning a Future
, but this is transparent to us..then
; unintuitively, if handle_hello
was synchronous, we’d use .map
instead.In addition to the welcome message at the root, you can now go to http://127.0.0.1:3000/hello/adam to see "Hello, adam"
.
The goodbye
handler will be very similar to hello
but with some minor tweaks. First, we might want to either return 200 OK (which is the default) or some alternate status code. The new handler will conditionally return an error code for certain kinds of input:
use std::convert::Infallible;
use warp::{hyper::StatusCode, reply, Reply};
#[tokio::main]
async fn main() {
// ...
let goodbye = warp::path("goodbye")
.and(warp::path::param())
.and(warp::path::end())
.and_then(handle_goodbye);
let routes = index.or(hello).or(goodbye);
// ...
}
async fn handle_goodbye(name: String) -> Result<impl Reply, Infallible> {
if name == "earl" {
Ok(reply::with_status(
"Earl Grey is a tea".to_string(),
StatusCode::IM_A_TEAPOT,
))
} else {
Ok(reply::with_status(
format!("Goodbye, {}", name),
StatusCode::OK,
))
}
}
Declaring goodbye
is now using the explicit warp Filters path
, param
, and end
. This is very much like path!
did for hello
. As is common in Rust, type inference is used to determine the type that param
expects (by virtue of the function we’re calling). Note that both handle_hello
and handle_goodbye
use owned values (ie, String instead of &str), which is required for async functions for reasons outside the scope of this post.
The function handle_goodbye
now returns a Result<_, Infallible>
. This is to say, this function cannot fail (all code paths must return Ok
) but still returns a Result
. There are times in which you must return a Result
(eg, because some trait requires it), and if the function never fails, we can use Infallable
as the error type. Becausue this function returns a Result, we switch goodbye
from using .then
to .and_then
– also unintuitive, IMO.
Finally, this function uses the reply::with_status
function to return two different replies and statuses depending on some condition (here, the value of name
). But both branches will return the same concrete type (that is, warp::reply::WithStatus
), so we can still use impl Reply
.
All three routes are currently infallible – if an HTTP request matches the path for a route, warp will respond to it. Usually it’s with a 200 OK, but sometimes with 418 IM_A_TEAPOT. And the responses all contain a text body. But what if we want to redirect to another page? Or what if we start handling a request, decide that the designated function isn’t equipped to handle it, and want another function to take over? This is where we make use of Rejection
s. First, let’s setup a login
route:
#[tokio::main]
async fn main() {
// ...
let login = warp::path("login")
.and(warp::path::param())
.and(warp::path::end())
.and_then(handle_login);
let routes = index.or(hello).or(goodbye).or(login);
// ...
}
(For simplicity, we’re still just using a GET request with path parameters, such as /login/adam
.)
The handling function is now no longer fallible and can reject the request (which is to say that it will allow another handler to potentially pick it up). Let’s assume there are a few users that we don’t want to login: agent_smith
and neo
:
async fn handle_login(name: String) -> Result<String, warp::Rejection> {
if name == "agent_smith" {
todo!()
} else if name == "neo" {
todo!()
} else {
Ok(format!("You are now logged in as '{}'", name))
}
}
(This function’s happy path returns a String
instead of impl Reply
only to show that it’s possible to declare it that way. String
does implement Reply
, so this is functionally identical.)
What should we do for these users? warp::reject
provides a not_found
function that will reject a request. For Agent Smith, let’s use that:
if name == "agent_smith" {
Err(warp::reject::not_found())
} else if name == "neo" {
todo!()
}
But ‘not found’ isn’t particularly descriptive, and it doesn’t give us much control over how the rejection is subsequently handled. We can create our own type that implements Debug
and Reject
, then we can return this as a custom rejection:
#[derive(Debug)]
struct Neo;
impl warp::reject::Reject for Neo {}
async fn handle_login(name: String) -> Result<String, warp::Rejection> {
if name == "agent_smith" {
Err(warp::reject::not_found())
} else if name == "neo" {
Err(warp::reject::custom(Neo))
} else {
Ok(format!("You are now logged in as '{}'", name))
}
}
Now we have three different types of responses to our three different logins:
"You are now logged in as 'adam'"
"Unhandled rejection: Neo"
But how can we make use of these rejected requests?
Rejected requests can be recovered with .recover
. The Rejection
is passed to the recovery function which can then do something with it. (That something might itself be another rejection.)
#[tokio::main]
async fn main() {
// ...
let login = warp::path("login")
.and(warp::path::param())
.and(warp::path::end())
.and_then(handle_login)
.recover(handle_rejection);
// ...
}
async fn handle_rejection(err: warp::Rejection) -> Result<Box<dyn Reply>, warp::Rejection> {
if err.is_not_found() {
Ok(Box::new(warp::redirect(warp::hyper::Uri::from_static("/"))))
} else if err.find::<Neo>().is_some() {
Ok(Box::new(reply::with_status(
"Follow the white rabbit",
StatusCode::UNAUTHORIZED,
)))
} else {
Ok(Box::new(reply::with_status(
r#"¯\_(ツ)_/¯"#,
StatusCode::INTERNAL_SERVER_ERROR,
)))
}
}
We are rejecting /login/agent_smith
with a not found request, so those can be handled by checking err.is_not_found
. In that case, we’ll redirect to the root of our server. And the Neo
rejection type is handled using err.find::<>
. In that case, we construct a 401 response with the specified message. All other rejections – which shouldn’t be possible right now – are gracefully handled with a 500. Note that this means the request can’t fall through to any other recovery function should we add one later on.
You’ll also note that this function doesn’t return an impl Reply
but a Box<dyn Reply>
. This is because we now no longer have one concrete type being returned but two: the WithStatus
as before but also whatever redirect
returns, which in this case is a warp::reply::WithHeader
. Thus we have to box the return type.
All of the request handlers we’ve setup, regardless of their fallibility, are setup to handle specific paths, such as /
, /hello/adam
, and /login/agent_smith
. But our server doesn’t know how to handle other requests such as /about/contact.html
. This can be handled with a final handler that matches all requests:
let fallthrough = warp::any().map(|| "All other requests here");
let routes = index.or(hello).or(goodbye).or(login).or(fallthrough);
.any
matches all requests. Since fallthrough
comes after login
in our route handling, if a login
request is ultimately still rejected, those requests will also be handled with 200 "All other requests here"
.
Here’s the code of this entire sample, combined and with comments:
use std::convert::Infallible;
use warp::{hyper::StatusCode, path, reply, Filter, Reply};
#[tokio::main]
async fn main() {
// The index (/) of our webserver shows a simple message.
let index = warp::path::end().map(|| "Welcome!");
// A simple GET route declaration using the `path!` macro: we can
// declare the route ("hello") and the expected parameter type (String)
let hello = path!("hello" / String).then(handle_hello);
// The same idea as above but with the individual warp components.
// Additionally, `goodbye` can return error codes for 'bad' input.
let goodbye = warp::path("goodbye")
.and(warp::path::param())
.and(warp::path::end())
.and_then(handle_goodbye);
// `handle_login` might reject some requests, but the subsequent
// `handle_rejection` will take care of (some of) them.
// NB, `.recover` could be added to `routes` instead.
let login = warp::path("login")
.and(warp::path::param())
.and(warp::path::end())
.and_then(handle_login)
.recover(handle_rejection);
// This handler just catches everything in a rather uninteresting way.
let fallthrough = warp::any().map(|| "All other requests here");
// warp will handle requests in the following order:
let routes = index
.or(hello)
.or(goodbye)
.or(login)
.or(fallthrough);
warp::serve(routes).run(([0, 0, 0, 0], 3000)).await;
}
/// This function is infallible, so we can simply return an impl Reply.
/// To use it, we make use of [warp::Filter::then], which expects a Future.
async fn handle_hello(name: String) -> impl warp::Reply {
format!("Hello, {}", name)
}
/// This function is also in fallible, but for demonstration purposes, we'll return a `Result<_, Infallible>`.
/// Because of this, we use [warp::Filter::and_then], which is normally for fallible async functions.
///
/// Unrelated to fallibility, this function may return different error codes depending on the input.
/// We rewrite the response with a [warp::reply::StatusCode], so the impl Reply is a [warp::reply::WithStatus].
/// And because [`impl Trait`](https://doc.rust-lang.org/rust-by-example/trait/impl_trait.html) returns a concrete
/// type, the two branches here must be the same type -- that is, we can't return a String on one
/// side and a WithStatus on the other.
async fn handle_goodbye(name: String) -> Result<impl Reply, Infallible> {
if name == "earl" {
Ok(reply::with_status(
"Earl Grey is a tea".to_string(),
StatusCode::IM_A_TEAPOT,
))
} else {
Ok(reply::with_status(
format!("Goodbye, {}", name),
StatusCode::OK,
))
}
}
#[derive(Debug)]
struct Neo;
impl warp::reject::Reject for Neo {}
/// On login, we might reject certain inputs and allow some other request handler
/// to take over.
///
/// There are two types of rejections here: for `"agent_smith"`, we return a 404,
/// while for `"neo"`, we'll return the custom rejection defined above.
async fn handle_login(name: String) -> Result<impl Reply, warp::Rejection> {
if name == "agent_smith" {
Err(warp::reject::not_found())
} else if name == "neo" {
Err(warp::reject::custom(Neo))
} else {
Ok(format!("You are now logged in as '{}'", name))
}
}
/// When specifynig the routes we want to serve, we can `.recover` them with this function.
/// Any Rejection that comes before the recovery will be send here, and we can handle it
/// or send it back for yet another later recovery.
///
/// Additionally, yet not required, this function returns different kinds of replies, so
/// the return type is `Box<dyn Reply>`, and each concrete type is boxed accordingly.
async fn handle_rejection(err: warp::Rejection) -> Result<Box<dyn Reply>, warp::Rejection> {
if err.is_not_found() {
Ok(Box::new(warp::redirect(warp::hyper::Uri::from_static("/"))))
} else if err.find::<Neo>().is_some() {
Ok(Box::new(reply::with_status(
"Follow the white rabbit",
StatusCode::UNAUTHORIZED,
)))
} else {
Ok(Box::new(reply::with_status(
r#"¯\_(ツ)_/¯"#,
StatusCode::INTERNAL_SERVER_ERROR,
)))
}
}
Let’s start with a very simple state transition: stoplights. A stoplight has exactly three states and a simple, circular state diagram: green -> yellow -> red -> green -> ...
, which we can represent as:
#[derive(Debug, PartialEq)]
enum Stoplight {
Red,
Yellow,
Green,
}
impl Stoplight {
pub fn next(&self) -> Stoplight {
match *self {
Stoplight::Green => Stoplight::Yellow,
Stoplight::Yellow => Stoplight::Red,
Stoplight::Red => Stoplight::Green,
}
}
}
fn test_stoplight() {
let mut stoplight = Stoplight::Green;
stoplight = stoplight.next(); // yellow
stoplight = stoplight.next(); // red
stoplight = stoplight.next(); // green
assert_eq!(stoplight, Stoplight::Green);
}
Easy peasy. But we’re not storing any data or doing anything particularly interesting yet.
A better example might be a shopping cart. We might also have three states to a cart: Empty
, InProgress
, and Completed
. (We could have more, such as Shipped
and Delivered
, but I’m keeping three for simplicity.) Again, we could represent this as an enum but this time with data as appropriate:
use std::time::Instant;
#[derive(Debug)]
pub enum ShoppingCart {
Empty,
InProgress {
started: Instant,
products: Vec<String>,
},
Completed {
started: Instant,
completed: Instant,
products: Vec<String>,
total: f32,
},
}
impl ShoppingCart {
// TODO
}
This gives us three possible states. An empty cart has no data. It doesn’t “start” until a product is added, at which point we’ll know the start time and have a non-empty list of products
. (For simplicity, I’m using Instant
instead of something like chrono::DateTime
.) A completd cart has the start time and list of products but also a completed time and the calculated total cost.
What functionality does this need in the impl
? Well, we need to create a new cart:
pub fn new() -> ShoppingCart {
ShoppingCart::Empty
}
And we need to be able to add a product. An Empty
cart needs to change into an InProgress
with the one item; an InProgress
needs to just add the item:
pub fn add(&mut self, product: String) {
match self {
ShoppingCart::Empty => {
*self = ShoppingCart::InProgress {
started: Instant::now(),
products: vec![product],
};
}
ShoppingCart::InProgress { products, .. } => {
products.push(product);
}
ShoppingCart::Completed { .. } => panic!("Cannot add to a completed cart"),
}
}
Here’s where we see problems arise: if a cart has been completed, what should happen when .add()
is called? In this code, we panic. Alternately, we could return a Result
to propagate errors. But let’s continue the checkout
function, which applies to non-empty carts:
pub fn checkout(self) -> Self {
match self {
ShoppingCart::Empty => panic!("Can't checkout an empty cart"),
ShoppingCart::InProgress { started, products } => {
let total = products
.iter()
.map(|p| match &p[..] {
"Apple" => 1.10,
"Orange" => 0.75,
_ => todo!("Handle other products here"),
})
.sum();
ShoppingCart::Completed {
started,
completed: Instant::now(),
products,
total,
}
}
ShoppingCart::Completed { .. } => panic!("Can't checkout a completed cart"),
}
}
Again, more errors. But we can now test this and everything works:
pub fn test_cart_enum() {
let mut cart = ShoppingCart::new();
cart.add("Apple".to_string());
cart.add("Orange".to_string());
let cart = cart.checkout();
println!("{:?}", cart);
}
An aside: .add()
takes &mut self
and will simply modify the existing value (or, in the case of an empty cart, will replace self
with an updated discriminant) while .checkout()
consumes self
and returns a new value. The reason for this difference is that .checkout()
consuming and returning can use move semantics to maintain the existing list of values. In order to remain &mut self
, it would have to either clone the product list or do some wonky mem::replace
to guarantee that self
is always valid. Probably not a great API design.
The above code isn’t great from a safety/correctness perspective. We can gracefully handle runtime errors with Result
, but wouldn’t it be better to make it impossible to hit them?
Ideally, an empty cart shouldn’t ever be able to call .checkout()
. That is, we’d like to do something like this:
let cart = ShoppingCart::new();
let done = cart.checkout(); // this *should* fail to compile
Let’s start with the states we need but instead of specifying them as an enum, they are unit structs. We’ll also define the shopping cart that can be of some type T
and will contain all the data we need:
struct Empty;
struct InProgress;
struct Complete;
struct ShoppingCart<T> {
started: Instant,
completed: Instant,
products: Vec<String>,
total: f32,
phantom: PhantomData<T>,
}
hic svnt dracones: this may not be best practice, but it works.
What does this mean? We will use the three unit structs to represent the type of ShoppingCart
; that is, a ShoppingCart<Empty>
is a different type than ShoppingCart<InProgress>
, and they can have independent implementations.
But what’s this PhantomData<T>
? It appeases the compiler because we’re not actually storing a T
; PhantomData
makes our ShoppingCart<T>
act like it contains a T
. We could have instead used, say, _state: T
if we wanted to.
Interestingly, we can have different impl
blocks for the different states:
impl ShoppingCart<Empty> {
fn new() -> Self {
ShoppingCart {
started: Instant::now(),
completed: Instant::now(),
products: Vec::new(),
total: 0.0,
phantom: PhantomData,
}
}
}
fn test_cart_marker() {
let cart = ShoppingCart::new(); // of type ShoppingCart<Empty>
}
Currently, there exists exactly one implementation of ShoppingCart
for which a new
function exists, so Rust knows that it must be a ShoppingCart<Empty>
.
Even though the cart has neither been started (per our definition above) nor completed, these fields shouldn’t exist. We could make the fields optional and set them to None
. But we also know the current state is Empty
and we can thus set but ignore the value. Again, this may not be best practice.
To add a product to a cart, we provide an .add()
function to both the empty cart:
impl ShoppingCart<Empty> {
fn add(self, product: String) -> ShoppingCart<InProgress> {
ShoppingCart {
// The only fields we really care about:
started: Instant::now(),
products: vec![product],
// Everything else has a valid value that we'll just ignore
completed: self.completed,
total: self.total,
phantom: PhantomData,
}
}
}
And the separate implementation of ShoppingCart<InProgress>
:
impl ShoppingCart<InProgress> {
fn add(mut self, product: String) -> ShoppingCart<InProgress> {
self.products.push(product);
self
}
}
The distinction here is that the empty cart needs to create an in-progress cart while an in-progress cart can keep its state and simply add a product to its already existing list. Now, our test code can be:
fn test_cart_marker() {
let cart = ShoppingCart::new(); // of type ShoppingCart<Empty>
let cart = cart.add("Apple".to_string()); // of type ShoppingCart<InProgress>
let cart = cart.add("Orange".to_string()); // of type ShoppingCart<InProgress>
}
By virtue of calling cart.add()
on an empty cart, we are returned a differently typed value. That is, just as a Vec<u32>
is an altogether different animal than a Vec<bool>
, so too is ShoppingCart<Empty>
different than ShoppingCart<InProgress>
. So it is with an in-progress cart being completed:
impl ShoppingCart<InProgress> {
fn complete(self) -> ShoppingCart<Complete> {
let total = self
.products
.iter()
.map(|p| match &p[..] {
"Apple" => 1.10,
"Orange" => 0.75,
_ => panic!("Unknown product"),
})
.sum();
ShoppingCart {
started: self.started,
completed: Instant::now(),
total,
products: self.products,
phantom: PhantomData,
}
}
}
For now, there’s no implementation of the completed cart. Specifically, we don’t create an .add()
function on it, which means:
fn test_cart_marker() {
let cart = ShoppingCart::new(); // of type ShoppingCart<Empty>
let cart = cart.add("Apple".to_string()); // of type ShoppingCart<InProgress>
let cart = cart.add("Orange".to_string()); // of type ShoppingCart<InProgress>
let cart = cart.complete();
assert_eq!(cart.total, 1.85);
//cart.add("Banana".to_string()); // no method named `add` found for struct `ShoppingCart<Complete>` in the current scope
}
Because the type is changing on the first call to .add()
and on the call to .complete()
, we have to shadow the variables – we can’t make cart mut and overwrite it:
fn test_cart_marker() {
let mut cart = ShoppingCart::new(); // of type ShoppingCart<Empty>
//cart = cart.add("Apple".to_string()); // mismatched types
// expected struct `ShoppingCart<Empty>`
// found struct `ShoppingCart<InProgress>
}
My next post will include more on converting between the generic types.
]]>I am at the very early stages of learning about SAT/SMT, and as a way to help me learn it myself, I’m writing this post - with the additional hope that it might help others. For the purpose of this post, I’m conflating the terms SAT and SMT, and I don’t define them here. The very high-level description of SAT is that we declare constraints over some boolean propositions and let the SMT solver figure it out; this is in contrast to trying to write some complicated or brute-force algorithm to figure it out for us. We declare these constraints using s-expressions comprising the SMT-LIB language.
You can find the the final script and the generated input SMT-LIB and output model from this post here.
“Einstein’s Problem” is the kind of logical thinking puzzle I used to do in high school: there are five houses, each of a different color, with an owner of a different nationality, etc. Each such property is unique; that is, exactly one house must be blue, exactly one owner drinks milk, exactly one owner keeps birds, and so on. There are also hints as to who lives in which house – the owner of the blue house also drinks milk; the owner of the red house is neighbors with the beer drinker.
I did a search for Einstein’s Problem and found this website as the first hit. So I copied all of the text of this problem as my input. It looks like this:
Then there are some clues:
To solve this puzzle, I wanted to stick with creating an .smt2 file fed to an SMT solver – specifically, Z3 – instead of using the Python API, for example. I did, however, make use of Python to generate all the s-exprs.
I started by creating functions for each property (blue, green, red, dog, horse, beer, Norwegian, etc.):
(declare-fun blue (Int) Bool)
(declare-fun green (Int) Bool)
(declare-fun red (Int) Bool)
(declare-fun white (Int) Bool)
(declare-fun yellow (Int) Bool)
This is done for all properties. The idea is to be able to call each of these functions with the house number, from 1 to 5 inclusive. For example, clue 9 says that the Norwegian lives in the first house, so we can constrain the solution to require this fact:
(assert (norwegian 1))
I programmatically generated these function declarations in Python by printing the generated s-exprs to stdout:
parameters = {
'colors': ['blue', 'green', 'red', 'white', 'yellow'],
'nationalities': ['brit', 'dane', 'german', 'norwegian', 'swede'],
'beverage': ['beer', 'coffee', 'milk', 'tea', 'water'],
'cigar': ['bluemaster', 'dunhill', 'pallmall', 'prince', 'blend'],
'pet': ['cat', 'bird', 'dog', 'fish', 'horse']
}
for (k, vs) in parameters.items():
print(f'; functions for {k}:')
for v in vs:
print(f'(declare-fun {v} (Int) Bool)')
print()
The definition of the puzzle is that each such parameter (or property) applies to exactly one house. That means that the function blue
must be true for house 1, house 2, house 3, house 4, or house 5. This can be created as the boolean or
of multiple functions:
(assert (or
(blue 1)
(blue 2)
(blue 3)
(blue 4)
(blue 5)
))
This constrains the solution such that blue
must be true for at least one of the houses. But we need to further constrain it so that it’s true for only one house. The way to do this is to generate a lot more constraints that require that it’s not the case that (blue i)
and (blue j)
for each i and j such that i != j:
(assert (not (and (blue 1) (blue 2))))
(assert (not (and (blue 1) (blue 3))))
(assert (not (and (blue 1) (blue 4))))
(assert (not (and (blue 1) (blue 5))))
(assert (not (and (blue 2) (blue 3))))
(assert (not (and (blue 2) (blue 4))))
(assert (not (and (blue 2) (blue 5))))
(assert (not (and (blue 3) (blue 4))))
(assert (not (and (blue 3) (blue 5))))
(assert (not (and (blue 4) (blue 5))))
Note that adding multiple separate constraints (assert
s-exprs) is the same as having a single s-expr as the conjunction (boolean and
) of multiple s-exprs.
There’s a further restriction – this one bit me as I was debugging my nearly complete solution – that once a property holds for a house, all other related properties must not hold. That is, if (blue 4)
is true, then (red 4)
must not be true (which is the same as (not (red 4))
must hold true). I then generated all of these s-exprs in Python:
for (k, vs) in parameters.items():
for v in vs:
print(f'; at least one {v} house')
ors = ' '.join(f'({v} {i})' for i in range(1, 6))
print(f'(assert (or {ors}))')
print(f'; but not more than one {v} house')
for i in range(1, 5):
for j in range(i+1, 6):
print(f'(assert (not (and ({v} {i}) ({v} {j}))))')
print()
print(f'; A house can only match one {k} proposition')
for v1 in vs:
for v2 in vs:
if v1 != v2:
for i in range(1, 6):
print(f"(assert (not (and ({v1} {i}) ({v2} {i}))))")
print()
Running these ~30 lines of Python generates almost 900 lines of SMT-LIB, and this sets up the baseline constraints.
The simplest clues are 8 and 9:
These are written as assertions that the milk
function must hold true when invoked with 3
, the middle position, and that norwegian
must hold when invoked with 1
:
(assert (milk 3))
(assert (norwegian 1))
Why do we not need to assert (not (milk 2))
, (not (brit 1))
, (not (norwegian 4))
, and so on? Because The above constraints already do that for us: we’re constraining such that (milk 3)
, and we already have constraints that require exactly one house holds true for milk
and that no other drink functions hold true for house 3.
Clue four is also quite easy:
If we knew that the white house was the third one ((white 3)
), then we’d know that (green 2)
must hold. Knowing either of the properties here tells us what we need about the other, so it’s not deducing white from green but constraining that both conditions must hold together. But we don’t know which house it’s in, so we can disjoin the four possibilities:
(assert (or
(and (green 1) (white 2))
(and (green 2) (white 3))
(and (green 3) (white 4))
(and (green 4) (white 5))
))
This combined constraint describes clue four.
The first clue is that the Brit lives in the red house, but we don’t know which one it is. Using bidirectional implication – if and only if, written as iff
– we can simply constrain:
(assert (iff (brit 1) (red 1)))
(assert (iff (brit 2) (red 2)))
(assert (iff (brit 3) (red 3)))
(assert (iff (brit 4) (red 4)))
(assert (iff (brit 5) (red 5)))
Colloquially, this says that if the Brit is in house 1 then house 1 must also be red, and if house 1 is red then the Brit must live there. The two sides of the iff
must match. This is all we need for the first clue, and this type of constraint applies to clues 2, 3, 5, 6, 7, 12, and 13. These are very simply generated in Python:
for i in range(1, 6):
print(f'(assert (iff (brit {i}) (red {i})))')
The four remaining clues relate neighbors, such as
This requires we relate previous and next neighbors, if they exist. For example:
(assert (or
(and (blend 1) (cat 2)) ; house 1 has no 'previous' neighbor, check the next
(and (blend 2) (cat 1)) ; check previous...
(and (blend 2) (cat 3)) ; ...and next neighbor
(and (blend 3) (cat 2))
(and (blend 3) (cat 4))
(and (blend 4) (cat 3)) ; etc
(and (blend 4) (cat 5))
(and (blend 5) (cat 4))))
I generate these with a helper function:
def has_neighbor(prop1: str, prop2: str) -> str:
ands = []
for i in range(1, 6):
for j in range(1, 6):
if abs(i-j) == 1:
ands.append(f' (and ({prop1} {i}) ({prop2} {j}))')
return '(assert (or\n' + '\n'.join(ands) + '))\n'
print(has_neighbor('blend', 'cat'))
print(has_neighbor('horse', 'dunhill'))
print(has_neighbor('norwegian', 'blue'))
print(has_neighbor('blend', 'water'))
That’s all we need to constrain the model! The last two s-expressions will check for satisfiability and will return the model:
(check-sat)
(get-model)
I print these out in Python and run the script, generating just over 1000 lines of SMT-LIB code. When I run that through Z3 using z3 einstein-generated.smt2
, I get the following output:
sat
(
(define-fun brit ((x!0 Int)) Bool
(= x!0 3))
(define-fun bluemaster ((x!0 Int)) Bool
(= x!0 5))
(define-fun dunhill ((x!0 Int)) Bool
(= x!0 1))
(define-fun cat ((x!0 Int)) Bool
(= x!0 1))
(define-fun norwegian ((x!0 Int)) Bool
(= x!0 1))
(define-fun coffee ((x!0 Int)) Bool
...
)
The first line, sat
, is the result of (check-sat)
, and it tells us that the constraints we’ve applied are satisfiable. The rest shows us a model that satisfies the constraints. For example, the first s-expr (define-fun brit ((x!0 Int)) Bool (= x!0 3))
tells us that the brit
function (which takes an int argument represented as x!0
and returns a bool) is satisfied when its argument equals 3
; in other words, the (brit 3)
constraint must evaluate to true in order for the model to hold. So the Brit lives in the third house.
Cleaning up the rest of these s-expressions gives the following details:
cat 1
dunhill 1
norwegian 1
water 1
yellow 1
blend 2
blue 2
dane 2
horse 2
tea 2
bird 3
brit 3
milk 3
pallmall 3
red 3
coffee 4
fish 4
german 4
green 4
prince 4
beer 5
bluemaster 5
dog 5
swede 5
white 5
Which can just be represented in a table as:
House 1 | House 2 | House 3 | House 4 | House 5 | |
---|---|---|---|---|---|
Color | Yellow | Blue | Red | Green | White |
Pet | Cat | Horse | Bird | Fish | Dog |
Cigar | Dunhill | Blend | Pall Mall | Prince | Bluemaster |
Nation | Norwegian | Dane | Brit | German | Swede |
Drink | Water | Tea | Milk | Coffee | Beer |
And this is exactly the solution at the bottom of the linked page.
The direct question posed by this problem is, “Who keeps fish?” This table makes it easy to look it up – the fish is at house 4, which is owned by the German. But presumably there’s a way to ask this directly of the SAT solver: declare a variable, owns_fish
, constrain that (fish owns_fish)
must hold (which should bind this value to whatever house owns it). Then assert that owns_fish
should be a value from 1 to 5:
(declare-const owns_fish Int)
(assert (fish owns_fish))
(assert (or
(= 1 owns_fish)
(= 2 owns_fish)
(= 3 owns_fish)
(= 4 owns_fish)
(= 5 owns_fish)
))
(eval owns_fish) ; displays '4' when run
And we can further constrain a declared String to be returned by asserting that natl_owns_fish
is equal to "brit"
if (brit owns_fish)
(which will hold only if the brit
function is true for owns_fish
), and so on:
(declare-const natl_owns_fish String)
(assert (= natl_owns_fish
(ite (brit owns_fish) "brit"
(ite (dane owns_fish) "dane"
(ite (german owns_fish) "german"
(ite (norwegian owns_fish) "norwegian" "swede"))))
))
(eval natl_owns_fish) ; "german"