Synchronous IO is fast?!
When Cloudflare announced SQLite backed Durable Objects, the claim was made that synchronous IO will actually complete faster than asynchronous IO. This does come as a surprise to most of the engineers that I speak with and so let us substantiate the claim.
To do this, we will run twelve benchmarks by looking at read only, write only, and an 80% read and 20% write workload. Ironically, we do see that approximately 80% of row operations on Cloudflare Durable Objects are reads, so I am not blindly implementing the 80/20 rule. We will consider these workloads with sequential and random operations and, most importantly, with synchronous and asynchronous operations.
In the previous post, I walked through how to benchmark a
storage performance using fio
. This post will primarily focus on the results of these benchmarks,
though these benchmarks will be running on the same hardware (however this matters less for the
sake of comparing synchronous and asynchronous IO).
Results part 1
Latency
There is a noticeable, approximately 30 times, reduction in latency for all types of synchronous operations. Another interesting side effect is that we see a slight reduction in latency for synchronous sequential operations when compared to synchronous random operations, but this does not carry over to asynchronous operations.
Latency (ns)
Synchronous | Asynchronous | ||
---|---|---|---|
Read | Random | 943 | 31,410 |
Sequential | 657 | 33,910 | |
Write | Random | 1,968 | 43,230 |
Sequential | 1,675 | 42,030 | |
80% Read / 20% Write | Random | 961 / 2,000 | 35,740 / 37,770 |
Sequential | 680 / 1,785 | 34,100 / 36,870 |
Throughput
Similarly to the latencies seen above, synchronous IO is noticeably faster in terms of throughput. Throughput for synchronous operations is approximately 2 times that of asynchronous operations. This can be seen in terms of either IOPS or bandwidth given that the blobs used in this benchmark are all the same size. Both results are shown below. Results for the 80% read and 20% write workload are summed.
IOPS
Synchronous | Asynchronous | ||
---|---|---|---|
Read | Random | 762,000 | 307,000 |
Sequential | 1,320,000 | 311,000 | |
Write | Random | 424,000 | 237,000 |
Sequential | 561,000 | 245,000 | |
80% Read / 20% Write | Random | 644,000 | 290,000 |
Sequential | 918,000 | 304,000 |
Bandwidth (MiB/s)
Synchronous | Asynchronous | ||
---|---|---|---|
Read | Random | 2,975 | 1,199 |
Sequential | 5,156 | 1,213 | |
Write | Random | 1,656 | 927 |
Sequential | 2,190 | 958 | |
80% Read / 20% Write | Random | 2,514 | 1,133 |
Sequential | 3,583 | 1,184 |
Results part 2
The reason why the previous benchmarks favor synchronous IO is partially because operating systems buffer IO operations in memory by default. These benchmarks were not configured to explicitly avoid buffering in memory, so these read and write "IO operations" are not really IO operations. Importantly, this is not cheating! Most real workloads are going to benefit similarly from buffering.
However, there is more nuance if we disable buffering, so let us repeat the benchmark and disable
buffering this time, that is setting direct=1
in the fio
initialization file. Results of this
benchmark are shown below in the same format, but this time only include a read only workload.
Latency
There are a couple of things that are different about the latency results with buffering disabled. First of all, our latencies are orders of magnitude higher across the board. While buffered synchronous sequential IO operations can take several hundred nanoseconds it takes tens of microseconds to do a IO operation without buffering. Second, synchronous IO is still faster.
Latency (ns)
Synchronous | Asynchronous | ||
---|---|---|---|
Read | Random | 84,700 | 362,640 |
Sequential | 26,450 | 144,270 |
Throughput
It turns out that synchronous IO is not better across the board though. With buffering disabled asynchronous IO does provide better throughput. Sequential synchronous IO operations are close in terms of throughput to random asynchronous IO operations, but not quite. Disabling buffering also shows the expected performance difference between sequential and random IO operations more clearly.
IOPS
Synchronous | Asynchronous | ||
---|---|---|---|
Read | Random | 12,000 | 44,000 |
Sequential | 38,000 | 110,000 |
Bandwidth (MiB/s)
Synchronous | Asynchronous | ||
---|---|---|---|
Read | Random | 46 | 172 |
Sequential | 147 | 430 |
If you have made it this far, hopefully you will now believe the claim that synchronous IO is faster than asynchronous IO. Beyond that, buffered IO is awesome.
Setup
The benchmarks in this post can be recreated using the initialization file below.
$ cat benchmark.ini
[global]
size=1G ; File size of 1 GiB per job
direct=0 ; Use buffered IO operations (toggle this for the second set of benchmarks)
bs=4k ; Block size of 4 KiB
numjobs=1 ; Single job to isolate IO method
runtime=60 ; Run for 60 seconds
time_based ; Ensure the test runs for the full duration
stonewall ; Ensure jobs run sequentially
[sync-read]
ioengine=sync ; Synchronous IO engine
rw=read ; Sequential reads (replicate with: randread, write, randwrite, rw, and randrw)
[async-read]
ioengine=posixaio ; POSIX asynchronous IO engine
iodepth=32 ; Queue depth for asynchronous I/O
rw=read ; Sequential reads (replicate with: randread, write, randwrite, rw, and randrw)