Synchronous IO is fast?!

Thursday, March 20, 2025

When Cloudflare announced SQLite backed Durable Objects, the claim was made that synchronous IO will actually complete faster than asynchronous IO. This does come as a surprise to most of the engineers that I speak with and so let us substantiate the claim.

To do this, we will run twelve benchmarks by looking at read only, write only, and an 80% read and 20% write workload. Ironically, we do see that approximately 80% of row operations on Cloudflare Durable Objects are reads, so I am not blindly implementing the 80/20 rule. We will consider these workloads with sequential and random operations and, most importantly, with synchronous and asynchronous operations.

In the previous post, I walked through how to benchmark a storage performance using fio. This post will primarily focus on the results of these benchmarks, though these benchmarks will be running on the same hardware (however this matters less for the sake of comparing synchronous and asynchronous IO).

Results part 1

Latency

There is a noticeable, approximately 30 times, reduction in latency for all types of synchronous operations. Another interesting side effect is that we see a slight reduction in latency for synchronous sequential operations when compared to synchronous random operations, but this does not carry over to asynchronous operations.

Latency (ns)

SynchronousAsynchronous
ReadRandom94331,410
Sequential65733,910
WriteRandom1,96843,230
Sequential1,67542,030
80% Read / 20% WriteRandom961 / 2,00035,740 / 37,770
Sequential680 / 1,78534,100 / 36,870

Throughput

Similarly to the latencies seen above, synchronous IO is noticeably faster in terms of throughput. Throughput for synchronous operations is approximately 2 times that of asynchronous operations. This can be seen in terms of either IOPS or bandwidth given that the blobs used in this benchmark are all the same size. Both results are shown below. Results for the 80% read and 20% write workload are summed.

IOPS

SynchronousAsynchronous
ReadRandom762,000307,000
Sequential1,320,000311,000
WriteRandom424,000237,000
Sequential561,000245,000
80% Read / 20% WriteRandom644,000290,000
Sequential918,000304,000

Bandwidth (MiB/s)

SynchronousAsynchronous
ReadRandom2,9751,199
Sequential5,1561,213
WriteRandom1,656927
Sequential2,190958
80% Read / 20% WriteRandom2,5141,133
Sequential3,5831,184

Results part 2

The reason why the previous benchmarks favor synchronous IO is partially because operating systems buffer IO operations in memory by default. These benchmarks were not configured to explicitly avoid buffering in memory, so these read and write "IO operations" are not really IO operations. Importantly, this is not cheating! Most real workloads are going to benefit similarly from buffering.

However, there is more nuance if we disable buffering, so let us repeat the benchmark and disable buffering this time, that is setting direct=1 in the fio initialization file. Results of this benchmark are shown below in the same format, but this time only include a read only workload.

Latency

There are a couple of things that are different about the latency results with buffering disabled. First of all, our latencies are orders of magnitude higher across the board. While buffered synchronous sequential IO operations can take several hundred nanoseconds it takes tens of microseconds to do a IO operation without buffering. Second, synchronous IO is still faster.

Latency (ns)

SynchronousAsynchronous
ReadRandom84,700362,640
Sequential26,450144,270

Throughput

It turns out that synchronous IO is not better across the board though. With buffering disabled asynchronous IO does provide better throughput. Sequential synchronous IO operations are close in terms of throughput to random asynchronous IO operations, but not quite. Disabling buffering also shows the expected performance difference between sequential and random IO operations more clearly.

IOPS

SynchronousAsynchronous
ReadRandom12,00044,000
Sequential38,000110,000

Bandwidth (MiB/s)

SynchronousAsynchronous
ReadRandom46172
Sequential147430

If you have made it this far, hopefully you will now believe the claim that synchronous IO is faster than asynchronous IO. Beyond that, buffered IO is awesome.

Setup

The benchmarks in this post can be recreated using the initialization file below.

$ cat benchmark.ini
[global]
size=1G           ; File size of 1 GiB per job
direct=0          ; Use buffered IO operations (toggle this for the second set of benchmarks)
bs=4k             ; Block size of 4 KiB
numjobs=1         ; Single job to isolate IO method
runtime=60        ; Run for 60 seconds
time_based        ; Ensure the test runs for the full duration
stonewall         ; Ensure jobs run sequentially

[sync-read]
ioengine=sync     ; Synchronous IO engine
rw=read           ; Sequential reads (replicate with: randread, write, randwrite, rw, and randrw)

[async-read]
ioengine=posixaio ; POSIX asynchronous IO engine
iodepth=32        ; Queue depth for asynchronous I/O
rw=read           ; Sequential reads (replicate with: randread, write, randwrite, rw, and randrw)