kazu.dev

notes from a systems engineer. probably wrong. occasionally useful.
kazuhiro w. — i write databases for a living and complain about them on the weekends. based out of fukuoka, jp. currently nerd-sniped by io_uring and the various ways it can take down a server.
2026-04-28

why io_uring made our tail latency worse

linuxio_uringperfpostmortem

we replaced a tidy little blocking-thread-per-connection model with io_uring expecting throughput to go up and latency to follow. throughput did go up. p99.9 doubled. this post is the bisect.

the short version: SQPOLL plus a saturated CPU plus an unfortunate cgroup pinning interacted to produce a multi-millisecond stall on submission that nothing in our metrics caught for two weeks. the rest of this post is the long version.

read it
2026-03-14

debugging a rocksdb write stall with nothing but strace

rocksdblsmdebug

customer reports: writes pause for 4-6 seconds, every 90 minutes, give or take. no panics. no GC. compaction looks healthy. metrics show nothing. it took me an afternoon and an embarrassing amount of strace to figure out it was the WAL file rotation racing with the fsync on the manifest.

# the smoking gun
strace -f -e trace=fsync,rename,openat -p $(pidof db) 2>&1 \
  | awk '/manifest/{print strftime("%T"), $0}'
read it
2026-02-02

i finally understand raft joint consensus

distsysraftconsensus

i implemented raft in 2019. i implemented joint consensus in 2021. it took until last week, reading dragonboat's source, for the membership-change algorithm to actually click for me. this post is the explanation i wish someone had handed me.

read it
2025-12-19

a small zig allocator trick for arena-heavy code

zigallocators

short one. if you're using a lot of nested arenas in zig and finding the lifetime accounting annoying, you can wrap an arena in a small helper that exposes both the underlying allocator and a "scoped reset" handle. handy for parser code.

read it