Blocking I/O, Nonblocking I/O, And Epoll
January 10, 2017
In this post I want to explain exactly what happens when you use nonblocking I/O. In particular, I want to explain:
The semantics of setting O_NONBLOCK on a file descriptor using fcntl
How nonblocking I/O is different from asynchronous I/O
Why nonblocking I/O is frequently used in conjunction with I/O multiplexers like select, epoll, and kqueue
How nonblocking mode interacts with edge-triggered polling with epoll
Blocking Mode
By default, all file descriptors on Unix systems start out in "blocking mode". That means that I/O system calls like read, write, or connect can block. A really easy way to understand this is to think about what happens when you read data on stdin from a regular TTY-based program. If you call read on stdin then your program will block until data is actually available, such as when the user actually physically types characters on their keyboard. Specifically, the kernel will put the process into the "sleeping" state until data is available on stdin. This is also the case for other types of file descriptors. For instance, if you try to read from a TCP socket then the read call will block until the other side of the connection actually sends data.
Blocking is a problem for programs that should operate concurrently, since blocked processes are suspended. There are two different, complementary ways to solve this problem. They are:
Nonblocking mode
I/O multiplexing system calls, such as select and epoll
These two solutions are often used together, but they are independent strategies to solving this problem, and often both are used. In a moment we'll see the difference and why they're commonly both used.
Nonblocking Mode (O_NONBLOCK)
A file descriptor is put into "nonblocking mode" by adding O_NONBLOCK to the set of fcntl flags on the file descriptor:
/* set O_NONBLOCK on fd */
int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
From this point forward the file descriptor is considered nonblocking. When this happens I/O system calls like read and write that would block will return -1, and errno will be set to EWOULDBLOCK.
This is interesting, but on its own is actually not that useful. With just this primitive there's no efficient way to do I/O on multiple file descriptors. For instance, suppose we have two file descriptors and want to read both of them at once. This could be accomplished by having a loop that checks each file descriptor for data, and then sleeps momentarily before checking again:
struct timespec sleep_interval{.tv_sec = 0, .tv_nsec = 1000};
ssize_t nbytes;
for (;