Ephemeral port exhaustion is a resource starvation problem where a machine is no longer able to use its TCP subsystem because it does not have any available connection slots. In our experience, this most often occurs on proxies.
TCP is the bedrock of web services—it forms the basis for all internode communication. Every TCP connection can be represented by a tuple of (source IP, source port, destination IP, destination port). For a given machine communicating with a single upstream host, three of these tuple elements (source IP, destination IP, and destination port) are fixed. This means the number of connections a single machine can make to a single web service is limited to the number of source ports it has available. On Linux, the source port for an outgoing connection is selected by the kernel from the ephemeral range.
This is how you can find the currently configured ephemeral port range:
[firstname.lastname@example.org ~]$ cat /proc/sys/net/ipv4/ip_local_port_range
This gives a range of about 28,000 connections. This seems like a number that would give one plenty of buffer room when designing for high scale web services—when would one ever have 28,000 active connections?
TCP Connection Lifecycle
TCP connections go through various stages during their lifecycle. During the handshake, connections go from SYN_SENT → SYN_RECV → ESTABLISHED. ESTABLISHED means the connection is open and the handshake is completed. At peak load, one should expect the number of ESTABLISHED TCP connections to be close to the maximum number of concurrent clients a service is able to handle.
However, upon closing a connection in the ESTABLISHED state, the connection enters TIME_WAIT for 120 seconds (this value is hardcoded in the linux kernel).
The TIME_WAIT state exists to allow any delayed, out of order, or straggling packets to be ignored by the networking stack (because this socket is not being recycled or reused by another session).
Doing a little math, a webservice only needs to have a sustained load of ~230 requests per second be in danger of exhausting the number of ephemeral ports available for outgoing connections.
Why Proxies Are Vulnerable
Proxies are particularly susceptible to ephemeral port exhaustion because all requests (from many clients) funnel through them. There are typically more clients than upstream backends from the proxy, which worsens this effect.
If you have a simple setup of 4 clients connecting to one proxy which connects to two backend servers, you have effectively cut the maximum concurrency of the system in half (assuming the backend can service requests rapidly enough).
The “safest” solutions to ephemeral port exhaustion involve increasing the effective TCP connection address space. This can be done by increasing the number of destination IPs, binding the web service backend to multiple ports, increasing the ephemeral port range on the client in the kernel, or providing clients with multiple IPs (for example, by bonding multiple virtual NICs to a single physical NIC).
In practice, the simplest solution is to enable an “exotic” Linux TCP option called tcp_tw_reuse. This option enables the Linux kernel to reclaim a connection slot from a connection in TIME_WAIT state and reallocate it to a new connection.
The TIME_WAIT state of a TCP connection has the most value value when network quality is low, there is network congestion, and unpredictable latency. On a private, internal datacenter network, most of these conditions are not present, which means it should be safe to enable tcp_tw_reuse. Enabling this option is higher leverage than increasing the ephemeral port range, because the maximum ephemeral port is only 65536.
There is an even more exotic TCP option in the kernel called tcp_tw_recycle, which causes the kernel to prefer to reclaim connections in TIME_WAIT state over allocating new ones.