A Gotcha When Using ZooKeeper Ephemeral Nodes

Ephemeral nodes in Apache ZooKeeper are great for transient data:
These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted.

One obvious use for ephemeral nodes is for service discovery, that is, when your services are running they publish metadata about their location (hostname, port). That way clients of the service don't need to know the list of potential service addresses and can instead know what services are available on-demand. (This also allows you, if you wish, to remove load-balancers that route requests to services in favor of client-side routing.) And when your service stops, its discovery information is automatically removed from ZooKeeper, and clients will no longer view the service as available.

In Scala this looks like:

try {
zk.create(path, data, CreateMode.EPHEMERAL)
} catch {
// The ephemeral node hasn't been deleted yet.
e: NodeExistsException => zk.set(path, data)
}
 

Something weird was happening with this code. I would restart the service, and when the the second case was triggered (the ephemeral node hadn't been deleted yet), after a little while the ZooKeeper node with my data would disappear, as if my session had expired!

Of course the reason the node disappeared is obvious in hindsight: if you didn't create the ephemeral node in your session, your session doesn't own the node, so when the owning session expires, the node will be deleted. This situation becomes more common when you don't close your ZooKeeper session explicitly when you shut down your service, so you're relying on the ZooKeeper server quorum itself to expire ephemeral nodes. And if your server can restart faster than the session timeout, usually 30 seconds, then you'll run into this situation.

To avoid this, when starting your service first delete the node if it's there (because you don't own it), and create it so your session owns the node:

try {
  zk.delete(path)
} catch {
  e: NoNodeException => // do nothing
}

zk.create(path, data, CreateMode.EPHEMERAL)