UDP Discovery Done Right (Not as Easy as you think)

Seems like a simple thing right? You want to do a UDP Broadcast to find devices/services on the network and then display them in an elegant list, right? Whereas it isn’t exactly an immensely difficult problem, knowing the right formula from the start is a must. There are other pre-canned discovery services out there, Bonjour, for example, but I really wanted to make my own just so that I could understand what all the fuss was about. Why do my friggen Chromecasts sometimes have a hard time appearing on the network? As simple as the problem seems, there are a few “gotchas” and other things to consider, however I was able to find a 100% working solution. Maybe you will find these advice points useful.

First let’s talk the basics for anyone who is totally virgin to this idea. In a nutshell, we should be able to theoretically perform UDP broadcasts on the Local area network to find computers running a service we create which is listening on a known port (I chose 1212). For example, broadcasting to 192.168.0.255 on a subnet with mask (255.255.255.0) should reach any udp listener listening on 1212 with addresses 192.168.0.1-192.168.0.254. Once that listener hears our request, it should be able to respond directly to our request, giving us information about the skills it publishes. I wanted to build a solution that didn’t require a dedicated single listener on the network, instead I wanted my apps to talk to each other and agree upon an aggregate list. This way I don’t have to install a service somewhere, I just have to have one app running that uses this framework (well TWO for it to be meaningful). Each app is effectively a peer.

This advice is programming language agnostic. The code I wrote was in Pascal, however the rules and advice contained herein will apply to literally any platform and programming language. In Pascal, I built myself a set of classes, a “Greeter” class, a “Skill Manager” and a “Skill” class.

First the easy part. In a nutshell, if an app wants to publish the availability of a skill, it simply creates an instance of the “Skill” class, and registers it with the “Skill Manager”. The skill manager keeps a list of skills, both local and remote, along with their attributes, such as the IP address and port where the skill is located, local/remote, address, port, protocol information…etc.

The Skill manager operates without any knowledge of the discovery mechanism. This is where the “Greeter” comes into play. The “Greeter” class handles broadcasting to the network to seek out skills. This seems trivial at first, but I made a few mistakes in my first, second, and even third attempt at getting it right.

My first mistake:

I assumed I could do this with just one UDP listener. In theory you sorta can, however, it gets tricky if you want add this capability to multiple programs running on the same computer. Since skill discovery has to start with a known port, I used “1212” in my case, I realized that I couldn’t have more than one app running at the same time, bound to port 1212 exclusively. I read about SO_REUSEADDR, which allows you to bind to an already bound socket… and is available on both Linux and Windows. Problem solved, right? WRONG!

My second mistake:

If you skip the fine-print, you’ll miss that the “reuse” option for sockets does not actually allow two apps to listen on the same port. What is most likely to happen, is one app will get the packets and the other will think it’s listening, but nothing will ever talk to it. Officially the functionality is undeterministic. You can’t use the SO_REUSEADDR option for anything meaningful. What do now?

My third mistake:

I read that there was an exception to the rule. I could get multiple apps to listen on the same UDP port, AND have all the apps receive all the packets…. all I had to do was join a multicast group. I spent another 24-hours trying to get this working…. but it didn’t really. I’m not sure I trust multicast really. I’m not sure what went wrong, but I feel like my network of wireless routers, switches, and virtual machines dropped the ball somewhere. I wouldn’t do multicast unless you had 100% faith in every router and switch in the world to implement multicast functionality 100% correctly… and/or you have a backup mechanism for when multicast fails. Multicast is intended primarily to save bandwidth for things like TV broadcasts when it works, but if it doesn’t work, I think you still need a non-multicast solution. Multicast is NOT intended for device discovery broadcasts. DHCP and ARP are the most common discovery mechanisms, and they function without multicast… I eventually had to conclude that I was barking up the wrong tree.

My final solution:

I finally got it all working with vanilla UDP, however, I learned to obey a few rules/design concepts.
1) TWO UDP listeners are required.

One listener listens on the main port “1212” and shares the socket with any other running apps listening on 1212 on that computer. Even if we can’t be sure the messages will not be sent to another app , this is okay. I’ll explain this a bit later.

The second listener does not share it’s port, but binds to an “ephemeral” port (a random port assigned by the OS). The libraries I use pick an ephemeral port if you specify “0” for the port to bind to. I think most libraries are like this.

2) When sending messages out, use only the ephemeral listener.

This is the only port upon which you have a reasonable guarantee that you’ll get a reply. If I send out any messages over port 1212, the replies I get back might actually go to another app that I might not even realize is running.

3) Only one 1212 listener is required on the whole network, all others are optional

There only needs to be one computer/app on the whole network that successfully binds to port 1212. That computer will keep a list of everything it hears and then reply back to the other apps on the ephemeral ports with what it knows. The whole protocol consists entirely of two messages that I call “herro” and “wasup”.

A typical exchange
(note: the 5-digit numbers are ephemeral ports)
1212: The “master” port – there has to be at least one successful binding of this on the network
54321: The “master”‘s ephemeral port
12345: The “client”‘s ephemeral port

“herro” 12345->1212 — This means “I am looking for skills on the network”
“wasup” 54321->12345 — “I know of some skills here is my list or local and known remote skills”

With all the skills being periodically requested, refreshed, and expired, eventually all the 1212 listeners on the network will have the same skill list.

As you can see the last bit there is the part that required a little bit of thought. I hope that by offering this insight, someone else out there gets more sleep than I did last night. Happy coding!

Leave a Reply Cancel reply