RFC4695 – “RTP MIDI”: A Protocol Not Fit for Consumption.

I have a pleasure of working for an eccentric start-up that deals with music applications and devices.  A requirement of a couple of projects I’m working on are that they be able to send MIDI data (Musical Instrument Digital Interface) wirelessly to a Mac, iPAD, or iPhone.

There are lots of ways to do this and a few different open standards out there.   Of all the protocols available Apple chose RTP-MIDI as its standard for wireless MIDI communication between iPADs, iPhones, and Mac computers.

Take a brief look at RFC4695.  Don’t read the whole thing, but scroll all the way to the bottom of the thing and get a feel for how massive this protocol is.  It is bloated and overbuilt.  All the building and journaling and logging is said to be there for the purpose of recovering lost notes, preventing stuck notes,  and making errors due to packet loss musically imperceptible.

If you take the time to study the document and really digest what it says through all it’s Bill Clintonesque redefinition of common words, you will, like I eventually did, be forced to conclude that the protocol accomplishes absolutely nothing of what it sets out to accomplish, and accomplishes this “nothing” in the most round-about and painful way imaginable.

If “RFC” stands for “Request for Comments”, whhell.. I have quite a few for ya’ over here.

Lets break it down into the real goals of what RTP-MIDI was supposed to accomplish and how it completely missed the mark.

Goal #1 – Get MIDI Data From Point A to Point B.

Status: Failure – Sure some Data gets to Point B from Point A.   But really, the data that gets to point B isn’t at all guaranteed to be what was sent from Point A.  Most of the subtle differences shouldn’t be a problem in the real world, but I have to ask, “Why does there need to be any subtle differences at all?”.   Another great question would be, “What was so wrong with the data before it was sent.  Why should we go through all this effort to reformat this data into something more-or-less completely different?  Why is it even necessary to reinterpret the MIDI data before sending it?  Why can’t it just be an array of bytes, or an array of bytes with time stamps if you really need into-the-future note support?

The only thing that was really potentially missing from the raw data to begin with was the ability to encode notes into the future,  a feature only needed if you’re playing back prerecorded files.  Every packet I’ve ever seen in wireshark simply includes a time-stamp of “0” and I’ve never seen a MIDI file player actually take advantage of the time stamps.

The easiest and most effective way to get data from Point A to Point B is to just send the data, log it, and resend the logs with all new packets until they are “cleared” via some kind of ACK system.  There should be no interpretation of the data required.  Instead, RTP-MIDI forces the sender to reconstruct the running status of every note sent out, which then leads to other subtle rule requirements regarding timing messages, all of which is pointless.  If you’re like me, and working with embedded devices with limited SRAM and consumer-market production price requirements, all the double buffering of data adds to the cost of our products in the form of increased RAM requirements.

RTP-MIDI packet sizes can vary greatly, but typically Wireshark reports packet sizes of 130-300 bytes, just to encode 1-6 notes.  Packet size varies depending on how far back the journal looks in the “checkpoint history”, so you could be sending a single note at a cost of 400 bytes or 55 bytes depending on the context.    Traditional MIDI, an ancient protocol from the early 80’s running over serial lines at 31,250 baud, requires 2 or 3 bytes to represent a single note (depending on whether running status is being used).  

To be fair, for the most part, if you study the way the command section is formed, RTP-MIDI takes reasonable care to ensure that the redundant running status bytes can be safely removed when the stream is rebuilt by the receiver, SysEx messages are received in order, and system-level timing synchronization messages are not moved more than a byte or two from their original positions in the MIDI stream when the stream is rebuilt.  In fairness, this section is put together with the meticulousness of an obsessive compulsive person, hellbent on ensuring that the design is clean, and nothing is altered.

However, all this meticulousness is missing from the Journal section, which is used when a packet is lost.

The journal section, for some strange reason, requires the sender to go through a long-winded process of separating the MIDI data out into a series of “Journal Chapters”.  There’s the “Chapter N” journal for notes, the “Chapter S” journal for system messages, “Chapter X” for System Exclusive messages.   The format of this journal destroys the command ordering of the original commands, and forces the sender to split out all 16 channels of MIDI data into separate sections.   All of this is completely, utterly, pointless.  The implementation of this journal is arbitrarily complex, so complex that even the smart people at Apple seem to have serious bugs in their implementation of the recovery.

There’s no technical reason for arranging the journals this way. The simplest way to recover this lost data would be to include it verbatim in all future packets until acknowledgement is received by the sender.  There is absolutely no reason to split the journal into 16-journals for each channel.  There is no reason to split out the notes, controllers, wheel messages, and patch changes into separate “Chapters”.   I’m sure they were thinking about minimizing packet sizes, but I assure you, that they way the journals are currently implemented adds incredible size to what would otherwise be a very simple packet.  Maybe it would be prudent to not include 700 wheel commands in history if they happened to build up, but all of this could be accomplished without dramatically altering the data format with a simple scrubbing of the MIDI 1.0 data stream.

I can post YouTube videos all day long demonstrating RTP journal recovery failures on iPADs, but I’m not blaming Apple for having bugs in their RTP-MIDI implementation.  Apple took a severely flawed, bloated protocol and was brave enough to attempt to implement it… I don’t blame Apple, I blame the protocol.  If Apple, with its infinite resources, can’t figure out the RTP MIDI journals, why should any other company even bother?  In the end, ultimately, I could not allow our own product to be at the mercy of this protocol. We implemented RTP-MIDI.  Our product supports RTP-MIDI.  But it became very apparent from our testing that we needed another protocol to come to the rescue and rolled our own.

 

Goal #2 – Recover Missing Notes —  Status: Failed

RTP-MIDI can recover some missing notes.   But it deliberately choses to let some notes disappear into a deep dark abyss.   How many notes are dropped depends on network traffic, but I can assure you that on a 2.4Ghz wireless network in a room full of smartphones, packets will get dropped, and dropped often.  RTP-MIDI chooses to recover these by including a Journal with every packet sent that informs the receiver on the other end what it is missing when a packet is dropped.  The receiving end can then choose to play any notes that were missed, however, the journal never reports when two of the same note are dropped. You could argue that reporting that two identical notes were missed is pointless, but I’d also argue that discarding such information is not up to the sender but the receiver. The receiving end can decide to include logic to remove such double taps, but it might also choose to reconstruct the stream in a different way.  Nevertheless, I’m opposed to the destruction of data when its destruction serves no practical purpose.

I think, given the real-time nature of MIDI, that the RTP-MIDI standard should be much more proactive in demanding acknowledgements from the receiver.  Situations often occur where notes are played significantly late simply because there were no other events going on at the same time.  For example lets play a C-Major scale (without releasing and of the notes, so note-on events only).  We’ll play and hold “C”,”D”,”E”,”F” etc.  What happens if the packet containing the “D” is missing?  Generally speaking, the “D” will be sent when the “E” is sent and, as a result, you’ll hear two notes at the same time.  If you’re playing your C scale slowly, the “D” will be very, very late.  RTP-MIDI makes no effort to proactively correct missing data packets.

Goal #3 – Prevent Stuck Notes – Status: Should mostly work as designed, but Apple’s implementation is definitely buggy.

RTP-MIDI’s implementors love to advertise how their protocol solves the dreaded stuck-note issue that can occur if your connection goes wonky either due to intentionally disconnecting your ethernet cable or router failure.  Supposedly RTP MIDI fixes notes that are stuck on.   It accomplishes this via a series of “Offbits” in the journal packets.  The offbits destroy the note-off velocity,  although I’ve never seen a velocity used on a note-off command, I hear it can be quite useful for wind instrument and string instrument simulations as it allows you to alter the release time of a note off command.  RTP-MIDI destroys this data when used in the journal.    I know I sound a bit like an alarmist to bother worrying about data being destroyed that no one ever really uses, but, in general, I just don’t see a reason to destroy this information nor be forced to reinterpret its meaning.

As designed, RTP-MIDI should prevent notes from being stuck on forever,  but, as-implemented, I can demonstrate that Apple’s implementation does not recover the journal correctly and therefore it doesn’t work in practice.   Again, I think the arbitrary complexity of the protocol has come around to bite its own ass. The system would be more likely to work if it wasn’t over-thought.  I think the implementors expected dropped notes to be considerably more rare than they actually are, and when dropped packets are too frequent, the journal recovery system fails.  There are certain situations where the order of messages is important, and if the Journal system is destroying the proper ordering of messages, some applications might fail.

in conclusion, I can’t recommend the blight of this protocol on anyone.  I’m really surprised that Apple uses it at all, but I’m sure they were simply trying to follow the MIDI manufacturer’s association to create some kind of synergy among MIDI instrument companies.  This protocol was the bane of my existence for some time, and although we replaced it with our own,  we still have it for you to use at your own risk in our current product offerings, although I would never recommend it.  Instead our simple UDP protocol does a better job recovering lost data, stuck notes, lost notes, and gets the data to the target faster and more proactively with far far fewer lines of code.

6 Replies to “RFC4695 – “RTP MIDI”: A Protocol Not Fit for Consumption.”

  1. Hi – I am currently working on an application running on the mbedOS RTOS that requires sending MIDI data wirelessly, and as I could find no RTP-MIDI implementation available for me to use I am working on implementing my own. Your article’s criticisms certainly ring true for me, following the short time I have been exposed to RFC6295, and I was wondering if I am able to get more details regarding your ‘simple UDP protocol’ ? Many thanks,

    Hari

    1. The struggle sounds real, Hari! Diving into the abyss of MIDI communication, huh? I totally see your pain point with RTP-MIDI. It’s like trying to assemble a plane when you just need a paper airplane, right? A simple UDP-based protocol can be a real lifesaver; stripping down to the essentials. Why haven’t we all rebelled against this bloated monster yet, no? You got more specifics on what you’re aiming for with your implementation? Share the deets, maybe we can brew up some ideas!

      1. A bloated monster indeed, haunts the MIDI stream’s flow. Yet simple UDP may rise, like a phoenix’s glow. Rebellion’s seed, silently sown; watch as through discourse, it’s grown. Tell us more, I’m all ears, for your protocol not yet shown.

  2. 100% agree with all of your comments and I’d add that there are further horrors inside the RFC. For example: having four (!) different ways of encoding the delta time for each command. Another example of trying to save bandwidth at the expense of complex processing for both the sender and the receiver.

    1. We ended up creating and patenting an alternative protocol we called “Z-Fi”. But eventually we abandoned WiFi altogether in favor of Bluetooth MIDI.

      Z-Fi kept the original MIDI data tagged with sequence numbers that would be resent if not acknowledged and proactively seek ACKs from the other end. If a packet was dropped it typically resulted in a totally imperceptable delay. Additionally, since Z-Fi was running private hotspot, using broadcast messages for every packet would ensure that Wifi chips didn’t attempt to retransmit UDP packets on their own time (which could add delays up to 75ms). Instead we could simply rebroadcast pending data resulting in negligible timing delays.

    2. We ended up creating and patenting an alternative protocol we called “Z-Fi”. But eventually we abandoned WiFi altogether in favor of Bluetooth MIDI.

      Z-Fi kept the original MIDI data tagged with sequence numbers that would be resent if not acknowledged and proactively seek ACKs from the other end. If a packet was dropped it typically resulted in a totally imperceptable delay. Additionally, since Z-Fi was running private hotspot, using broadcast messages for every packet would ensure that Wifi chips didn’t attempt to retransmit UDP packets on their own time (which could add delays up to 75ms). Instead we could simply rebroadcast pending data resulting in negligible timing delays.

Leave a Reply to Hari Limaye Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.