Why XML is Garbage

XML, the ugly step-child of the uglier HTML, is pervasive everywhere in computers these days.   Since the beginning of time, my bosses have tried to get me to embrace and adopt XML behind the scenes into my designs and since the beginning I’ve been defiant. So what’s wrong with XML?  Why do I hate it so much?

Well, I think a better question is, “what’s right with XML?”.  In the opinion of this programmer, very little is right with XML.  But I suppose I’ll throw it a few bones.

<douches count=3>
<douche firstname="Joseph" lastname="Stalin"/>
<douche firstname="Adolph" lastname="Hitler"/>
<douche firstname="Donald" lastname="Trump"/>
</douches>


The Good

  • XML is human readable and just about any idiot can understand what it means.  However, we all hope we don’t have to work with an idiot in the next cubicle.
  • It is hierarchical.
  • That’s all.  There’s nothing else good about XML.

The Bad

  • Expensive to Parse.  XML is expensive to parse and not at all easy to address.  Back in the early days I developed my own “dot notation” for addressing different parts of XML documents quickly and easily which are similar to some of the techniques used by people today, but I also wondered why the hell a kid in a cubicle in a Minneapolis suburb was having to invent new innovative ideas that I thought a monkey should have invented already.  It was common sense.  There’s not enough of that in the open source community plagued by wanna-be programmers. Have you ever tried parsing a giant XML document, like, for example, the infamous planet.xml which contains a fully annotated map of the planet earth generated by the community?  Try it… then come crying to me and tell me I’m right.
  • “AMP Encoding”.  XML reemploys useful ASCII/Ansi/Unicode characters in ways that make encoding of those characters difficult.  Maybe “difficult” is too strong of a word for it, but we’ve all bumped into those situations where the idiot in the next cubicle builds a document with a body that contains these special characters and forgets to properly encode it.  But the fact remains that there are plenty of unused characters on the ASCII chart… and even more in the unicode space… so why didn’t they use them instead?Traditionally (long ago) markup languages have used the ESC character to denote out-of-band formatting data and commands.  In my opinion this is a bit more technically sound because when dealing with text data… no-one ever uses the ASCII 27 for anything… ever.  Under those old systems, unless you were slipping truly binary data into the stream, you’d basically never have to encode anything ever.  Again… its called “common sense”.
  • Using XML in a read-write random access fashion is prohibitively expensive. XML it isn’t really designed for that nor is it ever used that way.  But why shouldn’t you want to use it that way?
  • Binary Data encoding has no real universal standard.  There are sorta standards, but not really.  Encoding of Binary data should be handled in a universally useful way and specified.
  • XML is “Chatty”.  Full of redundancy.   If you have 50,000 “user” tags in your document, you’ll probably have a document that contains the word “user” 50,000 times.   In most cases its chattiness is acceptable, but it is not universally useful as a result.

If not XML, then what?  What would be better?

Honestly, just about anything would be better than XML.  I resisted transporting data in XML format because I already had a robust suite of code that allowed for the marshaling and transport of literally any kind of data efficiently that was less prone to encoding errors than XML ever was or would ever be.  A simple protocol using binary header blocks is pretty easy for any beginner programmer to deal with these days.  But even a much simpler approach might be more powerful than XML while still being painfully simple to understand…

Lets start with this painfully simple XML document:

<users count=3>
<user firstname="Joseph" lastname="Stalin"/>
<user firstname="Adolph" lastname="Hitler"/>
<user firstname="Donald" lastname="Trump"/>
</users>

Yes, the above document is simple to understand, but it has all the bad things that I listed earlier.

What if we simply stored all this information in name-value pairs?

Users.Count=3
User#0.FirstName=Joseph
User#0.LastName=Stalin
User#1.FirstName=Adolph
User#1.LastName=Hitler
User#2.FirstName=Donald
User#2.LastName=Trump

Yes, the above formatting is a smidge chattier than the XML, however it has several BIG advantages.

1. It is sortable.
2. Since it is sortable, it is searchable using a binary search and therefore lightning fast.
3. It is random-accessible.  You can easily find the value you’re reading, but you can also easily add new values into the middle or the end of the document without problems, particularly if you load it as a string list, dictionary, or other kind of string array of your liking.
4. It is still pretty darn easy to understand.
5. It has no problem encoding characters like “<” and “>”  and ” “.  The only reserved characters are “=” in the key names and CR/LF in the values.
6. There is only one place for data, making it less confusing.  XML has two places for data, in the “body” of a key or as an “attribute” of the key.

*Poof* *Mind Blown*

All brashness aside.  The use of name-value pairs hardly constitutes out-of-the-box thinking, nor it is the most useful and innovative thing that I or a team of engineers could come up with, but that just goes to further emphasize my point that XML is a garbage format.  I swear this planet is just doomed to blindly follow bad standards and it, frankly, drives this programmer mad!

4 Replies to “Why XML is Garbage”

  1. 1) having to delcare array size causes one new entry at the end of the document to require a change at the beginning. Its faster for the reader if you can allocate the memory once and read into it but im skeptical of benifits of reading in abstract types.
    2) people suck at counting huge lists
    3) peoples suck at pluralization. So plurization rules in the data format will be confusing
    4) what will it look like when its a huge multi layered list of lists. Will that be sortable? what about serization of abstract types? How will that work and will reading in a unsorted list work? Because you need to know the type before you allocate the memory.
    5) hierarchical nature of xml combined with text editors that allow for code collapse is very useful. What would a syntax highligher look like for this.

    1. Well… I only put that example in there to force people to think out of the box a little bit. And again… who is reading this? A human? I think not. This output is generated by a computer and read by a computer just as XML generally is. As far as being multi-layered and containing complex types, well… again, really any number of methods could be used to express those types. I pulled this example out of my ass simply to express how I thought XML wasn’t really all that deserving of its place in the world. And regarding how people query databases… don’t get me started on that… I’ll be blogging all night about it….

  2. I disagree profoundly on a few of those notions.
    Let’s start from what I think is right: it is true you should NOT be using XML as a data store. That is simply wrong, Also, there are occasions where XML isn’t best suited as transport and superior alternatives are present, for example working on mobile (with exceptions…) because it’s true that XML is expensive on such CPUs.

    Let me, however, show you an example where XML does not just make sense, but is the only reasonable technology to use. There are some such cases in various industries, but I will stick to the one I know best: healthcare.

    Suppose you’re a nurse needing to administer medication.
    You have a smartphone and fetch the medications for a certain floor.
    Your XML may look like this:

    You will have several patients with several drugs.
    XML lets you – locally – do a number of checks:

    1) That your tray is correct
    2) Know how many patients will be treated
    3) Know how many mg you need of all the drugs
    Plus a lot more information. There is no way around that, it’s just about the best possible way to specify those things and use XQuery to gain local insight. This is but one example, there would be more if I wanted to dig (for example, say that you have several patients waiting in an optical practice for exams or products, again, such format would make product dispatches and exams undertaking a lot faster).

    1. Certainly a reasonable programmer could design any number of systems to transfer and express data to do what you suggest. XML is just one solution that happens to get widely adopted. I’m generally not scared to design and build something new and innovative. In this system you describe, obviously your nurse isn’t going to be reading an XML document… it will be generated by a computer and read by a computer, then formatted in such a way that the nurse can read it… therefore, the manner of transport is completely up to you. I’ve built designs that had transport protocols that were robust and hierarchical in the past and simply manifest themselves as a group of objects once unmarshalled… that’s kinda what people do with XML and JSON these days anyway… so why use an inferior and expensive protocol to transport your data? When the micromanaging CTO came to me demanding to see XML for the data of various requests all I had to do was write a little function to convert my protocol to XML…. job done. I never ever used the XML directly, because that would have just been a big waste of computing resources. My protocol supported all the major primitive types, object hierarchies, and could be extended to represent complex types as-if new primitive types… and binary data always went in perfect and came out perfect.

Leave a Reply to jnelson Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.