Don’t use USB 3.0 Disks With Windows unless you’re okay with total failure.

USB 3.0 in Windows Server 2012 r2 (and presumably many other flavors of windows) is utterly broken.  It has been broken since USB 3.0 came to market, and despite what the Microsoft Core USB engineering blog will tell you, it is still broken in the worst way possible.   If you haven’t read it, check out this very active blog post from the Microsoft core team, but ignore the part at the top where they claim that all the crap you have to go through to work around the problem “is no longer required“.  I assure you, they still have a long way to go before I’ll certify that USB 3.0 is stable on Windows.

Since USB 3.0 is fast enough to compete with SATA, I figured… why not expand my server with external backup drives?  It was supposed to be simpler than cracking it open and putting SATA drives into the drawers, hot-pluggable, zero down time.

Well… now that I’ve learned my lesson… I’ll tell you “why not”… and also offer you the only viable solution to get around it (which is not the one that M$ suggests).

Anyway… long story short, because I’m too exhausted to write a long blog tonight… here’s what happened when I installed my first dual-drive Vantec USB 3.0 Disk dock with a pair of 3TB hard-drives installed.

One the surface, everything seemed fine… and seemed fine for a long time.  But every once in a while I would run into the occasional glitch…. the source of the glitch wasn’t immediately apparent, so I rebooted or whatever and went about my business… was it the OS? The RAM? The shitty CPU?  Improper cooling?  I didn’t know.  I actually was under-clocking my CPU for a while because I thought it might be a source of the trouble.

Eventually I wanted to add more disks, because… because.  I was working on disk array software and I foolishly made my hyper-v host my guinea pig… bad idea.  Eventually I added a total of 11 disks over USB 3.0… and then the frequency and severity of my problems became more apparent.

USB 3.0 Disks on Windows Server 2012 have terrible power management.  Part of this might be Microsoft’s fault.  But I also blame the hardware manufacturers… or maybe the chipset makers… truthfully I’m not sure.

Basically what happens is that if a USB disk wakes up from sleep, there’s a good chance it will make every device on the entire bus disappear and then reappear (if you’re lucky) at the same time.  So if one of my USB 3.0 disks decided to wake up after going to sleep … all 11 of these harddisks would fail… handles would become invalid… my RAID arrays would lockup or even begin to corrupt themselves.  Eventually I started losing data and entire VHDs and VMs.  It got really ugly.  Eventually I narrowed it down and managed to reproduce the problem reliably (as opposed to randomly after 8 hours of intense file copying).

If you’re having this problem, all the suggestions that M$ has for you on their blog are bullshit.  They may “help” the problem, but my Disk docks put the drives to sleep, regardless of what I set my power settings to or what the registry says.  I only found one viable option, which was to write myself a service that maintained a “nosleep” file using Unbuffered I/O on all the disks I wanted to keep alive.   I then built the service into the iSCSI server I wrote in Delphi… and now my iSCSI server never crashes, my disks never disappear, and everything works peachy.

If you want a copy of my service, comment… preferably beg… I’ll see what I can do for you.  For now it is beer-o-clock.