vmdksync helps you escape from VMware
Posted: Sun, 13 May 2012 | permalink | 5 Comments
When I wrote lvmsync late last year, I didn’t realise I was being typecast. Before too long, I realised that the logic that I’d implemented for lvmsync would also help me with a separate migration project I’d been dreading – getting the day job off VMware.
Back in the early days of virtualisation, management made the decision to run VMware, for all the usual reasons (“commercially supported!”, “industry standard!”, and so on). Unsurprisingly (to me, anyway) it didn’t take too long for management to realise that it wasn’t the best choice for us. When you’ve got umpty-billion dollars to spend on hardware, software, and support, VMware might be the right option (although Amazon doesn’t seem to think so). Anchor’s company culture, on the other hand, is build around “smart staff, simple systems” over “dumb staff, smart vendors”, because no vendor is ever going to care about our customers as much as we do. So VMware was never going to work for us.
Unfortunately, as happens all too often, once VMware was in place, there was very little motivation to get rid of it and move those customers onto the chosen replacement (that we were deploying all new customers on). I happen to think this is a terrible attitude in general – one that makes life so much harder in the long term. I believe strongly in retrofitting old systems to keep them up-to-date with the current state of the art, and keeping technical debt under control. But, I wasn’t running the show back when we stopped putting new customers on VMware, so the few VMware servers we had stayed around far longer than they should have.
Recently, though, bad things started to happen. The VMware servers were starting to fall apart. The Windows machine we had to keep around to use the VMware management console started crapping out, and when the choice was between doing unspeakable things to Windows, and just ditching VMware… well, it wasn’t much of a choice. The only remaining question was how to do the migration off VMware with the least amount of downtime to our customers.
I was really quite surprised that nobody out in Internet land appeared to have come up with a simple, robust tool to do this. Sure, some vendors had all-singing, all-dancing toolkits that cost ridiculous amounts of money, required you to install their agent on the machine involved, and promised the earth, but it all smelt of snakeoil and bullshit.
In true hacker style, then, I decided to write something myself. The model I came up with mirrored lvmsync’s quite closely – because that one worked, and it turned out to be surprisingly easy to implement once I managed to reverse-engineer the file format (VMware has a PDF spec of a bunch of it’s file formats, but whoever wrote it was enough of an evil genius to make it utterly incomprehensible to anyone who doesn’t already know the file format, whilst making perfect sense to anyone who already does).
The result: vmdksync. It is nothing
but 80-odd lines of ruby whose sole purpose is to take a delta.vmdk file
and write the changes that are stored in that file to a file or block device
that is a copy of the flat.vmdk file that you can copy while the VM is
still running (after you’ve made a snapshot, of course). It helped me
provide a painless migration path away from VMware, and I’d be really
pleased if it helped some other people do the same. Share and enjoy!
The Other Way...
Posted: Sun, 25 December 2011 | permalink | 6 Comments
The profusion of network cables strung through doorways here demonstrates that two drops per sysadmin isn’t anywhere near enough.
What I actually suspect it demonstrates is that Chris’ company hasn’t learnt about the magic that is VLANs. All of the reasons he cites in the longer, explanatory blog post could be solved with VLANs. The only time you can’t get away with one gigabit drop per office and an 8 port VLAN-capable switch is when you need high capacity, and given how many companies struggle by with wifi, I’m going to guess that sustained gigabit-per-machine is not a common requirement.
So, for Christmas, buy your colleages a bunch of gigabit VLAN capable switches, and you can avoid both the nightmare of not having enough network ports, and the more hideous tragedy of having to crawl around the roofspace and recable an entire office.
Rethtool: How I Learned to Stop Worrying and Love the ioctl
Posted: Sat, 17 December 2011 | permalink | 2 Comments
Damn those unshaven yaks…
I’m trying to write a Nagios plugin for work that will comprehensively monitor network interfaces and make sure they’re up, passing traffic, all those sorts of things. Of course, I’m doing it all in Ruby, because that’s how I roll.
So, I need to Know Things about the interface. Everyone does that with ethtool. Right? Sure, if your eyeballs are parsing it. But have you ever tried to machine parse it? To put it as eloquently as possible:
# ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Link partner advertised pause frame use: No Link partner advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) Link detected: yesParse that, bitch!
Or… perhaps not.
At any rate, I decided that it would be most advantageous if I went
straight to the source and twiddle the ioctl until it did my bidding.
And thus, about 5 hours later, was Rethtool born.
Once I worked out a less-than-entirely-crackful way of dealing with C
structs in Ruby (after a bit of digging around, I went with the
appallingly-undocumented-but-sufficiently-featureful
CStruct), and after I finally worked out I
was passing the wrong damned struct to ioctl(SIOCETHTOOL) (speaking of
appallingly-undocumented: fuck you, ioctl, and all your twisty-passages
children), it was smooth sailing.
So, if you’re one of the eight or so people on earth who will ever need to get at the grubby internals of your network interfaces using Ruby (and can’t do it via some sysfs magic), Rethtool is for you.
Misleading error messages from blktrace
Posted: Sat, 12 November 2011 | permalink | No comments
If you ever get an error message from the blktrace tool that looks like
this:
BLKTRACESETUP(2) /dev/dm-0 failed: 2/No such file or directory
Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory
Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory
Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or directory
Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
FAILED to start thread on CPU 0: 1/Operation not permitted
FAILED to start thread on CPU 1: 1/Operation not permitted
FAILED to start thread on CPU 2: 1/Operation not permitted
FAILED to start thread on CPU 3: 1/Operation not permitted
Don’t be alarmed – your disk hasn’t suddenly disappeared out from
underneath you. In fact, it means quite the opposite of what “No such file
or directory” might imply. In fact, it means that there is already a
blktrace of that particular block device in progress, and you’ll need to
kill that one off before you can start another one.
Thank $DEITY for the kernel source code – it was the only hope I had of diagnosing this particular nit before I went completely bananas and smashed my keyboard into small pieces.
rsync for LVM-managed block devices
Posted: Fri, 28 October 2011 | permalink | 12 Comments
If you’ve ever had to migrate a service to a new machine, you’ve probably
found rsync to be a godsend. It’s ability to pre-sync most data while the
service is still running, then perform the much quicker “sync the new
changes” action after the service has been taken down is fantastic.
For a long time, I’ve wanted a similar tool for block devices. I’ve managed
ridiculous numbers of VMs in my time, almost all stored in LVM logical
volumes, and migrating them between machines is a downtime hassle. You need
to shutdown the VM, do a massive dd | netcat, and then bring the machine
back up. For a large disk, even over a fast local network, this can be
quite an extended period of downtime.
The naive implementation of a tool that was capable of doing a block-device
rsync would be to checksum the contents of the device, possibly in blocks,
and transfer only the blocks that have changed. Unfortunately, as network
speeds approach disk I/O speeds, this becomes a pointless operation.
Scanning 200GB of data and checksumming it still takes a fair amount of time
– in fact, it’s often nearly as quick to just send all the data as it is to
checksum it and then send the differences.1
No, a different approach is needed for block devices. We need something that keeps track of the blocks on disk that have changed since our initial sync, so that we can just transfer those changed blocks.
As it turns out, keeping track of changed blocks is exactly what LVM snapshots do. They actually keep a copy of what was in the blocks before it changed, but we’re not interested in that so much. No, what we want is the list of changed blocks, which is stored in a hash table on disk.
All that was missing was a tool that read this hash table to get the list of blocks that had changed, then sent them over a network to another program that was listening for the changes and could write them into the right places on the destination.
That tool now exists, and is called
lvmsync. It is a slightly crufty
chunk of ruby that, when given a local LV and a remote machine and block
device, reads the snapshot metadata and transfers the changed blocks over an
SSH connection it sets up.
Be warned: at present, it’s a pretty raw piece of code. It does nothing but the “send updated blocks over the network”, so you have to deal with the snapshot creation, initial sync, and so on. As time goes on, I’m hoping to polish it and turn it into something Very Awesome. “Patches Accepted”, as the saying goes.
-
rsyncavoids a full-disk checksum because it cheats and uses file metadata (the last-modified time, ormtimeof a file) to choose which files can be ignored. No such metadata is available for block devices (in the general case).↩
UPSes in Datacentres
Posted: Tue, 23 August 2011 | permalink | 3 Comments
(This was going to be a comment on this blog post, but it’s a Turdpress site that wants JS and cookies to comment. Bugger that for a game of skittles.
Rimuhosting’s recent extended outage due to power problems was apparently caused by a transfer switch failure at their colo provider. This has led people to wonder if putting UPSes in individual racks is a wise move. The theory is that in the event of a small outage, the UPS can keep things humming, and in an extended outage you can gracefully shut things down rather than having a hard thump.
I happen to think this theory is bunkum. Your UPS is a newly instituted single point of failure. I’d be willing to bet that the cost of purchasing, installing, and maintaining the UPSes, as well as the cost of the outages that inevitably result from their occasional failure, would be far greater than the cost of the occasional power outage you get in a well-managed facility.
Good facilities don’t have small outages. They don’t have squirrels in the roof cavities, and they don’t have people dropping spanners across busbars. The only outages they have are the big ones, when some piece of overengineered equipment turns out to be not so overengineered – the multi-hour (or multi-day) ones where your UPS isn’t going to stop you from going down. Your SLA credit and customer goodwill is already toast, so all you’re saving is the incremental cost of a little bit more downtime while you get fscks run.
If you want the best possible power reliability, get yourself into a really well engineered facility, and run dual-power on everything. Definitely run the numbers before you go down the UPS road; I’ll bet you find they’re not worth it.
Oh HP, you Bucket of Fail
Posted: Tue, 23 August 2011 | permalink | 9 Comments
I recently got given a new printer, a HP LaserJet “Professional”1 P1102w. It’s fairly loudly touted on HP’s website that this printer has “Full” support under Linux.
And yet, it won’t work with my Linux-based print server. Why? Because it uses a proprietary driver plugin, and that plugin is only available for x86 and amd64, and my print server is ARM-based. Well done, HP. You’ve managed to revive the old “all the world’s a VAX” philosophy, on an OS that is more than capable of running on practically anything. You got that for free. Why do you insist on screwing with it?
As an added bonus, when I try to “Ask a Question” on the HPLIP website, to politely (ha!) inquire as to the possibility of an ARM binary, I get sent to Launchpad, which does nothing more than tell me that there is an “Invalid OpenID transaction”. That’s the entire content of the page. Useful.
Lies, damned lies, and a double helping of proprietary software fail. My day is complete.
-
I use scarequotes around “Professional” because, as far as I can tell, this is just an entry-level personal laser printer. There is nothing particularly professional about it.↩
Unintended Consequences: Why Evidence Matters
Posted: Sun, 21 August 2011 | permalink | 1 Comment
If you were trying to get rid of hiring discrimination (on grounds irrelevant to the ability to do the job), you’d think a good way to do it would be to reduce the ability of the hiring manager to discriminate, by restricting their access to irrelevant (but possibly prejudicial) information. It’s certainly what I might come up with as an early idea in a brainstorming session.
I’m not alone: France had this same idea, and gave it a go, by passing a law requiring companies to anonymise resumes before they got to any decision makers.
So far, so average. But rather than just coming up with an idea and inflicting it on everyone by a blanket law, they did what should be done with all new ideas: they trialled it (with 50 large corporations, according to the report) before making it universal, to make sure that the theory matched reality. Then, after giving it a good shake, they examined the evidence, and found that the idea had some unintended consequences:
Applicants with foreign names, or who lived in under privileged areas were found to be less likely to be called in for an interview without the listing of their name and address. Researchers reasoned that this was because employers and recruiters made allowances for subpar presentation or limited French speaking if their performance could be explained by deprivation or foreign birth.
The icing on the cake is that now the evidence is in, they’re now planning on making it “optional” (I’m not sure how that’s different from killing it entirely, but I guess it’s worth the same in the end).
So we’ve got the quinella of decision-making awesome:
- An idea was had
- A trial was run
- The evidence was examined
- When the evidence didn’t support the idea, the idea was abandoned
Far too often, we get far too attached to our ideas, and don’t let them go when reality doesn’t fit our preconceptions. Kudos to the people involved in this idea for not letting their egos get in the way of good government. Let it be an object lesson for us all.
Stream of Consciousness
Posted: Fri, 19 August 2011 | permalink | No comments
This forum post on requiring formal letters of resignation made me smile:
HR does silly stuff like this all the time. Somebody’s following some policy that was created because somebody verbally resigned nine years ago and then wanted to come back and some executive said where’s their letter and HR said we don’t have one and the exec said that’s not good and we oughta not be doing stuff to help people leave unless they’re really leaving and HR said okay we’ll have a policy and the exec said that’s good.
And the exec’s not there anymore.
I’ll leave everyone to make their own conclusions as to why I was reading that particular thread.
Using a Local Root Zone with djbdns
Posted: Sun, 7 August 2011 | permalink | No comments
In my continuing war on the effects of craptastic mobile Internet connectivity, I came across a suggestion to host a local copy of the root zone alongside your local DNS resolver. It’s an interesting idea, so I’ve decided to give it a go, despite the potential problems (I’m confident I can manage the risks).
I was surprised to find that nobody had a guide on setting this up using djbdns1 so… I’ve written one.
If you’re thinking of doing this yourself, heed some words of caution: It is imperative that you keep your local cache up to date. If you set this up, and don’t maintain it, you will have a slow, gradual degradation of Internet service as the live root zone diverges from your local, out-of-date cache.
If you set this up locally, just for yourself, that’s one thing; all you’re
doing is breaking your own machine. If you want to do this for the ISP you
run, though, you’re doing your customers a grave disservice if you don’t
automate the cache update, and setup some means of monitoring that your
cache is kept up to date (a SOA check against the live roots, or at least
a check to make sure that your data.cdb file is no more than a couple of
days old).
The Design
For simplicity, I decided to run a dedicated tinydns instance that only
serves the root zone. This makes it easy to periodically refresh the root
zone that I serve with a script, which I run daily, without needing to
integrate with the database of any other tinydns instances I’ve got
running (I have a couple on my laptop for testing). I’ve set this up on an
arbitrary loopback address (127.53.53.53), so it’s inaccessable from
anywhere other than localhost, and so my local dnscache instance just
forwards root zone requests to it.
Setup the infrastructure
-
Install gnupg (make sure you’ve also got the
gpgvutility) and the necessary tools to build a minimal C program (such asbuild-essentialon Debian). -
As the user you’re going to run the daily update script as, run the following:
gpg --primary-keyring ~/.gnupg/trustedkeys.gpg --recv-keys 20E3C425 -
Build/install
https://github.com/derat/bind-to-tinydns, because the root zone is provided in BIND zonefile format, and… we don’t want that.git clone git://github.com/derat/bind-to-tinydns.git btt cd btt make sudo cp bind-to-tinydns /usr/local/bin/ -
Setup your local root-zone-only tinydns (these commands assume my local structure for daemontools-using programs; adapt to suit)
sudo tinydns-conf tinydns tinydns /var/lib/service/tinydns-root 127.53.53.53 sudo ln -s /var/lib/service/tinydns-root /etc/service/tinydns-root -
Since some root zone records can be too large for a standard DNS UDP packet, you’ll need to have an
axfrdnsrunning as well; this is pretty straightforward too:sudo axfrdns-conf tinydns tinydns \ /var/lib/service/axfrdns-root /var/lib/service/tinydns-root 127.53.53.53 echo 127.0.0.1:allow |sudo tee /var/lib/service/axfrdns-root/tcp sudo tcprules /var/lib/service/axfrdns-root/tcp.cdb \ /var/lib/service/axfrdns-root/tcp.tmp < /var/lib/service/axfrdns-root/tcp sudo ln -s /var/lib/service/axfrdns-root /etc/service/axfrdns-root -
Let the user who will be running the daily update script update the
data.cdbfile:sudo touch /etc/service/tinydns-root/root/data.cdb sudo chown someuser /etc/service/tinydns-root/root/data.cdb
You’ve now got a minimal tinydns suitable for serving a local cache of the
root zone to anyone on your local machine who asks. But where’s the data?
Script the root zone processing
The following script should do the job nicely. Drop it somewhere useful and
chmod a+x it. If you put your tinydns somewhere else, change the
TINYDNS_DATA variable at the top.
Run it once by hand to “seed” your root cache, then add it to cron for a nightly update.
#!/bin/sh
set -e
TINYDNS_DATA="/etc/service/tinydns-root/root/data.cdb"
###########################################################################
WORKDIR="$(mktemp -d)"
trap "rm -rf ${WORKDIR}" EXIT
cd "$WORKDIR"
wget -q http://www.internic.net/domain/root.zone.gz
wget -q http://www.internic.net/domain/root.zone.gz.sig
if ! gpgv root.zone.gz.sig root.zone.gz >/dev/null 2>&1; then
echo "Root zone signature validation failed -- this is probably
really bad" >&2
exit 1
fi
gzip -d root.zone.gz
egrep -v '[[:space:]]IN[[:space:]]+(RRSIG|DNSKEY|DS|NSEC)[[:space:]]' root.zone \
| /usr/local/bin/bind-to-tinydns . data btttmp
tinydns-data
cp data.cdb "${TINYDNS_DATA}"
Test
The simplest test, to make sure you’ve got everything running, is just to request something from the root zone:
dig @127.53.53.53 com IN NS
If you get something useful (compare against dig com IN NS for a sanity
check) then everything’s probably working well.
Point dnscache to your local root server
echo 127.53.53.53 >/etc/service/dnscache/root/servers/@
svc -k /etc/service/dnscache
And you’re away.
-
For all it’s oddities, it’s a very tidy piece of software, and takes up so little resources on a modern system that it’s presence is practically invisible – it uses less memory than
init.↩