VLAN adventures
Posted: Fri, 29 December 2006 | permalink | No comments
Have you ever had a need for a large number of "real" network interfaces on a Linux box? For security reasons, you don't just want to hook everyone up to a single switch and use virtual interfaces (because it's too easy for one of the connected devices to play silly buggers with his neighbours). Up to a certain point, you can stuff more NICs into a box, and when it fills up you can switch to those outrageously expensive 2- and 4-port NICs. But eventually you're going to run out of space[1] or money. At that point, you need something more interesting.
For me, this week, that "more interesting" has been Ethernet VLANs. Linux Journal has a good intro to the technical aspects of how to setup VLANs and Linux.
VLANs are some crazy stuff. They're implemented by wrapping a little number around every Ethernet frame indicating which VLAN the packet is "on", and the switch (you need a higher-end device, appropriately configured) makes sure that packets don't roam onto ports they're not meant to be on. For a single switch, this isn't very interesting, and you don't really need an IEEE standard to make it work -- just a lot of little jumpers. However, VLANs are mostly useful on larger networks, which tend to use more than one switch. So you need a way for your VLANs to roam amongst your switches -- hence the packet tagging. You stick the tag onto the packets and fling them off to another switch, and packets coming in with the tag are corraled into their VLAN.
Naturally, you don't want end-user devices whacking their own VLAN tags onto packets (untrusted data, and all that), so you define most of your ports as being "untagged". Packets that come in on these ports are treated as regular ethernet frames and tagged with the appropriate VLAN ID, and packets that go out have their VLAN IDs stripped before being sent. (This implies that you can tunnel VLANs in VLANs, and I'm pretty sure that is doable in most switches, modulo some nasty problems that can, and probably will, occur). The ports that aren't designated "untagged" spit out tagged packets -- and this is where my interest lies.
Like most every other computer-related standard, Linux can decipher and work with VLAN tags (the 8021q module is what you're looking for). You just load up the module, hook your NIC into the appropriate port on your very expensive switch, define your VLAN tags with vconfig, and suddenly you've got vlanN interfaces, which map to real RJ-45 ports, and you can hang network devices off these and get lots and lots of "real" network interfaces on your Linux router.
It's all so simple, right? Nothing ever is, of course. The trials and tribulations I've had over the past two days have included:
- Not being able to talk to the switch via serial console (so I can configure the damned thing). Turns out the cable I had wasn't a null modem cable, and the switch needed one (the documentation was contradictory on this point, so we started off with the regular kind and a gender-bender -- which should have been my first clue). So Jaycar got ten more of the company's dollars for the proper cable. After that, all was well, and I could find out that...
- The console interface is horrendous on the switch I've got, at least for defining VLANs. I just enabled the (slightly less hideous) web interface, and used that instead. It still sucked, but it sucked less.
- Once everything was configured and the test rig all wired up, I spent about 20 minutes plugging cables into various ports on the switch and watching a distinct lack of packets flowing. It turns out that new VLAN trunks (the technical term for "a port that gets all VLANs, with tags still attached") on this switch get configured as "disabled". Yes, of course that's what I want -- it's overwhelmingly likely that I won't want to use that new trunk I just configured. Sheesh.
- The first NIC I used as the VLAN interface on the server (a Tulip-based home brand thing) didn't play nicely with the VLAN packets. As you'd expect, the extra bytes for the VLAN ID need to be a part of the packet -- and that size is normally limited to 1514 bytes (1500 bytes of payload plus a 14 byte Ethernet header). Adding the VLAN tags increases this by 4 bytes. Although some NICs comprehend the utility of supporting VLANs, and will handle these oversized 1518 byte frames, others don't. This NIC just didn't want to handle it. The symptoms were large pings just disappearing (they didn't show up in tcpdump on the destination interface). As I've had to track down MTU problems in the past, and it's always painful, I wasn't keen on introducing a known one right at the start, so this had to be rectified.
- The next NIC I tried (a Netgear FA311 -- uses the natsemi driver) had a fantastic failure mode. Packets arrived having had their VLAN tag stripped -- so the 8021q module didn't know it was a VLAN packet, and couldn't drop it into the appropriate vlanN interface. I know everything was otherwise alright, because ARP packets were tagged to the VLAN fine, but the ICMP packets that followed weren't tagged. Scratch that one off the list.
- The third NIC was an RTL8139D. The driver didn't even recognise this card, which confused the hell out of me. It turns out that this particular card (PCI ID 1904:8139) isn't a real RTL8139 (despite having RTL8139D and the RealTek crab printed on the chip) but is rather a dodgy knockoff (some might say "as opposed to the dodgy original?"). Why anyone would bother to knock-off the world's cheapest ethernet chipset, I have no idea -- if I were an electronics counterfeiter, I'd go after something expensive, like a 3Com card. Far more profit.
- The fourth NIC, a real RTL8139D, worked like a charm. I know it isn't exactly the most fantastic NIC on the market today, but I don't have any 3x59x or eepro cards around, so this'll have to do for now.
Thankfully, once all those dramas were done, the whole thing just kind of fell into place, and now I've got a machine which has up to 2048(ish) network interfaces (with huge, glaring caveats on that upper bound, of course). With a bit of luck, the live installation in a couple of days will go smoothly.
You can stop sniggering now.
1. In this case, there wasn't a lot of space to begin with, as it's a 1RU box I'm working on, and the 4-port NIC didn't want to place nice with me for some reason anyway.
Post a comment
All comments are held for moderation; markdown formatting accepted.