|Home • Broadband - Not So Fast • Road Runner Nightmare|
The problem discussed below was finally corrected late on December 30 (although the Roadrunner operation is still clueless). Turns out that the problem in UUNet has been affecting all Hawaii ISPs that use UUNet for connectivity since at least November! Details of the resolution of the nightmare are on the last page.
December 27, 2000: For more than a week, my connection to the Internet has been driving me crazy - most sites come in must fine, but there are some that are effectively useless. December 28 - no change. December 29 - now even worse - takes 4 minutes 12 second to load the Zonelabs page! December 30 - no change.
This problem affects all Roadrunner West-Hawaii customers - but the affected sites differ from customer to customer. (This is because one of the techniques used in Internet backbone routing uses the IP addresses at each end of the link in a hashing algorithm to select one of multiple paths to route traffic through.) And unlike most Internet slowdowns, this one is constant - 24hours a day 7 days a week unless you get a different IP address. (With dial-up service, a user is likely to get a different IP address on each connection; with a broadband connection, your IP address rarely changes.)
The problem is best described with an analogy: Imagine if whenever you used your telephone to dial your Mother in LA, you get a busy signal. Maybe once in 20 tries, the call goes through, but you get disconnected. Go to your neighbor's house, and you have no trouble! But, your neighbor can't call their family in LA, but can from your phone!7
This all started around December 15 when Roadrunner Hawaii added another DS-3 (45mbps) link to the Internet from a Tier-1 provider, and started "balancing" the network. The preferred routing for servers sending data to West Hawaii users was changed from AT&T to UUNet. [While Roadrunner uses many Tier-1 providers, and outbound data may travel via any of these, the data that comes back to you will route independently and is based upon your public IP address. In Hawaii, Roadrunner assigns a large block of users to addresses that have a common preferred inbound route of either AT&T or UUNet, and "balancing" has involved changing which carriers serve different users: think rolling blackouts that last for weeks.] There are 2 out of at least 12 UUNet routers that introduce extreme latency and packet loss 24 hours a day 7 days a week. The rest work fine. [Unbalanced network.] Unfortunately, the path taken from any particular site to your particular IP will rarely change - so if the route is through a bad router, you'll get less than 28k modem speeds!
For example, right now (27-Dec) it takes 2 minutes and 44 seconds to load www.zonelabs.com !
Look at the trace to the site:
First, ignore the 100% loss and unidentified router - it is normal to encounter routers that do not respond to ICMP packets. Note the latency at all hops except 22.214.171.124 and the Zonelabs server are excellent.
The problem is really not identified well because the routers used in the return path back to me are not identified in a trace. Routers along the way can take different paths on the way back, and I think this is what is happening here. The problem is more easily seen doing a reverse trace. There is a directory of sites with reverse trace tools at www.traceroute.org .From : www.netnostics.com/tracert.cgi To my IP: Result for a24b165n36client148.hawaii.rr.com: traceroute to a24b165n36client148.hawaii.rr.com (126.96.36.199), 30 hops max, 40 byte packets 1 netnostics (188.8.131.52) 1.107 ms 0.960 ms 0.973 ms 2 Serial1-1-1.GW3.DCA1.ALTER.NET (184.108.40.206) 0.896 ms 3.618 ms 1.640 ms 3 522.ATM1-0.XR2.DCA1.ALTER.NET (220.127.116.11) 1.703 ms 1.122 ms 1.024 ms 4 194.at-0-0-0.TR2.DCA8.ALTER.NET (18.104.22.168) 1.546 ms 1.427 ms 1.844 ms 5 115.at-6-1-0.TR4.SCL1.ALTER.NET (22.214.171.124) 62.654 ms 62.692 ms 62.536 ms 6 399.ATM6-0.XR2.PAO1.ALTER.NET (126.96.36.199) 66.238 ms 65.140 ms 65.208 ms 7 188.ATM10-0-0.CR1.PAO1.ALTER.NET (188.8.131.52) 2162.787 ms 2688.173 ms 2532.479 ms 8 197.Hssi4-0.GW1.HAW2.ALTER.NET (184.108.40.206) 2999.107 ms 2832.314 ms 2715.960 ms 9 * * * 10 220.127.116.11 (18.104.22.168) 2462.390 ms 2378.096 ms 2597.500 ms 11 22.214.171.124 (126.96.36.199) 2503.834 ms 2464.486 ms 2326.038 ms 12 188.8.131.52 (184.108.40.206) 2118.566 ms 2131.334 ms 2159.567 ms 13 220.127.116.11 (18.104.22.168) 1977.412 ms 2143.371 ms 2075.853 ms 14 22.214.171.124 (126.96.36.199) 2003.156 ms 2252.479 ms 2206.641 ms
In the above reverse trace, notice hop 7 - PAO= Palo Alto, California; Alter.net = UUNet/MCI Worldcom. This router shows over 2 seconds of latency, and affects all routers beyond it back to me.
Now, a traceroute from the same Netnostics server to my an IP only 1 digit away from mine - .147 at the end instead of .148:Result for 188.8.131.52:traceroute to 184.108.40.206 (220.127.116.11), 30 hops max, 40 byte packets 1 netnostics (18.104.22.168) 0.902 ms 0.532 ms 0.500 ms 2 Serial1-1-1.GW3.DCA1.ALTER.NET (22.214.171.124) 0.739 ms 0.762 ms 0.784 ms 3 522.ATM3-0.XR2.DCA1.ALTER.NET (126.96.36.199) 1.090 ms 1.446 ms 1.854 ms 4 194.at-0-0-0.TR2.DCA8.ALTER.NET (188.8.131.52) 1.563 ms 1.505 ms 1.285 ms 5 115.at-6-1-0.TR4.SCL1.ALTER.NET (184.108.40.206) 62.572 ms 63.878 ms 62.547 ms 6 399.ATM6-0.XR2.PAO1.ALTER.NET (220.127.116.11) 65.206 ms 65.720 ms 65.116 ms 7 188.ATM10-0-0.CR2.PAO1.ALTER.NET (18.104.22.168) 65.098 ms 63.990 ms 66.468 ms 8 197.Hssi5-0.GW1.HAW2.ALTER.NET (22.214.171.124) 114.217 ms 114.917 ms 114.970 ms 9 * * * 10 126.96.36.199 (188.8.131.52) 117.399 ms 120.990 ms 117.616 ms 11 184.108.40.206 (220.127.116.11) 114.320 ms 114.521 ms 114.032 ms 12 18.104.22.168 (22.214.171.124) 123.707 ms 127.555 ms 141.346 ms 13 126.96.36.199 (188.8.131.52) 128.170 ms 125.774 ms 122.968 ms 14 184.108.40.206 (220.127.116.11) 125.154 ms 137.065 ms 127.237 ms 15 * *
There is no problem going to this IP! Notice hop 7 is a different IP address.
OK, so I think I've pinpointed the problem - why do I have to do this? And, even though I've located the problem, how can I do anything to get it fixed? (I also identified a second UUNet router at 18.104.22.168 with the same problem.)
Roadrunner/Oceanic Cable-Hawaii [Time Warner] is my ISP. I've got a ticket number. Supposedly "escalated". I have sent them traces and messages, and they haven't sent a single reply. 'All agents are busy - leave a message' on their phone line. I leave messages, but they don't call back.
Roadrunner 'National' Support online chat - Three sessions, and the best I can get out of them:
"...the router which is in trouble belongs to UUNET, you need to contact them at 800-900-0241 about this problem."
I call that number. There's no option that fits my situation, but I try their tech support anyway. The agent tells me "I'm lucky", he can see a problem and asks me to e-mail traces to email@example.com which I do. They even create a ticket number, and send an automated response to me. 18 hours later, I call back and ask about the ticket number..... I'm told that I'm not a UUNet customer, and they can't help me - Roadrunner needs to contact them. Good luck!
Now, I try the technical contact for zonelabs.com. I get a prompt reply - they don't see a problem! Well, of course they don't because they're tracing from a different IP than the zonelabs server. They operate Brainstorm.com, and I can access that just fine:
While the UUNet router in the middle of the trace still shows trouble (because it is using a 'bad' return path to me), it is not slowing down data moving to Brainstorm; the Brainstorm server's data to me gets routed over a good (different) path back to me! Zonelabs' provider - RCN (who purchased Erol's) actually got back to me, and looked further into the problem. Yes, they see the problem, but although they use UUNet as their provider, the problem is not between them and UUNet, it's between Roadrunner and UUNet - Roadrunner must address the problem.
Since Oceanic Cable/Roadrunner Hawaii still will not even acknowledge the problem, I call Roadrunner's NOC in Herndon, Virginia. On December 26, I'm told they are aware of a problem - that it affects many Roadrunner systems, not just Hawaii, and that their engineers are meeting and will fix the problem. December 29 - still not fixed and I call back again. After 30 minutes on the phone, I convince the tech there's a problem, and he promises to look into it even though he 'could get in trouble' for talking to me - they are only supposed to deal with local Roadrunner Help Desks.
It boggles my mind that first of all UUNet and Roadrunner cannot acknowledge a problem - the first step in fixing it! It boggles my mind that I'm stuck in the middle and this situation - which already has been ongoing for more than a week - has no end in sight!
If I reboot everything including my cable modem, and if I am assigned a different IP address, Zonelabs.com will come in just fine - but then I'm able to start building a new list of sites that act like Zonelabs, because a new set of sites will choose the bad UUNet path back to me!
For example, when I was assigned IP 22.214.171.124, www.dslreports.com/tweaks was affected:Ping stability 2289 152 - - - - - - - 154 * Wide variation in ping, 154 to 2289 * Unusually high average ping time
Out of 10 pings, only 3 made it. But, with the IP address I currently lease, no problem:Ping stability 158 145 153 180 147 155 148 152 164 148 * Quick packet-loss tested ok
Click here for MORE traces