Thanks to all for the suggestions all.

While winmtr traces have a lot of packet loss (Assuming ICMP low priority/blocking), another tool suggested was mturoute. Whilst this tool showed that mtu size was not the problem, what it did show is that the end server stopped responding when I broke the connection, but all other hops were fine.

Which doesn’t explain why this issue is replicable to other completely unrelated sites. I have checked blacklist sites and our IP appears on none of them, so the mystery is … still a mystery.

 

Anyway, I added an outbound NAT rule to our FW to translate http traffic to 210.48.17.18 and that has gotten around the problem.

 

Cheers

Julian

 

From: nznog-bounces@list.waikato.ac.nz [mailto:nznog-bounces@list.waikato.ac.nz] On Behalf Of Julian Maxwell
Sent: Friday, 16 December 2011 4:54 p.m.
To: nznog@list.waikato.ac.nz
Subject: [nznog] Something out there hates our IP address

 

Hi Guys,

 

Bit of a lurker – not a poster. Hai!

 

All of this week, our office network here has been experiencing an odd issue of sorts. It goes sort of like this:

 

The NAT’d IP address for our office network is 210.48.17.17. At random times, international destinations stop working, usually for a period of a few minutes.

For example, when I first came across the problem I went to download winmtr to try and help diagnose the issue. I then discovered that winmtr was a site that is effected by this problem, so it became my guinea pig site.

So using winmtr.net as an example site, the following happens:

 

Ping’s work to winmtr.net fine.

The website works fine.

I go to download the winmtr file… however the download gets interrupted (TCP RST) at between 600kb and 1100kb. This is guaranteed to happen and I have replicated it time and time again.

When the download gets interrupted, the pings stop working and I can’t access the website anymore.

After a period of some minutes, I can again access the website – however the pings remain unreplied. It seems they remain unreplied until I stop the ping, and then after a few minutes I can resume them…as if they are blocked indefinitely until I stop the requests and then after a period they are allowed again….some sort of flood control?

Here’s a SS of the TCP dump explaining the above. note that the ICMP Echoes have no corresponding reply packets:

 

 

I have ruled out our office firewall as the cause as it happens when I plug in a laptop on the same address, ruling out the firewall.

If I change the firewall WAN Ip to anything but 17.17 (Ie: 17.18) everything works fine!

 

Any other device on this subnet has no issues – it is ONLY 17.17.

 

We do have a netenforcer shaper in the path – however I have ruled that out as we bypassed it for a while and the issue still existed.

We don’t want to have to change the WAN IP to something else as there are a whole bunch of inbound connections going to that IP. But at this stage I can’t think of anything else we can do?

I have contacted our upstream peer and they say there is no Dos/flood control or anything of sorts on the circuit.

It’s not just winmtr, but about 70% of international sites we try to access.

 

I have run winmtr (I downloaded it via a VPN, in case you’re wondering) and interestingly the path doesn’t change much when I break the connection to winmtr.net – however the PL is about 100% anyway after hop 7 so it’s not very revealing.  

 

So, what say you NZNOGgers that are waiting for 5pm to tick over and drink those Christmas beers?

 

 

Julian Maxwell