
|
View Full Version : ARP overflow? Help!
magnafix 02-16-2002, 01:03 AM Been working on this a few hours now with no useful results. RedHat 7.2, custom 2.4.14 kernel. Been running trouble free as our primary DNS server for months. Then tonight,
Feb 15 15:13:27 outlaw kernel: Neighbour table overflow.
Feb 15 15:13:32 outlaw kernel: NET: 238 messages suppressed.
Feb 15 15:13:32 outlaw kernel: Neighbour table overflow.
Feb 15 15:13:37 outlaw kernel: NET: 244 messages suppressed.
Feb 15 15:13:37 outlaw kernel: Neighbour table overflow.
A few searches pointed us at the ARP cache. So we raised the ceiling on the ARP cache. 'arp -n' shows hundreds and hundreds of remote machines using the MAC address of our border router. No other machine on the network has remote addresses like this.
'tcpdump arp' shows lots of arp requests (I think) from our nameserver (outlaw) out to nameservers on the net:
21:49:33.079962 arp who-has ns01b.nameservers.net tell outlaw.modwest.com
21:49:33.080944 arp reply ns01b.nameservers.net is-at 0:e0:1e:b4:62:70
21:49:33.210835 arp who-has dns-rl02.proxy.aol.com tell ns1.missoulaweb.com
21:49:33.211809 arp reply dns-rl02.proxy.aol.com is-at 0:e0:1e:b4:62:70
21:49:33.636394 arp who-has resone.univ-rennes1.fr tell outlaw.modwest.com
21:49:33.637334 arp reply resone.univ-rennes1.fr is-at 0:e0:1e:b4:62:70
These same remote addresses are also in our ARP cache now.
Rebooting the box doesn't help.
Possibly related -- had some trouble with named restarting as well.
Thanks for any pointers. Everything seems to be working, but this surely isn't 'normal'.
CagedTornado 02-16-2002, 02:42 AM You're not running a Cisco router are you? We had a VERY strange problem at my work about 2 weeks ago, where our PIX firewall was literally poisoning the ARP cache on some of our boxes by responding to ARP requests incorrectly. Just a thought...
Dan
magnafix 02-16-2002, 04:38 AM Gateway router is a Cisco 3640. How is that significant?
magnafix 02-16-2002, 09:48 PM Can someone else running Linux check their arp cache ('arp -n')on their primary nameserver and see if you have many remote IPs with the MAC address of your router, just in case this is normal nameserver behavior?
Thanks!
allan 02-16-2002, 10:58 PM Originally posted by magnafix
Can someone else running Linux check their arp cache ('arp -n')on their primary nameserver and see if you have many remote IPs with the MAC address of your router, just in case this is normal nameserver behavior?
Authoritative name server:
[root@ns1 root]# arp -n
Address HWtype HWaddress Flags Mask Iface
x.x.x.1 ether 00:90:B1:8E:9C:71 C eth0
Caching-only name server, on local network:
[root@test root]# arp -n
Address HWtype HWaddress Flags Mask Iface
192.168.0.2 ether 00:01:02:23:34:80 C eth0
192.168.0.1 ether 00:E0:29:7C:D0:1B C eth0
Hope this helps.
magnafix 02-16-2002, 11:07 PM Thanks, yes.
So the behavior we're seeing is not normal, it appears. Here's a *small* portion of 'arp -n' on our primary nameserver:
209.226.175.237 ether 00:E0:1E:B4:62:70 C eth0
192.5.6.32 ether 00:E0:1E:B4:62:70 C eth0
212.29.65.250 ether 00:E0:1E:B4:62:70 C eth0
63.203.35.55 ether 00:E0:1E:B4:62:70 C eth0
205.188.152.9 ether 00:E0:1E:B4:62:70 C eth0
216.167.107.244 ether 00:E0:1E:B4:62:70 C eth0
209.128.93.236 ether 00:E0:1E:B4:62:70 C eth0
217.172.162.201 ether 00:E0:1E:B4:62:70 C eth0
205.188.152.8 ether 00:E0:1E:B4:62:70 C eth0
64.152.75.114 ether 00:E0:1E:B4:62:70 C eth0
24.217.0.4 ether 00:E0:1E:B4:62:70 C eth0
24.158.0.10 ether 00:E0:1E:B4:62:70 C eth0
207.217.77.12 ether 00:E0:1E:B4:62:70 C eth0
61.220.163.234 ether 00:E0:1E:B4:62:70 C eth0
64.34.86.213 ether 00:E0:1E:B4:62:70 C eth0
209.128.93.237 ether 00:E0:1E:B4:62:70 C eth0
66.62.233.230 ether 00:E0:1E:B4:62:70 C eth0
As far as I can tell, these are all nameservers out on the net. The hardware address is that of our gateway router. :erm:
Anyone else have insight on this?
allan 02-16-2002, 11:14 PM Originally posted by magnafix
Anyone else have insight on this?
Have you tried clearing the ARP on the router? If this just started happening it may be just be a fluke.
As to why it is occurring, is your name server used as a caching name server as well as an authoritative name server?
magnafix 02-17-2002, 03:21 AM We cleared the arp cache in the router with no effect. Tonight (30 hours into this now), we overflowed our new arp count ceiling, with 1025 entries. This has a couple of effects. First, named stops working (while still running):
Feb 16 23:40:39.106 client: client 216.15.164.133#58691: error sending response: not enough free resources
Feb 16 23:41:00.678 client: client 203.109.252.9#1024: error sending response: not enough free resources
We also lost the ability to telnet to the router -- when we tried, we got 'no buffer space available' or something similar.
Then we shut down named. Arp cache a minute or so later had dropped down to 100 entries.
So, while named on on the secondary nameserver handled requests, we watched the arp cache on that box. It remained at about a dozen (# of machines on this local network).
Restarting named on the primary nameserver starts the arp cache climbing again. :angry:
allan 02-17-2002, 11:33 AM I am not aware of any BIND bugs that cause the problem you are describing, but I have a couple of other questions:
1. Is this a dedicated machine (ie does it do only BIND stuff)?
2. Is it behind a firewall, or are you running IP Tables on the box?
3. Does it do authoritative DNS only, or is it authoritative/caching?
A couple of ideas, and hopefully, someone else will have more ideas than this:
1. It may be some sort of BIND exploit, that generates bad ARP entries (check with the BIND mailing list at isc.org to see if that is possible).
2. If this is a bind only device, it is possible that it may be a hardware error that is causing this (not likely, but possible), and swapping out the NIC card could fix it.
magnafix 02-17-2002, 11:49 AM It's pretty much a dedicated machine -- it also runs our MySQL slave, but that's it.
No firewall in front of it, we did have some port-forwarding on it to allow some really long-time customers to continue checking POP3 mail on its IP (forwarded to dedicated mail server). I believe our sysadmin removed that first as a test.
It's our authoritative nameserver. We do also run 'nscd' on all our boxes to make LDAP authentication lookups go quicker.
I'll check out the BIND list. Arp cache is at 400 this morning. Sysadmin says he's going to do a kernel upgrade today to see if that helps.
Thanks for the help.
magnafix 02-17-2002, 09:15 PM Figured it out.
Default gateway was set wrong on the nameserver, so packets basically didnt know who was local and who wasn't.
Thanks for all who helped.
Gah.
Ehm ... sorry to open this thread again, but I have a similar problem. John, could you share what you did exactly?
Many thanks,
Reyner
Matt Lightner 07-21-2002, 03:47 AM Originally posted by rey
Ehm ... sorry to open this thread again, but I have a similar problem. John, could you share what you did exactly?Reyner,
You will want to check to make sure that your server's default gateway is set correctly. If not, then your server will try to cache ARP entries for all of the IP addresses it knows about, and obviously the MAC address for all of those IPs will be the router directly above the server in question (because your server is not directly connected to those IP addresses--it must send packets for those IP addresses to the router). On Linux, you can do something like the following to set your server's default route:
/sbin/route add default gw your_gateway_ip
You will need to replace your_gateway_ip with your actual gateway IP address (you should get this from your host, or whoever assigns your IP address space). Or, if you are unfamiliar with routing and networking, you might want to just have your host's support team fix this for you so that nothing bad happens. In other words... if you've never used the "route" command before, I wouldn't suggest starting now. :)
Hope that helps.
Matt,
Thank you for the explanation and the tip. Your help is greatly appreciated :)
Reyner
magnafix 07-21-2002, 02:07 PM Matt's explanation of both the problem we were having and solution is accurate.
|