I have received 20,000 hits in the past hour from one particular IP address. These hits include secure areas of my site that should not be available to regular viewers or search engines. The IP address has a web site, which says the owner is a Linux programmer.
What should I do about this guy? I am thinking send him an email, and consider banning the IP address in the .htaccess file.
What do you think he is doing? Is it a virus on his server or is he threatening my site?
vSector
10-08-2001, 04:22 PM
Might be a good idea to check your logs, if you can still connect to your server.
Look at the agent type.
It may be a bot or spider which you can block by using your robots.txt.
I checked my logs. The hits are coming from someone using "Wget 1.7" . I am guessing that Wget is the program described at http://www.gnu.org/software/wget/wget.html . This is a utility that is used for downloading files on the web. It appears to use a spider type approach and click on every possible option at each page. This is an enourmous amount of hits for the message board since each post has several options (reply, alert, edit, ...).
vSector
10-08-2001, 05:50 PM
Well if you want to block that then make a robots.txt file and stick the following in it and put it in your root web dir:
User-agent: Wget 1.7
Disallow: /
Hope that helps.
I added a robots.txt file that excludes Wget 1.7 and restricted his IP address in the .htaccess file, but the hits keep coming (all hits are getting 403 errors). I was getting concerned because it found its way into the forum admin area and was trying to delete threads. I suppose that it will be unable to delete threads as long as it is getting forbidden errors.
multipleimage
10-08-2001, 06:23 PM
i've never seen wget do that in a log before. but it does understand rebots.txt so just disallow it there
joe52
10-08-2001, 06:46 PM
Even if the agent respects robots.txt, it's not going to check for changes to that file every time it gets a page. It might take it a while to check for a robots.txt file again.
-joe
Chicken
10-08-2001, 08:37 PM
My suggestion...
.htaccess with:
ErrorDocument 403 http://www.go.ryhmeswithluck.yourself
Doesn't have to be .htaccess of course, however you want to do it, point being to redirect the 403's.
SoftWareRevue
10-08-2001, 08:59 PM
Originally posted by Chicken
My suggestion... . . . . . .:eek2: . . . . . . :D