View Full Version : Stop HTML File Downloader ???
AvailableHost 08-17-2003, 10:12 PM Hi all,
Someone is using this program HTML File Downloader to download my website >>> eating my bandwith :bawling: >>> Website suspended
Any solution to stop this program?
Thanks,
You can deny the user agents. Findout what software he's using and deny the useragents. That would stop a noob. If he were a pro then i'd say there's not much you can do.
Eric Lim 08-17-2003, 10:32 PM Check the log file, find out the HTTP User Agent and define whether this person is on a dedicated IP, dynamic, or whole subnet that the traffic is coming from.
Have .htaccess block it and contact the IP owner. Also try to contact your hosting provider to have it completely blocked from the server.
Got the IP address and go to www.whois.ac to know who registered the IP. Hopefully it's registered through ARIN, I know APNIC doesn't give a **** (from my experience).
AvailableHost 08-17-2003, 10:43 PM # Hits User Agent
1 164671 95.61% HTML File Downloader
The IP is: 203.162.166.37
I ban the IP in Cpanel, what more can I do ?
Thanks
Acroplex 08-17-2003, 11:15 PM Originally posted by dotvietnam
# Hits User Agent
1 164671 95.61% HTML File Downloader
The IP is: 203.162.166.37
I ban the IP in Cpanel, what more can I do ?
Thanks
What exactly is he downloading? You can also password-protect areas of your website.
AvailableHost 08-17-2003, 11:57 PM He is targeting / folder and /forum folder
Here is a "bad bots" list that I use from time to time. it goes in an .htaccess file and works great . To stop a particular bot, just add the referrer string to the list in the same manner as the others.
SetEnvIfNoCase User-Agent "^$" bad_bot
SetEnvIfNoCase User-Agent "^Anarchie" bad_bot
SetEnvIfNoCase User-Agent "^BlackWidow" bad_bot
SetEnvIfNoCase User-Agent "^Bloodhound" bad_bot
SetEnvIfNoCase User-Agent "^Bullseye" bad_bot
SetEnvIfNoCase User-Agent "^Bumblebee" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^CIS TE/1.0" bad_bot
SetEnvIfNoCase User-Agent "^DA 4.0" bad_bot
SetEnvIfNoCase User-Agent "^DA 5.0" bad_bot
SetEnvIfNoCase User-Agent "^DA 5.3" bad_bot
SetEnvIfNoCase User-Agent "^DiscoPump" bad_bot
SetEnvIfNoCase User-Agent "^Drip" bad_bot
SetEnvIfNoCase User-Agent "^eCatch" bad_bot
SetEnvIfNoCase User-Agent "^e-collector" bad_bot
SetEnvIfNoCase User-Agent "^EirGrabber" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^EyeNetIE" bad_bot
SetEnvIfNoCase User-Agent "^fastlwspider" bad_bot
SetEnvIfNoCase User-Agent "^FlashGet" bad_bot
SetEnvIfNoCase User-Agent "^GetRight" bad_bot
SetEnvIfNoCase User-Agent "^Getleft" bad_bot
SetEnvIfNoCase User-Agent "^Gets" bad_bot
SetEnvIfNoCase User-Agent "^GetWebPage" bad_bot
SetEnvIfNoCase User-Agent "^GetYou" bad_bot
SetEnvIfNoCase User-Agent "^Go!Zilla" bad_bot
SetEnvIfNoCase User-Agent "^Go-Ahead-Got-It" bad_bot
SetEnvIfNoCase User-Agent "^Grafula" bad_bot
SetEnvIfNoCase User-Agent "^HTTrack" bad_bot
SetEnvIfNoCase User-Agent "^HTTrack 3.0x" bad_bot
SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot
SetEnvIfNoCase User-Agent "^IBrowse" bad_bot
SetEnvIfNoCase User-Agent "^ImageGrab" bad_bot
SetEnvIfNoCase User-Agent "^InterGET" bad_bot
SetEnvIfNoCase User-Agent "^Internet Ninja" bad_bot
SetEnvIfNoCase User-Agent "^Iria" bad_bot
SetEnvIfNoCase User-Agent "^Java1.1.8" bad_bot
SetEnvIfNoCase User-Agent "^Java1.3.0" bad_bot
SetEnvIfNoCase User-Agent "^JetCar" bad_bot
SetEnvIfNoCase User-Agent "^JustView" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^Mass Down" bad_bot
SetEnvIfNoCase User-Agent "^MetaProducts Download Express" bad_bot
SetEnvIfNoCase User-Agent "^MIDown tool" bad_bot
SetEnvIfNoCase User-Agent "^minibot" bad_bot
SetEnvIfNoCase User-Agent "^Mister PiX" bad_bot
SetEnvIfNoCase User-Agent "^MemoWeb 1.75" bad_bot
SetEnvIfNoCase User-Agent "^obot" bad_bot
SetEnvIfNoCase User-Agent "^MSIECrawler" bad_bot
SetEnvIfNoCase User-Agent "^NearSite" bad_bot
SetEnvIfNoCase User-Agent "^NetAnts" bad_bot
SetEnvIfNoCase User-Agent "^NetMechanic" bad_bot
SetEnvIfNoCase User-Agent "^NetSpider" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Offline" bad_bot
SetEnvIfNoCase User-Agent "^PageGrabber" bad_bot
SetEnvIfNoCase User-Agent "^Papa Foto" bad_bot
SetEnvIfNoCase User-Agent "^Pockey" bad_bot
SetEnvIfNoCase User-Agent "^Prozilla" bad_bot
SetEnvIfNoCase User-Agent "^RealDownload" bad_bot
SetEnvIfNoCase User-Agent "^ReGet" bad_bot
SetEnvIfNoCase User-Agent "^SmartDownload" bad_bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bad_bot
SetEnvIfNoCase User-Agent "^Slurp" bad_bot
SetEnvIfNoCase User-Agent "^SpaceBison" bad_bot
SetEnvIfNoCase User-Agent "^Star Downloader" bad_bot
SetEnvIfNoCase User-Agent "^SuperBot" bad_bot
SetEnvIfNoCase User-Agent "^SuperHTTP" bad_bot
SetEnvIfNoCase User-Agent "^SurfWalker" bad_bot
SetEnvIfNoCase User-Agent "^tAkeOut" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bad_bot
SetEnvIfNoCase User-Agent "^vobsub" bad_bot
SetEnvIfNoCase User-Agent "^WE" bad_bot
SetEnvIfNoCase User-Agent "^WebCapture 2.0" bad_bot
SetEnvIfNoCase User-Agent "^Web Downloader" bad_bot
SetEnvIfNoCase User-Agent "^Web Image" bad_bot
SetEnvIfNoCase User-Agent "^Web Sucker" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebCapture" bad_bot
SetEnvIfNoCase User-Agent "^w3mir" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier v3.0" bad_bot
SetEnvIfNoCase User-Agent "^Webdupe" bad_bot
SetEnvIfNoCase User-Agent "^WebFetch" bad_bot
SetEnvIfNoCase User-Agent "^webfetcher" bad_bot
SetEnvIfNoCase User-Agent "^WebFountain" bad_bot
SetEnvIfNoCase User-Agent "^WebHook" bad_bot
SetEnvIfNoCase User-Agent "^WebMiner" bad_bot
SetEnvIfNoCase User-Agent "^WebMirror" bad_bot
SetEnvIfNoCase User-Agent "^WebReaper" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^Website eXtractor" bad_bot
SetEnvIfNoCase User-Agent "^Webster" bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^WebWhacker" bad_bot
SetEnvIfNoCase User-Agent "^WebZIP" bad_bot
SetEnvIfNoCase User-Agent "^Wget" bad_bot
<Files ~ "\.(htm*|pdf|mp3|zip|rar|exe|gif|jpe?g|png)$">
order allow,deny
allow from all
Deny from env=bad_bot
</Files>
-Bob
wKkaY 08-18-2003, 12:12 PM (OT) imho GetRight is more considerate than many other download managers . it confirms with you when you do something anti-social eg open too many connections to a server , download too many files at once , etc , so i wont consider it 'bad' :)
Amish_Geek 08-18-2003, 04:04 PM What? you have Wget listed as a bad bot? :eek:
Wget is your friend! :D
AvailableHost 08-18-2003, 08:57 PM Thanks so much
AvailableHost 08-19-2003, 06:10 AM After following above guidance, my website keeped downloaded from :angry: HTML File Downloader":angry:
I already added up this line: SetEnvIfNoCase User-Agent "^HTML File Downloader" bad_bot to TMX's list.
Help me please!!!!!!!!! :bawling:
Originally posted by amish_geek
What? you have Wget listed as a bad bot? :eek:
Wget is your friend! :D
With a list like this, you need to make adjustments as necessary to suit your needs. The particular site I grabbed this copy from disallows anything that can be used to automatically download site content.
OTOH, I maintain a site that provides utilities for Ensim server admins, on which blocking wget would be a Bad Thing.
-Bob
sprintserve 08-19-2003, 08:33 AM Some of the agents can claim they are a different user agent. The nature of it is that Internet operates on a trusted model, so you can't do anything really. Try blocking his IP as suggested above.
Originally posted by dotvietnam
After following above guidance, my website keeped downloaded from :angry: HTML File Downloader":angry:
I already added up this line: SetEnvIfNoCase User-Agent "^HTML File Downloader" bad_bot to TMX's list.
Help me please!!!!!!!!! :bawling:
Please post a line from your logs showing the complete referrer string.
-B
eicklerr 08-19-2003, 10:45 PM You can put this in in /etc/hosts.deny:
ALL: 203.162.166.37
That will prevent them from accessing all services on your server.
Or, try putting this in your <Directory /> container instead of inside the <Files> one:
order allow,deny
allow from all
Deny from env=bad_bot
That should prevent all access to your webserver from those browsers, bot just to the specified file types.
AvailableHost 08-19-2003, 11:11 PM Hello,
How to ban an IP layer 203.162.164.xxx in .htaccess?
I do this: deny from 203.162.164.*
Thanks,
|