Web Hosting Talk







View Full Version : Stop HTML File Downloader ???


AvailableHost
08-17-2003, 10:12 PM
Hi all,

Someone is using this program HTML File Downloader to download my website >>> eating my bandwith :bawling: >>> Website suspended

Any solution to stop this program?

Thanks,

TTL
08-17-2003, 10:26 PM
You can deny the user agents. Findout what software he's using and deny the useragents. That would stop a noob. If he were a pro then i'd say there's not much you can do.

Eric Lim
08-17-2003, 10:32 PM
Check the log file, find out the HTTP User Agent and define whether this person is on a dedicated IP, dynamic, or whole subnet that the traffic is coming from.

Have .htaccess block it and contact the IP owner. Also try to contact your hosting provider to have it completely blocked from the server.

Got the IP address and go to www.whois.ac to know who registered the IP. Hopefully it's registered through ARIN, I know APNIC doesn't give a **** (from my experience).

AvailableHost
08-17-2003, 10:43 PM
# Hits User Agent

1 164671 95.61% HTML File Downloader

The IP is: 203.162.166.37

I ban the IP in Cpanel, what more can I do ?

Thanks

Acroplex
08-17-2003, 11:15 PM
Originally posted by dotvietnam
# Hits User Agent

1 164671 95.61% HTML File Downloader

The IP is: 203.162.166.37

I ban the IP in Cpanel, what more can I do ?

Thanks

What exactly is he downloading? You can also password-protect areas of your website.

AvailableHost
08-17-2003, 11:57 PM
He is targeting / folder and /forum folder

TMX
08-18-2003, 11:27 AM
Here is a "bad bots" list that I use from time to time. it goes in an .htaccess file and works great . To stop a particular bot, just add the referrer string to the list in the same manner as the others.



SetEnvIfNoCase User-Agent "^$" bad_bot
SetEnvIfNoCase User-Agent "^Anarchie" bad_bot
SetEnvIfNoCase User-Agent "^BlackWidow" bad_bot
SetEnvIfNoCase User-Agent "^Bloodhound" bad_bot
SetEnvIfNoCase User-Agent "^Bullseye" bad_bot
SetEnvIfNoCase User-Agent "^Bumblebee" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^CIS TE/1.0" bad_bot
SetEnvIfNoCase User-Agent "^DA 4.0" bad_bot
SetEnvIfNoCase User-Agent "^DA 5.0" bad_bot
SetEnvIfNoCase User-Agent "^DA 5.3" bad_bot
SetEnvIfNoCase User-Agent "^DiscoPump" bad_bot
SetEnvIfNoCase User-Agent "^Drip" bad_bot
SetEnvIfNoCase User-Agent "^eCatch" bad_bot
SetEnvIfNoCase User-Agent "^e-collector" bad_bot
SetEnvIfNoCase User-Agent "^EirGrabber" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^EyeNetIE" bad_bot
SetEnvIfNoCase User-Agent "^fastlwspider" bad_bot
SetEnvIfNoCase User-Agent "^FlashGet" bad_bot
SetEnvIfNoCase User-Agent "^GetRight" bad_bot
SetEnvIfNoCase User-Agent "^Getleft" bad_bot
SetEnvIfNoCase User-Agent "^Gets" bad_bot
SetEnvIfNoCase User-Agent "^GetWebPage" bad_bot
SetEnvIfNoCase User-Agent "^GetYou" bad_bot
SetEnvIfNoCase User-Agent "^Go!Zilla" bad_bot
SetEnvIfNoCase User-Agent "^Go-Ahead-Got-It" bad_bot
SetEnvIfNoCase User-Agent "^Grafula" bad_bot
SetEnvIfNoCase User-Agent "^HTTrack" bad_bot
SetEnvIfNoCase User-Agent "^HTTrack 3.0x" bad_bot
SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot
SetEnvIfNoCase User-Agent "^IBrowse" bad_bot
SetEnvIfNoCase User-Agent "^ImageGrab" bad_bot
SetEnvIfNoCase User-Agent "^InterGET" bad_bot
SetEnvIfNoCase User-Agent "^Internet Ninja" bad_bot
SetEnvIfNoCase User-Agent "^Iria" bad_bot
SetEnvIfNoCase User-Agent "^Java1.1.8" bad_bot
SetEnvIfNoCase User-Agent "^Java1.3.0" bad_bot
SetEnvIfNoCase User-Agent "^JetCar" bad_bot
SetEnvIfNoCase User-Agent "^JustView" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^Mass Down" bad_bot
SetEnvIfNoCase User-Agent "^MetaProducts Download Express" bad_bot
SetEnvIfNoCase User-Agent "^MIDown tool" bad_bot
SetEnvIfNoCase User-Agent "^minibot" bad_bot
SetEnvIfNoCase User-Agent "^Mister PiX" bad_bot
SetEnvIfNoCase User-Agent "^MemoWeb 1.75" bad_bot
SetEnvIfNoCase User-Agent "^obot" bad_bot
SetEnvIfNoCase User-Agent "^MSIECrawler" bad_bot
SetEnvIfNoCase User-Agent "^NearSite" bad_bot
SetEnvIfNoCase User-Agent "^NetAnts" bad_bot
SetEnvIfNoCase User-Agent "^NetMechanic" bad_bot
SetEnvIfNoCase User-Agent "^NetSpider" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Offline" bad_bot
SetEnvIfNoCase User-Agent "^PageGrabber" bad_bot
SetEnvIfNoCase User-Agent "^Papa Foto" bad_bot
SetEnvIfNoCase User-Agent "^Pockey" bad_bot
SetEnvIfNoCase User-Agent "^Prozilla" bad_bot
SetEnvIfNoCase User-Agent "^RealDownload" bad_bot
SetEnvIfNoCase User-Agent "^ReGet" bad_bot
SetEnvIfNoCase User-Agent "^SmartDownload" bad_bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bad_bot
SetEnvIfNoCase User-Agent "^Slurp" bad_bot
SetEnvIfNoCase User-Agent "^SpaceBison" bad_bot
SetEnvIfNoCase User-Agent "^Star Downloader" bad_bot
SetEnvIfNoCase User-Agent "^SuperBot" bad_bot
SetEnvIfNoCase User-Agent "^SuperHTTP" bad_bot
SetEnvIfNoCase User-Agent "^SurfWalker" bad_bot
SetEnvIfNoCase User-Agent "^tAkeOut" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bad_bot
SetEnvIfNoCase User-Agent "^vobsub" bad_bot
SetEnvIfNoCase User-Agent "^WE" bad_bot
SetEnvIfNoCase User-Agent "^WebCapture 2.0" bad_bot
SetEnvIfNoCase User-Agent "^Web Downloader" bad_bot
SetEnvIfNoCase User-Agent "^Web Image" bad_bot
SetEnvIfNoCase User-Agent "^Web Sucker" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebCapture" bad_bot
SetEnvIfNoCase User-Agent "^w3mir" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier v3.0" bad_bot
SetEnvIfNoCase User-Agent "^Webdupe" bad_bot
SetEnvIfNoCase User-Agent "^WebFetch" bad_bot
SetEnvIfNoCase User-Agent "^webfetcher" bad_bot
SetEnvIfNoCase User-Agent "^WebFountain" bad_bot
SetEnvIfNoCase User-Agent "^WebHook" bad_bot
SetEnvIfNoCase User-Agent "^WebMiner" bad_bot
SetEnvIfNoCase User-Agent "^WebMirror" bad_bot
SetEnvIfNoCase User-Agent "^WebReaper" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^Website eXtractor" bad_bot
SetEnvIfNoCase User-Agent "^Webster" bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^WebWhacker" bad_bot
SetEnvIfNoCase User-Agent "^WebZIP" bad_bot
SetEnvIfNoCase User-Agent "^Wget" bad_bot
<Files ~ "\.(htm*|pdf|mp3|zip|rar|exe|gif|jpe?g|png)$">
order allow,deny
allow from all
Deny from env=bad_bot
</Files>


-Bob

wKkaY
08-18-2003, 12:12 PM
(OT) imho GetRight is more considerate than many other download managers . it confirms with you when you do something anti-social eg open too many connections to a server , download too many files at once , etc , so i wont consider it 'bad' :)

Amish_Geek
08-18-2003, 04:04 PM
What? you have Wget listed as a bad bot? :eek:

Wget is your friend! :D

AvailableHost
08-18-2003, 08:57 PM
Thanks so much

AvailableHost
08-19-2003, 06:10 AM
After following above guidance, my website keeped downloaded from :angry: HTML File Downloader":angry:

I already added up this line: SetEnvIfNoCase User-Agent "^HTML File Downloader" bad_bot to TMX's list.

Help me please!!!!!!!!! :bawling:

TMX
08-19-2003, 08:33 AM
Originally posted by amish_geek
What? you have Wget listed as a bad bot? :eek:

Wget is your friend! :D

With a list like this, you need to make adjustments as necessary to suit your needs. The particular site I grabbed this copy from disallows anything that can be used to automatically download site content.

OTOH, I maintain a site that provides utilities for Ensim server admins, on which blocking wget would be a Bad Thing.

-Bob

sprintserve
08-19-2003, 08:33 AM
Some of the agents can claim they are a different user agent. The nature of it is that Internet operates on a trusted model, so you can't do anything really. Try blocking his IP as suggested above.

TMX
08-19-2003, 08:34 AM
Originally posted by dotvietnam
After following above guidance, my website keeped downloaded from :angry: HTML File Downloader":angry:

I already added up this line: SetEnvIfNoCase User-Agent "^HTML File Downloader" bad_bot to TMX's list.

Help me please!!!!!!!!! :bawling:

Please post a line from your logs showing the complete referrer string.

-B

eicklerr
08-19-2003, 10:45 PM
You can put this in in /etc/hosts.deny:

ALL: 203.162.166.37

That will prevent them from accessing all services on your server.

Or, try putting this in your <Directory /> container instead of inside the <Files> one:

order allow,deny
allow from all
Deny from env=bad_bot

That should prevent all access to your webserver from those browsers, bot just to the specified file types.

AvailableHost
08-19-2003, 11:11 PM
Hello,

How to ban an IP layer 203.162.164.xxx in .htaccess?

I do this: deny from 203.162.164.*

Thanks,