Results 1 to 8 of 8
  1. #1
    Join Date
    Feb 2003
    Posts
    34

    prevent wget downloads

    hi,

    how to prevent wget downloads of my website

    help required

    thank you

  2. #2
    Join Date
    Jul 2004
    Location
    Reporting Live from Marrz
    Posts
    254
    .htaccess:

    Code:
    SetEnvIfNoCase User-Agent "^Wget" bad_bot
    
    <Limit GET POST>
       Order Allow,Deny
       Allow from all
       Deny from env=bad_bot
    </Limit>

  3. #3
    Join Date
    Feb 2003
    Posts
    34
    I have figured out.. thank you "SupaDucta"

    SetEnvIfNoCase User-Agent "^Wget" bad_bot
    SetEnvIfNoCase User-Agent "^Wget/1.5.3" bad_bot
    SetEnvIfNoCase User-Agent "^Wget/1.6" bad_bot
    <Files ~ "\.(html|pdf|mp3|zip|rar|exe|gif|jpe?g|png|php|jsp) $">
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </files>

  4. #4
    Join Date
    Jul 2004
    Location
    Reporting Live from Marrz
    Posts
    254
    This line:

    Code:
    SetEnvIfNoCase User-Agent "^Wget" bad_bot
    makes the following two lines unnecessary:

    Code:
    SetEnvIfNoCase User-Agent "^Wget/1.5.3" bad_bot
    SetEnvIfNoCase User-Agent "^Wget/1.6" bad_bot
    and if you use this:

    Code:
    <Limit GET POST>
       Order Allow,Deny
       Allow from all
       Deny from env=bad_bot
    </Limit>
    this blocks Wget completely.

    While this:

    Code:
    <Files ~ "\.(html|pdf|mp3|zip|rar|exe|gif|jpe?g|png|php|jsp) $">
       Order Allow,Deny
       Allow from all
       Deny from env=bad_bot
    </files>
    would block only the related extensions.

  5. #5
    Just remember that you cannot stop a determined person this way.

    Having the source code to wget, I could probably easily substitue IE 5.5 for the user-agent string. This would deter most people though.

    Also there is never any need to put php files in your list of non-allowed materials, since they cannot download your source (interpreted by the server first).
    "The only difference between a poor person and a rich person is what they do in their spare time."
    "If youth is wasted on the young, then retirement is wasted on the old"

  6. #6
    Join Date
    Nov 2001
    Posts
    551
    You don't need the source code... it is built in:

    From the wget manual

    `-U agent-string'
    `--user-agent=agent-string'
    Identify as agent-string to the HTTP server. The HTTP protocol allows the clients to identify themselves using a User-Agent header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as `Wget/version', version being the current version number of Wget. However, some sites have been known to impose the policy of tailoring the output according to the User-Agent-supplied information. While conceptually this is not such a bad idea, it has been abused by servers denying information to clients other than Mozilla or MS IE. In these cases it may be useful to "fake" the user-agent with this option. In the following example Wget masquerades as Mozilla 4.03 running on Solaris.

    wget -U "Mozilla/4.03 [en] (X11; I; SunOS 5.5.1 sun4u)"

    Use of this option is discouraged, unless you really know what you are doing.
    --

  7. #7
    Touche

    I suppose I could have taken 2 second to read the man pages, heh.
    "The only difference between a poor person and a rich person is what they do in their spare time."
    "If youth is wasted on the young, then retirement is wasted on the old"

  8. #8

    I am using this

    But my Bandwidth Limit Exceeded.
    my Internal server Error page hits 2,028,069 and use 2.79 GB
    Wget hits 360,006
    Googlebot hits 57,130 uses 82.91 GB Bandwidth.

    what can i do?

  9. Newsletters

    Subscribe Now & Get The WHT Quick Start Guide!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •