hosted by liquidweb


Go Back   Web Hosting Talk : Web Hosting Main Forums : Hosting Security and Technology : mirror website
Reply

Forum Jump

mirror website

Reply Post New Thread In Hosting Security and Technology Subscription
 
Send news tip View All Posts Thread Tools Search this Thread Display Modes
  #1  
Old 07-02-2002, 03:03 PM
infinite infinite is offline
WHT Addict
 
Join Date: Feb 2002
Location: UK
Posts: 120

mirror website


Hi all,

Is wget the best way to download a complete copy of a website? Parts of the site need username + password, has anyone used this before?

Can lynx do a good job of it too?

If there is a better linux utility to do this?

Any help much appreciated,
Infinite

__________________
Do'h!

Reply With Quote


Sponsored Links
  #2  
Old 07-02-2002, 03:25 PM
magnafix magnafix is offline
Web Hosting Master
 
Join Date: Apr 2001
Location: Montana USA
Posts: 673
wget -m I think.

wget --help
GNU Wget 1.7, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
-V, --version display the version of Wget and exit.
-h, --help print this help.
-b, --background go to background after startup.
-e, --execute=COMMAND execute a `.wgetrc'-style command.

Logging and input file:
-o, --output-file=FILE log messages to FILE.
-a, --append-output=FILE append messages to FILE.
-d, --debug print debug output.
-q, --quiet quiet (no output).
-v, --verbose be verbose (this is the default).
-nv, --non-verbose turn off verboseness, without being quiet.
-i, --input-file=FILE download URLs found in FILE.
-F, --force-html treat input file as HTML.
-B, --base=URL prepends URL to relative links in -F -i file.
--sslcertfile=FILE optional client certificate.
--sslcertkey=KEYFILE optional keyfile for this certificate.

Download:
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host.
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits).
-O --output-document=FILE write documents to FILE.
-nc, --no-clobber don't clobber existing files or use .# suffixes.
-c, --continue resume getting a partially-downloaded file.
--dot-style=STYLE set retrieval display style.
-N, --timestamping don't re-retrieve files unless newer than local.
-S, --server-response print server response.
--spider don't download anything.
-T, --timeout=SECONDS set the read timeout to SECONDS.
-w, --wait=SECONDS wait SECONDS between retrievals.
--waitretry=SECONDS wait 1...SECONDS between retries of a retrieval.
-Y, --proxy=on/off turn proxy on or off.
-Q, --quota=NUMBER set retrieval quota to NUMBER.

Directories:
-nd --no-directories don't create directories.
-x, --force-directories force creation of directories.
-nH, --no-host-directories don't create host directories.
-P, --directory-prefix=PREFIX save files to PREFIX/...
--cut-dirs=NUMBER ignore NUMBER remote directory components.

HTTP options:
--http-user=USER set http user to USER.
--http-passwd=PASS set http password to PASS.
-C, --cache=on/off (dis)allow server-cached data (normally allowed).
-E, --html-extension save all text/html documents with .html extension.
--ignore-length ignore `Content-Length' header field.
--header=STRING insert STRING among the headers.
--proxy-user=USER set USER as proxy username.
--proxy-passwd=PASS set PASS as proxy password.
--referer=URL include `Referer: URL' header in HTTP request.
-s, --save-headers save the HTTP headers to file.
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION.
--no-http-keep-alive disable HTTP keep-alive (persistent connections).
--cookies=off don't use cookies.
--load-cookies=FILE load cookies from FILE before session.
--save-cookies=FILE save cookies to FILE after session.

FTP options:
-nr, --dont-remove-listing don't remove `.listing' files.
-g, --glob=on/off turn file name globbing on or off.
--passive-ftp use the "passive" transfer mode.
--retr-symlinks when recursing, get linked-to files (not dirs).

Recursive retrieval:
-r, --recursive recursive web-suck -- use with care!
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite).
--delete-after delete files locally after downloading them.
-k, --convert-links convert non-relative links to relative.
-K, --backup-converted before converting file X, back up as X.orig.
-m, --mirror shortcut option equivalent to -r -N -l inf -nr.
-p, --page-requisites get all images, etc. needed to display HTML page.

Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions.
-R, --reject=LIST comma-separated list of rejected extensions.
-D, --domains=LIST comma-separated list of accepted domains.
--exclude-domains=LIST comma-separated list of rejected domains.
--follow-ftp follow FTP links from HTML documents.
--follow-tags=LIST comma-separated list of followed HTML tags.
-G, --ignore-tags=LIST comma-separated list of ignored HTML tags.
-H, --span-hosts go to foreign hosts when recursive.
-L, --relative follow relative links only.
-I, --include-directories=LIST list of allowed directories.
-X, --exclude-directories=LIST list of excluded directories.
-nh, --no-host-lookup don't DNS-lookup hosts.
-np, --no-parent don't ascend to the parent directory.

Mail bug reports and suggestions to <bug-wget@gnu.org>.

__________________
John Masterson
Former Hosting Company Owner

Reply With Quote
  #3  
Old 07-04-2002, 05:17 AM
infinite infinite is offline
WHT Addict
 
Join Date: Feb 2002
Location: UK
Posts: 120
thanks magnafix. has anyone used wget regularly, would you recommend to set it up in a cron job?

Cheers,
Infinite

__________________
Do'h!

Reply With Quote
Sponsored Links
  #4  
Old 07-04-2002, 05:26 AM
MotleyFool MotleyFool is offline
Fool about Town
 
Join Date: Sep 2001
Location: Madras
Posts: 737
I have used / use wget and it just rocks..

for mirroring use

wget -m -nH http://sourcesite.com [to avoid copying under a directory called sourcesite.com use -nH the No Host option]

I haven't tried password protected directories, but they should work just as fine because the .htaccess is also copied.

Cheers
Balaji

__________________
Offering Managed Servers - for an exclusive clientèle who value uptime, caring support and superior technology.

Reply With Quote
  #5  
Old 07-04-2002, 05:31 AM
infinite infinite is offline
WHT Addict
 
Join Date: Feb 2002
Location: UK
Posts: 120
Quote:
Originally posted by MotleyFool
I have used / use wget and it just rocks..
Thanks MotleyFool, I'll give it a try!

Cheers,
Infinite

__________________
Do'h!

Reply With Quote
  #6  
Old 07-04-2002, 03:47 PM
mwatkins mwatkins is offline
Web Hosting Master
 
Join Date: Nov 2001
Location: Vancouver
Posts: 2,416
Assuming Apache -- If you need wget to copy .htaccess files, you might have to alter your httpd.conf and comment out the following section:

<Files ~ "^\.ht">
Order allow,deny
Deny from all
Satisfy All
</Files>

By default most Apache installs prevent web clients (including wget) from viewing .htaccess contents.

Could be wrong - I've used wget before to mirror but only visible files - but a heads up on something to check on.

Reply With Quote
  #7  
Old 07-05-2002, 09:00 AM
infinite infinite is offline
WHT Addict
 
Join Date: Feb 2002
Location: UK
Posts: 120
Thanks mwatkins, I'll make/copy a version of .htaccess, and make sure the other server keeps the same password

Cheers,
Infinite

__________________
Do'h!

Reply With Quote
  #8  
Old 07-05-2002, 10:53 AM
admin0 admin0 is offline
Web Hosting Master
 
Join Date: Dec 2001
Location: Netherlands
Posts: 768
Hi.

does wget -m -nH http://site1.com also work on cgi's and php pages ?

I meant if I run that on a forum site maybe, will it work too ?

eg:

wget -m -nH http://webhostingtalk.com ?

just curious ?


__________________
███ Remote Hands/Cloud Setup @ Europe
███ DevOps for Hosting Comapanies
███ CloudStack Consultancy and Setups
███ since 1997

Reply With Quote
  #9  
Old 07-05-2002, 08:48 PM
chuckt101 chuckt101 is offline
Web Hosting Master
 
Join Date: Jul 2001
Posts: 889
admin0: it will try to work but it won't work right for any practical purposes since wht is served by php scripts yet you are downloading parsed files of html... if that made any sense.. hehe..


__________________


Reply With Quote
  #10  
Old 07-05-2002, 10:27 PM
aquos aquos is offline
Junior Guru Wannabe
 
Join Date: Jun 2002
Posts: 80
Question

What if your site has MySQL database? Does it work too?

Reply With Quote
  #11  
Old 07-06-2002, 08:49 AM
infinite infinite is offline
WHT Addict
 
Join Date: Feb 2002
Location: UK
Posts: 120
aquos, wget will take html files of the web server, like your web browser does, it won't get any MySQL tables or databases, just like your web browser won't take any MySQL tables either.

admin0, if you wanted to mirror a forum, you would have to do it at the database level, thats where all the info is, including posts, profiles etc. You could get a cron job set up to dump the database, zip it up, and place it in a password protected area. Then the other server could wget the database file, and unzip, and run it into MySQL. Someone may have a simpler way though

wget will only take a copy of the HTML (and images etc.) pages that it finds on the server, it's similar to a search engine spider.

HTH,
Infinite

__________________
Do'h!

Reply With Quote
  #12  
Old 07-06-2002, 10:46 AM
admin0 admin0 is offline
Web Hosting Master
 
Join Date: Dec 2001
Location: Netherlands
Posts: 768
was just curious !!



__________________
███ Remote Hands/Cloud Setup @ Europe
███ DevOps for Hosting Comapanies
███ CloudStack Consultancy and Setups
███ since 1997

Reply With Quote
  #13  
Old 07-06-2002, 11:35 AM
infinite infinite is offline
WHT Addict
 
Join Date: Feb 2002
Location: UK
Posts: 120
Quote:
Originally posted by admin0
was just curious !!
no worries admin0

__________________
Do'h!

Reply With Quote
Reply

Related posts from TheWhir.com
Title Type Date Posted
Designer-Focused Website Builder Webydo Closes $7 Million Funding Round Web Hosting News 2014-06-24 12:11:52
China Tightens Internet Censorship on 25th Anniversary of Tiananmen Square Massacre Web Hosting News 2014-06-04 11:35:43
Solar VPS Becomes Official Mirror for Centos and Ubuntu Web Hosting News 2013-04-16 11:03:34
Domain Registrar Namecheap Offers Onepager Website Builder to Customers, Makes Equity Investment Web Hosting News 2013-01-10 17:24:23
VIDEO: Yola CEO Talks About Website Builder and Web Hosting Partners Whir Tv 2014-05-02 14:58:34


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes
Postbit Selector

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump
Login:
Log in with your username and password
Username:
Password:



Forgot Password?
Advertisement:
Web Hosting News:



 

X

Welcome to WebHostingTalk.com

Create your username to jump into the discussion!

WebHostingTalk.com is the largest, most influentual web hosting community on the Internet. Join us by filling in the form below.


(4 digit year)

Already a member?