| www.delorie.com/gnu/docs/wget/wget_9.html | search |
![]() Buy GNU books! | |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Note that filenames changed in this way will be re-downloaded every time you re-mirror a site, because Wget can't tell that the local `X.html' file corresponds to remote URL `X' (since it doesn't yet know that the URL produces output of type `text/html'. To prevent this re-downloading, you must use `-k' and `-K' so that the original version of the file will be saved as `X.orig' (see section 2.9 Recursive Retrieval Options).
basic (insecure) or the
digest authentication scheme.
Another way to specify username and password is in the URL itself
(see section 2.1 URL Format). Either method reveals your password to anyone who
bothers to run ps. To prevent the passwords from being seen,
store them in `.wgetrc' or `.netrc', and make sure to protect
those files from other users with chmod. If the passwords are
really important, do not leave them lying in those files either--edit
the files and delete them after Wget has started the download.
For more information about security issues with Wget, See section 9.2 Security Considerations.
Caching is allowed by default.
Set-Cookie header, and the client responds with the
same cookie upon further requests. Since cookies allow the server
owners to keep track of visitors and for sites to exchange this
information, some consider them a breach of privacy. The default is to
use cookies; however, storing cookies is not on by default.
You will typically use this option when mirroring sites that require that you be logged in to access some or all of their content. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity.
Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by `--load-cookies'---simply point Wget to the location of the `cookies.txt' file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations:
If you cannot use `--load-cookies', there might still be an alternative. If your browser supports a "cookie manager", you can use it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the "official" cookie support:
wget --cookies=off --header "Cookie: name=value" |
Content-Length headers, which makes Wget
go wild, as it thinks not all the document was retrieved. You can spot
this syndrome if Wget retries getting the same document again and again,
each time claiming that the (otherwise normal) connection has closed on
the very same byte.
With this option, Wget will ignore the Content-Length header--as
if it never existed.
You may define more than one additional header by specifying `--header' more than once.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.srk.fer.hr/
|
Specification of an empty string as the header value will clear all previous user-defined headers.
basic authentication scheme.
Security considerations similar to those with `--http-passwd' pertain here as well.
The HTTP protocol allows the clients to identify themselves using a
User-Agent header field. This enables distinguishing the
WWW software, usually for statistical purposes or for tracing of
protocol violations. Wget normally identifies as
`Wget/version', version being the current version
number of Wget.
However, some sites have been known to impose the policy of tailoring
the output according to the User-Agent-supplied information.
While conceptually this is not such a bad idea, it has been abused by
servers denying information to clients other than Mozilla or
Microsoft Internet Explorer. This option allows you to change
the User-Agent line issued by Wget. Use of this option is
discouraged, unless you really know what you are doing.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| webmaster donations bookstore | delorie software privacy |
| Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |