X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4AD4DE7C.7030606@gmail.com> Date: Tue, 13 Oct 2009 21:09:32 +0100 From: Dave Korn User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 Followup-To: The,Off-Topic,And,Nonymous,Cygwin-Talk,Mailing,List, To: cygwin AT cygwin DOT com Subject: [OT] Re: Want to use tor with wget. References: <3j28d51fso528qi14rpfqcga8r9oqckji8 AT 4ax DOT com> <4AD413B2 DOT 1070903 AT gmail DOT com> <1631589547 DOT 20091013204226 AT gmail DOT com> In-Reply-To: <1631589547.20091013204226@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Note-from-DJ: This may be spam [ We're offtopic here since it's not a cygwin-specific issue anymore, so I've set a follow-up to the cygwin-talk list in case you have further questions or replies. ] hongyi.zhao wrote: > On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote: >> Hongyi Zhao wrote: > I want to use wget to grab the following web page: > > http://www.cybersyndrome.net/pla5.html Then, you can tell wget to use your local privoxy as an http proxy, which is exactly how your browser relates to it. export http_proxy=localhost:8118 wget http://www.cybersyndrome.net/pla5.html should do the trick, but check the wget manual page about proxy support for full details. (I'm assuming here you're running the usual kind of Tor setup with a supporting co-installation of Privoxy.) > OTOH, I've also learned that curl support socks4/5 proxy, and I use > the following command under my cygwin console: > > curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html > > But I meet the following error: > > ----------------------------- > > > 302 Found > >

Found

> The document has moved here.

> > ----------------------------- That's interesting. A real 302 redirect would have an actual 302 status code and a Location header, not just be a 200 returning an html document with the words "302 Found" and a URL in it. > Nevertheless, I can use firefox with Tor enabled to access this > webpage. > > What's the reason It's something the server is doing deliberately, perhaps a malfunctioning or misguided anti-bot feature of some sort, based on the request headers sent by the user's agent. > and how can I grab this webpage just by a > command-line downloading tool? Well, you can use wget! Or you can tell your curl to pretend it is wget! > $ curl 'http://www.cybersyndrome.net/pla5.html' > > > 302 Found > >

Found

> The document has moved here.

> > $ wget 'http://www.cybersyndrome.net/pla5.html' > --2009-10-13 21:00:36-- http://www.cybersyndrome.net/pla5.html > Resolving www.cybersyndrome.net... 210.153.118.69 > Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected. > HTTP request sent, awaiting response... 200 OK > Length: unspecified [text/html] > Saving to: `pla5.html' > > [ <=> ] 18,151 3.11K/s in 5.7s > > 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151] > $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4' > > > > > > CyberSyndrome : Proxy List / Anonymous >