Forum Moderators: coopster & phranque

Message Too Old, No Replies

perl web spider using socket.pm

how do I do user agent specifications? referrers?

         

jeremy goodrich

5:21 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a spider working which relies on the socket.pm perl module, instead of the libwww...I wanted to try spitting out http 1.1 for kicks, instead of 1.0...to make it look more like a surfer.

Anyway, the script only spits out the request, I don't have the syntax write for the other parts of the http 1.1 header. Does any one know where I would find such stuff? Such as accept: text/html, text/plain, image/gif, etc...
Preferably, the info would also have stuff on how each browser sends out a the head part of the request.

Brett_Tabke

6:12 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Download Proxomitron. Set it up and turn on the "web log" feature. It will display all headers your browser sends and receives. You know what to do.

+++GET 1+++
GET /forum5/875.htm HTTP/1.0
User-Agent: Opera/5.5 (Windows 98; U) [en]
Host: www.webmasterworld.com
Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*
Accept-Language: en
Accept-Charset: iso-8859-1,utf-8...
Referer: [webmasterworld.com...]
Cookie: lastvisitinfo=21-1003703433%2613-1003703433%2623-...
Cookie2: $Version="1"

+++RESP 1+++
HTTP/1.1 200 OK
Date: Tue, 23 Oct 2001 06:00:53 GMT
Server: Apache/1.3.4 (Unix)
Connection: close
Content-Type: text/html
+++CLOSE 1+++

jeremy goodrich

3:35 pm on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



he he, thanks for the tip...noticed after I posted, I had another spider called "get.pl" which had some header info in it, so you could fake like browsers better. Though I think I'll cross check those listings, with the Proxomitron you mentioned. :)