Forum Moderators: open

Message Too Old, No Replies

Pita now is called WebVac

name change

         

webvaccrawler

7:09 pm on Aug 12, 2003 (gmt 0)

10+ Year Member



Send any questions to me,
or the email in the spider.

We crawl every couple of months
and make our crawl available to
researchers across the US and
Europe who are too resource poor
to do their own crawl.
It cuts down on spider traffic
that way.

Sorry, could not figure out how
to respond to Pita name thread,
told me it was too old.

Gary Wesley
Spider Pilot

jdMorgan

8:07 pm on Aug 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gary,

Welcome to WebmasterWorld [webmasterworld.com]!

Thanks for posting this info. I note that your crawler is NOT on my disallowed list, so it must have been well-behaved. Maybe you can join in here when we razz other spider authors for their badly-behaved agents! :)

Jim

msgraph

2:47 pm on Aug 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Gary,

What's the exact UA to look out for? Do you have an information page set up?

webvaccrawler

3:06 pm on Aug 13, 2003 (gmt 0)

10+ Year Member



[www-diglib.stanford.edu...]
" HTTP/1.0\r\n"
"Host: %s\r\n"
"User-Agent: WebVac (webmaster@pita.stanford.edu)\r\n"
"From: webmaster@pita.stanford.edu\r\n\r\n",

Gary

stechert

4:09 pm on Aug 13, 2003 (gmt 0)

10+ Year Member



where can we go to see the data?

thanks,
andre

webvaccrawler

4:24 pm on Aug 13, 2003 (gmt 0)

10+ Year Member



Instructions on the website above
for building a client (needed mostly
because it is compressed) and what
ports to access data on.
Please let us know what you use it for.

Gary