Yell.com now have their own spider

Forum Moderators: IanTurner & engine

Message Too Old, No Replies

Yell.com now have their own spider

Why?

IanTurner

12:12 am on Feb 26, 2003 (gmt 0)

Why would Yell.com want a spider, as far as I can tell they are only a directory.

Details:

2003-02-24 11:59:38 194.74.151.201 - W3SVC48 RASRV02 nnn.nnn.nnn.nn 80 GET /robots.txt - 200 0 450 130 31 HTTP/1.1 YellSpider - -

That IP resolves to:
inetnum: 194.74.151.192 - 194.74.151.207
netname: BT-CUST-983
descr: Yellow Pages
country: GB
admin-c: WG219-RIPE
tech-c: SW239-RIPE
status: ASSIGNED PA
mnt-by: RIPE-NCC-NONE-MNT
changed: Peter.Lee@bt.net 19961217
source: RIPE

jdMorgan

12:33 am on Feb 26, 2003 (gmt 0)

IanTurner,

I'd guess they are revalidating their links.

Jim

MarkHarwood

12:28 pm on Feb 26, 2003 (gmt 0)

Interesting to see that they were looking for a robots.txt file - why would they do that if they were just validating a link?
Looks like it might be a spider. Was the site being spidered a company registered with Yellow Pages?

engine

1:00 pm on Feb 26, 2003 (gmt 0)

Mark,

Welcome to WebmasterWorld [webmasterworld.com]

The Robots.txt file (if it exists) provides every spider with information about which parts of the site can or can't be spidered.

A well behaved spider will automatically look for a robots.txt file before it proceeds further.

MarkHarwood

1:50 pm on Feb 26, 2003 (gmt 0)

Hi, engine.

My point is they may be spidering all of the content from each of their customers sites. I've just finished a project for Thomson Local to do exactly this.
You're right - if you want to spider the whole site you would check robots.txt first. If you were simply Yell checking a single url held against your customer records, why bother? Surely you would just test that single URL was valid?

IanTurner

2:15 pm on Feb 26, 2003 (gmt 0)

I can tell you that the site was registered with Yellow pages and that the spider did take some other pages but not the whole site.

Robber

1:02 am on Feb 27, 2003 (gmt 0)

As much as I would like to see another full spidering engine out there, I can't see it coming from Yell as they cant even get their own site indexed that well!