yahoo slurp always want to spider some no-exist pages

Forum Moderators: open

Message Too Old, No Replies

yahoo slurp always want to spider some no-exist pages

yahoo sluro, 404 pages

johnlim

4:02 pm on Feb 26, 2005 (gmt 0)

Hi,

Now in my site the yahoo slurp always want to spider some no-exist pages.

Yahoo don't follow the links but directly go to "eat" some pages but acturally doesn't exist.

I am wondering how did yahoo slurp go to sone never existed sub directory to "eat" pages?

George_hu

1:20 am on Feb 27, 2005 (gmt 0)

Hi,

I just installed few weeks ago an error trap.
I can confirm, so many never existed page/files requested.

Even not fit in the exisiting structure and or file naming. Looks like some random or just mixed the sites/files.

Sometimes G. also request one never existed page, but that is not changing.

Anyone interested to solve this problem @ yahoo?

johnlim

2:04 am on Feb 27, 2005 (gmt 0)

George,

How many pages of your this site cached in yahoo? Pls use site:domain.com in yahoo.com to get.

George_hu

2:34 am on Feb 27, 2005 (gmt 0)

1620 (blog)

soapystar

10:02 am on Feb 27, 2005 (gmt 0)

this has been noted sooooooo many times.....

wot are the names of the files its looking for does it always look for a directory?

George_hu

11:53 am on Feb 27, 2005 (gmt 0)

These are exist in different DIR, and nothing to do with them, since called by CRON for weather feeds.
So not even see if it's OK, since just fetching some file.

/infocenter/radar45.php
/infocenter/radar00s.php
/infocenter/radar45s.php
/infocenter/radar15s.php
/infocenter/radar00.php
/infocenter/radar30s.php

These are never exist>

/sogn_og_fjordane/000947.htm
/equipa/000253/agreement.htm
/nisshi.htm
/pd-chika/000666.htm
/gakugei/000636.htm
/committees/000184/tfaq.htm
/bt2115pd.htm

So all of them files.

George_hu

11:55 am on Feb 27, 2005 (gmt 0)

IF
this has been noted sooooooo many times.....
WHYNOT
reported by anyone to y.?

OR
they not listen?
OR
they had problem with the spider?

walrus

3:57 pm on Feb 27, 2005 (gmt 0)

Yes they have big problems with this, not sure how detrimental it can be , i see 302s where they should get 404s,
kinda odd.
sent letters
No reply at all.

garyr_h

8:02 pm on Feb 27, 2005 (gmt 0)

I just bought a domain which hasn't been used for over a year yet slurp is still spidering pages which were up in early 2004 and haven't been there since...

Also 301s don't seem to be doing that good of a job redirecting Mr. Slurp...

johnlim

6:53 am on Feb 28, 2005 (gmt 0)

also notices an empty directory of the sites spidered by yahoo slurp.

domain.com/?=d
domain.com/?=h
domain.com/?=d etc

many pages cached in yahoo.....

johnlim

2:58 pm on Mar 6, 2005 (gmt 0)

anybody have new ideas about this issue?

soapystar

11:16 am on Mar 8, 2005 (gmt 0)

i believe they are looking for sites using identical site/template and directory structure.....

makuabob

1:23 am on Mar 15, 2005 (gmt 0)

Hi, all.

I have a not-for-profit site that's been on the same server for more than two years now. For the past 6 months, I have had access to the log files for my domain.

Among other problems, Yahoo!'s SLURP is doing just as the previous posters have mentioned: making nonsense requests, but each one has one small piece of info in it that DOES relate to my site.

I HAVE e-mailed Yahoo! about it (KMM75141283V67679.... is their auto-response #, less the last four digits for my privacy) but no reply from them for two weeks now.

I don't really see that the requests are causing a problem, but,... what do I know about the subtlities of web-savvy tech-nerds?

My impression of SLURP is that it is slightly unstable but not worth blocking from the site,... yet.

Peace,.. out,..

makuabob