Forum Moderators: open

Message Too Old, No Replies

Yahoo Crawler having problems indexing my site

is requesting urls with ( by using %28

         

DoppyNL

5:07 pm on Jan 11, 2005 (gmt 0)

10+ Year Member



here is an example of a typical request made by the Yahoo crawler, lines like this one are all over my logs:

66.196.90.20 - - [10/Jan/2005:20:04:32 +0100] "GET /%28%28pa%28559/ HTTP/1.0" 404 1368 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

As you can see it is requesting an url ending with:
/%28%28pa%28559/
while this should be:
/((pa(559/

Since the URL it requests isn't correct, Yahoo gets a 404 back.
Google, MSN and other search enginge's haven't got any problem at all with this.

Does anyone know why this is going wrong?

I tried contacting Yahoo, but got no reply.

tnx

DoppyNL

8:17 am on Jan 12, 2005 (gmt 0)

10+ Year Member



I forgot to add:
The index pages are in the index, but that doesn't involve a ( in the url, since those are accessed by requesting the URI "/".

theBear

3:16 pm on Jan 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When SLURP parses a page it screws up at some point and passes back garbage as urls to follow.

Thus the 404s.

This has been going on for YEARS, seems they can't find the BUG and FIX it.

I guess the programming staff must still be in DIAPERS or flunked DEBUGING 101.

Maybe if you sent them a can of Raid ;)

DoppyNL

7:42 am on Jan 16, 2005 (gmt 0)

10+ Year Member



Well, if thay are to stupid to fix their crawler, then I don't want to be in their index :P

it isn't a real big problem, as I'm not expecting much visitors from Yahoo. All site's are in dutch and not many dutch people use Yahoo to seach for dutch site's. And it will be solved for me when I switch to a new CMS currently in development. Allthough that will take some time :-¦.

What comes to my mind is: I know they haven't fixed this, what more isn't fixed that we don't know about?

Yahoo_Mike

7:07 am on Jan 19, 2005 (gmt 0)

10+ Year Member



Thanks for bringing this to our attention. Looks like there may be something going on here -- we’ll take a detailed look at this issue. To flag item like this, please send us an email at ystfeedback@yahoo.com

Thanks,

Yahoo! Mike

theBear

12:05 am on Jan 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I did exactly that back in 2002 when I first noticed the problem ... I heard nada .... I got a much better response when I helped another SE track down it's incorrect handling of server responses.

I say it once, then I figure that the powers that be don't give a rats hind quarter, since they don't care why should I.

I provide detailed information when I present a possible software problem. Having been in the field for over 35 years I know exactly how important detailed data information is in finding software errors.

Good luck.

DoppyNL

8:55 am on Jan 21, 2005 (gmt 0)

10+ Year Member



Thanks for bringing this to our attention.

Perhaps you should be looking at the emails you received late september (lets say, september 26?!?) I've mailed this problem back then complete with a section of my log to show what is the problem.

I got ZERO response (not even a "we received your message"-message; not that I like those ofcourse.)

Looks like there may be something going on here -- we’ll take a detailed look at this issue. To flag item like this, please send us an email at ystfeedback@yahoo.com

Looks like it? no kidding!
I think the details in the first post are quite clear; requests like that are all over my log! A couple more lines won't help you more.
If you need to know what domain it is, put a dot on the right spot in my username. If you can't find the spot, send me a sticky.

Also it might be a good idea to place an email adres on the yahoo-search-site somewhere, I had to use WebmasterWorld in September to find out where I could mail you guys!

Oh, one last thing. Sorry if I sound a little rude. I didn't sleep very well last night, don't know why and I'm just awake 30 minutes. (in the netherlands it is currently 9:56 in the morning).

and people, don't forget to smile today! :-)