Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why does googlebot try to get strange URLs?

         

johnlim9988

12:47 am on Oct 31, 2006 (gmt 0)

10+ Year Member



Hi,

Googlebot/2.1 (IP is 66.249.72.134) try to get such following starnge URL at our site, which it always get 404 error,

GET /sub1/tacmpa.h%20...
GET /sub2/ke-es%20...

Also this googlebot/2.1 also try to get some picture at starange format like,

GET /pictures/MA/?C=S;O=A HTTP/1.1 200 584
GET /pictures/PJ/?C=S;O=A HTTP/1.1 200 5706
GET /pictures/IQ/?C=N;O=D HTTP/1.1 200 5904

Here the googlebot get 200 status, we cannot understand why,

1) It try to get picture at such strange URL, we never have such links from external or internal.

2) why googlebot/2.1 to get image? Should have another googlebot which spider images, right?

Thanks.

tedster

5:33 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here the googlebot get 200 status

The first order of business is fixing that so that you serve a 404 Not Found. Otherwise the 200's may sink you!

You may never know exactly where those URLs are coming from. Googlebot may have a coding error, or may be testing your server to see how you respond to bad URLs (that happens.) Some competitor might have noticed that you are vulnerable and is now posting those bad links somewhere. Or they may be surfing to those bad urls with the Toolbar turned on. Someone may be directly submitting those URLs. Or you may have a dynamic script on your site that is misfiring somehow.

So the urgent need you have is to start responding with a 404. Then, if you want, you can be a detective and find the source.

shogun_ro

10:20 am on Oct 31, 2006 (gmt 0)

10+ Year Member



Then, if you want, you can be a detective and find the source.

How?
I can't find that page where someone put a link to a fake page of my site.

hakukoneoptimointi

11:51 am on Oct 31, 2006 (gmt 0)

10+ Year Member



GET /pictures/MA/?C=S;O=A HTTP/1.1 200 584
GET /pictures/PJ/?C=S;O=A HTTP/1.1 200 5706
GET /pictures/IQ/?C=N;O=D HTTP/1.1 200 5904

You get those if you rearrange directorylisting depending on Name,Last modified,Size or Description

johnlim9988

12:26 pm on Oct 31, 2006 (gmt 0)

10+ Year Member



Acturally already put the following lines at robots.txt to forbidden the robot to crawl the directoy pictures/

User-agent: *
Disallow: /pictures/

Cannot understand why the googlebot still can crawl those pages.