Whats the point in robots.txt

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Whats the point in robots.txt

Pages still return in the surps.

maccas

7:54 am on Apr 14, 2006 (gmt 0)

I have my entire cgi-bin denied in robots.txt so why does google still lists my cgi pages? Just had someone come to my site by typing "script.cgi?something=" which is a vulnerability in a particular script that hasn't been patched yet, so mysite.com/cgi-bin/script.cgi?something= gets returned, grrrr.

LifeinAsia

3:37 pm on Apr 14, 2006 (gmt 0)

If someone directly typed script.cgi?something=, which is a known vulnerability, don't you think that perhaps it could be a hacker and have absolutely nothing to do with Google?

Pages can take months to drop out of the SERPs.

maccas

7:25 pm on Apr 14, 2006 (gmt 0)

"known vulnerability, don't you think that perhaps it could be a hacker and have absolutely nothing to do with Google"

Yes it would of been a hacker. Yes it is googles fault they served up a url that was disallowed in robots.txt and always has been.

"Pages can take months to drop out of the SERPs", I have always have had a robots.txt with my cgi-bin disallowed since day one.

g1smd

11:50 pm on Apr 14, 2006 (gmt 0)

If something links to the URL then Google will list the URL as a URL-only entry in the SERPs. They don't try to index content at the URL by crawling it, but they do list that fact that it supposedly exists.

On the other hand, Yahoo, will attempt to make a fully indexed SERP by taking anchor text poiinting at the URL and use that as a title in the SERP (unless that anchor text is "click here" or something, and then they will ignore that anchor text).

abates

10:55 am on Apr 15, 2006 (gmt 0)

robots.txt contains instructions regarding what URIs bots are not allowed to download from your server.

robots.txt does not give any instructions on what URLs can be listed in a search engine index. It only controls bots, not the listings compiled. If Googlebot finds a link to a URL which it's banned from fetching, that doesn't prevent Google from listing the URL in their index, even if they don't know what's at the other end.

wmuser

3:57 pm on Apr 15, 2006 (gmt 0)

Why dont you simply patch that script?

abates

12:50 am on Apr 18, 2006 (gmt 0)

Having said what I said above, you might be able to log into Google's removal tool and use the "Remove pages, subdirectories or images using a robots.txt file." to mass remove pages from the denied directory. Disclaimer: I haven't tried this myself so I don't know whether it will do that or not.

g1smd

12:52 am on Apr 18, 2006 (gmt 0)

You cannot "remove" it.

Google merely "hides" it for 90 or 180 days and then adds it back into the public index.