glitterball

msg:1525650 | 11:03 am on Jan 12, 2005 (gmt 0) |
I have noticed that Google is ignoring my robots.txt also. A few months ago I noticed that google was spidering many instances of a form that is on my site e.g. Goggle had form.asp?id=2 form.asp?id=3 etc. My robots.txt now reads: User-agent: * Disallow: form.asp However more than 6 months later (and Google has been back and re-cached the page several times), Google is still including the page. A clarification from Google would be nice.
|
LowLevel

msg:1525651 | 12:41 am on Jan 15, 2005 (gmt 0) |
After all this, google is passing PR and listing my site as backlinks for the recipients. |
| The purpose of a robots.txt file is to ask robots to not download files. Was your "go.php" page actually downloaded and indexed by Google? If not, Google has followed your robots.txt directives. That should be "Disallow: /form.asp". Try to analyze your robots.txt files with a robots.txt validator.
|
glitterball

msg:1525652 | 1:59 pm on Jan 17, 2005 (gmt 0) |
Hi LowLevel Thanks for you advice regarding the use of "/" before the filename. Actually many of the "authority" sites on Robots.txt do not specify the need for a preceeding forward slash and my robots.txt file seems to validate okay without it. Anyway, I have updated my robots.txt with your suggestion, so hopefully the "/" will do the trick.
|
dedmond29

msg:1525653 | 4:54 pm on Jan 18, 2005 (gmt 0) |
Hello - I am having the same problem - but it is in regards to a cgi-bin folder which Google is indexing the query results. I did a site check on Google and am certain that it has indexed these, which I do not want it to do. Here is my current Robots.txt file info: User-agent: * Disallow: /cgi-bin/ Should I specify the Googlebot as well? Thanks!
|
|