|Google cached my robots.txt file|
This just killed me - I dont know what to think, so I am putting it out there just to see what you guys think of it:
When I do a site: search, one of the pages Google has looks like this:
User-agent: * Disallow: /file/ (this is the title)
User-agent: * Disallow: /file/ (this is the description)
www.webmasterworld.com/robots.txt - 1k - Supplemental Result
Am I missing something here? Seems like there must be some very simple explanation why G cached my robots.txt, making up it's own title and all. The original file is fine, and it's on two lines, unlike the cached version:
Maybe I made a mistake, or misspelled a word?
As far as I know, it only gets indexed if someone somewhere links to it.
Google does index text files you know.
|Google does index text files you know. |
I didn't think Google looks at robots.txt as just a text file - Google always looks for robots.txt and if what you say it's true - everyone's robots.txt would have been cached.
maybe not altrus, I think there has to be at least "one" link to it somewhere for it to show up in the search.
Not true. I can have a page indexed without any links to it.
Sure, if you continually submit it to Google.
But barring pushing the envelope, only URLs with links stay in the index. That's how Google works; and why no site ever needs submitting; just link to it, and Google will, er, follow the links :)
There's never a need to link to a robots.txt file. Google will find that if the domain is indexed.
On the other hand, having that in Google's index really will do no harm (and no good). Best just to be normal, however - too much experimenting can be harmful to your income ;)
[edited by: Quadrille at 2:55 am (utc) on Jan. 30, 2007]
|On the other hand, having that in Google's index really will do no harm (and no good). Best just to be normal, however - too much experimenting can be harmful to your income ;) |
Yeah, but if Google has a robot.txt doesnt this mean that it does not look at it as a robots.txt but just a simple text file? Will it obey the disallow?
P.S. or I can submit a page through the sitemaps :)
I doubt that GoogleBot will disregard your robots.txt as a result of also having it indexed. A quick search for the phrase turns up 3 of the largest web sites (popularity-wise) with an indexed robots.txt file - WW, Whitehouse and Google's own. Brett or an administrator could confirm if robots.txt is being disregarded for this site. I will speculate and say 'it ain't so'.
Is your robots.txt listed in your sitemap file? It seems to me that *might* be considered a link.
We have robots.txt indexed as well and it happenned 5 days before we got hit by the 950 penalty (22 Dec). At the time, it was listed as our number1 page on a site:search, though this may have been meaningless. Since then it went supplemental. I'm not sure if Google stopped obeying it during that time but previously (and for many years) we had disallowed pages listed as urls only when doing a site:search, then they just disappeared and have only just returned back to normal.
I'm beginning to wonder if all this was and is related to the 950 penalty. Many members reported having their supplementals listed above their main pages when first hit by 950 penalty.... I wonder how many have a robots.txt indexed as well?
|P.S. or I can submit a page through the sitemaps |
You surely can; you surely can. But it remains a pointless exercise. The way to effectively have a page indexed is to link to that page. Period. "Forcing" Google, repeatedly, to include a file that shouldn't be there does not sound appropriate use of sitemaps or your time; indeed, getting a robots.txt indexed does not sound particularly useful, either.
Whether Google cares either way, I couldn't know. But if they do care, you can bet that's not 'care' as in 'fond affection'.
If you 'care' about your site, I'd suggest you stop playing games with it - sooner or later, the dragon will stir.
Never forget the Hogwarts motto: "Draco dormiens nunquam titillandus," which means "Never tickle a sleeping dragon." ;)
[edited by: Quadrille at 11:32 am (utc) on Jan. 30, 2007]
I wonder what Disallow: /robots.txt does?
I don't know.
But I'll bet it's not pretty. ;)