Welcome to WebmasterWorld Guest from 54.167.0.111

Message Too Old, No Replies

Google cached my robots.txt file

     

atlrus

1:44 am on Jan 28, 2007 (gmt 0)

10+ Year Member



This just killed me - I dont know what to think, so I am putting it out there just to see what you guys think of it:

When I do a site: search, one of the pages Google has looks like this:


User-agent: * Disallow: /file/ (this is the title)
User-agent: * Disallow: /file/ (this is the description)
www.webmasterworld.com/robots.txt - 1k - Supplemental Result

Am I missing something here? Seems like there must be some very simple explanation why G cached my robots.txt, making up it's own title and all. The original file is fine, and it's on two lines, unlike the cached version:

User-agent: *
Disallow: /file/

Maybe I made a mistake, or misspelled a word?

g1smd

10:10 pm on Jan 29, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



As far as I know, it only gets indexed if someone somewhere links to it.

Google does index text files you know.

atlrus

12:10 am on Jan 30, 2007 (gmt 0)

10+ Year Member



Google does index text files you know.

I didn't think Google looks at robots.txt as just a text file - Google always looks for robots.txt and if what you say it's true - everyone's robots.txt would have been cached.

MThiessen

2:29 am on Jan 30, 2007 (gmt 0)

10+ Year Member



maybe not altrus, I think there has to be at least "one" link to it somewhere for it to show up in the search.

atlrus

2:44 am on Jan 30, 2007 (gmt 0)

10+ Year Member



Not true. I can have a page indexed without any links to it.

Quadrille

2:54 am on Jan 30, 2007 (gmt 0)

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Sure, if you continually submit it to Google.

But barring pushing the envelope, only URLs with links stay in the index. That's how Google works; and why no site ever needs submitting; just link to it, and Google will, er, follow the links :)

There's never a need to link to a robots.txt file. Google will find that if the domain is indexed.

On the other hand, having that in Google's index really will do no harm (and no good). Best just to be normal, however - too much experimenting can be harmful to your income ;)

[edited by: Quadrille at 2:55 am (utc) on Jan. 30, 2007]

atlrus

5:03 am on Jan 30, 2007 (gmt 0)

10+ Year Member



On the other hand, having that in Google's index really will do no harm (and no good). Best just to be normal, however - too much experimenting can be harmful to your income ;)

Yeah, but if Google has a robot.txt doesnt this mean that it does not look at it as a robots.txt but just a simple text file? Will it obey the disallow?

P.S. or I can submit a page through the sitemaps :)

grandpa

8:24 am on Jan 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I doubt that GoogleBot will disregard your robots.txt as a result of also having it indexed. A quick search for the phrase turns up 3 of the largest web sites (popularity-wise) with an indexed robots.txt file - WW, Whitehouse and Google's own. Brett or an administrator could confirm if robots.txt is being disregarded for this site. I will speculate and say 'it ain't so'.

Is your robots.txt listed in your sitemap file? It seems to me that *might* be considered a link.

MHes

8:38 am on Jan 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We have robots.txt indexed as well and it happenned 5 days before we got hit by the 950 penalty (22 Dec). At the time, it was listed as our number1 page on a site:search, though this may have been meaningless. Since then it went supplemental. I'm not sure if Google stopped obeying it during that time but previously (and for many years) we had disallowed pages listed as urls only when doing a site:search, then they just disappeared and have only just returned back to normal.

I'm beginning to wonder if all this was and is related to the 950 penalty. Many members reported having their supplementals listed above their main pages when first hit by 950 penalty.... I wonder how many have a robots.txt indexed as well?

Quadrille

11:31 am on Jan 30, 2007 (gmt 0)

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member



P.S. or I can submit a page through the sitemaps
.

You surely can; you surely can. But it remains a pointless exercise. The way to effectively have a page indexed is to link to that page. Period. "Forcing" Google, repeatedly, to include a file that shouldn't be there does not sound appropriate use of sitemaps or your time; indeed, getting a robots.txt indexed does not sound particularly useful, either.

Whether Google cares either way, I couldn't know. But if they do care, you can bet that's not 'care' as in 'fond affection'.

If you 'care' about your site, I'd suggest you stop playing games with it - sooner or later, the dragon will stir.

Never forget the Hogwarts motto: "Draco dormiens nunquam titillandus," which means "Never tickle a sleeping dragon." ;)

[edited by: Quadrille at 11:32 am (utc) on Jan. 30, 2007]

g1smd

4:54 pm on Jan 30, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I wonder what Disallow: /robots.txt does?

Quadrille

5:22 pm on Jan 30, 2007 (gmt 0)

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I don't know.

But I'll bet it's not pretty. ;)

 

Featured Threads

Hot Threads This Week

Hot Threads This Month