Welcome to WebmasterWorld Guest from 54.196.244.45

Forum Moderators: goodroi

Message Too Old, No Replies

Google's Nonstandard robots.txt is Not Valid

Googles robots.txt Does not validate

     
4:33 pm on Aug 6, 2005 (gmt 0)

New User

10+ Year Member

joined:Aug 4, 2005
posts:2
votes: 0


Now, check out google's robots.txt.
I thought it was pretty clear to everybody that the Allow: tag was not correct...
If you try to validate [searchengineworld.com] www.google.com/robots.txt, you simply get an error that google uses the Allow: tag in its robots.txt!

Does this mean that googl'es agents actually use the Allow: tag?

4:22 pm on Aug 7, 2005 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15755
votes: 0


it isn't a supported tag, doesn't mean they can't use it
10:24 pm on Aug 7, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Well obviously they can and have used it, however they risk having other bots/spiders getting confused and disregarding all their rules.
1:35 pm on Aug 17, 2005 (gmt 0)

New User

10+ Year Member

joined:Aug 17, 2005
posts:16
votes: 0


a bit off-topic, but I see /search listed in google's robots.txt and if you look at these results: [search.msn.com...] you'll see a google search page listed there.
which means search.msn has indexed a disallowed url. It's funny to see a google result page listed in a microsoft one :)

The msnbot has probably indexed this link found on several other websites...

1:54 pm on Aug 17, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


you'll see a google search page listed there.

Its a link that must have been found on one of the crawled pages elsewhere -- robots.txt only regulates which pages should NOT be retrieved, not which URLs should not ever be used to link to the site.

1:59 pm on Aug 17, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 6, 2003
posts:2523
votes: 0


> which means search.msn has indexed a disallowed url.

I think your conclusion is wrong - MSN has not indexed the page, only the URL. Google will also show URL only results, even if they're banned via robots.txt

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members