homepage Welcome to WebmasterWorld Guest from 50.19.144.243
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google's Nonstandard robots.txt is Not Valid
Googles robots.txt Does not validate
forak




msg:1529421
 4:33 pm on Aug 6, 2005 (gmt 0)

Now, check out google's robots.txt.
I thought it was pretty clear to everybody that the Allow: tag was not correct...
If you try to validate [searchengineworld.com] www.google.com/robots.txt, you simply get an error that google uses the Allow: tag in its robots.txt!

Does this mean that googl'es agents actually use the Allow: tag?

 

jatar_k




msg:1529422
 4:22 pm on Aug 7, 2005 (gmt 0)

it isn't a supported tag, doesn't mean they can't use it

Dijkgraaf




msg:1529423
 10:24 pm on Aug 7, 2005 (gmt 0)

Well obviously they can and have used it, however they risk having other bots/spiders getting confused and disregarding all their rules.

effisk




msg:1529424
 1:35 pm on Aug 17, 2005 (gmt 0)

a bit off-topic, but I see /search listed in google's robots.txt and if you look at these results: [search.msn.com...] you'll see a google search page listed there.
which means search.msn has indexed a disallowed url. It's funny to see a google result page listed in a microsoft one :)

The msnbot has probably indexed this link found on several other websites...

Lord Majestic




msg:1529425
 1:54 pm on Aug 17, 2005 (gmt 0)

you'll see a google search page listed there.

Its a link that must have been found on one of the crawled pages elsewhere -- robots.txt only regulates which pages should NOT be retrieved, not which URLs should not ever be used to link to the site.

PatrickDeese




msg:1529426
 1:59 pm on Aug 17, 2005 (gmt 0)

> which means search.msn has indexed a disallowed url.

I think your conclusion is wrong - MSN has not indexed the page, only the URL. Google will also show URL only results, even if they're banned via robots.txt

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved