homepage Welcome to WebmasterWorld Guest from 54.242.241.20
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt Content
almo136

5+ Year Member



 
Msg#: 4084471 posted 10:44 pm on Feb 21, 2010 (gmt 0)

I created a robots.txt file and added this content to it:

sitemap: http://example.com/sitemap/sitemap.xml
User-agent: *
Disallow: /enable-cookies
Disallow: /provacy-policy

Does this seem correct. I'm trying to show the search engines where my sitemap is and block those 2 pages from being crawled.

The 2 pages I want to exclude are included in the sitemap (it gets automatically generated by my cms)

Thanks.

 

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4084471 posted 11:22 am on Feb 22, 2010 (gmt 0)

sitemap: http://example.com/sitemap/sitemap.xml
User-agent: *
Disallow: /enable-cookies
Disallow: /provacy-policy


I have only seen 'Sitemap:' (with a capital 'S') - although I'm not sure whether it is case-sensitive or not? (All other directives have a capital first letter)

I would also put the Sitemap: directive last (and separated by a blank line - used to delimit records in robots.txt). I have read that not all robots support the Sitemap: directive, so in order to prevent these bots from prematurely aborting processing of the robots.txt file it should appear last.

The 2 pages I want to exclude are included in the sitemap (it gets automatically generated by my cms)


IMHO, if they are disallowed in robots.txt then the search engine should be prevented from accessing them, regardless of whether they are linked to from elsewhere, or included in your sitemap - but I don't know for sure; just my opinion. Ideally they should not be in your sitemap.

winwinmantra



 
Msg#: 4084471 posted 1:54 pm on Feb 23, 2010 (gmt 0)

Why don't you use Google webmaster tool (www.google.com/webmasters/tools/) to submit the sitemap. I work more effectivly and fast when compared to robots.txt file.

Hyder
<snip>

[edited by: goodroi at 2:21 pm (utc) on Feb 23, 2010]
[edit reason] Please no signature links [/edit]

almo136

5+ Year Member



 
Msg#: 4084471 posted 2:03 pm on Feb 23, 2010 (gmt 0)

Because this will only submit it to google search engines. Robots.txt will allow other search engines to find it.

penders

WebmasterWorld Senior Member penders us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4084471 posted 11:50 am on Feb 25, 2010 (gmt 0)

Just a thought... with the sitemap: directive in robots.txt do you still need to resubmit it to the search engines (via HTTP request for example)?

Since if you use Google Webmaster Tools to submit the sitemap in the beginning, you still need to resubmit it when it changes.

almo136

5+ Year Member



 
Msg#: 4084471 posted 11:56 am on Feb 26, 2010 (gmt 0)

My cms submits generates and submits the sitemap automatically to google and yahoo.

amythepoet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4084471 posted 1:33 am on Apr 9, 2010 (gmt 0)

I am missing a robots.txt file in my site

can you please tell me if it is n ecessary and why? I"m confused

thank you

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4084471 posted 8:47 am on Apr 9, 2010 (gmt 0)

by default the search engine bots will crawl your site unless excluded by some technical method.
a missing or empty robots.txt file is equivalent to permission to crawl.
an empty robots.txt is preferable since the frequent requests for that file will return a 200 OK status code response instead if a 404 Not Found.
if you wish to exclude some or all bots from crawling a part of your site, the robots.txt file is one of the methods available.

amythepoet

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4084471 posted 11:12 am on Apr 9, 2010 (gmt 0)

ok, I got it, thank you

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved