homepage Welcome to WebmasterWorld Guest from 50.17.7.84
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Problem with robots.txt?
andrewc




msg:4647085
 11:41 am on Feb 20, 2014 (gmt 0)

Hi there,

I received this message in my webmaster tools account. But don't actually understand what its about.

Thanks in advance guys!

Here is the message form the Blocked URLs section:


Blocked URLs

If your site has content you don't want Google or other search engines to access, use a robots.txt file to specify how search engines should crawl your site's content.
Check to see that your robots.txt is working as expected. (Any changes you make to the robots.txt content below will not be saved.)


robots.txt analysis
ValueResult
Line 4: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 5: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 6: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 7: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 8: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 9: Sitemap: http://www.example.com/Valid Sitemap reference detected

 

phranque




msg:4647156
 4:09 pm on Feb 20, 2014 (gmt 0)

welcome to WebmasterWorld, andrewc!


Here is the message form the Blocked URLs section:


Blocked URLs

that would be normal if you have Disallow directives in your robots.txt file

andrewc




msg:4647167
 4:28 pm on Feb 20, 2014 (gmt 0)

I used to have them, now i removed them. Could it be related to the URLs i removed in the last two days from the google index? (around 200)

lucy24




msg:4647291
 11:04 pm on Feb 20, 2014 (gmt 0)

Well, it's possible they sent you the wrong form letter :) Crawling and indexing are separate activities. It's also possible that if you remove a lot of URLs at once, they take an extra look at robots.txt to see if anything there has changed.

What did happen to those 200 pages? Are they roboted-out, physically removed (404 or 410), or does each one have a <noindex> tag?

tangor




msg:4647294
 11:18 pm on Feb 20, 2014 (gmt 0)

I'm just looking at two terms: robots.txt and Sitemap... Not the same things... What's up?

phranque




msg:4647312
 12:29 am on Feb 21, 2014 (gmt 0)

In Google terminology blocking means exclude from crawling.
removing a URL from the index doesn't block it.

andrewc




msg:4647359
 6:49 am on Feb 21, 2014 (gmt 0)

@lucy24 we did a redesign and some of the urls are old categories that don't exist anymore. Since last week are physically removed (410) but i also removed them from google urls to speed things.

lucy24




msg:4647390
 8:40 am on Feb 21, 2014 (gmt 0)

Good. If it doesn't exist, they can't possibly change their minds three months down the line. (Or can they? Has anyone ever found a search engine bringing back pages that it hasn't crawled in months?)

I'm just looking at two terms: robots.txt and Sitemap

I think it means that the robots.txt file includes references to six (!) sitemaps. It isn't part of the robots.txt standard-- well, nothing is except "Disallow" --but it's to google's advantage to recognize it ;)

Speaking of which: Make double-sure that no sitemap, anywhere, mentions those old pages.

andrewc




msg:4648633
 4:48 pm on Feb 24, 2014 (gmt 0)

Ok, cleared the robots.txt

Now it looks like this: User-agent: *
Allow: /


The problem is that there are some blocked urls, at least google says this. How long should it take to clear those blocked urls from WMT?

lucy24




msg:4648732
 9:34 pm on Feb 24, 2014 (gmt 0)

It should happen right away if you "fetch as googlebot". But don't say
Allow: /
say
Disallow:
(nothing after "Disallow:")

If you can do something within the strictest confines of the robots.txt standard, do so.

andrewc




msg:4649058
 8:09 pm on Feb 25, 2014 (gmt 0)

Ok, i just updated the robots. txt. Thanks for the tip lucy24!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved