Welcome to WebmasterWorld Guest from 54.163.35.238

Forum Moderators: goodroi

Message Too Old, No Replies

Problem with robots.txt?

     

andrewc

11:41 am on Feb 20, 2014 (gmt 0)



Hi there,

I received this message in my webmaster tools account. But don't actually understand what its about.

Thanks in advance guys!

Here is the message form the Blocked URLs section:


Blocked URLs

If your site has content you don't want Google or other search engines to access, use a robots.txt file to specify how search engines should crawl your site's content.
Check to see that your robots.txt is working as expected. (Any changes you make to the robots.txt content below will not be saved.)


robots.txt analysis
ValueResult
Line 4: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 5: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 6: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 7: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 8: Sitemap: http://www.example.com/Valid Sitemap reference detected
Line 9: Sitemap: http://www.example.com/Valid Sitemap reference detected

phranque

4:09 pm on Feb 20, 2014 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, andrewc!


Here is the message form the Blocked URLs section:


Blocked URLs

that would be normal if you have Disallow directives in your robots.txt file

andrewc

4:28 pm on Feb 20, 2014 (gmt 0)



I used to have them, now i removed them. Could it be related to the URLs i removed in the last two days from the google index? (around 200)

lucy24

11:04 pm on Feb 20, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Well, it's possible they sent you the wrong form letter :) Crawling and indexing are separate activities. It's also possible that if you remove a lot of URLs at once, they take an extra look at robots.txt to see if anything there has changed.

What did happen to those 200 pages? Are they roboted-out, physically removed (404 or 410), or does each one have a <noindex> tag?

tangor

11:18 pm on Feb 20, 2014 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I'm just looking at two terms: robots.txt and Sitemap... Not the same things... What's up?

phranque

12:29 am on Feb 21, 2014 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



In Google terminology blocking means exclude from crawling.
removing a URL from the index doesn't block it.

andrewc

6:49 am on Feb 21, 2014 (gmt 0)



@lucy24 we did a redesign and some of the urls are old categories that don't exist anymore. Since last week are physically removed (410) but i also removed them from google urls to speed things.

lucy24

8:40 am on Feb 21, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Good. If it doesn't exist, they can't possibly change their minds three months down the line. (Or can they? Has anyone ever found a search engine bringing back pages that it hasn't crawled in months?)

I'm just looking at two terms: robots.txt and Sitemap

I think it means that the robots.txt file includes references to six (!) sitemaps. It isn't part of the robots.txt standard-- well, nothing is except "Disallow" --but it's to google's advantage to recognize it ;)

Speaking of which: Make double-sure that no sitemap, anywhere, mentions those old pages.

andrewc

4:48 pm on Feb 24, 2014 (gmt 0)



Ok, cleared the robots.txt

Now it looks like this: User-agent: *
Allow: /


The problem is that there are some blocked urls, at least google says this. How long should it take to clear those blocked urls from WMT?

lucy24

9:34 pm on Feb 24, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



It should happen right away if you "fetch as googlebot". But don't say
Allow: /

say
Disallow:

(nothing after "Disallow:")

If you can do something within the strictest confines of the robots.txt standard, do so.

andrewc

8:09 pm on Feb 25, 2014 (gmt 0)



Ok, i just updated the robots. txt. Thanks for the tip lucy24!
 

Featured Threads

Hot Threads This Week

Hot Threads This Month