Drew_Black

msg:3469078 | 2:53 pm on Oct 4, 2007 (gmt 0) |
I think there's a bug in GWT. I reported it in one of their forums. I've noticed that if there's a 301 redirect from Page A to Page B and Page A is disallowed in robots.txt then Page B will be reported as Restricted in robots.txt in GWT.
|
silverbytes

msg:3473194 | 10:02 pm on Oct 9, 2007 (gmt 0) |
I have exactly same issue and sadly don't know how to fix it, all my sites are ok but blogger blog has 19 errors this kind: http://mysite.blogspot.com/search/label/alojamiento URL restricted by robots.txt [?] Sep 30, 2007 I think I have no robots.txt there... how do we fix that? [edited by: tedster at 6:10 am (utc) on Oct. 10, 2007] [edit reason] delink [/edit]
|
Drew_Black

msg:3473409 | 3:40 am on Oct 10, 2007 (gmt 0) |
I don't think you can fix it if someone is doing a 301 redirect to your site from a site that you don't control. This is the nature of the bug. The destination page is appearing as disallowed by robots.txt when it's the source page that did the 301 that should really be listed. Example: example.com has an outbound traffic tracking script that records outbound clicks using a page like http://example.com/click.php Click.php is in example.com's Disallow: section for Googlebot (or for *). When user clicks the link to http://yoursite.com/yourpage.html the click.php page records the click and redirects the user with an HTTP 301 to your site. For some reason GWT is reporting the destination URL as blocked by robots.txt. I have this happening on many hundreds of links. [edited by: tedster at 6:11 am (utc) on Oct. 10, 2007] [edit reason] de-link [/edit]
|
adfree

msg:3473553 | 7:30 am on Oct 10, 2007 (gmt 0) |
Thanks Drew, know of any negative impact?
|
Susan Moskwa

msg:3474290 | 11:10 pm on Oct 10, 2007 (gmt 0) |
Hi folks-- I work with the Google webmaster tools team and can clarify some of these issues for you: All Blogger blogs have a robots.txt file automatically created for them (add "/robots.txt" to the end of your blog's URL and you'll see yours). These files all disallow the /search directory, which is part of the path when you're viewing all of your blog posts that have a particular label. Disallowing crawlers from this section of your blog basically keeps them from crawling and indexing the same blog post in multiple places (on its permalink URL *and* under each of its labels), which reduces potential problems with duplicate content. adfree and silverbytes, it sounds like this is the cause of the errors you're seeing in webmaster tools. There's no way to get rid of them, since Blogger doesn't let you edit your robots.txt file, but they're not something you need to worry about (since those URLs were disallowed deliberately). Drew_Black, it sounds like the 301 redirect issue you're talking about may be unrelated to Blogger? If there's a 301 redirect from page A to page B and page *B* is blocked by a robots.txt file, then page *A* will show a "Restricted by robots.txt" error in webmaster tools (there's a blog post from September '06 on the Google Webmaster Central blog with more details about this). But the opposite shouldn't be true (page A is blocked but page B shows the error). I believe I've found your thread in our Webmaster Help Group (a search for [chaosunlimited destination] in our Help Group returns your question, right?); could you post an example URL there so that we can look into the issue further? Thanks!
|
tedster

msg:3474301 | 11:24 pm on Oct 10, 2007 (gmt 0) |
Hello Susan. Welcome to the forums and thanks for pitching in on these questions. I espcially appreciate the insight into the Blogger robots.txt issues. Good to know that Google has preventative steps in place to avoid those nasty duplicate url issues. Also thanks for respecting our policies about links and taking the example URL discussion with Drew_Black over to Google's Webmaster Help Group.
|
adfree

msg:3475648 | 7:28 am on Oct 12, 2007 (gmt 0) |
Neat Susan, this is helpful! Thanks.
|
|