URL blocked in robots.txt reported in WMT as having dup title/desc

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

URL blocked in robots.txt reported in WMT as having dup title/desc

aakk9999

7:29 pm on Apr 26, 2009 (gmt 0)

I have few URLs that are blocked via robots.txt, however, Google still reports duplicate titles and descriptions for these URLs. I would imagine that blocking URL through robots.txt should avoid these issues?

Likewise, I have two identical pages, one has URL with lang=en parameter and one without (neither blocked by robots.txt).

I have put canonical tag on pages with lang=en to point to the URL without lang=en parameter, however WMT still reports these two pages as having duplicate titles / descriptions.

I would imagine that

a) if a page is blocked in robots.txt, it should not report duplicate title / description in WMT?

b) if a page has canonical tag implemented, it should not come as duplicate title/description with the page the canonical tag points to?

I have verified that the pages ARE blocked in robots.txt and they are also listed under "pages blocked by robots.txt in WMT.

Or is my understanding wrong?

tedster

9:41 pm on Apr 26, 2009 (gmt 0)

The only way the title and meta description can be discovered is if the page gets spidered at least once, and that shouldn't happen if it's disallowed in robots.txt. A couple thoughts:

* Was that url always in robots.txt from the time it first went live, or did you add it to robots.txt a bit later?

* Have you validated the syntax of your robots.txt file?

But if the url is no longer allowed, then whatever kind of loose information is being reported in the WMT pages, there should be nop effect on the ranking of your allowed pages. Yes, you're right about the canoncial tag - it's clearly no worry even if the WNT reports duplicates across several urls with the same canonical tag.

[edited by: tedster at 9:55 pm (utc) on April 26, 2009]

aakk9999

9:50 pm on Apr 26, 2009 (gmt 0)

The page is a new page. So I have firstly uploaded changed robots.txt which had that new page excluded, and then uploaded the page. Mind you, this has been all done the same day - maybe I should have waited and checked WMT to see if Google has picked up the new robots.txt before uploading the page itself?

I guess all I can do now is wait and see if duplicate title / descriptions disappear from content analysis after some time has passed.

BTW, I have verified robots.txt (I used a tool in WMT and it confirmed the URL is excluded via robots.txt) and even more, the page itself appears as an entry under "Pages disallowed by robots.txt" in the WMT overview, so obviously, Google has picked up the info that this page should not have been crawled!

tedster

9:57 pm on Apr 26, 2009 (gmt 0)

Google does not check robots.txt before every url request, but it does check frequently. At any rate, with the url now blocked you can request its removal from the index and that should end the issue for you.

[edited by: tedster at 11:44 pm (utc) on April 26, 2009]

g1smd

11:38 pm on Apr 26, 2009 (gmt 0)

Wait at least a week, maybe two, before WMT will show the correct status for the URL in question.

aakk9999

6:17 am on Apr 27, 2009 (gmt 0)

Thanks on reply to both of you. I will wait and see what happems.