Forum Moderators: goodroi
Long ago, in webmaster tools, I noticed that a page with variables appended got listed under "Links to your site".
Sometimes, it may be a spammy site that picks links and ads from AdWords, I have no clue why, and sometimes it's from people that create content manually in blogs, and then they link to my site for reference purpose, by picking a link from AdWords.
Anyhow, it's duplicated content when you have index.html, index.html?1, and index.html?2 listed.
To fix that, I used robots.txt by simply adding this:
Disallow: /*?
There is even a reference from Google about it:
[google.com...] under Pattern matching.
Now, just seeing those URLs under Links in webmaster tools, does that really mean that the page gets indexed as a separate one? Or this simply means "this is how somebody links to you"?
Thanks
When I take a better look into all pages under Webmastertools, I see that stuff I block in robots.txt is listed as blocked.
The URLs listed under external links simply show how other sites link to my site.
Finally, when I query Google by site:example.com, I get pages as per my sitemap. Still, if i pick "include omitted results", trailing stuff after question mark shows up, and also other pages banned through robots.txt show up.
Is that right?
I would think that if something is banned through robots.txt, that should be evaded 100%.
I don't like this behaviour either, but it's in full accordance with the purpose and scope of the Standard for Robot Exclusion, which explicitly states that it is a 'fetch control' mechanism.
If you don't want the page to show up in the "show omitted" search results, then don't Disallow it in robots.txt. Instead, permit Google to fetch it, but only after adding a <meta name="robots" content="noindex,nofollow"> tag to the page.
Jim
If you don't want the page to show up in the "show omitted" search results, then don't Disallow it in robots.txt. Instead, permit Google to fetch it, but only after adding a <meta name="robots" content="noindex,nofollow"> tag to the page.
Thanks very much for thorough explanation Jim.
But... here I'm coming from the same spot as in Apache forum where you replied as well. It is about:
page.html
page.html?v=something
I cannot change meta tags in those pages. It's same page that has to be indexed, but with variables applied (query string).
Vicious circle...
There is that "parameter exclusion" in Webmaster tools that Tedster has mentioned under "Goole Search" (yeah, I posted about same problem there, too), which I hope will help.