|Removing Content Pages From Google Index|
| 7:02 pm on Nov 11, 2007 (gmt 0)|
I'd like to remove some dublicated content pages from google index. Is it enough to ban the mentioned pages from robots.txt and add
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
to the pages?
Also what else can I do to avoid some new pages being indexed by google and any other search angines?
In advance thank you for any comments. :)
| 7:19 pm on Nov 11, 2007 (gmt 0)|
To remove pages immediately, sign up for Google Webmaster Tools and use the URL removal link.
To remove pages as you indicated, you need to use only one method... If you disallow in robots.txt, the page(s) disallowed will not be accessed again, so the 'noindex' meta tag will not be seen, making it ineffective.
If possible, I think the better method is to redirect the duplicated pages to a single set of pages, so you will gain the benefits of any inbound links, but if redirecting is not an option, either method you suggested should be effective, both for removing pages and keeping new pages out of the index.
| 7:31 pm on Nov 11, 2007 (gmt 0)|
Justin Thank you very much for the reply. Will robots.txt baning and <META NAME.... work for other search engines too or just Google?
Are any ways to hide just part of the pages? Such as some links or part of content, I know about "no follow", but are there other ways to hide links/content complitely and for all search engines?
| 7:42 pm on Nov 11, 2007 (gmt 0)|
One good aproach to hide part of a web page from indexing is to put that content on a separate url, one that you disallow in robots.txt. Then you can display that content in an iframe on the original page for your human visitors there, but search engines will never see it.
But there is no html mark-up that disallows indexing for just part of a document.
| 7:52 pm on Nov 11, 2007 (gmt 0)|
Banning in robots.txt will work for any standards compliant robot / spider, so it should work for all (or nearly all) commercial search engines. The robots meta tag should also work in the major SEs, but may not be use by some of the smaller ones --- I do not know personally, because I usually focus on the 'big 3'.
Keep in mind any type of 'hiding text' can be considered cloaking and/or spamming, so you really have to make your own determination, and use caution / discretion when implementing any system which shows different information to visitors and search engines.
I would suggest doing quite a bit of research, so you know the risk / reward prior to attempting to hide information... Also, keep in mind the way things are treated today could change tomorrow, and what was 'not seen' today, might be 'seen' as a 'red flag' in the near future.
| 11:45 am on Nov 12, 2007 (gmt 0)|
Thank you very much :)
| 3:28 pm on Nov 12, 2007 (gmt 0)|
We removed content, basically all of our pages, 5000 +, using
the URL removal tool. Worked great. Then we removed all but 14
pages from our server. Two weeks later, we cancelled the Url
removal request, in the hope that just the 14 pages would be
eventually reindexed. Much to our dismay, most of the pages
not even on our server anymore, have reappeared in the Serps.
Honestly, I think the Url removal tool, should be renamed
the Url "hold" tool, as it does not really appaer to remove
| 10:00 am on Nov 13, 2007 (gmt 0)|
DannyTweb, thanks you for comments. If you have any other tips for content removal please post.
| 1:57 pm on Feb 14, 2008 (gmt 0)|
You cannot control Google. You can ask for a removal (usually unwise, if you want visitors), but doing this as a 'trick' to try and force Google to update their database is doomed.
The 'gone' pages will fall out eventually, meanwhile, just be sure that the 'new' pages are better, and therefore more likely to appear in the serps.
The fact that dead pages 'can' be found does not mean that they will (by the average searcher) - try a few keyword searches and you'll see.