homepage Welcome to WebmasterWorld Guest from 54.161.192.61
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Duplicate Content
joyjesters



 
Msg#: 4381219 posted 9:34 pm on Oct 29, 2011 (gmt 0)

Shoot. I did a search on Google like this for my site...

site:widgets.com/page

And over 1,000 pages came up indexed on Google! Thats so much duplicate content. How do i get rid of it? What's the best way to configure my robots.txt file to block that content?

Alex.

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4381219 posted 1:18 am on Oct 31, 2011 (gmt 0)

robots.txt will exclude the bot from crawling the content but won't prevent google from including those urls in the index.

depending on the details of your situation, the better solution may be one of the following:
- redirect those requests to the canonical urls
- meta robots noindex the documents served from non-canonical urls
- use a link rel canonical element
- use the ignore parameters feature in GWT if appropriate

then you should look for where google discovered those non-canonical urls and if that situation is under your control you should fix it at the source.

joyjesters



 
Msg#: 4381219 posted 2:59 pm on Oct 31, 2011 (gmt 0)

Thanks. I also did a submission to get rid of them via Webmaster Tools. Looks like they already got rid of them :)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4381219 posted 11:12 pm on Nov 1, 2011 (gmt 0)

My advice would be to look at the Search Engine Spider and User Agent Identification forum and the Google SEO News and Discussion forum hereabouts. The former has a lot of info about killing scrapers and hackers, the latter will tell you google has basically lost the plot. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved