homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
How to stop content scrapers?
kokopoko

10+ Year Member



 
Msg#: 4675415 posted 2:57 pm on May 28, 2014 (gmt 0)

Our main site allows business owners to post inventory for sale. This inventory, including internal links, is getting scraped and showing up on crappy sites.

How can I block the scrapers or make it impossible for them to scrape our content?

 

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4675415 posted 6:11 pm on May 28, 2014 (gmt 0)

The way I deal with scrapers is to monitor my access logs and find "visitors" that are automated, lookup the IP information and block that server. It is time intensive to build up your information, but it gets much easier after a few months. IPs all pertain to server addresses, but some are ISPs and others are Hosts. Hosts don't send visitors, only robots.

There is a Forum here dedicated to information about SE, Spiders and the Hosts they come from. At the top of that Forum: [webmasterworld.com...] there is information about how to tell robot/automated activity on your site from the activity of human visitors. A lot of information and it takes time to read, understand and take action - but it does solve the problem better than anything else I know.

kokopoko

10+ Year Member



 
Msg#: 4675415 posted 2:51 pm on May 29, 2014 (gmt 0)

We've been blocking IP addresses but boss now wants to code content so that it is visible on the page but not in the source code of the page so it can't be scraped. By hiding content like this, we are completely screwing ourselves. How can these pages rank if all the content is hidden?

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4675415 posted 4:22 pm on May 29, 2014 (gmt 0)

Have you thought of using dynamic delivery for those pages so the actual pages are not static files? That is far better than the cloaking scheme that can really be a problem as it specifically goes against Google's Webmaster Guidelines: [support.google.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved