Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Stop site scrapers

         

KaseyM

7:27 pm on Apr 18, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm at my wits end.

I must have about 6-10 sites at any one time scraping my site. I'm using WordPress.

I send requests to Google and occasionally a site gets knocked out but two more seems to then pop up.

They're all Indian sites that scrape my content then use some sort of AI to change different words within. Sometimes, they rank above me!

Is there anything I can do?

not2easy

9:06 pm on Apr 18, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It might be time to spend some time looking at your logs and at the non-human visits your site gets. Some time spent reading through our Search Engine Spider and User Agent Identification [webmasterworld.com] forum has information on how to make them stop. No, it isn't a quick fix, but if your content is important to you and you want to protect it, it can be done.

Since you are using WP, there are some plugins that assist in blocking unwanted robots, it beats nothing but does not protect and prevent the way your own efforts can.

tangor

1:07 am on Apr 19, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As noted above ^^^ look at your raw logs. It won't take too much effort to identify the real scrapers (or bots) and then deal with them in .htaccess (or whatever your config might be).

This is a never ending thing, but once you get started it becomes easier to keep the site "clean" over time. Scrapers never go away, and those you nuke are replaced by others.

JorgeV

7:48 am on Apr 20, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

If you block requests from IP range belonging to Web hosts, you will have sold more of the problem.

Secondly, you can set a limit to the number of requests an IP range can do per minutes , or hours, to detect scarping activity.

I think Cloufflare also has some kind of anti bot system, I never tried it.

RedBar

10:19 am on Apr 20, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They're all Indian sites

What do they gain from this? Are they using AdSense / advertising to generate earnings or is it something else?

martinibuster

4:29 pm on Apr 20, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



They're all Indian sites that scrape my content...


More than just Indian sites scraping your content. That's just the very teeny tiny tip of the bot iceberg.

BoredMeteor

6:25 pm on Apr 20, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



I recently noticed a fellow website in my niche scraping my feed and just republishing the posts immediately whenever I updated. Simple solution was to do a reverse IP lookup on their domain, then check my server logs, and I was able to pinpoint that, yes, they were accessing my feed every time I updated with a new post. So I just blocked their IP and moved on. Problem solved.

For human rewriting (what was it called, spinning?) you're just going to have to do the usual: Contact webmaster, contact host, DMCA, etc. It's a game of whack-a-mole that I grew bored of long ago.

What gets me these days are some of the larger websites vaguely rewriting and stealing ideas. Don't even get me started on YouTube.

NickMNS

6:33 pm on Apr 20, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I recently noticed a fellow website in my niche scraping my feed and just republishing the posts immediately whenever I updated. Simple solution was to do a reverse IP lookup on their domain, then check my server logs, and I was able to pinpoint that, yes, they were accessing my feed every time I updated with a new post. So I just blocked their IP and moved on. Problem solved.

Well you are fortunate that your competitor is a complete idiot. But I would suspect that most scrappers are not scrapping from the same servers that are serving the scrapped web pages.

jmccormac

6:47 pm on Apr 20, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Many may not be Indian sites as such but rather open proxies on Indian PCs or webservers.

Regards...jmcc

not2easy

7:20 pm on Apr 20, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The sites that end up with the copied content (or spun versions of it) are not necessarily the ones who are scraping your site(s). Scraping is an industry with customers who pay for their work. The places where you find it probably never scraped it and just happened to buy your content.

Using WordPress makes it easier to scrape via RSS though and there are reputable plugins to help you prevent that.