Welcome to WebmasterWorld Guest from 54.226.241.8

Forum Moderators: rogerd & travelin cat

Message Too Old, No Replies

Scrapers Referral

scraping my hidden post

     

bongkph

4:01 am on Jun 25, 2014 (gmt 0)

5+ Year Member



I have done everything I can to hide my recent post including hiding the robots.txt and removing feeds. However, one Scraper still manages to copy my recent content.

Upon checking on my logs, one IP is suspicious. It is the only suspicious IP that pays a visit to my recent post. The referral on the logs states that it came from:

hxxp://www.google.nl/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDMQFjAC&url=http%3A%2F%2Fmydomain.com%2Fmyarticle%2F&ei=eSupU97bJYW0PP-KgPAD&usg=AFQjCNGrZVGBZfYcCpifxzarbSaKEq4P7w

What does it mean? Is he using Google tools to find my recent items?

Please help me.

not2easy

4:14 am on Jun 25, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



You shouldn't be hiding your robots.txt file, that file should be available for every request, it is how you tell robots where to crawl or not to crawl certain files or directories on your site.

Unfortunately the referer is not always accurate, it is one piece of information that is pretty easy to alter, to have it say anything they want so you might not rely on it as 100% accurate. Check the IP address of the visitor and look it up with a Whois search.

bongkph

10:02 am on Jun 25, 2014 (gmt 0)

5+ Year Member



Hi not2easy. Pardon me, sitemap is hidden and not the robot.txt file.

I have the IP of the offending party. I have blocked it but they can use other resources just to access the site.

Do you know other ways to see hidden posts on my site? As I have said, sitemap hidden, no post navigation, recent posts removed, upload folder serves blank page.

lorax

10:50 am on Jun 25, 2014 (gmt 0)

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Are you referring to the results generated by wp_get_recent_posts() [codex.wordpress.org...] IF so, look for the code in your theme and comment it out - though this doesn't hide the recent posts - it disables them. Have you checked to see if your site still generates the recent posts in the RSS feed?

bongkph

11:20 am on Jun 25, 2014 (gmt 0)

5+ Year Member



Hi Lorax,

Yes, the offending party can still see my recent post, even though it is hidden.

I already removed any recent_post codes from the templates. RSS/ATOM feeds are also disabled.

not2easy

7:03 pm on Jun 25, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



When you create the post, it is marked as "Private" so only members with privileges can see it?

bongkph

12:03 am on Jun 27, 2014 (gmt 0)

5+ Year Member



Hi not2easy,

I am using a code embedded into functions.php to hide posts. It is achievable through custom fields. I am not using the Private option since it is an open site and normally visitors will arrive via search engine.

The bad thing is, the one who scrape my posts ranks higher than me.

not2easy

1:31 am on Jun 27, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



You would need to examine the custom coding you use to "hide" it to see why hiding isn't working. If that is the problem, I can't help with that.

Is it only supposed to be hidden under certain conditions? Have you examined your access logs to see what is really happening after you put up a new post? I mean, checking the IPs, not just referer or UA. Is this scraper relying on getting your content from search engines like Google or Bing? There may be better ways to prevent certain visitors from having access, at least on your domain.

Once your Post is published, it can be scraped. Finding out what happens after you post can give you better insight into how to deal with the problem.

tangor

3:33 am on Jun 27, 2014 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



The bad thing is, the one who scrape my posts ranks higher than me.

Who owns/runs the site? You or the other person who ranks higher?

Are you the webmaster, or just a poster or moderator/admin to the site?

lorax

11:33 am on Jun 27, 2014 (gmt 0)

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I hate to say it bongkph, but you aren't likely to win this battle. Unless you make it private there's no way for you to know who to block as the scraper can easily switch IPs and change their footprint.

bongkph

1:03 am on Jun 29, 2014 (gmt 0)

5+ Year Member



Not2easy, the custom code works only to hide the post. To my findings, it is not hidden on Recent-Post, thus I have to remove the recent post also. If I will base on referrals, the scrapper is arriving on my site via Google. I don't know if there are tools on Google to see recent post on my site. Thanks for the advice. I will do thorough investigation regarding this issue.

Tangor, I am the webmaster of the site. The one who reproduce my post ranks higher than me. I believe it is normal with G. But, the other site never has any effort to do his own research. He lives by copying contents from site in the same niche. He's so lucky that G is favoring him.

Lorax, I have read so many forums and I believed that there is really no way to totally block the offender. Especially if he lives by scraping/reproducing someone else's content. He has many ways than me.

I know that once my post was published in public, so many things can be done to expose even hidden post on my site.

Thanks guys.

tangor

1:23 am on Jun 29, 2014 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Okay, back up a bit. More info needed.

Your post is posted in what... a forum? Web Page? Blog? If so any of these what kind?

You are the webmaster and yet you have one above you/? Most webmasters would nuke that kind of privilege and get on with life.

You say he has a site scraping your stuff? Why not apply a DMCA against him or his host? That solves many things mighty quick.

And lastly, why post "hidden". If you don't want it seen then don't post it at all. Unless there's another reason which has not been shared.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month