homepage Welcome to WebmasterWorld Guest from 54.161.197.188
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / WordPress
Forum Library, Charter, Moderators: lorax & rogerd

WordPress Forum

    
Scrapers Referral
scraping my hidden post
bongkph




msg:4682575
 4:01 am on Jun 25, 2014 (gmt 0)

I have done everything I can to hide my recent post including hiding the robots.txt and removing feeds. However, one Scraper still manages to copy my recent content.

Upon checking on my logs, one IP is suspicious. It is the only suspicious IP that pays a visit to my recent post. The referral on the logs states that it came from:

hxxp://www.google.nl/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDMQFjAC&url=http%3A%2F%2Fmydomain.com%2Fmyarticle%2F&ei=eSupU97bJYW0PP-KgPAD&usg=AFQjCNGrZVGBZfYcCpifxzarbSaKEq4P7w

What does it mean? Is he using Google tools to find my recent items?

Please help me.

 

not2easy




msg:4682577
 4:14 am on Jun 25, 2014 (gmt 0)

You shouldn't be hiding your robots.txt file, that file should be available for every request, it is how you tell robots where to crawl or not to crawl certain files or directories on your site.

Unfortunately the referer is not always accurate, it is one piece of information that is pretty easy to alter, to have it say anything they want so you might not rely on it as 100% accurate. Check the IP address of the visitor and look it up with a Whois search.

bongkph




msg:4682662
 10:02 am on Jun 25, 2014 (gmt 0)

Hi not2easy. Pardon me, sitemap is hidden and not the robot.txt file.

I have the IP of the offending party. I have blocked it but they can use other resources just to access the site.

Do you know other ways to see hidden posts on my site? As I have said, sitemap hidden, no post navigation, recent posts removed, upload folder serves blank page.

lorax




msg:4682670
 10:50 am on Jun 25, 2014 (gmt 0)

Are you referring to the results generated by wp_get_recent_posts() [codex.wordpress.org...] IF so, look for the code in your theme and comment it out - though this doesn't hide the recent posts - it disables them. Have you checked to see if your site still generates the recent posts in the RSS feed?

bongkph




msg:4682676
 11:20 am on Jun 25, 2014 (gmt 0)

Hi Lorax,

Yes, the offending party can still see my recent post, even though it is hidden.

I already removed any recent_post codes from the templates. RSS/ATOM feeds are also disabled.

not2easy




msg:4682830
 7:03 pm on Jun 25, 2014 (gmt 0)

When you create the post, it is marked as "Private" so only members with privileges can see it?

bongkph




msg:4683136
 12:03 am on Jun 27, 2014 (gmt 0)

Hi not2easy,

I am using a code embedded into functions.php to hide posts. It is achievable through custom fields. I am not using the Private option since it is an open site and normally visitors will arrive via search engine.

The bad thing is, the one who scrape my posts ranks higher than me.

not2easy




msg:4683139
 1:31 am on Jun 27, 2014 (gmt 0)

You would need to examine the custom coding you use to "hide" it to see why hiding isn't working. If that is the problem, I can't help with that.

Is it only supposed to be hidden under certain conditions? Have you examined your access logs to see what is really happening after you put up a new post? I mean, checking the IPs, not just referer or UA. Is this scraper relying on getting your content from search engines like Google or Bing? There may be better ways to prevent certain visitors from having access, at least on your domain.

Once your Post is published, it can be scraped. Finding out what happens after you post can give you better insight into how to deal with the problem.

tangor




msg:4683153
 3:33 am on Jun 27, 2014 (gmt 0)

The bad thing is, the one who scrape my posts ranks higher than me.

Who owns/runs the site? You or the other person who ranks higher?

Are you the webmaster, or just a poster or moderator/admin to the site?

lorax




msg:4683217
 11:33 am on Jun 27, 2014 (gmt 0)

I hate to say it bongkph, but you aren't likely to win this battle. Unless you make it private there's no way for you to know who to block as the scraper can easily switch IPs and change their footprint.

bongkph




msg:4683685
 1:03 am on Jun 29, 2014 (gmt 0)

Not2easy, the custom code works only to hide the post. To my findings, it is not hidden on Recent-Post, thus I have to remove the recent post also. If I will base on referrals, the scrapper is arriving on my site via Google. I don't know if there are tools on Google to see recent post on my site. Thanks for the advice. I will do thorough investigation regarding this issue.

Tangor, I am the webmaster of the site. The one who reproduce my post ranks higher than me. I believe it is normal with G. But, the other site never has any effort to do his own research. He lives by copying contents from site in the same niche. He's so lucky that G is favoring him.

Lorax, I have read so many forums and I believed that there is really no way to totally block the offender. Especially if he lives by scraping/reproducing someone else's content. He has many ways than me.

I know that once my post was published in public, so many things can be done to expose even hidden post on my site.

Thanks guys.

tangor




msg:4683687
 1:23 am on Jun 29, 2014 (gmt 0)

Okay, back up a bit. More info needed.

Your post is posted in what... a forum? Web Page? Blog? If so any of these what kind?

You are the webmaster and yet you have one above you/? Most webmasters would nuke that kind of privilege and get on with life.

You say he has a site scraping your stuff? Why not apply a DMCA against him or his host? That solves many things mighty quick.

And lastly, why post "hidden". If you don't want it seen then don't post it at all. Unless there's another reason which has not been shared.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / WordPress
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved