Forum Moderators: Robert Charlton & goodroi
The best way in my opinion to try to attempt to stop a site beeing hijacked is to ensure pages are only served one way by dealing with any canonical issues. There some great advice on this from matt cutts
[mattcutts.com...] Basicly redirect non www. to www and "/index.html" to "/".
What is a hijack?
"A page hijack is a technique exploiting the way search engines interpret certain commands that a web server can send to a visitor. In essence, it allows a hijacking website to replace pages belonging to target websites in the Search Engine Results Pages ("SERPs"). "
from [clsc.net...]
How can I tell if I have been hijacked?
Use googlesitemap creator to crawl your site. You should then make a note of the number of pages on your site. Now run site:www.yoursite.com on google. If the number of pages in google is vastly higher than the actual number of pages on your site you "may" have some hijacked content.
How can I find who's hijacking me?
You can use your logs to do this. Because a hijack relies on mirroring your pages they will start comming up as a refferer to your site. So have a good look at the people reffering traffic to your site and compare them to a few months ago. Now check out the new sites referring by visiting each one. A hijacker site in my experience often use's a numeric or gibberish domain name and typically run with extension of .com and .net . Also look for adult content sites reffering and treat them as suspect.
Ok I have my list What Do I do Next?
Report them. Explain what tests you have made and list the refferers you find suspicious to [google.com...]
The good news is its very easy to find and very easy to get rid of.
To find search on google "unique text" on your site start with your homepage and use "google alerts" as previously mentioned.
To cure block the site url and ip in .htaccess as previously described and most important of all fill out a spam report on google.
[google.com...]
They really do take notice.....
Ok new years resolution , stop bashing G so much, there not all bad.
Important to note this type of hijack will not show as a refferer in your logs
[edited by: Pirates at 1:14 am (utc) on Dec. 19, 2006]
Can you provide a definition of hijacking to understand you better? Also please tell us if your site has been attacked or not.
In your definition of hijack in your initial post you define it as:
"A page hijack is a technique exploiting the way search engines interpret certain commands that a web server can send to a visitor. In essence, it allows a hijacking website to replace pages belonging to target websites in the Search Engine Results Pages ("SERPs"). " from that page..
That page does not speak about scrapping (e.g.: copying a site A's page and presenting it on the hijacking domain)
It speaks about 302 redirects and the problem Google has identifying the original (source) from the copied (hijacker from what you mention on your other posts, scrapper according to what I know).
The 302 issue does not have anything to do with scrapping (copying parts of your site or parts of a page on your site).
Basically most attacks on my sites using 302s are coming from v1agra sites and the like... thousands of them... They are all identical and feature "sponsored listings" of various products, only that in some cases instead of linking to the product pages on v1agra sites, they 302 redirect to my "blue widgets" site...
So googlebot visits there, follows the link to the product and it does a 302 to my site... It is trying to fool googlebot into thinking that the hijacker is the original site and that he has moved it TEMPORARILY to my site.
For that you need the base href element in ALL your pages. Then it can hurt no more with google.
But scrapping is a different story. And the most effective way (if you have the time) is to file a DMCA report at google:
[google.com...]
Pirates, I am a bit confused I must say as to what you are trying to say here...Can you provide a definition of hijacking to understand you better? Also please tell us if your site has been attacked or not.
OK first, my sites have probably been attacked but not successfully, I am working on clients sites so far have 99% success rate. Now looking like 100% but waiting till after xmass to confirm.
In your definition of hijack in your initial post you define it as:"A page hijack is a technique exploiting the way search engines interpret certain commands that a web server can send to a visitor. In essence, it allows a hijacking website to replace pages belonging to target websites in the Search Engine Results Pages ("SERPs"). " from that page..That page does not speak about scrapping (e.g.: copying a site A's page and presenting it on the hijacking domain)
Actually I was just quoting there. I wanted to do something that was comprehensive on hijacking so it expanded from initiall post and to me scrapers and hijackers come from the same pool of people online so thought why not include those feckers as well in htaccess file.
It speaks about 302 redirects and the problem Google has identifying the original (source) from the copied (hijacker from what you mention on your other posts, scrapper according to what I know).The 302 issue does not have anything to do with scrapping (copying parts of your site or parts of a page on your site).
302 hijack is just one type of hijack. There are many. A well known affiliate portal that I phoned this week but will not name as they are co-operating admitted that some of there affiliates hijack and scrape content. Here's a post that deserves an answer regarding proxy hijacks
[webmasterworld.com...]
Basically most attacks on my sites using 302s are coming from v1agra sites and the like... thousands of them... They are all identical and feature "sponsored listings" of various products, only that in some cases instead of linking to the product pages on v1agra sites, they 302 redirect to my "blue widgets" site... So googlebot visits there, follows the link to the product and it does a 302 to my site... It is trying to fool googlebot into thinking that the hijacker is the original site and that he has moved it TEMPORARILY to my site.For that you need the base href element in ALL your pages. Then it can hurt no more with google.
Sounds like you have this under control.
But scrapping is a different story. And the most effective way (if you have the time) is to file a DMCA report at google:http://www.google.com/dmca.html
As said before you can block most common methods in .htaccess and if you know the site scrapping block ip and site but regards reporting to [google.com...] , that would be the icing on the cake :)
[edited by: Pirates at 3:51 am (utc) on Dec. 19, 2006]
Hey Walkot, the first one is OK - the full path of the web page - but note also that you should include any variable (if variables there are)
<base href="http://www.mydomain.com/info/about-widgets.asp?prod=1253&ord=1253">
I can't tell you how to do this in ASP :(
But for those using PHP, you can use:
echo '<base href="http://'.$_SERVER['HTTP_HOST'].''.$_SERVER['REQUEST_URI'].'">';
Query on google to find hijacked content for your site
inurl:www.yoursite.com -site:www.yoursite.com
I've had a look at your site using Hitwise.
For the 4 weeks ending 7th Jan 2006 you had 2114 different search terms used to find your site.
For the 4 Weeks ending 30th Dec 2006 you only had 158 search terms.
However, your site has lost traffic in three stages since early March 2006. Then again mid-June and traffic started to come back slightly until the last week in November then is started dropping like a stone - until the last couple of days.
So looking at this, is there a possibility that the earlier falls were as a result of internal duplicate content which could be seen to be getting greater over time.
I have no doubt that external duplicate could be a major factor. But I think you should also look at internal.