Hijacking - Some Advice for Webmasters - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Hijacking - Some Advice for Webmasters

Some advice on google hijacking,how to find, what to do etc..

«
1
2

Pirates

10:50 am on Dec 11, 2006 (gmt 0)

Protecting yourself against hijacks.?

The best way in my opinion to try to attempt to stop a site beeing hijacked is to ensure pages are only served one way by dealing with any canonical issues. There some great advice on this from matt cutts
[mattcutts.com...] Basicly redirect non www. to www and "/index.html" to "/".

What is a hijack?

"A page hijack is a technique exploiting the way search engines interpret certain commands that a web server can send to a visitor. In essence, it allows a hijacking website to replace pages belonging to target websites in the Search Engine Results Pages ("SERPs"). "
from [clsc.net...]

How can I tell if I have been hijacked?

Use googlesitemap creator to crawl your site. You should then make a note of the number of pages on your site. Now run site:www.yoursite.com on google. If the number of pages in google is vastly higher than the actual number of pages on your site you "may" have some hijacked content.

How can I find who's hijacking me?

You can use your logs to do this. Because a hijack relies on mirroring your pages they will start comming up as a refferer to your site. So have a good look at the people reffering traffic to your site and compare them to a few months ago. Now check out the new sites referring by visiting each one. A hijacker site in my experience often use's a numeric or gibberish domain name and typically run with extension of .com and .net . Also look for adult content sites reffering and treat them as suspect.

Ok I have my list What Do I do Next?

Report them. Explain what tests you have made and list the refferers you find suspicious to [google.com...]

tedster

5:37 pm on Dec 18, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Showing the page's full URL is the best practice for the base element. It's what you see in the W3C examples and the most aligned to the definition of the base element.

Pirates

1:11 am on Dec 19, 2006 (gmt 0)

Update on hijacking. The main problem at the moment appears to be a proxy hijack. This is one where a permanent connection to your site is made and it servers a copy of your page.

The good news is its very easy to find and very easy to get rid of.

To find search on google "unique text" on your site start with your homepage and use "google alerts" as previously mentioned.

To cure block the site url and ip in .htaccess as previously described and most important of all fill out a spam report on google.

[google.com...]

They really do take notice.....
Ok new years resolution , stop bashing G so much, there not all bad.

Important to note this type of hijack will not show as a refferer in your logs

[edited by: Pirates at 1:14 am (utc) on Dec. 19, 2006]

mcskoufis

1:46 am on Dec 19, 2006 (gmt 0)

10+ Year Member

Pirates, I am a bit confused I must say as to what you are trying to say here...

Can you provide a definition of hijacking to understand you better? Also please tell us if your site has been attacked or not.

In your definition of hijack in your initial post you define it as:

"A page hijack is a technique exploiting the way search engines interpret certain commands that a web server can send to a visitor. In essence, it allows a hijacking website to replace pages belonging to target websites in the Search Engine Results Pages ("SERPs"). " from that page..

That page does not speak about scrapping (e.g.: copying a site A's page and presenting it on the hijacking domain)

It speaks about 302 redirects and the problem Google has identifying the original (source) from the copied (hijacker from what you mention on your other posts, scrapper according to what I know).

The 302 issue does not have anything to do with scrapping (copying parts of your site or parts of a page on your site).

Basically most attacks on my sites using 302s are coming from v1agra sites and the like... thousands of them... They are all identical and feature "sponsored listings" of various products, only that in some cases instead of linking to the product pages on v1agra sites, they 302 redirect to my "blue widgets" site...

So googlebot visits there, follows the link to the product and it does a 302 to my site... It is trying to fool googlebot into thinking that the hijacker is the original site and that he has moved it TEMPORARILY to my site.

For that you need the base href element in ALL your pages. Then it can hurt no more with google.

But scrapping is a different story. And the most effective way (if you have the time) is to file a DMCA report at google:

[google.com...]

Pirates

3:36 am on Dec 19, 2006 (gmt 0)

Pirates, I am a bit confused I must say as to what you are trying to say here...Can you provide a definition of hijacking to understand you better? Also please tell us if your site has been attacked or not.

OK first, my sites have probably been attacked but not successfully, I am working on clients sites so far have 99% success rate. Now looking like 100% but waiting till after xmass to confirm.

In your definition of hijack in your initial post you define it as:"A page hijack is a technique exploiting the way search engines interpret certain commands that a web server can send to a visitor. In essence, it allows a hijacking website to replace pages belonging to target websites in the Search Engine Results Pages ("SERPs"). " from that page..That page does not speak about scrapping (e.g.: copying a site A's page and presenting it on the hijacking domain)

Actually I was just quoting there. I wanted to do something that was comprehensive on hijacking so it expanded from initiall post and to me scrapers and hijackers come from the same pool of people online so thought why not include those feckers as well in htaccess file.

It speaks about 302 redirects and the problem Google has identifying the original (source) from the copied (hijacker from what you mention on your other posts, scrapper according to what I know).The 302 issue does not have anything to do with scrapping (copying parts of your site or parts of a page on your site).

302 hijack is just one type of hijack. There are many. A well known affiliate portal that I phoned this week but will not name as they are co-operating admitted that some of there affiliates hijack and scrape content. Here's a post that deserves an answer regarding proxy hijacks
[webmasterworld.com...]

Basically most attacks on my sites using 302s are coming from v1agra sites and the like... thousands of them... They are all identical and feature "sponsored listings" of various products, only that in some cases instead of linking to the product pages on v1agra sites, they 302 redirect to my "blue widgets" site... So googlebot visits there, follows the link to the product and it does a 302 to my site... It is trying to fool googlebot into thinking that the hijacker is the original site and that he has moved it TEMPORARILY to my site.For that you need the base href element in ALL your pages. Then it can hurt no more with google.

Sounds like you have this under control.

But scrapping is a different story. And the most effective way (if you have the time) is to file a DMCA report at google:http://www.google.com/dmca.html

As said before you can block most common methods in .htaccess and if you know the site scrapping block ip and site but regards reporting to [google.com...] , that would be the icing on the cake :)

[edited by: Pirates at 3:51 am (utc) on Dec. 19, 2006]

Pirates

4:13 am on Dec 19, 2006 (gmt 0)

Ok lets grill a hijacker. I have the phone number of a notorious hijacker and would like to grill them and then attach an mp3 of conversation online. Is this possible to do tedster?

tomda

5:44 am on Dec 19, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Tedster's remark is very important otherwise you'll get funny surprises using base href that doesn't equal to the full path of the page.

Hey Walkot, the first one is OK - the full path of the web page - but note also that you should include any variable (if variables there are)
<base href="http://www.mydomain.com/info/about-widgets.asp?prod=1253&ord=1253">

I can't tell you how to do this in ASP :(
But for those using PHP, you can use:

echo '<base href="http://'.$_SERVER['HTTP_HOST'].''.$_SERVER['REQUEST_URI'].'">';

wrkalot

12:23 pm on Dec 19, 2006 (gmt 0)

10+ Year Member

tomda and tedster: Thanks for the comments. I'll figure out how to grab the variables :)

Pirates

2:07 am on Dec 22, 2006 (gmt 0)

Its looking to me like 302 hijacks are cured on google and other methods of hijacking from way back when they are resorting to "Hello Thailand" and are having a temporary effect on google listings but a permenant effect for regional listings for sites using non country specific domains.

Pirates

4:44 am on Jan 1, 2007 (gmt 0)

Update on hijacks...........

Query on google to find hijacked content for your site

inurl:www.yoursite.com -site:www.yoursite.com

ageingguy

10:57 am on Jan 4, 2007 (gmt 0)

10+ Year Member

RichTC

I've had a look at your site using Hitwise.

For the 4 weeks ending 7th Jan 2006 you had 2114 different search terms used to find your site.

For the 4 Weeks ending 30th Dec 2006 you only had 158 search terms.

However, your site has lost traffic in three stages since early March 2006. Then again mid-June and traffic started to come back slightly until the last week in November then is started dropping like a stone - until the last couple of days.

So looking at this, is there a possibility that the earlier falls were as a result of internal duplicate content which could be seen to be getting greater over time.

I have no doubt that external duplicate could be a major factor. But I think you should also look at internal.

theBear

1:40 pm on Jan 4, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

ageingguy,

You are correct from what I've seen the most damaging form of duplicate content is that which is on the site's domain.

pirates,

That search you are touting will at best give some candidates, it won't be 100 correct nor will it be a complete list.

This 41 message thread spans 2 pages: 41

«
1
2