4 out of my 6 sites had all or part of their pages being hijacked. I agree with everyone that this is a problem that Google needs to fix.
But, I am happy to say that the Google Removal Tool worked great for me.
But, along the way I really came to realize how big this problem is. Someone above said that most people don't even realize their sites are falling victim. I really have to agree with that statement. I had no idea until I started digging into it.
Two issues with the removal tool. First it can't remove URLs that are now 404 errors, and since these Supplemental Listings are bizarrely hanging around for a year or more this means Google is very inappropriately "remembering" a redirect that is long gone.
Second, a related and fairly extensive problem I see is the amount of duplicate Supplemental listings for site.com/directory and site.com/directory/ No-slash versions of pages remain Suplemental or URL only for a long time, and it seems to me that when these exist it depresses the ranking of the page. Removing site.com/directory with the URL tool removes the real page. I did this anyway as an experiment to see how soon they would reappear after being crawled again and how they would rank, but after 48 hours the pages haven't reappeared yet.
On March 17 I used the removal tool to get rid of a URL that was hijacking my index page. (Read message #1.)
I just wanted to report that my site just came back to it's original position in nearly all of its money terms in the SERPs for the first time since Dec/Jan.
It's currently #6 out of 2 million results.
If I shout hurray I'll wake up my family so instead I'll just post here. (holding breath and hoping it stays)
Congrats, Idaho. Well done.
I'm truly happy for you! Here's hoping that it happens to some of us other unfortunates.
Would this be considered hijacking. I did a search to see if anyone hijacked my site and I found this page that quickly refreshed to my site. This is the link that I found in search;
So basically this blank page except for this url quickly refreshed to my site.
Here is what the source code said:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<title>Designerz Integrated Information Network Routing [mysite.com...]
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="refresh" content="0;URL=http://www.mysite.com/">
Designerz Integrated Information Network Routing <br> You will be transferred to 'http://www.mysite.com/' in 1 seconds. <br><br><a href='http://www.mysite.com/'>http://www.mysite.com/</a> <a href="http://widgets.com/?login=routdshb"
name="EXim" border="0" height="1" width="1"
var EXlogin='routdshb' // Login
var EXvsrv='s9' // VServer
"l="+escape(EXd.referrer)+"\" height=1 width=1>");//-->
</script><noscript><img height="1" width="1" alt=""
When you find the suspected hijacking link in the SERPs, does it have your page title and does it have exactly your page content in the cache?
"When you find the suspected hijacking link in the SERPs, does it have your page title and does it have exactly your page content in the cache?"
Yes Atticus after it refreshes to my page, but before it does it refreshes so fast, I can't catch the cache.
Then I believe that that page has been hijacked. And as it has a metarefresh of 0, you won't be able to remove it by methods discussed in this thread.
I am in a similar situation and I am aware of no solution at this time short of asking the offend site to remove the link.
"Then I believe that that page has been hijacked. And as it has a metarefresh of 0, you won't be able to remove it by methods discussed in this thread.
I am in a similar situation and I am aware of no solution at this time short of asking the offend site to remove the link."
I wonder if that is why I lost my ranking today for some of my top keywords?
Either I read your post too fast or you edited it after I read it. Anyway, if you can't see what's in the cache, I don't know if it's a 'hijack' as defined here. Does sound mighty suspicious, though.
i did edit it, so you must of missed it. I wonder what it is if it is not a hijack?
There's potentially more going on with this stuff than hijacking. As discussed elsewhere by thebear, this could be a case of domain poisoning. Theory is that the 'bad guy site' gets associated with your site in G's opinion and then G thinks that you are part of any 'bad neighborhoods' with which the bad guy is associated.
I have a fast meta refresh situation where a porno site is listed with it's correct URL and title, but the snippet comes from my page and includes my domain name. There's a fast meta refresh on the cache, and the actual site does not have my domain name in it, so I have no idea how this is happening.
It is important to make a distiction between hijacking -- when another site is showing up with your title and cached content -- and other possible penalities (such as domain posioning) which don't have the physical evidence of a Google error which the cache provides.
Ok, this looks promising. I sent an email to google asking them to remove my cached homepage that has been attributed to the hijacking link when I do a site:www.example.com.
They have removed the offending page and say this page will not return as a result for this query after the next crawl. Pretty quick response time too. 24 hours.
And I was able to nuke the other redirect using the google removal tool. So now all I can do is wait and hope that my site come back in the SERPS.
To which e-mail address did you send the request? Did you mention nuking the other links and did they say anything about it? Any other information on this as developments occur will be welcomed.
I made the request here Atticus...
so if you have the cache hijack problem this should solve it.
I didn't tell them about nuking the other link but I did try my luck and ask if this cache hijack would cause a duplicate content problem but they didn't respond to that at all.
But anyway, step in the right direction even though this is just treating the symptoms and not the disease itself.
>> Meta redirect (refresh time = 0)
It's important that we do not forget this for all the talk about 302's. The meta redirect can do exactly the same as the 302 redirect, and some hijackers even combine these two methods for one request (as Japanese originally pointed out). When combined, you can't remove the URL's with the remove tool (URL Console).
snowflake- I did a little checking
extemetr*cker - a java powered tracking system which checks browser queries, referrers and several other things. It logs into a secure network and reports the stats. That is what the Java code is all about. Seems harmless enough but their methods may inadvertently be hijacking pages in google because of the META refresh page.
Safaridude - congrats way to go
Clause - with the 0 refresh even a non-java browser will get o seconds to view anything. I found this cool tool called s*msp*de which has a 'one page at a time' 'code view' browser. it is for testing purposes.
It follows no orders, just reveals the source one page at a time. Very handy.
edited to cloak brand names *=a
|For the past 7 months I could not determine for the life of me why my website tanked to a PR 0 . After reading this forum, I found the culprit. A website offering partership links and promising increased traffic. I thought of it as a simple link exchange. Truth is this company uses some sort of CGI/Frames Link to do an absolute pull of the link partner's URL. It then puts its URL before the real URL and locks the original site "live" into its own webpage with the its title at the top and some advertising boxes. Once google caches this link, google assumes the original site content belongs to the scamming website and does a duplicate penalty on the real site. Now that my site is cached in google, I've tried all of googles tools to remove the cached link and I've even tried IP deny. Nothing stops this. Then I clicked on all the partner links and ALL of them are PR0. If anyone has any advise as to what I can do short of starting from scratch. Please let me know... Here is what the code looks like when I do a allinurl:mysite.com on google.|
When the link is clicked. Here is the source code, real user info has been changed so I could post this.
<FRAMESET ROWS="70,*" FRAMESPACING=0 FRAMEBORDER="0" BORDER="0">
<FRAME NAME="mem_top" SRC="/webmaster_top.html" FRAMESPACING=0 BORDER="0" MARGINHEIGHT="0" MARGINWIDTH="0" NORESIZE SCROLLING="NO">
<FRAME NAME="mem_body" SRC="http://www.mysite.com">
Thank you for visiting. We recommend using a frames compatible browser,
but you can view this document without frames by clicking
If anyone can give me advise as to how to eliminate this link from the google cache and from this Scamming website, please message me or reply in this forum.
Having (perhaps only temporarily :( ) sorted my 302 problems, I am looking at the meta-refresh problem
Do you think it is worth your while starting a new thread on it, so that it does not get buried in this thread? And we all can explore this problem there
>> this cool tool
- yeah, i've been using it for years. Especially before i learned *nix commands and started messing with other stuff than Windows it was invaluable. There's a built in spider as well ;-)
You can always do a "curl -i [example.com"...] at the command prompt if you run a *nix flavour OS (or have cygwin installed or whatever). Make that an "-I" to see the server headers only.
snowflake: I've seen that same site come up in the top spot for searches on our site name and inurl:mysite.com. However, I haven't observed any evidence of hijacking (e.g. our cache, site:mysite.com, etc.), nor does it seem to rank for any of our keywords other than our site name.
I suspect that site's strength on our site name is due to inurl and keyword density factors, not the meta refresh. It's behavior and ranking on our site name is very similar to that of a well-known site which frames their outgoing links to our site. The well-known site doesn't use a meta refresh, but its usage and density of our domain name is very similar to that of the site you pointed out, and they usually rank close together. I don't see evidence that either is hijacking our listings.
cornwall: it's probably worth another thread. meta refresh that's not coupled with a 302 seems to be a different issue, and certainly the solutions that have been discussed here don't apply.
There is more to this than a simple ad tracking type 302 redirect. IMO, it needs the refresh or a server side directive *and* a third site with the p.r... and that's probably more than I should say about it except that *yes* your actual page content is being spidered by the bot off your server and you see just a regular page code 200 visit from the bot. Last night I started to post what I thought was one way this is being done and then had to <self snip> it out. I snipped it out because right now deep down I think there are not that many webmasters doing this. Granted, the offenders that are doing it are doing so on a grand scale. Because, it takes no real content to do this with. Also, I think even *if* this were to become common knowledge it still could not be addressed by the bots. I don't think the bots believe that it's not a problem to address... just that there is no way around this for them at this time.
Put an absolute link to your index page http: //www.site.com/index.html from some or all of your internal site pages. I did this on directory level pages because of the "can't find my company by name" problem with the company name in anchor text and I *believe* the unforseen bonus was clearing out these redirect/refresh index mismatch sites.
So, what is your response to msg#205 in this thread?
Great now when I search www.mydomain.com I get mydomain.com thats how it looked for 3 month ago, I thought things looked a little better, but now it seems that everything is going back again.
zeus: have you configured a 301 redirect from domain.com to www.domain.com?
I apologize in advance for cross-posting. I noticed that the old thread was closed right after I posted there. Now I'm posting it what I hope is the correct location.
Yesterday, I cooked up an idea for a web server-based defense against this exploit and posted it to slashdot([slashdot.org ]) where it received no comments. I'm not sure if I should take this as a good sign (nobody found a serious flaw) or a bad one (nobody thought it worth discussing).
I'm considering recommending that my organization implement this, but am airing it out in public first to see if someone can find a flaw in it.
Proposed Defensive Solution
Robots that index pages for search engines may be tricked into believing that content from one site actually belongs to another. The sequence of events looks like this:
- The robot visits [badguy.xyz...]
- The web server at badguy.xyz responds with an HTTP 302 redirect that informs the robot that the content has been temporarily moved to [victim.xyz...]
- The robot dutifully follows the redirect to [victim.xyz...]
- The robot receives content from the web server at www.victim.xyz and indexes it. However, because it believes that the content has been moved only temporarily, it indexes it under the www.badguy.xyz domain instead of the www.victim.xyz domain.
- Some time later, a user hits the robot's search service (google in most examples) and types in some keywords that appear at [victim.xyz....] The search engine finds the keywords which it has indexed under www.badguy.xyz, so it returns a link to [badguy.xyz....]
- The user selects the link and is taken to the [badguy.xyz...] site where badguy has complete control over the content.
To protect against the scenario above, the administrator of victim.xyz can install a filter on her web server which will issue an HTTP 301 redirect back to itself if it thinks that the request might be the result of a malicious/erronious HTTP 302 redirect.
Here is how it works:
- The robot visits [badguy.xyz...]
- Badguy issues its 302 redirect as above
- The robot follows the redirect to [victim.xyz...]
- The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz.
- The filter determines if it has seen this particular web client recently. (This check could be as simple as scanning the last few lines of the Web server's access log.)
- If the filter has not seen this client (the robot) recently, it issues an HTTP 301 ("moved permanently") redirect pointing to [victim.xyz...]
- The robot follows the redirect to [victim.xyz...]
- The filter at victim.xyz intercepts the request. This time, it recognizes that it has seen the robot bofore and lets the request through normally.
- The robot receives Web content from the sever at victim.xyz and indexes it. Because it reached this site from a 301 (moved permanently)rather than a 302 (moved temporarily) redirect, it knows that the content belongs to victim.xyz rather than badguy.xyz and indexes it under victim.xyz. badguy.xyz never gets associated with the content.
Because a robot might be smart enough to recognized that it is being redirected back to the current page, it would probably be a good idea to obfuscate the http 301 redirect by rewriting the URL in a technically insignificant way. For example, "http://www.victim.xyz/" might be rewritten as "http://www.victim.xyz/?"
Exactly how this filter would be implemented depends on the Web server platform and possibly the requirements of the organization. For example, it could be implemented as an Apache httpd module, an IIS ISAPI filter (or whatever the .Net equivalent is. It's been a few years since I've worked with Microsoft products), or a servlet in a J2EE setup. In some cases, it could even be implemented in a more localized scope using globally included PHP or ASP scripts, although I think I'd steer away from this because of the performance penalty.
I'd greatly appreciate feedback.
"The filter determines if it has seen this particular web client recently."
This is where the problem lies. If GBot follows a regular link, and a second later hits it from a 302-redirect, you wouldn't know the diference.
Also, I'd like to thank WebmasterWorld community for making this issue SO PUBLIC, that now any idiot with basic HTML knowledge an a shovel can knock down other people's sites. Hope this doesn't violate TOS ;-) This should've been better off discussed in a Supporters forum.
>> This should've been better off discussed in a Supporters forum. <<
Too late. It is now mentioned on almost every SEO and webdesign forum, as well as on /. too.
>> The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz. <<
A web browser visits pages by going from one to the next by clicking on links, and may (it might not, as some people surf with referrers off) leave a referrer in your log (the referrer is the URL of the previous page that it was visiting, if that page linked to you). If someone typed the URL in, clicked on a bookmark, or has referrers off then you will not see a referrer. Don't confuse Referrer with User Agent. The User Agent part of the log entry says which browser and OS was used.
For search engines, they do not crawl the web going from one page to the next. They spider a page and add all links found on that page into a database. When they finish that page, they ask their own database for the URL of the next page to spider. It might be one on a different site! Multiple bots will adding to that database, and getting their next job from it, so you can have several bots from the same search engine on your site at the same time. Search engine bots leave User Agent information in your log, but they do NOT leave any referrer information, ever.
accidentalgeek - that is a great idea with only one flaw.
Googlebot goes to badguy.xyz and makes a list of links, noting the 302 redirect.
Another googlebot later visits the links reported by first googlebot indexing the content for badguy.xyz
It does not follow the 302 redirect it only records it, adding victim.xyz as a temporary location to fetch badguys content.
We went through this inside and out (see 700 post thread) there is no way to stop it
Google needs to fix it.
All we can do is site:mysite (or allinurl:) and keep getting rid of them until google fixes it.
anything at all in site:mysite should be removed.
allinurl: you need to make a judgement call wether this link is harming you or not. There are a lot of 302 links that will show up in allinurl: which are completely harmless good backlinks.
adding to that - I really frown on any type of redirect to mysite using a META refresh blank page. I think that might be the key to the hijack problem.
A normal tracking 302 just runs through a script. Someone clicks a link on notbadguy.xyz and the script takes the id# and replaces the url with your url and records the click. Nothing wrong there.
When the script sends user to a blank page with META refresh (without a noarchive) thats when google assigns victims content to that blank page.