homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 277 message thread spans 10 pages: < < 277 ( 1 2 3 4 5 6 7 [8] 9 10 > >     
How to Remove Hijacker Page Using Google Removal Tool
8,058,044,651 page indexed (now minus 1)

 6:19 pm on Mar 17, 2005 (gmt 0)

Continued from: [webmasterworld.com...]

With the help of posts from crobb305 and others, I was able to remove a hijacker's page from the Google index.

My site was doing very well in the SERPs. For over 2 years it had been on the first page for a competitive term (1.2 million listings). Then during the first week in January my site disappeared and traffic tanked for no obvious reason.

When searching for "site:www.mydomain.com" I noticed that my index page often wasn't listed or it appeared on about page 3 or 4 of the results after all my supplimental pages.

A search for "allinurl:mysite.com" often didn't show my index page at all but instead showed somebody else's domain (located in Turkey). When I clicked on this link, my site came up. When I clicked on the cached version of the site, it showed a very old cache of the page. This same site also showed up after all my results when doing a "site:www.mydomain.com"

Using a header checker tool on the site's URL I was able to see it was using a 302 link to my site.

Last night after reading some posts by crobb305 and others I went to Google.com and clicked on "About Google." Then I clicked on "Webmaster Info." Then I clicked on "I need my site information removed." Then I clicked on "remove individual pages." Where I found instructions on how to remove the page.

(Here's the exact page where I ended up. If mod needs to remove then snip away:) [google.com...]

I then clicked on the "urgent" link.

1. I signed up for an account with Google and replied back to them from an email they sent me;
2. I added the "noindex" meta tag according to their instructions and uploaded it to my site;
3. Using the instructions to remove a single page from the Google index, I added the hijacker's URL that was pointing to my site. (copy and paste from the result found on "allinurl" search)

This didn't work the first time because I had to remove a space from the url to get it to work.

4. I got a message back saying that the request would be taken care of within 24 hours. The URL that I entered showed on the uppper right hand part of the screen saying "removal of (hijacker's url)pending."
5. I then removed the "noindex" meta tag from my page and re-uploaded it to my site.

This morning the google account still shows the url removal as "pending" but when I do "site:" and "allinurl" searches the offending URL is gone and my index URL is back.

Conclusions and Speculations:
At some point last September, Google cached the hijack page's url pointing to my site. In January, Google penalized my site for duplicate content because it found both URL's and compared them. Mine got penalized because it was the only page that really existed. The hijacker's page didn't get penalized because it only existed as a re-direct to my site.

Because my index page was now penalized, it dropped almost completely from the SERPs. (Some of my suppliement pages showed up for obscure searches) but none of my money terms.

Because I haven't been able to get a response from the hijacker's webmaster, the 302 is still in place but it is buried deep in his site and the last Google cache of the page was sometime in September. Therefore with some luck Google won't re-index it any time soon.

Will my site return to the SERPs? I don't know. Any thoughts?



 6:07 am on Mar 24, 2005 (gmt 0)

4 out of my 6 sites had all or part of their pages being hijacked. I agree with everyone that this is a problem that Google needs to fix.

But, I am happy to say that the Google Removal Tool worked great for me.

But, along the way I really came to realize how big this problem is. Someone above said that most people don't even realize their sites are falling victim. I really have to agree with that statement. I had no idea until I started digging into it.


 6:28 am on Mar 24, 2005 (gmt 0)

Two issues with the removal tool. First it can't remove URLs that are now 404 errors, and since these Supplemental Listings are bizarrely hanging around for a year or more this means Google is very inappropriately "remembering" a redirect that is long gone.

Second, a related and fairly extensive problem I see is the amount of duplicate Supplemental listings for site.com/directory and site.com/directory/ No-slash versions of pages remain Suplemental or URL only for a long time, and it seems to me that when these exist it depresses the ranking of the page. Removing site.com/directory with the URL tool removes the real page. I did this anyway as an experiment to see how soon they would reappear after being crawled again and how they would rank, but after 48 hours the pages haven't reappeared yet.


 7:33 am on Mar 24, 2005 (gmt 0)

On March 17 I used the removal tool to get rid of a URL that was hijacking my index page. (Read message #1.)

I just wanted to report that my site just came back to it's original position in nearly all of its money terms in the SERPs for the first time since Dec/Jan.

It's currently #6 out of 2 million results.

If I shout hurray I'll wake up my family so instead I'll just post here. (holding breath and hoping it stays)



 7:49 am on Mar 24, 2005 (gmt 0)

Congrats, Idaho. Well done.


 8:07 am on Mar 24, 2005 (gmt 0)

I'm truly happy for you! Here's hoping that it happens to some of us other unfortunates.


 8:30 am on Mar 24, 2005 (gmt 0)

Would this be considered hijacking. I did a search to see if anyone hijacked my site and I found this page that quickly refreshed to my site. This is the link that I found in search;

So basically this blank page except for this url quickly refreshed to my site.

Here is what the source code said:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<title>Designerz Integrated Information Network Routing [mysite.com...]
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="refresh" content="0;URL=http://www.mysite.com/">


Designerz Integrated Information Network Routing <br> You will be transferred to 'http://www.mysite.com/' in 1 seconds. <br><br><a href='http://www.mysite.com/'>http://www.mysite.com/</a> [2]<a href="http://widgets.com/?login=routdshb"
target="_top"><img src="http://t1.widgets.com/i.gif"
name="EXim" border="0" height="1" width="1"
alt="eXTReMe Tracker"></img></a>
<script type="text/javascript" language="javascript1.2"><!--
</script><script type="text/javascript"><!--
var EXlogin='routdshb' // Login
var EXvsrv='s9' // VServer
EXd.write("<img src=\"http://e0.extreme-dm.com",
"l="+escape(EXd.referrer)+"\" height=1 width=1>");//-->
</script><noscript><img height="1" width="1" alt=""


 8:34 am on Mar 24, 2005 (gmt 0)


When you find the suspected hijacking link in the SERPs, does it have your page title and does it have exactly your page content in the cache?


 8:41 am on Mar 24, 2005 (gmt 0)

"When you find the suspected hijacking link in the SERPs, does it have your page title and does it have exactly your page content in the cache?"

Yes Atticus after it refreshes to my page, but before it does it refreshes so fast, I can't catch the cache.


 8:45 am on Mar 24, 2005 (gmt 0)


Then I believe that that page has been hijacked. And as it has a metarefresh of 0, you won't be able to remove it by methods discussed in this thread.

I am in a similar situation and I am aware of no solution at this time short of asking the offend site to remove the link.


 8:50 am on Mar 24, 2005 (gmt 0)

"Then I believe that that page has been hijacked. And as it has a metarefresh of 0, you won't be able to remove it by methods discussed in this thread.

I am in a similar situation and I am aware of no solution at this time short of asking the offend site to remove the link."

I wonder if that is why I lost my ranking today for some of my top keywords?


 8:50 am on Mar 24, 2005 (gmt 0)


Either I read your post too fast or you edited it after I read it. Anyway, if you can't see what's in the cache, I don't know if it's a 'hijack' as defined here. Does sound mighty suspicious, though.


 8:56 am on Mar 24, 2005 (gmt 0)

i did edit it, so you must of missed it. I wonder what it is if it is not a hijack?


 9:08 am on Mar 24, 2005 (gmt 0)


There's potentially more going on with this stuff than hijacking. As discussed elsewhere by thebear, this could be a case of domain poisoning. Theory is that the 'bad guy site' gets associated with your site in G's opinion and then G thinks that you are part of any 'bad neighborhoods' with which the bad guy is associated.

I have a fast meta refresh situation where a porno site is listed with it's correct URL and title, but the snippet comes from my page and includes my domain name. There's a fast meta refresh on the cache, and the actual site does not have my domain name in it, so I have no idea how this is happening.

It is important to make a distiction between hijacking -- when another site is showing up with your title and cached content -- and other possible penalities (such as domain posioning) which don't have the physical evidence of a Google error which the cache provides.

Good luck.


 9:14 am on Mar 24, 2005 (gmt 0)

Ok, this looks promising. I sent an email to google asking them to remove my cached homepage that has been attributed to the hijacking link when I do a site:www.example.com.

They have removed the offending page and say this page will not return as a result for this query after the next crawl. Pretty quick response time too. 24 hours.

And I was able to nuke the other redirect using the google removal tool. So now all I can do is wait and hope that my site come back in the SERPS.


 9:31 am on Mar 24, 2005 (gmt 0)


To which e-mail address did you send the request? Did you mention nuking the other links and did they say anything about it? Any other information on this as developments occur will be welcomed.


 10:09 am on Mar 24, 2005 (gmt 0)

I made the request here Atticus...


so if you have the cache hijack problem this should solve it.

I didn't tell them about nuking the other link but I did try my luck and ask if this cache hijack would cause a duplicate content problem but they didn't respond to that at all.

But anyway, step in the right direction even though this is just treating the symptoms and not the disease itself.


 10:36 am on Mar 24, 2005 (gmt 0)

>> Meta redirect (refresh time = 0)

It's important that we do not forget this for all the talk about 302's. The meta redirect can do exactly the same as the 302 redirect, and some hijackers even combine these two methods for one request (as Japanese originally pointed out). When combined, you can't remove the URL's with the remove tool (URL Console).

AFAIK, if you turn javascript off in your browser you will be able to see the page with the meta refresh (also the Google cache version). Somebody please correct me if i'm wrong here.


 10:55 am on Mar 24, 2005 (gmt 0)

snowflake- I did a little checking
extemetr*cker - a java powered tracking system which checks browser queries, referrers and several other things. It logs into a secure network and reports the stats. That is what the Java code is all about. Seems harmless enough but their methods may inadvertently be hijacking pages in google because of the META refresh page.

Safaridude - congrats way to go

Clause - with the 0 refresh even a non-java browser will get o seconds to view anything. I found this cool tool called s*msp*de which has a 'one page at a time' 'code view' browser. it is for testing purposes.
It follows no orders, just reveals the source one page at a time. Very handy.

edited to cloak brand names *=a


 11:02 am on Mar 24, 2005 (gmt 0)
For the past 7 months I could not determine for the life of me why my website tanked to a PR 0 . After reading this forum, I found the culprit. A website offering partership links and promising increased traffic. I thought of it as a simple link exchange. Truth is this company uses some sort of CGI/Frames Link to do an absolute pull of the link partner's URL. It then puts its URL before the real URL and locks the original site "live" into its own webpage with the its title at the top and some advertising boxes. Once google caches this link, google assumes the original site content belongs to the scamming website and does a duplicate penalty on the real site. Now that my site is cached in google, I've tried all of googles tools to remove the cached link and I've even tried IP deny. Nothing stops this. Then I clicked on all the partner links and ALL of them are PR0. If anyone has any advise as to what I can do short of starting from scratch. Please let me know... Here is what the code looks like when I do a allinurl:mysite.com on google.


When the link is clicked. Here is the source code, real user info has been changed so I could post this.

<TITLE>Scamming Site</TITLE>
<FRAME NAME="mem_body" SRC="http://www.mysite.com">
<P ALIGN=center>
Thank you for visiting. We recommend using a frames compatible browser,
but you can view this document without frames by clicking
<A HREF="http://www.MYSITE.com">here</A></P>

If anyone can give me advise as to how to eliminate this link from the google cache and from this Scamming website, please message me or reply in this forum.



 12:05 pm on Mar 24, 2005 (gmt 0)


AFAIK, if you turn javascript off in your browser you will be able to see the page with the meta refresh (also the Google cache version). Somebody please correct me if i'm wrong here.

Having (perhaps only temporarily :( ) sorted my 302 problems, I am looking at the meta-refresh problem

Do you think it is worth your while starting a new thread on it, so that it does not get buried in this thread? And we all can explore this problem there


 2:40 pm on Mar 24, 2005 (gmt 0)

>> this cool tool

- yeah, i've been using it for years. Especially before i learned *nix commands and started messing with other stuff than Windows it was invaluable. There's a built in spider as well ;-)

You can always do a "curl -i [example.com"...] at the command prompt if you run a *nix flavour OS (or have cygwin installed or whatever). Make that an "-I" to see the server headers only.


 2:51 pm on Mar 24, 2005 (gmt 0)

snowflake: I've seen that same site come up in the top spot for searches on our site name and inurl:mysite.com. However, I haven't observed any evidence of hijacking (e.g. our cache, site:mysite.com, etc.), nor does it seem to rank for any of our keywords other than our site name.

I suspect that site's strength on our site name is due to inurl and keyword density factors, not the meta refresh. It's behavior and ranking on our site name is very similar to that of a well-known site which frames their outgoing links to our site. The well-known site doesn't use a meta refresh, but its usage and density of our domain name is very similar to that of the site you pointed out, and they usually rank close together. I don't see evidence that either is hijacking our listings.

cornwall: it's probably worth another thread. meta refresh that's not coupled with a 302 seems to be a different issue, and certainly the solutions that have been discussed here don't apply.


 3:19 pm on Mar 24, 2005 (gmt 0)

There is more to this than a simple ad tracking type 302 redirect. IMO, it needs the refresh or a server side directive *and* a third site with the p.r... and that's probably more than I should say about it except that *yes* your actual page content is being spidered by the bot off your server and you see just a regular page code 200 visit from the bot. Last night I started to post what I thought was one way this is being done and then had to <self snip> it out. I snipped it out because right now deep down I think there are not that many webmasters doing this. Granted, the offenders that are doing it are doing so on a grand scale. Because, it takes no real content to do this with. Also, I think even *if* this were to become common knowledge it still could not be addressed by the bots. I don't think the bots believe that it's not a problem to address... just that there is no way around this for them at this time.

Put an absolute link to your index page http: //www.site.com/index.html from some or all of your internal site pages. I did this on directory level pages because of the "can't find my company by name" problem with the company name in anchor text and I *believe* the unforseen bonus was clearing out these redirect/refresh index mismatch sites.


 5:37 pm on Mar 24, 2005 (gmt 0)

So, what is your response to msg#205 in this thread?


 6:48 pm on Mar 24, 2005 (gmt 0)

Great now when I search www.mydomain.com I get mydomain.com thats how it looked for 3 month ago, I thought things looked a little better, but now it seems that everything is going back again.


 7:01 pm on Mar 24, 2005 (gmt 0)

zeus: have you configured a 301 redirect from domain.com to www.domain.com?


 7:17 pm on Mar 24, 2005 (gmt 0)

I apologize in advance for cross-posting. I noticed that the old thread was closed right after I posted there. Now I'm posting it what I hope is the correct location.

Yesterday, I cooked up an idea for a web server-based defense against this exploit and posted it to slashdot([slashdot.org ]) where it received no comments. I'm not sure if I should take this as a good sign (nobody found a serious flaw) or a bad one (nobody thought it worth discussing).

I'm considering recommending that my organization implement this, but am airing it out in public first to see if someone can find a flaw in it.

Proposed Defensive Solution

Problem Statement
Robots that index pages for search engines may be tricked into believing that content from one site actually belongs to another. The sequence of events looks like this:

  1. The robot visits [badguy.xyz...]
  2. The web server at badguy.xyz responds with an HTTP 302 redirect that informs the robot that the content has been temporarily moved to [victim.xyz...]
  3. The robot dutifully follows the redirect to [victim.xyz...]
  4. The robot receives content from the web server at www.victim.xyz and indexes it. However, because it believes that the content has been moved only temporarily, it indexes it under the www.badguy.xyz domain instead of the www.victim.xyz domain.
  5. Some time later, a user hits the robot's search service (google in most examples) and types in some keywords that appear at [victim.xyz....] The search engine finds the keywords which it has indexed under www.badguy.xyz, so it returns a link to [badguy.xyz....]
  6. The user selects the link and is taken to the [badguy.xyz...] site where badguy has complete control over the content.

Proposed Defense
To protect against the scenario above, the administrator of victim.xyz can install a filter on her web server which will issue an HTTP 301 redirect back to itself if it thinks that the request might be the result of a malicious/erronious HTTP 302 redirect.

Here is how it works:

  1. The robot visits [badguy.xyz...]
  2. Badguy issues its 302 redirect as above
  3. The robot follows the redirect to [victim.xyz...]
  4. The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz.
  5. The filter determines if it has seen this particular web client recently. (This check could be as simple as scanning the last few lines of the Web server's access log.)
  6. If the filter has not seen this client (the robot) recently, it issues an HTTP 301 ("moved permanently") redirect pointing to [victim.xyz...]
  7. The robot follows the redirect to [victim.xyz...]
  8. The filter at victim.xyz intercepts the request. This time, it recognizes that it has seen the robot bofore and lets the request through normally.
  9. The robot receives Web content from the sever at victim.xyz and indexes it. Because it reached this site from a 301 (moved permanently)rather than a 302 (moved temporarily) redirect, it knows that the content belongs to victim.xyz rather than badguy.xyz and indexes it under victim.xyz. badguy.xyz never gets associated with the content.

Because a robot might be smart enough to recognized that it is being redirected back to the current page, it would probably be a good idea to obfuscate the http 301 redirect by rewriting the URL in a technically insignificant way. For example, "http://www.victim.xyz/" might be rewritten as "http://www.victim.xyz/?"

Exactly how this filter would be implemented depends on the Web server platform and possibly the requirements of the organization. For example, it could be implemented as an Apache httpd module, an IIS ISAPI filter (or whatever the .Net equivalent is. It's been a few years since I've worked with Microsoft products), or a servlet in a J2EE setup. In some cases, it could even be implemented in a more localized scope using globally included PHP or ASP scripts, although I think I'd steer away from this because of the performance penalty.

I'd greatly appreciate feedback.


 7:38 pm on Mar 24, 2005 (gmt 0)

"The filter determines if it has seen this particular web client recently."

This is where the problem lies. If GBot follows a regular link, and a second later hits it from a 302-redirect, you wouldn't know the diference.

Also, I'd like to thank WebmasterWorld community for making this issue SO PUBLIC, that now any idiot with basic HTML knowledge an a shovel can knock down other people's sites. Hope this doesn't violate TOS ;-) This should've been better off discussed in a Supporters forum.


 7:50 pm on Mar 24, 2005 (gmt 0)

>> This should've been better off discussed in a Supporters forum. <<

Too late. It is now mentioned on almost every SEO and webdesign forum, as well as on /. too.


>> The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz. <<

A web browser visits pages by going from one to the next by clicking on links, and may (it might not, as some people surf with referrers off) leave a referrer in your log (the referrer is the URL of the previous page that it was visiting, if that page linked to you). If someone typed the URL in, clicked on a bookmark, or has referrers off then you will not see a referrer. Don't confuse Referrer with User Agent. The User Agent part of the log entry says which browser and OS was used.

For search engines, they do not crawl the web going from one page to the next. They spider a page and add all links found on that page into a database. When they finish that page, they ask their own database for the URL of the next page to spider. It might be one on a different site! Multiple bots will adding to that database, and getting their next job from it, so you can have several bots from the same search engine on your site at the same time. Search engine bots leave User Agent information in your log, but they do NOT leave any referrer information, ever.


 7:52 pm on Mar 24, 2005 (gmt 0)

accidentalgeek - that is a great idea with only one flaw.

Googlebot goes to badguy.xyz and makes a list of links, noting the 302 redirect.

Another googlebot later visits the links reported by first googlebot indexing the content for badguy.xyz

It does not follow the 302 redirect it only records it, adding victim.xyz as a temporary location to fetch badguys content.

We went through this inside and out (see 700 post thread) there is no way to stop it

Google needs to fix it.

All we can do is site:mysite (or allinurl:) and keep getting rid of them until google fixes it.

anything at all in site:mysite should be removed.
allinurl: you need to make a judgement call wether this link is harming you or not. There are a lot of 302 links that will show up in allinurl: which are completely harmless good backlinks.


 8:05 pm on Mar 24, 2005 (gmt 0)

adding to that - I really frown on any type of redirect to mysite using a META refresh blank page. I think that might be the key to the hijack problem.

A normal tracking 302 just runs through a script. Someone clicks a link on notbadguy.xyz and the script takes the id# and replaces the url with your url and records the click. Nothing wrong there.

When the script sends user to a blank page with META refresh (without a noarchive) thats when google assigns victims content to that blank page.

This 277 message thread spans 10 pages: < < 277 ( 1 2 3 4 5 6 7 [8] 9 10 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved