Forum Moderators: Robert Charlton & goodroi
My site was doing very well in the SERPs. For over 2 years it had been on the first page for a competitive term (1.2 million listings). Then during the first week in January my site disappeared and traffic tanked for no obvious reason.
When searching for "site:www.mydomain.com" I noticed that my index page often wasn't listed or it appeared on about page 3 or 4 of the results after all my supplimental pages.
A search for "allinurl:mysite.com" often didn't show my index page at all but instead showed somebody else's domain (located in Turkey). When I clicked on this link, my site came up. When I clicked on the cached version of the site, it showed a very old cache of the page. This same site also showed up after all my results when doing a "site:www.mydomain.com"
Using a header checker tool on the site's URL I was able to see it was using a 302 link to my site.
Last night after reading some posts by crobb305 and others I went to Google.com and clicked on "About Google." Then I clicked on "Webmaster Info." Then I clicked on "I need my site information removed." Then I clicked on "remove individual pages." Where I found instructions on how to remove the page.
(Here's the exact page where I ended up. If mod needs to remove then snip away:) [google.com...]
I then clicked on the "urgent" link.
Then:
1. I signed up for an account with Google and replied back to them from an email they sent me;
2. I added the "noindex" meta tag according to their instructions and uploaded it to my site;
3. Using the instructions to remove a single page from the Google index, I added the hijacker's URL that was pointing to my site. (copy and paste from the result found on "allinurl" search)
This didn't work the first time because I had to remove a space from the url to get it to work.
4. I got a message back saying that the request would be taken care of within 24 hours. The URL that I entered showed on the uppper right hand part of the screen saying "removal of (hijacker's url)pending."
5. I then removed the "noindex" meta tag from my page and re-uploaded it to my site.
This morning the google account still shows the url removal as "pending" but when I do "site:" and "allinurl" searches the offending URL is gone and my index URL is back.
Conclusions and Speculations:
At some point last September, Google cached the hijack page's url pointing to my site. In January, Google penalized my site for duplicate content because it found both URL's and compared them. Mine got penalized because it was the only page that really existed. The hijacker's page didn't get penalized because it only existed as a re-direct to my site.
Because my index page was now penalized, it dropped almost completely from the SERPs. (Some of my suppliement pages showed up for obscure searches) but none of my money terms.
Because I haven't been able to get a response from the hijacker's webmaster, the 302 is still in place but it is buried deep in his site and the last Google cache of the page was sometime in September. Therefore with some luck Google won't re-index it any time soon.
Will my site return to the SERPs? I don't know. Any thoughts?
But, I am happy to say that the Google Removal Tool worked great for me.
But, along the way I really came to realize how big this problem is. Someone above said that most people don't even realize their sites are falling victim. I really have to agree with that statement. I had no idea until I started digging into it.
Second, a related and fairly extensive problem I see is the amount of duplicate Supplemental listings for site.com/directory and site.com/directory/ No-slash versions of pages remain Suplemental or URL only for a long time, and it seems to me that when these exist it depresses the ranking of the page. Removing site.com/directory with the URL tool removes the real page. I did this anyway as an experiment to see how soon they would reappear after being crawled again and how they would rank, but after 48 hours the pages haven't reappeared yet.
I just wanted to report that my site just came back to it's original position in nearly all of its money terms in the SERPs for the first time since Dec/Jan.
It's currently #6 out of 2 million results.
If I shout hurray I'll wake up my family so instead I'll just post here. (holding breath and hoping it stays)
Idaho
So basically this blank page except for this url quickly refreshed to my site.
Here is what the source code said:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Designerz Integrated Information Network Routing [mysite.com...]
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="refresh" content="0;URL=http://www.mysite.com/">
</head>
<body>
Designerz Integrated Information Network Routing <br> You will be transferred to 'http://www.mysite.com/' in 1 seconds. <br><br><a href='http://www.mysite.com/'>http://www.mysite.com/</a> [2]<a href="http://widgets.com/?login=routdshb"
target="_top"><img src="http://t1.widgets.com/i.gif"
name="EXim" border="0" height="1" width="1"
alt="eXTReMe Tracker"></img></a>
<script type="text/javascript" language="javascript1.2"><!--
EXs=screen;EXw=EXs.width;navigator.appName!="Netscape"?
EXb=EXs.colorDepth:EXb=EXs.pixelDepth;//-->
</script><script type="text/javascript"><!--
var EXlogin='routdshb' // Login
var EXvsrv='s9' // VServer
navigator.javaEnabled()==1?EXjv="y":EXjv="n";
EXd=document;EXw?"":EXw="na";EXb?"":EXb="na";
EXd.write("<img src=\"http://e0.extreme-dm.com",
"/"+EXvsrv+".g?login="+EXlogin+"&",
"jv="+EXjv+"&j=y&srw="+EXw+"&srb="+EXb+"&",
"l="+escape(EXd.referrer)+"\" height=1 width=1>");//-->
</script><noscript><img height="1" width="1" alt=""
src="http://e0.extreme-dm.com/s9.g?login=routdshb&j=n&jv=n"/>
</noscript>
</body>-
When you find the suspected hijacking link in the SERPs, does it have your page title and does it have exactly your page content in the cache?
Then I believe that that page has been hijacked. And as it has a metarefresh of 0, you won't be able to remove it by methods discussed in this thread.
I am in a similar situation and I am aware of no solution at this time short of asking the offend site to remove the link.
I am in a similar situation and I am aware of no solution at this time short of asking the offend site to remove the link."
I wonder if that is why I lost my ranking today for some of my top keywords?
Either I read your post too fast or you edited it after I read it. Anyway, if you can't see what's in the cache, I don't know if it's a 'hijack' as defined here. Does sound mighty suspicious, though.
There's potentially more going on with this stuff than hijacking. As discussed elsewhere by thebear, this could be a case of domain poisoning. Theory is that the 'bad guy site' gets associated with your site in G's opinion and then G thinks that you are part of any 'bad neighborhoods' with which the bad guy is associated.
I have a fast meta refresh situation where a porno site is listed with it's correct URL and title, but the snippet comes from my page and includes my domain name. There's a fast meta refresh on the cache, and the actual site does not have my domain name in it, so I have no idea how this is happening.
It is important to make a distiction between hijacking -- when another site is showing up with your title and cached content -- and other possible penalities (such as domain posioning) which don't have the physical evidence of a Google error which the cache provides.
Good luck.
They have removed the offending page and say this page will not return as a result for this query after the next crawl. Pretty quick response time too. 24 hours.
And I was able to nuke the other redirect using the google removal tool. So now all I can do is wait and hope that my site come back in the SERPS.
To which e-mail address did you send the request? Did you mention nuking the other links and did they say anything about it? Any other information on this as developments occur will be welcomed.
[google.com...]
so if you have the cache hijack problem this should solve it.
I didn't tell them about nuking the other link but I did try my luck and ask if this cache hijack would cause a duplicate content problem but they didn't respond to that at all.
But anyway, step in the right direction even though this is just treating the symptoms and not the disease itself.
It's important that we do not forget this for all the talk about 302's. The meta redirect can do exactly the same as the 302 redirect, and some hijackers even combine these two methods for one request (as Japanese originally pointed out). When combined, you can't remove the URL's with the remove tool (URL Console).
AFAIK, if you turn javascript off in your browser you will be able to see the page with the meta refresh (also the Google cache version). Somebody please correct me if i'm wrong here.
Safaridude - congrats way to go
Clause - with the 0 refresh even a non-java browser will get o seconds to view anything. I found this cool tool called s*msp*de which has a 'one page at a time' 'code view' browser. it is for testing purposes.
It follows no orders, just reveals the source one page at a time. Very handy.
edited to cloak brand names *=a
http://www.SCAMMING_WEB_SITEXXX.com/cgi-bin/list/out.pl?id=latinagirl&url=http://www.MYSITE.com&frame=yes
When the link is clicked. Here is the source code, real user info has been changed so I could post this.
<HTML>
<HEAD>
<TITLE>Scamming Site</TITLE>
</HEAD>
<FRAMESET ROWS="70,*" FRAMESPACING=0 FRAMEBORDER="0" BORDER="0">
<FRAME NAME="mem_top" SRC="/webmaster_top.html" FRAMESPACING=0 BORDER="0" MARGINHEIGHT="0" MARGINWIDTH="0" NORESIZE SCROLLING="NO">
<FRAME NAME="mem_body" SRC="http://www.mysite.com">
<NOFRAMES>
<P> </P>
<P ALIGN=center>
Thank you for visiting. We recommend using a frames compatible browser,
but you can view this document without frames by clicking
<A HREF="http://www.MYSITE.com">here</A></P>
</NOFRAMES>
</FRAMESET>
</HTML>
If anyone can give me advise as to how to eliminate this link from the google cache and from this Scamming website, please message me or reply in this forum.
Best,
Net_Warrior
AFAIK, if you turn javascript off in your browser you will be able to see the page with the meta refresh (also the Google cache version). Somebody please correct me if i'm wrong here.
Having (perhaps only temporarily :( ) sorted my 302 problems, I am looking at the meta-refresh problem
Do you think it is worth your while starting a new thread on it, so that it does not get buried in this thread? And we all can explore this problem there
- yeah, i've been using it for years. Especially before i learned *nix commands and started messing with other stuff than Windows it was invaluable. There's a built in spider as well ;-)
You can always do a "curl -i [example.com"...] at the command prompt if you run a *nix flavour OS (or have cygwin installed or whatever). Make that an "-I" to see the server headers only.
I suspect that site's strength on our site name is due to inurl and keyword density factors, not the meta refresh. It's behavior and ranking on our site name is very similar to that of a well-known site which frames their outgoing links to our site. The well-known site doesn't use a meta refresh, but its usage and density of our domain name is very similar to that of the site you pointed out, and they usually rank close together. I don't see evidence that either is hijacking our listings.
cornwall: it's probably worth another thread. meta refresh that's not coupled with a 302 seems to be a different issue, and certainly the solutions that have been discussed here don't apply.
Put an absolute link to your index page http: //www.site.com/index.html from some or all of your internal site pages. I did this on directory level pages because of the "can't find my company by name" problem with the company name in anchor text and I *believe* the unforseen bonus was clearing out these redirect/refresh index mismatch sites.
Yesterday, I cooked up an idea for a web server-based defense against this exploit and posted it to slashdot([slashdot.org ]) where it received no comments. I'm not sure if I should take this as a good sign (nobody found a serious flaw) or a bad one (nobody thought it worth discussing).
I'm considering recommending that my organization implement this, but am airing it out in public first to see if someone can find a flaw in it.
Proposed Defensive Solution
Problem Statement
Robots that index pages for search engines may be tricked into believing that content from one site actually belongs to another. The sequence of events looks like this:
Proposed Defense
To protect against the scenario above, the administrator of victim.xyz can install a filter on her web server which will issue an HTTP 301 redirect back to itself if it thinks that the request might be the result of a malicious/erronious HTTP 302 redirect.
Here is how it works:
Because a robot might be smart enough to recognized that it is being redirected back to the current page, it would probably be a good idea to obfuscate the http 301 redirect by rewriting the URL in a technically insignificant way. For example, "http://www.victim.xyz/" might be rewritten as "http://www.victim.xyz/?"
Exactly how this filter would be implemented depends on the Web server platform and possibly the requirements of the organization. For example, it could be implemented as an Apache httpd module, an IIS ISAPI filter (or whatever the .Net equivalent is. It's been a few years since I've worked with Microsoft products), or a servlet in a J2EE setup. In some cases, it could even be implemented in a more localized scope using globally included PHP or ASP scripts, although I think I'd steer away from this because of the performance penalty.
I'd greatly appreciate feedback.
This is where the problem lies. If GBot follows a regular link, and a second later hits it from a 302-redirect, you wouldn't know the diference.
Also, I'd like to thank WebmasterWorld community for making this issue SO PUBLIC, that now any idiot with basic HTML knowledge an a shovel can knock down other people's sites. Hope this doesn't violate TOS ;-) This should've been better off discussed in a Supporters forum.
Too late. It is now mentioned on almost every SEO and webdesign forum, as well as on /. too.
.
>> The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz. <<
A web browser visits pages by going from one to the next by clicking on links, and may (it might not, as some people surf with referrers off) leave a referrer in your log (the referrer is the URL of the previous page that it was visiting, if that page linked to you). If someone typed the URL in, clicked on a bookmark, or has referrers off then you will not see a referrer. Don't confuse Referrer with User Agent. The User Agent part of the log entry says which browser and OS was used.
For search engines, they do not crawl the web going from one page to the next. They spider a page and add all links found on that page into a database. When they finish that page, they ask their own database for the URL of the next page to spider. It might be one on a different site! Multiple bots will adding to that database, and getting their next job from it, so you can have several bots from the same search engine on your site at the same time. Search engine bots leave User Agent information in your log, but they do NOT leave any referrer information, ever.
Googlebot goes to badguy.xyz and makes a list of links, noting the 302 redirect.
Another googlebot later visits the links reported by first googlebot indexing the content for badguy.xyz
It does not follow the 302 redirect it only records it, adding victim.xyz as a temporary location to fetch badguys content.
We went through this inside and out (see 700 post thread) there is no way to stop it
Google needs to fix it.
All we can do is site:mysite (or allinurl:) and keep getting rid of them until google fixes it.
anything at all in site:mysite should be removed.
allinurl: you need to make a judgement call wether this link is harming you or not. There are a lot of 302 links that will show up in allinurl: which are completely harmless good backlinks.