|Dupe content checker - 302's - Page Jacking - Meta Refreshes|
You make the call.
My site, lets call it: www.widget.com, has been in Google for over 5-years, steadily growing year by year to about 85,000 pages including forums and articles achieved, with a PageRank of 6 and 8287 backlinks in Google, No spam, No funny stuff, No special SEO techniques nothing.
Normally the site grows at a tempo of 200 to 500 pages a month indexed by Google and others ... but since about 1-week I noticed that my site was loosing about
5,000 to 10,000 pages a week in the Google Index.
At first I simply presumed that this was the unpredictable Google flux, until yesterday, the main index-page from www.widget.com disappeared completely our of the Google index.
The index-page was always in the top-3 position for our main topics, aka keywords.
I tried all the techniques to find my index page, such as: allinurl:, site:, direct link etc ... etc, but the index page has simply vanished from the Google index
As a last resource I took a special chunk of text, which can only belong to my index-page: "company name own name town postcode" (which is a sentence of 9
words), from my index page and searched for this in Google.
My index page did not show up, but instead 2 other pages from other sites showed up as having the this information on their page.
Lets call them:
www.foo1.net and www.foo2.net
Wanting to know what my "company text" was doing on those pages I clicked on:
(with mykeyword being my site's main topic)
The page could not load and the message:
"The page cannot be displayed"
was displayed in my browser window
Still wanting to know what was going on, I clicked " Cached" on the Google serps ... AND YES ... there was my index-page as fresh as it could be, updated only yesterday by Google himself (I have a daily date on the page).
Thinking that foo was using a 301 or 302 redirect, I used the "Check Headers Tool" from
webmasterworld only to get a code 200 for my index-page on this other site.
So, foo is using a Meta-redirect ... very fast I made a little robot in perl using LWP and adding a little code that would recognized any kind of redirect.
Fetched the page, but again got a code 200 with no redirects at all.
Thinking the site of foo was up again I tried again to load the page and foo's page with IE, netscape and Opera but always got:
"The page cannot be displayed"
Tried it a couple of times with the same result: LWP can fetch the page but browsers can not load any of the pages from foo's site.
Wanting to know more I typed in Google:
to get a huge load of pages listed, all constructed in the same way, such as:
Also I found some more of my own best ranking pages in this list and after checking the Google index all of those pages from my site has disappeared from the Google index.
None of all the pages found using "site:www.foo1.com" can be loaded with a browser but they can all be fetched with LWP and all of those pages are cached in their original form in the Google-Cache under the Cache-Link of foo
I have send an email to Google about this and am still waiting for a responds.
Do you mean that if some of sub-pages are being redirected the rest of the pages will be dropped as well in due course?
Thanks for the lynx and wannabrowser suggestions.
I think my case is different. I think the rogue site is cloaking and not hijacking.
If I click the link from google to the rogue site I get his product page, but if I use wannabrowser I see a totally different page. There are no redirect lines in the html so he must be cloaking / redirecting in the .htaccess file?
The "hidden" page is stuffed with keywords and links to not just my site but other leading sites too.
So is this something I need to worry about? He's not copying complete pages - just a piece of keyword rich content from my site.
BTW, I notice that there is a link on the hidden page to a rather tacky seo site. The seo site, to me, looks like your typical "90 day, 100% money back guarantee - join up now for a free e-book worth £197" dodgy outfit.
Marcello,Frank Rizo, if the problem is becoming serious and it is ,maybe sounds a bit too much ,but how a few webmsters hited from that new kind of piracy ,equiped with strong evidence just take a flight and go straight to Mountain View CA and just try to get a personal contact with G people,there must be a way.
I really feel myself stupid but i dont undrestand how they benefit from it (besides killing your rankings)? If the page that is using this redirect thing, it redirects all regular users to your page and you get all the traffic what is meant to you - doesn't it?
I repeat an earlier post
In my case it was a coding error in installing a well known link prog.
It somwhow copied the pages of one of our test sites, in real estate, and cached them as pages of the other site. Our pages disappeared from the SERPS and dropped to PR0.
I am convinced that it was not intentional, as the site owner manually removed all the offending pages.
However I was somewhat disappointed that G took no interest in the matter.
Sorry I had to leave this thread this weekend. I had pressing business to attend.
First of all, please no more sticky mails unless you are a mod or senior. I will not answer any more. I am not going to give the URL of the offending sites. As stated before, I need someone with a bit more knowledge then I to take a look at the offending sites to determine if this is a click-through scam or not.
Secondly, and I find this odd, but it might just be the fact that indexing in Gooogle has gotten so slow, The offending site that originally had the meta refresh now has a 302 redirect, yet my site's title remains listed in the SERPs at the same position as it was before, only the link goes to the offending sites home page via a 302 redirect. I was able to remove my link which got rid of the meta refresh, but the 302 remains. The site in question allowed me to remove my link via a login and password and that got rid of the meta refresh, but there has been no change on the 302 as of yet. This link still occupies the postion MY SITE used to occupy.
Thirdly, I did not intend this thread to deviate into the legal ramifications of what is happening. I really don't believe that Google intentionally is doing this. I think it is just a bug that needs to be fixed. Quickly.
Fourth, I would just like to get my site back in the SERPs again. As stated previously, the only way I rank higher then the offending site is if I search for my title. I do come up #1 for that search, the offending site comes up #2. No other searches for ANY of my key phrases lists my home page. I do get a return on some sub pages of mine in the 500's.
Now to answer some questions posted here...
|also, was wondering if periodically varying the wording of the index page's title might help. |
No, it doesn't help. Varying the title changes the title shown for the offending site also.
|I wish I understood these issues better but unfortunately I don't. The hijacker duplicates a page and then inserts a meta refresh tag to redirect back to the original page, right? |
No. The hijacker doesn't need to do anything except the meta refresh. Some others here have answered this better then I, but the page that gets hijacked does not need to be duplicated at all. It seems the function of the meta refresh gives credit to the site that has the meta refresh. There is no copy of the page on the offending site, just a meta refresh.
If site A has a meta refresh to site B, Site A gets credit for all the backlinks and PR that would normally go to site B. It's that simple. Why, however, PR does not seem to make a difference as to why site A is listed first, I do not know. This is just what happened to me.
So how do you check if your site is hijacked?
Search for your site. If a link in the SERPs that shows your page title(you can check by running your mouse over the returns) goes to another site, you might have been hijacked.
If you click on that link and it goes to your site, you might have been hijacked.
Copy the link and run it through an HTML checker. I am not going to recommend which checker to use, but use one which shows the html of the page you are checking. If the page shows a meta refresh to your site, it has been hijacked.
If you try checking with a header check, it will seem fine because you will just get a 200 page found. That tells you nothing.
If you do a link:siteA.com/metarefreshpage.html, it will show the EXACT same backlinks as your page, even though the only page that exists on the offending site is the meta refresh page.
I have actually given thought of relisting my site back on the offending site again. At least I may increase my traffic because the end result IS my site. But I feel in principle, I need to stick to my guns on this. A meta refresh to my site and any other sites that may be hijacked, is producing SERPs which are not accurate, even though the end result is the same. I wish we could roll back the clock to when meta refreshes were considered bad and sites were penalized for them. Obviously something changed in the algo which is allowing this kind thing. As stated before, these types of threads seem to have started on this forum about the time of the Florida fiasco or soon after, and that may have been the beginning of the problem.
But the real problem is how easy it is to do this, and how rapidly it could spread if not fixed. As I have stated previously, I have 2 other sites that have the same type of problem, however, my pages are still ranked HIGHER then the offending sites pages. BUT these offending meta refreshes are getting credit for all my back links and PR....
I did not start this thread. Nor did I add the topic.
And yes, maybe part of me thinks that by spreading the word, we might get some action on Google's part to fix it.
My intention was not to spread a way to boot your competitors out of Google, but rather to bring this to the forefront.
In some previous posts, I linked to many threads (and there are many more) of this problem that has dated back to late last year and the beginning of this year. How long will it take? Seems so far the issue has been ignored. And if it happened to your site, I think you would want to bring it the attention of all also.
Sure, I could experiment and try to get competitors "kicked out of google." but I heven't, nor will I. I just want the problem fixed so we can all go on our merry way.
"Google does not owe us a living"
There's a point I've been wanting to make about that sentiment for a long time.
Should you rely on Google for your living? Of course not, but that is common sense not morality.
But does Google owe us a living? Not individually but jointly, IMHO, yes.
Why doesn't anyone ever point out the fact that if it hadn't been for webmasters *other* than Google, there would be NO Google? Google became a giant by indexing and presenting content *strictly* created by others! Google does not create content at all, it organizes other peoples content. The only reason no one complains about Google's "fair use" of other site's content in the form of SERP snippets, is because they are *sending them commerce*.
Just a thought.
|Google *does* at least have an obligation that webmasters are fairly credited and ranked for the content they create; since it is those very same webmasters and their content that are responsible for Google's very existence. |
Right on, Androidtech.
Imagine if an art museum were to shuffle up name-tags of the arists when they display works of various artists. You bet if Bubba's Original Tattoo Design were mistakenly labeled as Salvador Dali's work, he would be real ticked off at that museum. Now imagine how he would respond if that museum (google) refuses to correct the mix-up (302 redirects) for almost a year. Could a case be made that if Google is knowingly allowing such mislabling (possible trademark infringement?) to occur, it's refusal/failure to correct it causes material harm to the original creator? Yes? No? May be?
My site is unaffected by this issue - so perhaps I am less emotionally involved then many of the posters but I had a few thoughts I wanted to interject:
First and foremost I dont think that we can definitivly say that the redirecting sites are of higher or lower pr then the redirectees. Its been some months since green was updated so the pr you see was accurate 3-4 months ago - certainly not so today.
Secondly I dont think its accurate to say that the redirector is "stealing" pr. Many have posted theories that the redirecting *site* benefits from the PR which is simply not true - the redirecting *page* (which does not link to the redirecting site otherwise the content would not be identical) benefits from the pr and then the pr is assumingly lost (or passed) via the meta refresh or 301 to the redirectee. Perhaps the redirectee's are deprived of pagerank. Any attempt to misappropriate this pr - by the redirector - will by nature ensure that content is not duplicate and eliminate the association between the two sites - so in a technical sense it is not theft but deprivation. Given the inability to do anything with the stolen sites I dont see anything patently immoral with this - though I have not seen any of the pages or sites mentioned, nor the redirectors implementing this.
IANAL, but there seems to be quite a bit of talk of legal action and dcma violations and such. The DCMA - in this case does not seem to be applicable - the pages are not using your content inappropriatly (or quite frankly at all) rather they are saying "get this content from the URI...". At most perhaps one could argue the circumvention of copyright protection ("...or otherwise to avoid, bypass, remove, deactivate, or impair a technological measure, without the authority of the copyright owner"). One would be hard pressed to demonstrate 200's as a technological measure of webpage distribution that protected copyright - and 301s as a device to circumvent this measure with little other legitimate purpose, but at least this argument is plausable. In short I dont believe google nor the redirectors are legally in the wrong - particualrly if the redirection is inadvetantly acceiving this effect. Rather I think it is an issue of ethical and moral ramifications.
I think the issue is a derogitory side effect of googles algorithm. Something about the way they measure and combine duplicate content has affected these sites in a negative manner - if I were to guess I think it has stayed in place over a year because it effects other sites in a positive and working manner and has only been noticably exploited as of late. Given the exploitations and that the issue has been raised to the forefront and is being exploited I think it will be fixed.
Points taken, however a couple of points of my own.
First of all, the sites that now have a link in google have been REPLACED by my site. They are in the same positions that my site enjoyed before the meta refreshes were put in place. My site has disappeared in the SERPS except for when doing a search for the title, in which my site is #1 and the offending site is #2. So we are not talking about just the throes of page rank, but the end result of what the meta refreshes have caused.
Next is the fact that even after the meta refreshes have been removed, at least in my case, the links seem to stay in place and a 302 redirect now goes to the home page of the offending site. This most assuredly is a problem. The user does not even get to the site they wish to go to! This seems to be in direct conflict as to what google is all about, relevant SERPs.
I could see maybe my site going down a notch or two, but to disappear altogether is a problem wouldn't you think?
Remember, the site is gone, the meta refreshes and the redirects remain. THAT is the problem.
As for DCMA, you are probably right. I don't know if there is a recourse, but I am going to try my darnedest to make the effort. As far as I can tell after reading the DCMA pages and such, posting here is my only recourse. Emailing to google seems to be falling on deaf ears, and quite frankly, I don't know if my emails are even being read. At least in the past, I would get some sort of acknowledgement that my email has been received. But now there is nothing. It's a big black hole.
So here I am posting the problems and hoping someone at the G will read this and take notice. So, far as I can tell, I have no other recourse.
Sure I could spam the site back, try to do funky stuff to my site to get back at these offending sites, but that is not the way I do things. I believe this will ultimately be fixed, it's just going to take some time.
And you're right, the offending site may not even know they are doing this, but that still does not make it right, nor should something as simple as adding a meta refresh to a blank page totally take a site out of the SERPs. THAT is what I am talking about.
I guess maybe I am emotional. But come on Google, let's get it fixed!
FWIW, I think mostly this *is* intentional. All that is needed is a click tracking script. Maybe therein lies the reason that the engines can't fix the seemingly easily fixed problem of 301 and 302 redirects. Maybe the fix will cause too much collateral damage to their own click through ad programs. I hate to be a conspiracist, but before the engines became so focused on marketing and selling words and clicks even to the extent of using vacant domains to sell clicks... this was not an issue.
I must post something that a sticky mail brought up. And I must post this to be fair.
The 302 redirects I am talking about have only been in place for a couple of weeks. When I found the meta refreshes, I deleted the sites which now created the 302s.
I am going to have to wait to see if the 302s disappear. What I am worried about is whether or not my site will disappear with them.
|Google does not owe us a living |
No, but they do have a responsibility to not deliver stolen goods, which is what they are doing in this case. Those hijacking sites wouldn't be found by users, (essentially wouldn't exist), without the infrastructure that G and the other SE's created and maintain.
How anyone can say that this nonsense isn't criminal, (with Google and the others merely innocent third-parties), is beyond me.
|getting the hosting company to disable the offending site |
... which is interesting. It demonstrates how "quality control" on the web could be introduced more locally (and more manageably) by hosting providers. And then Google and other reputable search engines could move to index sites only from those hosts who are able to show that they maintain a standard.
I'm not saying this is a good or a bad thing, but that it is a mechanism. Some might see it as a sinister move towards web censorship.
Googlebot only seems to treat a Meta Refresh like a 302 when a page redirects to an external page.
Eg: site1.com/bla.html => site2.com/blabla.html
Has anyone seen an internal Meta Refresh acting like a 302?
Eg: site1.com/bla.html => site1.com/blabla.html
First - we took a chainsaw to this thread as it had many comments that had little to do with the subject at hand.
Second - there is so much more going on here that meets the eye. There is an issue here, but throwing out spam reports doesn't help any one.
I tell ever one to keep a very skeptical eye to the so called "facts" here - they are not at all what they seem to be.
... trying to get to the bottom of it. In the mean time, leave the side topic stuff, the post count padding, and the pure off topic messages for foo.
If you have created and own copyright to a page that Google is displaying in its cache but is attributing it to another website, you should file a DMCA with Google.
Google has the legal responsibility to take action on that DMCA claim. When you have filed the DMCA with Google, they will contact the other website owner. If the other website owner claims that the content belongs to them and not you, Google will have little choice but to leave the page in its cache.
If the other website owner does not respond, which I think in the above cases mentionnned, they may not, Google will remove these pages from their cache.
After reading this thread it boosted my curiosity and in fact one company I am or was exchanging links with shows up directly under my company name search in G (mysite position 1 their site position 2) with the identical page title and meta, when you click on the link it goes to the site but if you click on cached it shows my site page jacked to their Ip address. I am very familiar with this company and I am not sure at this point what to do.
Sticky me if you have any advice or if you need cached and dated examples for any class actions.
Excuse me for wading into an interesting topic that is a little over my head:
If someone wanted to get Google's attention on this could they hijack Google's homepage using the same tactic?
Just a thought.
What you mean a few thousand webmasters put up pages with;
<meta http-equiv="refresh" content="0; url=http://www.********.com/etc......"> in them, that wouldn't help G to get it sorted any quicker and could even slow down the cure, I think they are aware and the absence of posts from certain representatives on this thread when they have posted on other subjects in other threads is the reason why I think this.
In message 45 I reported that I'm using a php redirect script for external links, like so:
$location = 'ht*p://www.somesite.com/';
header('Location: ' . $location);
... though in the script I have maybe 40 links done that way. In the interests of being in a position to act responsibly I would appreciate knowing (from anyone who is technically "au fait" with these things in the light of this thread) whether I am likely doing any damage to the SERPS or SE indexing of somesite.com, or indeed to my page, and if so, what I should do - ideally I would like to retain the php redirect in some form or another.
I understand that Meta Refreshes are treated as 302s which gives the originating URL credit for the destination's content.
This clearly is a bad thing if done maliciously.
But it seems the majority of occurances that I've seen and used it is for tracking click-thrus. Tracking has to be done on the host's website prior to sending the visitor to the destination URL.
website abc.com links to xyx.com/index.htm
The only way to track click-thrus, is by linking through a tracking script, i.e:
The visitor clicks on this link calling the link.php on my site, which increments the number of clicks to xyz.com.
In this legitimate application, if we do away with Refreshes, how can click-thrus be tracked?
> The only way to track click-thrus, is by linking through a tracking script, i.e:
Here's a thread about one way to do it without using a meta-refresh or redirect: [webmasterworld.com...]
Maybe this is why Yahoo! is a bit more finicky with redirects...
I highly doubt that this is accidental...
>"Whoops I didn't know over the last four months that Googlebot indexed over 937,129 pages...
Seems as if I've heard a similar excuse from another SEO programmer...
If I find any of my content falsely represented and/or copied without my consent, you can bet I'll be generating some Spam reports and DMCA reports.
|In this legitimate application... |
Thanks, and I thought so...
I got mine from WW (more or less):
case('description1'): // ID 1
$location = 'ht*p://www.site1.com/';
case('description2'): // ID 2
$location = 'ht*p://www.site2.com/';
header('Location: ' . $location);
Links are like: ht*p://www.mysite.com/go/?url=description1 which returns a 302.
The issue with the one quoted from the other thread is that it requires JS enabled.
Anyway, as I posted earlier, I'm finding sometimes that where the linked-to page should be in SERPS, my script (go/index.php) replaces it. This is a problem either I or the relevant search engine should be addressing. Who and when though?
<<If it's a spidering problem, it likely means the improperly redirected links will need to be respidered, then reindexed. An indexing problem COULD be fixed more quickly if indexing is done independently from spidering I suppose. Google may have a fix in place now and we may be simply waiting for respidering/reindexing corrections to percolate into the publicly available SERPs.>>
I wonder about this. I had been emailing Google back and forth. At first they kept giving me answers such as "your page probably disappeared due to natural fluctuations", your PR probably disappeared as a result of "natural fluctuations". Finally, when my site's title and snippet came up with the other site's URL when doing a search on "www.[my-site].com" I sent one last email to Google. Their response was that "since the other site has removed the link, the problem may resolve itself during future crawls." They also told me they cannot make manual changes for individual sites.
As for my site, the latest is that when you do a search on www.[my-site].com, my snippet and title are gone, and a link with no title and snippet to the other site comes up in the results.
I still have no backlinks and am still coming up as PR0.
Although, if you click on "Find web pages that contain the term "www.[my-site].com" the SERPS list the results for my page, including my index page that was totally gone before.
So, yeah, maybe it will resolve itself in the next crawl, and maybe somebody else will use a meta-refresh to my site again, and I'll be back in the same boat.
<<Each of you needs to report your experience with this problem to Google. This page suggests reporting it to email@example.com. They need solid examples of the problem to analyze so they can implement a fix. Reporting it or complaining about it at WebmasterWorld does little to improve the situation, although it is an opportunity to vent and commiserate with others experiencing the problem... ;)>>
Am I the only person getting responses from Google, still? Anyway, I mentioned this before, but I think it got lost amongst all the other posts.
Originally I had been sending emails go firstname.lastname@example.org and the responses were coming from email@example.com. I was getting nowhere, so I asked them to please, please send my comments to a level 2 or 3 tech (thank you to the person who sent me a sticky mail and gave me that idea). After that, I started getting responses from firstname.lastname@example.org.
I also sent them links to these threads and I can tell you that at least Google was showing up in my stats-looking at my site and the cached pages from that other site's link to me after emailing email@example.com as opposed to getting absolutely nowhere with firstname.lastname@example.org.
Which isn't to imply I really got anywhere with that email address, but at least they eventually admitted they knew what I was talking about.
You can change your header statement to:
and then it will send the 301 code instead of the 302 code.
|header("Location: $URL",true,301); |
Thanks for the suggestion, but it doesn't work.
If (as suggested higher up) the issue is a topical one with search engines - in other words it will get fixed - and anyway the later part of this thread seems to suggest "hush - things are not what they seem" (whatever that means), I feel inclined to leave it be and drop the subject.
There has to be a reason and a reward for doing something like this kind of hijacking, same as there is with browser hijacking. It's deliberate, and it has to be more than just a technical exercise. Here's some food for thought