|Dupe content checker - 302's - Page Jacking - Meta Refreshes|
You make the call.
My site, lets call it: www.widget.com, has been in Google for over 5-years, steadily growing year by year to about 85,000 pages including forums and articles achieved, with a PageRank of 6 and 8287 backlinks in Google, No spam, No funny stuff, No special SEO techniques nothing.
Normally the site grows at a tempo of 200 to 500 pages a month indexed by Google and others ... but since about 1-week I noticed that my site was loosing about
5,000 to 10,000 pages a week in the Google Index.
At first I simply presumed that this was the unpredictable Google flux, until yesterday, the main index-page from www.widget.com disappeared completely our of the Google index.
The index-page was always in the top-3 position for our main topics, aka keywords.
I tried all the techniques to find my index page, such as: allinurl:, site:, direct link etc ... etc, but the index page has simply vanished from the Google index
As a last resource I took a special chunk of text, which can only belong to my index-page: "company name own name town postcode" (which is a sentence of 9
words), from my index page and searched for this in Google.
My index page did not show up, but instead 2 other pages from other sites showed up as having the this information on their page.
Lets call them:
www.foo1.net and www.foo2.net
Wanting to know what my "company text" was doing on those pages I clicked on:
(with mykeyword being my site's main topic)
The page could not load and the message:
"The page cannot be displayed"
was displayed in my browser window
Still wanting to know what was going on, I clicked " Cached" on the Google serps ... AND YES ... there was my index-page as fresh as it could be, updated only yesterday by Google himself (I have a daily date on the page).
Thinking that foo was using a 301 or 302 redirect, I used the "Check Headers Tool" from
webmasterworld only to get a code 200 for my index-page on this other site.
So, foo is using a Meta-redirect ... very fast I made a little robot in perl using LWP and adding a little code that would recognized any kind of redirect.
Fetched the page, but again got a code 200 with no redirects at all.
Thinking the site of foo was up again I tried again to load the page and foo's page with IE, netscape and Opera but always got:
"The page cannot be displayed"
Tried it a couple of times with the same result: LWP can fetch the page but browsers can not load any of the pages from foo's site.
Wanting to know more I typed in Google:
to get a huge load of pages listed, all constructed in the same way, such as:
Also I found some more of my own best ranking pages in this list and after checking the Google index all of those pages from my site has disappeared from the Google index.
None of all the pages found using "site:www.foo1.com" can be loaded with a browser but they can all be fetched with LWP and all of those pages are cached in their original form in the Google-Cache under the Cache-Link of foo
I have send an email to Google about this and am still waiting for a responds.
They should simply fix their software. They didn't do it before, and they should handle redirects on foreign domains as simple links.
keep this thread open until GG comes here and tell us what Google will do about it.
OK, I've posted in a few other similar threads. Now I have more questions.
I lost my index page after a site used a 302 redirect to me. Several emails to Google, just a bunch of lame responses.
Today I did a search on my URL, and my index page shows my title, snippet and the other site's URL. Previously the other site was just showing up under random keyword searches like that. Also, now if you click on:
Find web pages that link to www.[my-site].com
Instead of my page it shows:
link:pdVL_A9zumsJ:www.[other-site.com] with no backlinks. Mine are gone, and my index page is now a PR0 in the Google Toolbar.
So, I checked further using a web sniffer somebody else mentioned, and found that site is also using meta refreshes. I'm not savvy enough to understand all of the information it gives, but there is another site that uses a 302 redirect to a different page within my site, so I was trying to see how they were each doing this, as the other page hasn't hijacked that page...yet.
The site that hijacked me is using 302 redirects and the web sniffer says "302 found". They used a meta-refresh and they removed my link from their site after
I asked them to, but if you click on the link from the Google SERPS it takes you to a page on their site with a MySQL error message across the top with content below and then meta refreshes to a 404 error page.
Is this going to disappear from Google, or not, since they still have that link going to their site before the 404 error?
Now, on the other site that hasn't hijacked me, yet, the web sniffer says "302 Moved Temporarily". Is there a difference between 302 moved temporarily and 302 found?
That site doesn't use meta refresh, but it does say:
Is that the same as a meta refresh?
I appreciate any insight you can give to me!
this is a public invitation for those who have had pages highjacked with a redirect to send a private message containing the following two pieces of information:
1) the url of the highjacked page
2) the url of the page doing the redirect
the reason for this request is to gather enough information for comparison to a case that i am familiar with. i promise not to bug respondents for more information than they first send me.
the particular line of reasoning i am following is a programming bug arising from misinterpretation of the rfc for http, i forget the exact number right now, combined with sending of a slightly misleading header by the offending site. the word misleading is used because the header is compliant with the relevant rfc although it is inconsistent with the expected meaning.
if this behaviour can be consistently observed across unrelated pairs of urls as requested above, then a good case can be made that it is a programming bug, that is why the request is being made to readers at large.
the current situation is that i can pretty well figure out what the bug is, but can't confirm it without additional observation points.
jdMorgan, that's an interesting idea. I'm making a note of it and will try the 301 if I run into this Google bug with any of my sites. Thanks.
"handle redirects on foreign domains as simple links"
This is the core of the Google problem!
(See message 34 of JPMorgan)
302-redirects (Moved Temporarily), under which also the Meta-Refresh falls, should only be accepted within the "Same Domain", as I can't see any reason that a Page is "Temporarily Moved" to another Domain
and/or Google should treath a Meta-Refresh as a 301 and not as a 302 (but then this still leaves the door open for hijacking by 302-redirects which only solves half of the problem.)
Can You imagin the possibilities of using 302 in the Phone-system or even worse in the IP/DNS-area. You want to phone your friend at Tel-No. 1234 and due to 302-hijacking you arrive at the local classifieds-newspaper at Tel-No. 4321
301-redirects (Moved Permanently) is acceptable to another Domain, as this means: This current page "Is No More Valid" but replaced by the page it is redirecting to. So-that the spiders cancels the redirecting page and replaces it with the redirected page
|There is no need for a contract to exist (between two parties for one to sue the other) |
|If some site where to be allowed to sue google then using the same logic i could sue any site that puts up a link to my site and then removes it for whatever reason |
This is slightly off-topic so I shall be brief.
It is in my nature to choose my words carefully - I did not specify the grounds on which Google might be sued, I merely sank the theory that a contract is required between two parties for one to sue the other.
On examples I have checked, Google's cache attributes the content to the wrong site - it would therefore be very easy to make a case against Google under copyright laws. Such a case would have to be made jointly against Google and the main perps. Both could be cited for breaches of copyright law and the main perps could probably also be cited for other matters.
The main case against Google would probably have to rest on negligence on their part in not fixing the problem after it was brought to light.
It's also worth noting that if evidence was brought to light of collusion, in the UK at least, this might be classed as conspiracy and we could then enter the realms of criminal law with a maximum jail time of 15 years I think - but that's UK law.
"Another person in this post already asked this, but didn't get a response. I would also like to know...
What is the easiest way to find out if your site is a victim of this type of activity?
My company's ecommerce site recently lost massive rankings on google, and I would like to find out if this could be the cause. "
- - -
That was me who asked, and I will try to answer my own question, even if I don't know how.
The easiest way I can see, is to copy small snippets of text from pages you think are hijacked, and use those a Google exact-search-phrases. If somebody else shaved your content, then those should show.
IF ALSO, you cannot get into those pages as described in the earliest posts in this thread, then THAT is
a second indication.
The rest, redirect matters etc. leaves me almost completely confused.
Sorry I took to song to reply.
There is a prog that will give you every copy of your text
Not sure if it is appropriate to publish it here, but a search for "website plagiarism search" does the trick
|jdMorgan, that's an interesting idea. I'm making a note of it and will try the 301 if I run into this Google bug with any of my sites. Thanks. |
Macro, using a 301 does not work. Google simply follows the 301 and uses that information for the meta refresh page instead (see post #41).
|The main case against Google would probably have to rest on negligence on their part in not fixing the problem after it was brought to light. |
I'm not a lawyer but I think a case of negligence would be hard to bring against Google without the existence of a contract of some sort - or at least a reasonable expectation of something by one party from another, or a duty of care or some such thing - and in this case there isn't much.
|I think a case of negligence would be hard to bring against Google without the existence of a contract of some sort |
Get a clever lawyer and he'll find a case in anything. It is obviously not the case that you need to have a contract with someone before bringing suit. That would bring criminal justice to a complete standstill. How many burglars, murderers or fraudsters get their victims to sign contracts first? ;)
Seriously, just because you're claiming "negligence" does not make a contract mandatory. I can think of numerous recent examples of successful action where there was no contract in place.
Google doesn't owe any of us a living, Google does not owe any of us traffic or ranking. Should they choose to remove us from their listings that is entirely their prerogative. And you can't sue them for that. However, as with any responsible business G will appreciate that they don't need to intentially commit a breach of the law to be named as party in a copyright infringement or in a "collusion" accusation. But I don't see that as the quickest or best route to a resolution.
On a related matter, when "criminal" was mentioned earlier on in the thread it referred to siteowners' deliberate expoitation of this Google flaw and not to Google's activities themselves.
If you cannot beat them, can you join them?
Get a throw away domain, redirect to the page that they hijack from you?
So that their traffic will come to you. Is this workable?
Please people, do not hijack this threat by changing the topic to "a lawsuit against Google".
This is about spammers that have found a new way to pull traffic to their site by using 302 and Meta-Redirects at the mercy of popular and well-established websites.
In the area of programming, the Internet and in any area of Software making, there will always be people that will put all their energy in trying to reverse-engineer or burglar into the system.
And the Flaw is not only with Google .... as of Today the same situation has happened in the Altavista SERPS where also since some hours ago my Index-Page at:
has been replaced by:
I am still OK in Yahoo, Jeeves and Alltheweb
Thanks to all for your concerns and lets try to put the hands together to to point this threat and Redirect-Flaw to all Search Engine engineers and more in the news ... not about LawSuits but to warn other webmasters who maybe do not understand why SERPS are dropping their pages and why they are loosing their good ranking they had before.
Thanks for keeping this threath on topic.
|If you cannot beat them, can you join them? |
That's the worse possible thing to do. More damage caused to the web experience, which damages everybody's sites in a small way and some people's in a bigger way - maybe yours even. Google and other search engines will be aware of the issue raised in this thread and the other related ones, and probably it won't be fixed in a week, but the big picture is that something will be done about it. This maybe won't help the people here whose sites have been affected, at least not quickly enough, but to respond by resorting to the same cynical (or negligent) methods is wrong in principle, and as useless as trying to take Google to court over something that is most likely being worked on in any case.
Do you forsee the issue to be solved real soon?
The problem has been there for months already. There were no response or any assurance from google or the other search engines that the issue will be addressed.
The hijackers are thriving on it and may be enjoying the harvest through the coming holiday seasons. They are hoping everyone will be sitting on it and not do anything.
I know it is not the solution but is there any other way those affected can do instead of waiting? Can someone at google respond?
I have only afew pages affected thus not too concerned but I know that the scope that these hijackers hit at are quite broad.
|and as useless as trying to take Google to court over something that is most likely being worked on in any case |
Taking Google to court is probably not the right action, however, this idea that the problem is being worked on is one I just don't get.
When a security flaw is found in Windows, MS can get a fix out in days (not always, but it does happen). This problem has been known for months, possibly more than a year and still no fix is in sight, nor have Google even conceded that the problem exists.
The possible explanations for this tardiness are as follows :-
1) Google just doesn't care.
2) The code is incomprehensible and the guy that wrote it has left/died, etc.
3) The code is fine but no one is smart enough to understand it.
4) The source code has been lost and no one is smart enough to work out how to patch the binary code.
Of course, the last possibility is interesting. They would probably need someone in his/her fifties or sixties to patch binary code - it's certainly not a skill taught at university.
If this problem's been around so long it's a shame someone didn't press Google on it pre-IPO ;)
In the future, Google is unlikely to comment on this issue/problem(?)/bug(?) (or any other), since any "official" comment could be used in support of legal action, whether frivolous or justified.
In my experience, Kerrin's canned response saying "we'll pass this on to our engineers" (in message 41) is an indication that Google is taking the problem seriously and exploring resolutions.
Google needs to first decide if this is their problem or a shortcoming in the HTTP protocol specs that everyone is expected to conform to. Then they must decide if this is a spider problem or an indexing problem. Then they need to fix the appropriate code and test the fix thoroughly before deploying it.
If it's a spidering problem, it likely means the improperly redirected links will need to be respidered, then reindexed. An indexing problem COULD be fixed more quickly if indexing is done independently from spidering I suppose. Google may have a fix in place now and we may be simply waiting for respidering/reindexing corrections to percolate into the publicly available SERPs.
Each of you needs to report your experience with this problem to Google. This page [google.com] suggests reporting it to firstname.lastname@example.org. They need solid examples of the problem to analyze so they can implement a fix. Reporting it or complaining about it at WebmasterWorld does little to improve the situation, although it is an opportunity to vent and commiserate with others experiencing the problem... ;)
Google will not be motivated to correct this problem until (a) they receive bad press about it in a mainstream publication, or (b) it directly affects one of their top clients.
wink wink nudge nudge. . .
I think this is happening to mysite. Here's the scenario:
www.widgets.com - has text on homepage describing widgets and stuff. In particular there is a line: "Widgets are the new gudgeon clips."
Searching on google for that line should only find my widgets site but also listed is:
The google entry for foo shows my text description but clicking the link goes to the foo site which shows nothing about widgets or gudgeon clips at all.
Is my site being hijacked like marcello's? How can I probe foo.com to see if there is a meta redirect?
From what I gather, when a site is hijacked then google will drop the site and PR0 it.
If that is the case then how quickly does that happen?
Is my site in the early stage of being dropped by google?
Why aren't you all using this link?
|Digital Millennium Copyright Act |
It is our policy to respond to notices of alleged infringement that comply with the Digital Millennium Copyright Act (the text of which can be found at the U.S. Copyright Office Web Site, [lcWeb.loc.gov ]) and other applicable intellectual property laws, which may include removing or disabling access to material claimed to be the subject of infringing activity. If we remove or disable access to comply with the Digital Millennium Copyright Act, we will make a good-faith attempt to contact the owner or administrator of each affected site so that they may make a counter notification pursuant to sections 512(g)(2) and (3) of that Act. It is our policy to document all notices of alleged infringement on which we act. A copy of the notice will be sent to a third party who will make it available to the public.
Last I checked, Google is a private entity. They can make their search engine operate in any manner they choose. The fact that their algorithm operates in a manner that is not to some of your personal likings is a rather stupid basis for a lawsuit.
Google does not have any kind of duty or obligation to index anybody's website. If they decide they only want to show results for domains that begin with the letter 'A' and were registered on a Tuesday, that's their right. If you don't like how they operate, make your own search engine.
That said, I think this is a pretty major flaw in their algorithm and I, for one, would like to see more extensive analyses of the problem and how it could be resolved. And I'd really be interested in seeing something from GoogleGuy explaining why they do things this way. Enough whining about lawsuits, let's get back to discussion of the actual problem itself.
as far as i remember that was altavista's words few years ago... they also had rare updates, good amount of advertising and so on... :)
now, vista is sold. thats pity, yeah? :)
A lot of people are asking about how to look at the redirects or meta tags people are using to possibly hijack your site, so here's the answer for anyone with access to a *nix account -- at the prompt, type:
lynx -mime_header HIJACKURL
where HIJACKURL is the url of the potential hijacker, including "http..." Lynx will show you the HTTP headers, so you can tell if it's a 301, 302, or meta refresh.
The DMCA doesn't apply in this case because the websites in question aren't copying content, just linking in a way which causes trouble with the way Google indexes pages.
The DMCA may not apply but google are obliged to answer one anyway.These are deliberate attempts to pass of another persons work as ones own.
A legal suit may have no effect on google but it should have on the offending culprits and the hosts of such.I would threaten both.
>>keep this thread open until GG comes here and tell us what Google will do about it.
Ahh, the good old days.
GoogleGuy would show a lot of credibility if he would make a comment.
A New Day ... But the saga continues!
Search Google for "wannabrowser".
under "Agent Selection" use "NetSpider"
UN-Check "Follow Redirects"
Leave "Show HTTP Response Headers" Checked
Yesterday I have send a DMCA-complaint to Google and to Altavista by Fax concerning this matter ... hopefully I will get an answer to this fax.
Today all my more than 8,000 backlinks to my site "www.widget.com" have disappeared.
entering link:www.widget.com now gives as answer:
"Your search - link:www.widget.com - did not match any documents."
So now the hijacking-page must have been completely accepted as "THE PAGE" and as Frank_Rizzo says in message 81, the next step will be that my PR6 will become PR0, resulting in a complete loss of:
- a 4-year old site
- over 80,000 pages
- PR6 ranking
- over 30,000 uniques/day
- 200,000 pageviews/day
All of the above the result of someone adding the following line of code to a not so high-ranking page:
"<meta http-equiv="refresh" content="0; url=http://www.widget.com/">"
Also Today (its morning here) My pages in the Google-Index have now dropped from over 80,000 pages to less than 40,000 pages. (using site:www.widget.com)
Also Google traffic is 50% less than the normal average from the last 6-months
I am watching Yahoo like a hawk as I am still getting a lot of traffic from them, but the hijacking-page is STILL NOT not in the Yahoo-Index and my www.widget.com page is still ranking No.1 for its main topic (keywords) on over 3-million results returned.
I still believe in Google and agree with "DaveAtIFG message 79" .... I just hope Google knows about the problem so that other webmasters never have this scenario happen to them.
Do you mean that if some of sub-pages are being redirected the rest of the pages will be dropped as well in due course?