| This 75 message thread spans 3 pages: 75 (  2 3 ) > > || |
|Meta Refresh leads to ...|
... Replacement of the target URL!
Say, i have a webkatalog. A listing's link is actually loading another page from my server that returns status 200 to googlebot and contains just this code:
<HTML><HEAD><META HTTP-EQUIV="Refresh" CONTENT="0;URL=http://www.webmasterworld.com/"></HEAD></HTML>
If i'd have a higher PR than WebmasterWorld, and if WebmasterWorld wouldn't have a google directory link, my page would make it to #1 at google for searches that formerly returned the WebmasterWorld index page at #1.
So if i'd search for WebmasterWorld the #1 snippet would read:
|WebmasterWorld News and Discussion for the Independent Web ... |
News and Discussion for the Independent Web Professional. WebmasterWorld
Highlighted Posts Mar. 17, 2004. Mar. ... WebmasterWorld Info and Utilities: ...
Doh! That's how google works currently. Found tons of examples. I know, that's nothing new. But it's annoying and ...
I call this broken!
If the point you are making is that the listed url should be that of the target page, then I agree.
Why not email Google and perhaps they'll change it. HOWEVER, I had to read your post several times to understand it. I imagine that Google techs are too busy to do the same.
Ummm ... I understood Yidaki's post first time around, and if I can, I'm sure the clever people at Google will have understood it first time too.
Seeing the same problem here in the UK. Heavy use of cloaking, so far I've got an average of around 5 listings per 100 for all the search terms I've researched that are cloaked (ie Googlebot is getting something other than the meta refresh that I get when I click on the Google listing).
For most of the keywords researched I'm getting a cloaked result in the top 10 for most. How did this happen?
plumsauce - Strange isn't it, the petty things that people argue over in some threads and then - like you said - deadly silence when Google make this dramatic error.
I call it an error, but lately I've been getting the distinct feeling that Google are messing with SEO's. Probably trying to shake off the spammy ones.
yes, i lost a site (very important for me) because a webkatalog has linked it in this way.
I have found many ohter sites with the same problem.
Index of these sites are gone and listing of the linking site is there instead. Yes my site is out there but not with the url of my site.
If i do changes to the title of my index, the title of the faked site changes too. lol
my visitors went from 10000 to 1000 per day and the site is not longer visited by google bots. Only old subpages are residing in Google and bring some visitors to me, but these sites never get a refresh.
The cache of my site can be found on another domain.
|since yesterday, i thought this thread would |
have balloned in size.
instead, absolutely stunning silence ....
I've been screaming about this for over six months and have gotten no good response. Yidaki describes the problem well, and we are not alone.
I've had several sites hit by variations of this problem.
It's gotten so bad that I won't let the sites I watch over be indexed in any directory that uses a redirect click counting scheme.
This explanation is conjecture, but I think what's happening comes from Google's way of handling two problems... redirected doorway pages, and redirected splash pages. By assigning the content of the destination page to the originating url, they solve both problems... and the arrangement makes sense, as long as the link doesn't jump across domains. When it does cross domains, though, the arrangement is open to a lot of abuse, both accidental and deliberate. I don't think the Google arrangement anticipated the redirects used on click counting pages in directories.
At this point I've written to GoogleGuy, and I discussed the problem with two high level Google search engineers at SES last August in San Jose. At SES, one of the engineers started by saying, 'yes, that's the way we handle redirects,' but after he heard me out, he said, 'sounds like a GoogleBug.'
I've gotten no feedback since, though, from anyone. I'd had a robots noindex, nofollow put on the redirecting page, but the problem persisted for months... then cleared up... then reappeared... and now it's gone... so I have no feeling of security about this type of directory listing.
Without feedback from the engines, it's hard to know what the status is, because I keep reading post after post about the problem.
I should mention that the problem doesn't occur just on Google... it also happens on Inktomi, and I've seen something similar, but not exactly the same, on Gigablast. All with different pages, but the pattern is the same, involving a high PR source page. Since the Google problem, I've had to yank a great directory listing just to get our money page reindexed on Inktomi. Am not sure what happens on the new Yahoo, but I don't want to try to find out.
Here's my rough archive of threads on this, elaborated a bit since I last posted it. You have to read down through some of the threads, but they all sound like roughly the same situation to me. Some are in the Supporters Forum:
Banner ad redirect-page indexed as mirror site by Google
- my original post on this problem.
Our company Lisiting is being redirected.
- elaboration of the problem. Some discussion of Links SQL as one of the problem systems, which uses both a 302 and an HTML page with a meta refresh.
Is using a redirect to track outward bound links bad?
- thread contains some hints for directory operators to prevent the problem, but no relief for the target site.
Google indexing redirect pages
How can I get google to stop indexing redirects?
- another example of the problem, posted by a content site operator
weird link showing up for my site in Web results
how can i change this?
- short thread... I'm guessing another example of the problem
It would be great to get some feedback from both Google and Yahoo on this. I know GoogleGuy is entertaining his family this weekend, but if one of the mods knows how to get his attention, it would be great to have him look into this eventually.
PS... Edit to correct omission.
[edited by: Robert_Charlton at 8:56 am (utc) on Mar. 20, 2004]
|since yesterday, i thought this thread would |
have balloned in size.
instead, absolutely stunning silence ....
most webmasters dont realize the real reason of lost sites,
they think the sites have been penalized or removed from google and wonder why this has happened.
Here in a german forum i started a threat like this a week ago and we discovered many pages with the same problem.
Often people starting threats like 'can you help me, i dont know why my site has vanished'.
In the majority of the cases i have found a directory like mentioned above. Allways the same, this could be no accident.
Robert Charlton - you said you've written to GoogleGuy about this, his sticky is turned off, where do you write to GoogleGuy?
I had a thought from the other side of this. I use redirectional scripts on my site for banner adverts at the top of pages. These use a php re-direct from a script in a robots.txt denied folder. I assume that the companies I link to through this script will not have their Google listing replaced with my redirect URL will it?
If it did, you can be sure that they won't be purchasing advertising from me next month! So far, all I've seen while researching this problem in the results is spammers trying to decieve search engines. I'm sure their must be thousands of non-spam situations where this is occuring though so how do you solve it?
>his sticky is turned off
Not always. I have written stickies to Mr. GG too ... and even received replies. :)
What i don't understand: why doesn't Google (and other robots) simply ignore pages with meta refresh's - just ignore the url where the meta refresh is hosted ... dividing content and url and REPLACING the target url is - in my eyes - a serious bug!
|where do you write to GoogleGuy? |
I tried whatever method he was suggesting at the time to reach him with spam reports, flagging the report with "GoogleGuy" and "WebmasterWorld" and my WW username. For various feedback now, I think he's suggesting doing this via email@example.com. Probably the best time to reach him is really late Friday night. ;)
I'm not even sure the message actually reached him. This is one of the dilemmas about this problem. No confirmation about anything. The helpful but obviously beleagured Google search engineer at SES gave me his card and email address, but also had to suggest I flag the message to refer to our conversation at SES so it wouldn't get filtered out. Who knows whether it got through?
I think Google is trying very hard and is much better than most about responding to feedback, but this topic has fallen through the cracks.
|I had a thought from the other side of this. I use redirectional scripts on my site for banner adverts at the top of pages. These use a php re-direct from a script in a robots.txt denied folder. |
Something similar was suggested in one of the threads I mentioned above. Whether this works may also depend on whether a meta refresh redirect is used as a "backup." Some redirectional scripts do this in case the scripts don't suffice. I'm definitely not an expert in this area.
Also, from what I understand, robots.txt won't prevent Google from indexing a url it hasn't crawled. It will index a link but not the page content, as in the threads below:
Problem with Googlebot and robots.txt?
Google indexing links to blocked urls even though it's not following them
Comment from GoogleGuy:
|If we have evidence that a page is good, we can return that reference even though we haven't crawled the page. |
And check out the thread Jim Morgan references (and read his excellent msg#12) now moved to:
Question about simple robots.txt file
I'm not sure how the robots.txt and Google indexing of links relates to our problem at hand, but the above threads suggest that robots.txt alone might not suffice.
In my conversation with the Google engineer, I'd mentioned that I'd requested the redirecting directory to put the noindex,nofollow robots meta tag on the redirect page in an attempt to get it out of the Google index. The engineer said he thought that this should take care of it. It didn't, or it took quite a few months, and the problem came back again before it finally disappeared.
In retrospect, I'm not sure why things eventually returned to normal, whether through Google's intervention, or because of the noindex,nofollow tag on the redirecting page. Anyway, I'd do both. This may not fix the problem, though... and the big problem is that we're vulnerable to how other domains link to us.
|why doesn't Google (and other robots) simply ignore pages with meta refresh's - just ignore the url where the meta refresh is hosted ... dividing content and url and REPLACING the target url is - in my eyes - a serious bug! |
Initially, this makes sense, but I don't think it's that simple. I think there might be some altruistic motives in Google's approach... ie, a lot of sites are built badly, with splash pages or landing pages that redirect, and, by handling meta refreshes the way it does, Google keeps a lot of these sites in the index. My guess is that ignoring meta refresh pages would create a more widespread upset than Florida, but I don't have the statistics... just a hunch.
It might suffice if Google kept things the way they were as long as it was the same domain, but ignored redirects that jumped domains. Even here, a lot of unwitting sites would get hurt. I can't tell you how many sites I've had to fix that used meta refreshes to redirect .net, .com, etc, as well as various ppc landing pages to a main "home" page that was not the default index page of the domain.
It may be that, while this is a stupid arrangement, it is no less legit than the meta refreshes on the directory counter pages and that the engines have a hard choice to make. I don't know. It would be nice to get some evidence of attention.
|troels nybo nielsen|
|certain enterprises pride themselves in not being just *plain* democratic, but *uniquely* democratic. |
Might I venture to ask you to be more specific? It would really be interesting to know who you are talking about. It would also be interesting to see some kind of documentation.
Before you joined us as a member I took part in one of the most absurd threads that I have read at WebmasterWorld. One member whose name I have happily forgotten stated that Google had described themselves in terms very much the same as those that you refer to. I asked where this was done and two or three people pointed to a page where Google did certainly *not* say anything like that.
Sorry, Yidaki, for bringing the thread off topic. The problem that your original post deals with certainly seems to be serious for Google, but I cannot claim to have any real understanding of the technicalities in it.
There is almost nothing a competitor can do to harm your ranking...
I wonder what Google suggests to avoid being linked by a meta refresh page - to further avoid having your own url replaced at the serp's by the linking page?
>In my conversation with the Google engineer
Robert, did the engineer or anybody else from Google and/or Yahink say something about how webmasters can controll this? I know, webmasters could email the linking site and ask politely to have the meta refreshing backlink removed. But i fear there are circumstances where it could be to late. And if the linking site is a competitor ... ouch ...
Oh well, an answer from GoogleGuy and Tim wouldn't be too bad - how could a webmaster avoid to have his url replaced by another linking url at Google and/or Yahoo? Unfortunately i'm pretty clueless ...
I told the webmaster of the linking site to remove the link to my site.
It took some time and the link vanished from google serps.
Unfortunately there is now a link of another domain instead.
This new domain is using the same directory software and data.
This all is very annoying. :-(
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important. blah,blah,blah
the search was:
"uniquely democratic" brin
to clarify the context of my original comment,
i was being sarcastic.
[edited by: Marcia at 3:22 am (utc) on Mar. 27, 2004]
[edit reason] Linked URL. [/edit]
plumsauce and troels,
could we PLEASE leave this thread on topic - it is far too important - off topic discussions dilute the original question (no matter how interesting they might be).
Meta Refresh leads to replacement of the target URL!
- How can webmasters avoid *in advance* having their URL replaced by meta refresh back links?
- And is this the almost that Google uses in its webmaster guidelines language: There is almost nothing a competitor can do to harm your ranking.?
>there is almost a message in the
deafening lack of silence itself.
is there not?
I really don't think the issue is explained very well in this thread IMO. And without being able to show links in this forum or use specific search terms it is really hard to explain this type issue in the real world.
What I think you may be talking about is a situation I have seen. If it is not...well, here is another problem.
I had a high rank web site for a certain city and subject (right when all the Google filtering kicked in).
Then I noticed my site disappeared and a few other sites in the index, that had taken it upon themselves to create tons of pages with various search terms, including mine, still showed in the Google results for those search terms.
Imagine a page with Google results in the middle and the keyword phrase spinkled around the page and in the title. The Google results in the center on the page seemed to be about a month old.
Since Google gives snippets of the most important data on the page it found in the search, the moron web sites that used this technique had all the imporatnt keywords and page titles of the original sites, plus they were all on one page, creating a super page for that search phrase.
I have given up all hope for Google. It was a very nice search engine before all the non-sense.
I think Robert_Charlton's approach makes sense - Do not "re-assign" URLs for 302 redirects if the redirect is to a different domain.
As for the quietude of this thread, there may be many like me who are intensely interested, but are waiting for a possible response from the SE representatives here, or from someone who has contacted the SEs directly and has received a reply that can be summarized and paraphrased.
The real problem is once these guys see the google hits in their server logs they will kill the redirect and steal the traffic. Not to mention your content will get penalized for being duplicate. Google could try to detect .js redirects, alot of the cloakers obfuscate the .js or use another trick with iframes to show their content over the top of your stolen content. I have been seeing server side cloaks as well. It's a real problem, I think the only way to stop it is not just to penalize the $9 disposable domain that cloaks, but also the affiliate seeking megasite that pays these guys for the hits. I think you need to make the companies responsible for their affiliate sites advertising methods and also spammers responsible for the disposable domains that cloak to their flagship sites. I think as it is now there is nothing to lose for the companies enjoying the traffic from this. It's oddly enough kind of like spam email in that regard.
<added> As I think on this, I had posted a thread about a megasite company that had content from one of my sites that I thought was being cloaked server side... it could well be that simply the redirect is now gone and the pages now point to the competitor and this is what happened to this site's traffic. I need to pay closer attention to referrals in the logs from strange URL's. I had seen some and thought it was someone had an open proxy server. Now I think I have some thinking to do to solve this. </added>
With a clearer head this a.m. and a little digging... here is how this thing is working at least for my 'jacked pages:
a disposable domain gets links through an obscure music category at dmoz, gets crawled and gets pr. This domain redirects to my sites pages and google indexes them. They have higher pr than me and a dmoz listing which I can't seem to get anytime soon so they get my content and I vanish. They disallow caching, but from the snippet I see my content... Then they redirect after indexing to their flagship site which happens to be a competitor to my site.
How server side can we prevent this from our end... ssi pages? or mod rewrite to alter the URL if from a redirect? I don't have an answer yet just more questions. Maybe someone has an idea.
<added> I have an email into webmaster@google and put GoogleGuy in the subject line... I have my fingers crossed anyway. :) </added>
[edited by: idoc at 7:25 pm (utc) on Mar. 26, 2004]
>> are waiting for a possible response from the SE representatives here
- subscribe! Action, even. there's been very illogical errors/flaws in redirect detection for a long time now.
A redirect should not appear in the SERPS by itself as it's not a page, only a placeholder for that page (this sentence originally posted in a related thread (1)) - Google should clearly show the target in stead of the redirect.
I can only interpret this as a symptom of way too much emphasis on outgoing links.
(1) Dec 03: [webmasterworld.com...] (#4)
(2) Dec 03: [webmasterworld.com...]
(3) Dec 03: [webmasterworld.com...]
(4) Oct 03: [webmasterworld.com...]
In my post above, I meant to say "Do not 're-assign' URLs for meta-refreshes if the redirect is to a different domain," instead of mentioning 302 redirects as I did. But that got me thinking about something and doig a bit of research.
Interpreting the HTTP/1.0 and HTTP/1.1 [w3.org] specifications for 301, 302, and 307 server response codes:
If an SE finds a 301-Moved Permanently redirect, then it should drop the old URL completely, and list and index the content of the new one.
If an SE finds a 302-Found or 307-Moved Temporarily redirect, then it should list the old URL but index the content from the new URL.
So, it looks like Google is treating a meta-refresh like a 302 or 307 redirect, and that is what enables this URL-jacking problem. Google may have done this to support pseudo-redirects on sites where the webmaster cannot do proper server-level redirects or generate proper redirect response headers using scripts, but are stuck with plain on-page html (think free-hosting sites).
After looking at a lot of sites, it seems to me that most webmasters (I mean in general, not here at WebmasterWorld) have no idea how to do a proper redirect, or which server response code they should use even if they do set the headers. So, Google is trying to help, but this problem is the unintended result. So, maybe we need a poster campaign that says, "Have you checked your server headers [webmasterworld.com] lately?"
I think Google should do its best to educate webmasters, but ultimately, there may be more harm in taklng proactive measures that go beyond the specifications and try to correct errors or omissions by webmasters or the limitations of certain hosting setups. I guess what I'm saying is that they should avoid methods which 'help' novice webmasters, but hurt those who know what they are doing. While trying to help is laudable, I think they should just let each webmaster be responsible for their own site's errors and limitations. Those who have problems will be motivated to take steps to fix their sites, and no unintended side-effects will be introduced on sites that are properly configured.
So, I'd propose treating a meta-refresh just as if it were a plain text hyperlink, and no more.
>a disposable domain gets links through an obscure music category at dmoz, gets crawled and gets pr. This domain redirects to my sites pages and google indexes them.
Googlebot provides referer, no?
You can cloak for references to this site from Googlebot, serve some wildly inappropriate page which will ruin their page relevance, and possibly even report THEM for cloaking, and get them banned.
Maybe I could serve them a poison page of the actual page content that they eventually refresh to of the flagship site. That way they would get their own duplicate penalty.
But, I don't want to cloak because this site IS my flagship site for a brick and mortar company and my domain is not expendible. I think I could report all the backlinks to dmoz from the various language music directories and get them killed. Incidentally, the site is a financial services site... nothing to do with music at all. Probably someone involved has editing priviliges there and could just as easily put them back up with another domain easier than I could kill the old ones. I could get google to ban their disposable domain, but $9 puts them back in the game and they don't need to build p.r. they have a scheme to do that without any work. The flagship site that is getting the final traffic is an advertising behemoth and I doubt google would ban their domain for the inbound cloaks. To me that is the only way to stop this is to ban the recipients of the cloaks. Yes, that could also be abused.
This business... at the bottom of the pile it is a dirty business and at the very top it is as well. In the middle lies mere mediocrity... go figure. Makes me miss my old 9600 baud modem and the bbs boards.
Googlebot provides referer, no?
umm, not that i have ever observed in the logs
from crawl activity, as opposed to serp activity.
besides, there is a risk that they unconfuse
themselves while you're not looking.
this is sometimes known as getting caught
with one's pants down.
other times, they call it an algo change.
>cloak ...serve some wildly inappropriate page which will ruin their page relevance
Sounds cool - - if you already know that someone and who is linking to you using a meta refresh. C'mon - if your site gets replaced at the google serps, it's allready too late. Or do i miss a hidden, secret way to avoid this in advance?
|as far as any excuse that this is in conformance |
with any particular rfc, say rfc2616, i would
counter that conformance to any rfc is the lowest
common denominator to interoperability and not
the limiting factor in creating any technical
solution. in other words, don't use the rfc as
Yes, I agree, it's the lowest common denominator, and therefore everyone should follow it. Otherwise, we'll end up with everyone acting like AOL and Microsoft, ignoring standards and demanding special handling to assure compatibility. You can give up the control afforded by the different meanings of 301, and 302 or 307 if you like, but I'd prefer sticking with the specifications any day -- each of those codes means something, and two of them are greatly different.
I don't care who the emperor is or whether he's wearing clothes... I've seen emperors rise and fall before. All I care about is that the current emperor not cause me a bunch of technical grief by changing the terms and conditions under which his robot communicates with my server.
<quote>a disposable domain gets links through an obscure music category at dmoz, gets crawled and gets pr. This domain redirects to my sites pages and google indexes them. They have higher pr than me and a dmoz listing which I can't seem to get anytime soon so they get my content and I vanish. They disallow caching, but from the snippet I see my content... Then they redirect after indexing to their flagship site which happens to be a competitor to my site.
This is even more interesting legally speaking ... I
have noticed this happening as well, but this is
great news for truly technologically advanced spammers ...
they can jack the content of a site
by just redirecting their traffic to the
high quality content site (WHICH IS NOT ILLEGAL).
If they use methods that allow them
only to do it only when a SE spider
is getting the page that's their choice.
Display thier own content when a buyer
finds the URL in the search engine,
that's also thier choice ...
Good luck proving in a court of law
that they stole your content
since they never placed
it on their $6.85 throw away site.
If the search engines EVENTUALLY ban the site,
so what, they probably already made
the $6.85 back plus a huge profit.
This is hypothetical but would it work?
|This is hypothetical but would it work? |
I think a lot of the hypothetical scenarios being spun out here suffer from the same flaw... they posit the existence of high PageRank throw-away domains.
As I've observed it, the redirect problem occurs when the redirect is from a high-PR page to a lower PR page, as might occur, eg, from a large directory. This is parallel to the situation of the (higher PR) index page of a domain redirecting to an interior page (lower PR).
It doesn't make sense that spammers and page hijackers are going to be risking PR6-7 pages, but I could be wrong.
My concern continues to be the accidental effects of redirects from banner ad counting pages, and from directory redirects where the directory operator is either trying to hoard PR or, more legitimately, to count clicks (or both).
|Robert, did the engineer or anybody else from Google and/or Yahink say something about how webmasters can controll this? |
Yidaki - If they had, I certainly would have mentioned it.
Incidentally, your StickyMail has been full for days.
Today (5 minutes ago) i saw this URL in the top 10 (anonymized):
[example.com...] - 16K - Cached - Similar pages
For that URL, the server sends a code 200 OK and the content of that page is:
<meta HTTP-EQUIV="Refresh" CONTENT="0;URL=http://www.some-other-domain.com/page-name.html">
The URL in the SERPS is from one domain and the snippet in the serps is from another domain. The first domain (URL) meta refreshes to the second (snippet and cache).
This is not the right way to do things. That URL should never have been under that snippet. The snippet should show the target URL in stead.
The URL that was there should have had it's own snippet, which should have been blank, as there is no real content on that URL. Also, a page (URL) like that should have been buried deep down in the serps in stead of being mixed up with a legitimate (top 10 in this case) listing.
claus - Thanks for posting that. It illustrates the problem exactly.
I've received several Stickies suggesting that we report specific examples of this problem to Google, and I think it's important that we do.
I'm guessing the firstname.lastname@example.org is the best email address, and I'd reference this thread, GoogleGuy, and WebmasterWorld, hoping that someone will take a serious look at what's going on.
| This 75 message thread spans 3 pages: 75 (  2 3 ) > > |