| This 75 message thread spans 3 pages: < < 75 ( 1  3 ) > > || |
|Meta Refresh leads to ...|
... Replacement of the target URL!
Say, i have a webkatalog. A listing's link is actually loading another page from my server that returns status 200 to googlebot and contains just this code:
<HTML><HEAD><META HTTP-EQUIV="Refresh" CONTENT="0;URL=http://www.webmasterworld.com/"></HEAD></HTML>
If i'd have a higher PR than WebmasterWorld, and if WebmasterWorld wouldn't have a google directory link, my page would make it to #1 at google for searches that formerly returned the WebmasterWorld index page at #1.
So if i'd search for WebmasterWorld the #1 snippet would read:
|WebmasterWorld News and Discussion for the Independent Web ... |
News and Discussion for the Independent Web Professional. WebmasterWorld
Highlighted Posts Mar. 17, 2004. Mar. ... WebmasterWorld Info and Utilities: ...
Doh! That's how google works currently. Found tons of examples. I know, that's nothing new. But it's annoying and ...
I call this broken!
it's the lowest common denominator, and therefore everyone should follow it.
Otherwise, we'll end up with everyone acting like AOL and Microsoft,
ignoring standards and demanding special handling to assure compatibility
in the full context, notice that i said lowest
further, since the result is a textual representation
of the crawl, what
might you be contemplating?
|special handling to assure compatibility |
the results presented in the serps are a
transformation/interpretation of the original
communications between the crawler and one or
therefore, i say again, rfc2616 is not the limiting factor
in creating this technical solution.
Furthermore, the alleged propensity of Microsoft to
ignore standards is overstated. For example, in a
widely reported test result, it was alleged that IE
was badly broken with respect to a specific rfc. It
turns out from analysis of the actual network traces
used by the testing organisation that there was not
a single browser that implemented the feature in
the same way over the wire. Indeed, IE was the
only browser that implemented the feature
nearly correctly. The presumption was that
since IIS did not respond to the other browsers
in the expected manner that Microsoft must necessarily
have cheated with their two products. From
examination of the particular anomalies displayed
by the other browsers, it is more likely that
the other browsers were written with the aid
of examination of the apache code base for the
server side of that feature. Thus, the same
errors were made in the client side code.
To this day, it is widely reported that Microsoft
ignored the standard in this matter when the
opposite is the truth. The assertion was made
after what I would consider to be be inadequate
testing, and never corrected or retracted.
Per the original topic, I found several sites that
"link" to mine, but in the following way:
VISIBLE on their page is [MYSITE.net...]
(which looks great)
But, when I mousover it, it find this actual URL
Clearly, this jumps to some "object" probably
a file, which calls up my real URL. Questions:
1) Would Google count this as a link to my
site, even a poorly rated one for the sake
of links counts or PR?
2) How about Yahoo, MSN and the rest?
3) Would any search engine go to the trouble
to learn what OBJ=54321 refers to, or just
4) Since my honest full URL shows in TEXT,
(but not as a hyperlink) does _that_ count
as at least some poor kind of link?
I hate it when they don't place a simple
<href= type link.
Any wisdom appreciated. -Larry
Thought I'd revisit this thread. The problem has been popping up a bunch on Yahoo too. Here's one of the threads...
Bitg problem with Yahoo
redirecting links cause sites to be dropped
These links go directly to my site but through another domain. Yahoo doesn't seem to recognize these as links and instead thinks of them as the direct URL of the websites. Because I got links from many directories who use this type of linking technique many of my domains are dropped.
You can also check out the links in the Yahoo Q&A post I mention on the above thread, which lead to other Yahoo threads about the problem.
This sounds like a dream come true for pop-up spammers. Find a well branded internet company with a lower PR than a dummy site, create pages on the dummy site to come up high in the SERPS which then refresh to the well branded internet company . As the user passes through the redirect, spawn a hundred popups, scumware downloads etc.
How much liability falls on Google for assisting scumware downloaders to infect innocent users' machines with their disgusting parasites?
It would be better if Google could treat redirects to a different domain as if they had a "noindex" robots meta tag. Why can't it do this?
<<I think a lot of the hypothetical scenarios being spun out here suffer from the same flaw... they posit the existence of high PageRank throw-away domains.>>
In my experience the sole function of the majority of these sites ( which are indeed throwaway )is to host on their pages adwords placed by "affs" for their eventual targets....
They are so fast and easy to set up that if the adwords get a few thousand clicks per day "google" wins ..the "aff" wins their customer (depending on conversion rate ) wins ....only you lose and "google" pretends that it had no conscious hand in the process ...( the proceeds how ever it had both hands in ) .....
<<VISIBLE on their page is [MYSITE.net...]
(which looks great)
But, when I mousover it, it find this actual URL
worse is finding that the supposed link to you ( or anyone of the 100 others supposedly on the page ) all link to your competitor ..but are surrounded by your and everyone elses "snippets".....
<<hoping that someone will take a serious look at what's going on.>>
they know exactly what's going on and it suits them just fine this way especially the adwords division ( and the IPO )......
Why haven't they anwered ever on this subject?
Because they couldn't "spin" out an answer ...
They would have to lie ....and they won't come here and outright lie .....
And that is the real explanation for why this happens ...
On all of the search engines where it occurs ........
>they know exactly what's going on and it suits them
>just fine this way especially the adwords division
>( and the IPO )......
It's a long way from the meta refresh and redirect problem to such a conspiracy theory. Sorry, but i can't follow. No idea how google would profit from indexed meta refresh pages or messed redirects.
I don't have an idea why neither Google nor Yahoo cleans the mess but i'm sure, it has nothing to do with AdWords or IPO or capitalism ... i can't see any relation.
|Maybe have a separate search tab for these sites |
That's a bit radical - making the net easier and better for most of its users? ;-)
I like the idea of that:
Google search 1: Do you want to search original sites and go straight there?
Google search 2: Do you want to search to get directories and then search the directories to get to subdirectories........etc. etc. and then find the site you eventually get to hasn't got any details worth looking at on it.
check this crap out. #1 result is the worst case of clocked spam I have ever seen
[edited by: DaveAtIFG at 4:38 pm (utc) on April 30, 2004]
[edit reason] Sorry, no specifics! [/edit]
I'm confident you have all filed your spam reports at [google.com...] or email@example.com and submitted your complaints to firstname.lastname@example.org
And now you're simply getting impatient to see some results... ;)
"I'm confident you have all filed your spam reports"
The point of this goes way further than spam. The www is chock full of spam. The important point here is that hard built sites are losing their pagerank and listings to redirecting hijackers.
If Google actually did something with these other than 86 them, I would keep on sending them.
But after spending many hours of my time sending them over the past year, every single site that I reported was still in the SERPS when I recently checked them. And these were egregious offenders. What a waste of time. . .
|It's a long way from the meta refresh and redirect problem to such a conspiracy theory. |
Yes... I wish we could keep the conspiracy theories and noise somewhere else. That's not what this thread is about.
It would be nice if there were an acknowledgement from Google and Yahoo that there is a problem; but, as I'm coming to understand, if there is a bug, I don't think either could publicly admit it.
I'm assuming by now that both Yahoo and Google understand that something isn't working right... or that they should reconsider if they think these redirects are OK.
Well if G and Y don't care enough about redirect hijacks to address this... somewhere it will wind up being a "black eye" for them as soon as somebody jacks a site considered important enough to draw widespread attention. I don't really understand the apparent complacency.
It's not only meta redirects - i've seen this happen for every type of redirect, copying, or mirroring, as listed in msg #10 here: [webmasterworld.com...]
Also, it seems as if 301'ed URLs are now treated a bit differently than before. I see known 301 links in the SERPS now, earlier these were gone and only 302 links showed up. You have to search explicitly for them or switch off the dupe filter to see them, though
Has anyone considered writing to a magazine or two? A little bad publicity will see this problem fixed PDQ.
|Well if G and Y don't care enough about redirect hijacks to address this... |
Tim has acknowledged the directory link problem in msg #52 of the Yahoo thread on this subject:
|They are both different but related issues (meaning the link and redirect) we have known about them both and are working on fixing them.... |
Tim is talking about the directory links/redirect problem, as well as the problem with 301 redirects, both of which were mentioned in the Yahoo thread. I'm guessing that the situation is the same at Google, but the timing may be such that they are a little bit more cautious about acknowledging the problem.
I think that both engines care... and that bad publicity wouldn't be helpful to anybody. In this particular case, I think they could have done a much better job of communication, but the kind of stories spun on this thread would make any engine reluctant to admit imperfection. It's definitely not a conspiracy.
This thread is six weeks old. The right programmer/engineer should have been able to isolate the cause of the problem in no more than a couple of days. The fix would probably require changes to no more than about a dozen lines of code and should take no more than an hour or two. Given the nature of the system, the changes would have to be thoroughly tested - say another few days.
In other words, from the moment the problem landed on someone's in-tray to the moment it could be stamped FIXED, only about a week should have passed.
Bad publicity usually results in problems being stamped URGENT. Since this problem breaks the golden rule that other sites cannot adversely affect yours, it should have been stamped URGENT weeks ago. Clearly that did not happen.
"Since this problem breaks the golden rule that other sites cannot adversely affect yours, it should have been stamped URGENT weeks ago"
That's what I don't understand... this is a *way* bigger issue pointing to a much larger flaw than the "new algo hurting small business" stories that are making the newspapers. With "G" whether they just don't want to openly discuss the issue, or whether they think the few to this point whose sites are affected so far aren't important in the whole picture... I wouldn't know the answer. Maybe the flaw exists because of domain park being included in the mix... who knows outside of the 'plex. I do think now after all this, this issue is common knowledge now to the folks at "G".
>> This thread is six weeks old.
But the problem is older. Like all things Google it's been cooking for a while - i assume it's just being employed full scale now:
September 2003: [webmasterworld.com...] >> no more than a couple of days
October 2003: [webmasterworld.com...]
October 2003: [webmasterworld.com...]
October 2003: [webmasterworld.com...]
Even though it could be down to 20 lines of code (which i believe is a low number), it is the effect of these 20 lines across 4 billion pages that creates the problem, i believe.
I don't think it's easy to solve, as (imho, fwiw, etc) this unintended behaviour is a byproduct of doing something else that is intended (whatever that may be). So, they can't just remove the unintended part without also removing something that is intended, they will have to find another way of doing the "whatever it is" first. >> this is a *way* bigger issue pointing to a much larger flaw than the "new algo hurting small business"
Which, in turn, might explain the silence on the topic from the SE reps.
Edit: This part of the post had advice for each case of redirects. I have posted it in a seperate thread as i don't want to hijack this one.
this is happening to me.... just found this thread
I think the intitial topic of this thread has been clearly defined and its specifics outlined, which points to an issue that is going to loom largely in the near future (IMHO). As a private entity, G has the option of ignoring or addressing such issues as it sees fit. If it does choose to address a particular issue, G may do so at any time of its choosing.
Once the IPO has taken place, I would expect a number of the more savvy investors will keep an eye on WW to track G's "job performance" on a real-time basis, rather than wait for the stock to move up or down (which is a behind-the-curve position). When and if that happens, a failure to address a serious issue might be viewed as a failure of management to recognize and respond to significant issues. Won't G have to improve response time (and the efficacy of those responses) to shore up investor confidence?
That would be a good thing. (Sorry Martha)
|Once the IPO has taken place, I would expect a number of the more savvy investors will keep an eye on WW to track G's "job performance"...... |
Yes, that's an attractive idea. But I think a means will be required to attract greater publicity to problems. Microsoft fix security holes because they are highly publicised - Google is not subject to the same level of bad publicity and so doesn't bother.
Thank you for discovering this issue. Thank you to the rest of you who contributed to this thread also. It took me a while to fully appreciate the nature of the problem and the effects, but this issue could be what has been effecting one of my sites for some time now.
In essence, what we have here is the duplicate content penalty being misapplied, due to copyright infringement and advanced Google spamming techniques. This has probably deep-sixed a number of good sites. It also probably has a lot of webmasters chasing their tail (hmm... over-optimization?.... cross-linking? .... bad neighborhood?).
It is amazing that Google hasn't been able to correct the problem yet. I can't imagine that GG and other Googlers haven't read this thread... I'm sure they have. I can't imagine that they couldn't find an automated way to fix the problem by now... but perhaps it is more complicated than we think.
I'll bet the next major update will include a fix for this.
I found out a few things...
the reason they do it is simply PR transfer, nothing more. It is the ultimate in saavy spam... they dont actually put your content on their site, they put a refresh page/link to your site.... in other words, nothing illegal ...simply a loophole/google exploit to gain PR. For example, I did the search, 'link:the-offending-page.com'... which should've told me what sites link to THAT page, but instead it lists all the sites that link to MY site, with no mention of a link to the page I queried.
In my case, no one at Google responded to my 'spam report' so far. Nor my 'search quality' email. My Adsense advisor hasn't responded either. I posted a new thread here on it, but it never passed administrative review.... I call it 'The Big Quiet':)
Who does something like this? One of you. I talked to him today. The owner of a SEO company... a 'professional'. He's a member here, but I guess he missed this 6 page thread... he had no idea what I was talking about... I guess he'll hear more about it in court.
>>I call it 'The Big Quiet'
and for good reason.Unless google fix the bug we are all in trouble once word gets out.There are a few ways of doing this so detection by the webmaster being harmed can be difficult.
You are lucky to have spotted it but you still face a hard road in court.:(
If you have a well placed site for competitive keywords...and if your site places well because of your content and *not* because of paid for p.r. then this is probably already happening to your site. From what I have seen these guys don't care about p.r. they just buy p.r. What they don't have is the time and imagination and creativity to write compelling but well seo'd content... so they take yours by redirecting.
The way to see the tracks is to really look over your server logs at the 404 error messages because their scrapers *do* misfire. I found one that had the apache setup wrong and I got how his spider handler script works... running from a residential dsl line. I have been banning by i.p. address the servers that scrape my site when they misfire and generate a 404. It seems to be helping, I am becoming visible in Google now again for this site for lesser competitive keywords again. This explains a whole lot of unexplainable penalty theories from alot of good folks. Y seems to have a better handle on this now, at least it seems to me. I don't know why G can't get past this one.
ok... talked with 2 intellectual lawyers... it is not a straight forward copyright issue because nothing is being stolen. HOWEVER, they could go down for various unfair competition practices... and I have a very clear cut case.
Google also responded. And while it wasn't the 'We nailed them to the wall' reponse I wanted, it sounded a little promising.
< Standard generic reply from Google stating that the message would be passed on to engineers. >
I responded with the following... too bad I cant make the links work for you.... but hopefully you will see what is going on with my case.... if you can use any of this info for your own case, you are free to use it and tell them the dogboy sends his regards after you get him on the ground with your knees in his chest:)
I found out more information, and it turns out things weren't exactly as they
seemed... they are worse. They are using a php script, in conjunction with a
refresh tag, to confuse your system into believing that my content is
located on their domain. I know that sounds far fetched, but I can show
you one search that will show you exactly how they benefit:
If you type the following into a search box:
... you will be taken to the following URL:
....This page is supposed to be a list of all the links that Google believes to be pointing at this dynamically created page..... but instead, if you click on these results, you
will see that [their url] is not ANYWHERE on those pages ...moreover, you will
also soon realize this is really a list of sites linking to ME! Those links which Google now associates with [their url] were really links to me.
Do you see the nature of the problem now? They are getting PR for their main site (which in this case is a bogus engine/PR siphon). That is the confusing part.... they arent stealing code, and they arent stealing traffic..... they are stealing my PR, while at the same time, blocking me out of the index. The reason I thought they were mirroring me was because IT WAS ME..... and unfortunately, since they did not actually copy the text, they violate no copyright laws.... in fact, they violate no laws... they are simply exploiting a loophole in your system... and therefore while I now can prove what happened, etc. my lawyers say it is not a copyright issue and therefore I cant petition you under the millennium act, because they are simply linking to me with a refresh.
But for now, between me and you, its unfair. I talked to them. The owner is a savvy spammer.... the name of the company is [SUPER EXPLICIT SEO NAME]... they are a self proclaimed professional SEO company. They are heavily into the [related] industry with lots of other lead generating sites on their servers.... and, they are covering their tracks with a bogus engine. The sites on that engine all redirect through a redirect counter... they are seeded with 1.) their own listings, 2.) affiliate links, 3.) sites like mine that are established. Their domain was registered earlier this year, was linked into a large powerful network, gained enough PR to start a vacuum... and the more sites like mine that it links to, the more PR is drained into it.... and they know this... by the end of the discussion it was obvious he was backpedalling, saying that sometimes people asked to have their sites removed because their listing rank higher than their site does... I donít know... if it was anyone else, or a real engine, Iíd say maybe... but not an SEO on this level... I maintain this was intentional and malicious. The purpose of the parent company is SEO, not search, and he is benefitting by having me held out of the index (not that I have a right, but I think it is your intent to give all the sites a fair shake and let the good ones come to the top... not a redirect) AND he is benefitting by all my incoming link PR, which he should not be entitled to.
My last thought though, is I would love to hear what your bot is seeing when it goes to their page:
I see only:
<META HTTP-EQUIV='Refresh' Content=0;URL='http://[www.MYSITE.com]'>
... and nothing else, no other tags or anything... I think this is where the PHP script is writing something to cloud the issue... maybe even cloaking and showing us that to throw us off... I can tell you one thing though, my mac browser doesnt like it and that tells me something isn't right with this link... its not just a simple refresh IMHO... if it were my guess, Iíd say they were cloaking and showing visitors the refresh and covering tracks again... your bots think they are finding a page, indexing the content, and associating that content with the URL it thinks it found it on... if it were finding the content on my site, it would index that content and associate it with my URL, right? It could be simple php, making something similar to a framed page. Maybe the refresh comes after that? That way the frame would be indexed and then the user would be forwarded on, landing on my URL, but by that time googlebot stops crawling, dropping the rest of the site and only leaving my index content on the other domain.... I have no idea... but let me say, [SUPER EXPLICIT SEO NAME] sure [...] does. So please tell me what I can do to get my site back in. Iím not doing anything because I donít know what to do.
[edited by: ciml at 11:05 am (utc) on July 13, 2004]
[edit reason] Generalised email. [/edit]
Great post dogboy and good luck with it.
you should have seen the one BT axed on submission... it wasn't pretty;)
You may not be able to bring action under copyright laws (personally I'd get new lawyers) but anti-hacking laws might be an alternative. It depends entirely how the legislation is phrased but it is a possibility. And that's criminal not civil law, so the threat of jail time is a possibility.
In addition, if you can prove that a position in Google has value, if someone steals that position, then proving criminal theft is not beyond the realms of possibility. Again there is a threat of jail time.
I don't know how these things work in the US, but in the UK, where a legal precedent may be set (in a test case) the Government will sometimes pick up the lawyers' fees.
Furthermore, if you were to notify Google of your intention to initiate such legal proceedings (against the SEO), I think they'd move heaven and earth to fix the problem before any journalists got hold of the story.
| This 75 message thread spans 3 pages: < < 75 ( 1  3 ) > > |