homepage Welcome to WebmasterWorld Guest from 23.22.179.210
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 713 message thread spans 24 pages: < < 713 ( 1 [2] 3 4 5 6 7 8 9 10 ... 24 > >     
302 Redirects continues to be an issue
japanese




msg:748407
 6:23 pm on Feb 27, 2005 (gmt 0)

recent related threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]



It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.

Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.

Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.

Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.

Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.

There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.

Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.

Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.

I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.

If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.

I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.

[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]

 

1milehgh80210




msg:748437
 4:32 am on Mar 9, 2005 (gmt 0)

One thing WILL eventually cure this problem -Human Nature.
Spammers and scammers are just as greedy as the next guy, expect more and more of then to jump on the 302 hi-jack bandwagon. (This technique is becoming less and less of a secret everyday.) :)
Eventually it will be impossible to ignore.

Ledfish




msg:748438
 5:35 am on Mar 9, 2005 (gmt 0)

I have a site getting hammered by this, of course I've tried to report it to google with a DMCA violation, however that hasn't done a thing.

Also as someone else said, Yahoo and MSN don't seem to have a problem with this.

The site that is hammering me with this is directory using a CGI script, is higher page rank, and has been on the net longer than I have, so they win and I lose. Of course my way to attempt to combat this was to create a new site which I did back in July, however it is sandboxed, so they win, I lose.

So the end result as far Google is concerned is they win, I lose and Google solution is buy adwords. Again they win and I lose.

MarkHutch




msg:748439
 5:44 am on Mar 9, 2005 (gmt 0)

Ledfish: Try Zoloft. It works wonders on problems like these! :)

Although many say Jack Daniels will do the same job for a lot less.

figment88




msg:748440
 5:58 am on Mar 9, 2005 (gmt 0)

I think the negative publicity generated by webmasters picketing the googleplex is the way to go. We just need to come up with some catchy signs:

Free the 302s

Hijack This!

Amber alert ... missing website

Emmett




msg:748441
 6:07 am on Mar 9, 2005 (gmt 0)

I just noticed something. I was doing a link:site search on msn and it came up with the php redirect to the legit site showing as what it is... A *link* to the site.

Guess what... Do the same thing on yahoo and it comes up as a *link* to the site.

Seems like a simple solution to me. Just treat it as a link but don't pass page rank. When something isn't being used for the purpose it was created for you have to adjust.

Edited: Bad Grammar.

Reid




msg:748442
 6:43 am on Mar 9, 2005 (gmt 0)

from what i understand , google handles 302 this way because of doorway pages.

If index is a doorway then pagerank goes to index from the content it points to. At least thats how it's supposed to work.

This was found out and exploited by hijackers in 2001 and is now called 'googlejacking'
Google thought they could take the 'wait and see' approach and the 'spammers will cause their own downfall' approach.
I seriously thought the Alegra update would deal with this but apparently not.
The reason google handles 302's this way is to deal with the way the web was built (doorway pages) when they came on the scene.
Today the web is being built around googles policies. Maybe it's time to penalize doorway pages.
Sure you will be penalizing a lot of legitamate web pages but you can bet that if google did this, doorway pages would be a thing of the past right quick. At least webmasters would have control of their own destiny.
After all doorway pages used to handle by HTML what scripts could easily do today.
also for pages that are depecrated they could come up with a self-depecrating tag rather than a redirect that steals the fire from the previous location.
Hijackers could never exploit a self-depecrating statement.

soquinn




msg:748443
 6:56 am on Mar 9, 2005 (gmt 0)

Japanese, this is my first time reading about this and must say it is very interesting stuff… is there a way to check if a site has been affected by a 302 hijacking/redirect? We see some directories or pay-per-click engines with links that are like:

something.com/jump.php?path=yoursite.com%2F

are all redirects potentially harmful? We've run many ads where we seem to find those kinds of links as referrals. Also, I thought engines don’t value redirects in their link popularity?

stargeek




msg:748444
 7:22 am on Mar 9, 2005 (gmt 0)

I think the negative publicity generated by webmasters picketing the googleplex is the way to go. We just need to come up with some catchy signs:

Free the 302s

Hijack This!

Amber alert ... missing website

Picketing is a good start, but that could be ignored, what wouldn't be ignored is 4 or 5 people getting arrested for having a sit in at the googleplex, or at a google headquarter elsewhere.

I for one am willing to spend a few hours locked up for this (as an activist i've been jailed longer for less).

is anyone else willing to start organizing something?

Emmett




msg:748445
 7:38 am on Mar 9, 2005 (gmt 0)

If index is a doorway then pagerank goes to index from the content it points to. At least thats how it's supposed to work.

So the page itself goes to 0 pagerank?

That would explain why one of my sites disappeared when someone redirected to my home page.

Thanks for introducing me to the term "googlejacking". I'm finding lots more info on it now.

If this continues, google search will be nothing more than directory sites and scrapers. There must be some cutoff in the algorythm right now though. Otherwise you surely wouldn't see SERPS listing major sites like yahoo. Maybe they say, if you have more than X PageRank we won't allocate your PageRank/Listing to the referring url.

So, is the solution to get half a million backlinks?

Reid




msg:748446
 8:37 am on Mar 9, 2005 (gmt 0)

actually the problem has grown considerably in recent years. Because of 'shortened url's'
Many websites use 302 redirects to track outgoing clicks, this results in them ranking for the content of the page they are linking to.
Anyone who links to you with a 302 redirect (default scenario) is googlejacking you and stealing your pagerank, I suppose this is why google has put little emphasis on pagerank recently, but they seem to be also taking YOUR position in the SERP as well, here is where the real problem lies.
We have 2 choices here.

1. pray to the google gods and wait for an answer before this happens to us.
(according to japanese guy the scumbags have already automated this exploitation so it's only a matter of time before we all fall victim)

2. automatically check referrals and deny anything coming from a 302.
(like shooting yourself in the foot since this is quite common)

claus




msg:748447
 10:45 am on Mar 9, 2005 (gmt 0)

Just a quick comment for now:

>> If index is a doorway then pagerank goes to index from the content it points to

Reid, i believe you're thinking about the right thing but your wording is unfortunately wrong. It's not "doorway pages" and it's not "PR" - rather, it's "intro pages" (like flash intros, logo's and such) and "content". Google would take the URL from the "flash intro" and append the content of the page it redirected to (after the intro) to this URL.

larryhatch




msg:748448
 11:04 am on Mar 9, 2005 (gmt 0)

Reid:

" 2. automatically check referrals and deny anything coming from a 302. "

A very interesting thought. What would the .htaccess code be for this?

What are the upsides and downsides to such a thing. assuming it works?

The vast majority of links to my site are the honest <a href= types.
So, I'd lose a little traffic, but most of that would be via the scumbags.

Would there be negative impact on my page rank, # of incoming links,
or SERPs positioning? - Larry

internet ventures




msg:748449
 11:39 am on Mar 9, 2005 (gmt 0)

This is indeed a massive problem but I don't see how webmasters of sites that use redirects are to blame.

The majority don't use redirects to hijack someone's page for PR and content they use them to track outbound link clicks.

It is Google that has the problem not some innocent webmasters that have been using redirects for years without problems.

larryhatch




msg:748450
 11:49 am on Mar 9, 2005 (gmt 0)

Thats fine.

Now how do I ban all those innocent outbound-click-click counting
webmasters without harming my own site? -Larry

walkman




msg:748451
 12:35 pm on Mar 9, 2005 (gmt 0)

jk3210,
it doesn't matter if Google is doing technically right or not. They have to adapt and see how 302 are being used, and how pages are being affected. This is a nightmare. Notice how GG hasn't made a comment on this since December (on another forum).

Leosghost




msg:748452
 12:44 pm on Mar 9, 2005 (gmt 0)

Whom do you think has the greater authority inside Google ...GG or the "suits" that run adsense and adwords?
..especially since the IPO ..

mitsu




msg:748453
 12:50 pm on Mar 9, 2005 (gmt 0)

goodness me
id consider myself fairly experienced webmaster but all this is very confusing, something the spammers want to stay that way...

appreciate the detail and research into your post japanese but im still at a loss to explain it to someone else...and certainly how to make sure it doesnt happen to my site

if this problem can be summed up so an average internet user/webmaster can understand then maybe that is the first step to awareness, step 2 - outrage!

(without being egotistical) if i find this confusing then most people wouldnt know there was a problem in the first place

this seems to be part of the problem, spammers etc operate at a level that only a minority of ethical webmasters/seos really understand what they are doing, they then try to explain to the slow learners (me) how the rug is being pulled from under us, meanwhile the spammers get richer and the public keep clicking from SERPS in a state of uninformed bliss.

google is obviously aware of pretty much everything that is wrong with their product but how to fix it?
I for one wouldnt want to work there these days. Seems like a lot of leaks in the google oceanliner right now, most of them below deck but some like this "leak" are well below deck and more serious.

Abandon ship?
MSN has just left port, looking pretty much unsinkable in the long run...
of course thats what they said about the titanic

hmm, maybe i should just go back to landscaping...

[edited by: engine at 4:00 pm (utc) on Mar. 10, 2005]

walkman




msg:748454
 1:04 pm on Mar 9, 2005 (gmt 0)

"Whom do you think has the greater authority inside Google ...GG or the "suits" that run adsense and adwords? ..especially since the IPO .. "

who brings home the bacon? If the two issues are substantially connected (302 & revenue), then we're really in trouble.

Leosghost




msg:748455
 1:08 pm on Mar 9, 2005 (gmt 0)

If the two issues are substantially connected (302 & revenue), then we're really in trouble.

Another one walks into the light :)

geekay




msg:748456
 1:40 pm on Mar 9, 2005 (gmt 0)

I'm with Larry. I would consider simply banning all 302 referrers. I'm being kept too busy closing down "innocent" redirect linkings to my sites. New ones seem to come up every now and then. I would like to be able to stop worrying about 302's.

By the way, it looks like allowing my page to be framed by an other site might be a lesser evil than using frame breaking scripts. If one fakes the Googlebot as user agent when requesting a page framed in that way, a 200 response is returned -- but the page content is only "frame enabled browser required" or a similar text.

If a browser makes the same request my full page is returned within that frame. But maybe handling such framed pages (with or without breaking scripts) doesn't cause Google the same indexing problems as the 302's do.

Import Export




msg:748457
 2:37 pm on Mar 9, 2005 (gmt 0)


I guarantee you can get press over this and put the heat on the major engine to either comment or fix it. But you're probably not going to get press by writing free press releases based on your little sites affected by a 302, and you're probably not going to get it by adding little messages to your site.

Sure there have been attempts at getting press on this, and 2 people did an "ok" job, but no one has been successful...

In your head, name off the sites you have that are victim to these issues. Ok, name off the name of the sites you know that fellow webmasters have that are victim... Now, why do you think nothing has been fixed or commented on?

If "these hijackers" unknowingly picked a couple news authorities (esp. IT news) and setup the same 302 redirects to them, there is no way in 302 hijacking hell that these powerhouses won't unleash the onslaught of bad press you have been unsuccessful in generating. -And you can believe when they do it, it will be heard in the distance.

Frequent




msg:748458
 3:04 pm on Mar 9, 2005 (gmt 0)

Very interesting and informative thread.

I can't believe I read the whole thing...I'm stuffed.

And I absolutely agree that until this happens to one of the "big boys" (a big boy new source would be even better) it will not be addressed.

Unfortunately, if it did happen to some huge corporation Google would simply "fix it" for them rather than fix it for everyone. Just my opinion of course.

FREQ---

diddlydazz




msg:748459
 3:23 pm on Mar 9, 2005 (gmt 0)

wow - this is the thread i've been waiting for ;o)

claus - thanks for the links (was trying to find those)

some great points been made in this thread (and others)

this point made by Emmett

(I seem to remember some press last year about doubling the total pages indexed or something like that) so now that they're indexing the generated php scripts etc it's balooned the problem

brought to my mind the possible (and maybe obvious) connection between allegra (ie, millions more sites processed into the real index) and the 302 problem, although in Feb it was allegra that made me sit up and pay attention!

(302s weren't on my "to look at" list)

Over the next week i have set aside some time to play with the data centers some more and try and find out if things are still being juggled/incremented (initially it does), if they are then maybe G are trying to deal with this problem as we speak.

maybe they were too busy having holidays and didnt see the potential problem either ;o)

great thread!

dazz

<edit>spelling, etc</edit>

claus




msg:748460
 3:30 pm on Mar 9, 2005 (gmt 0)

The full story of Google and 302s
Fine print: I may want to republish this on my own site later on (usually when i say this i don't even bother), but otherwise it's one of those "you saw it on WebmasterWorld first" posts, so it's not intended for republishing all across the web. Yes, it means: Please don't republish if you didn't write it, which you didn't.

:)
...just clearing up a few misunderstandings first, then you'll get the full lowdown on this stuff.

You can't ban 302 referrers as such

Why? Because your server will never know that a 302 is used for reaching it. This information is never passed to your server, so you can't instruct your server to react to it.

You can't ban a "go.php?someurl" redirect script

Why? Because your server will never know that a "go.php?someurl" redirect script is used for reaching it. This information is never passed to your server, so you can't instruct your server to react to it.

Even if you could, it would have no effect with Google

Why? Because Googlebot does not carry a referrer with it when it spiders, so you don't know where it's been before it visited you. As already mentioned, Googlebot could have seen a link to your page a lot of places, so it can't "just pick one". Visits by Googlebot have no referrers, so you can't tell Googlebot that one link that points to your site is good while another is bad.

You CAN ban clickthrough from the page holding the 302 script - but it's no good

Yes you can - but this will only hit legitimate traffic, meaning that surfers clicking from the redirect URL will not be able to view your page. It also means that you will have to maintain an ever-increasing list of individual pages linking to your site.

For Googlebot (and any other SE spider) those links will still work, as they pass on no referrer.


This is what really happens when Gbot meets 302:

Here's the full lowdown. First time i post it all. It's extremely simplified to benefit the non-tech readers among us, and hence not 100% accurate in the finer details, but even though i really have tried to keep it simple you may want to read it twice:

  1. Googlebot visits a page holding eg. a redirect script
  2. Googlebot indexes the content and makes a note of the links
  3. Links are sent to a database for storage until another Googlebot is ready to spider them. At this point the connection breaks between your site and the site with the redirect script, so you (as webmaster) can do nothing about the following:
  4. Some other Googlebot tries one of these links
  5. It receives a "302 Found" status code and goes "yummy, here's a nice new page for me"
  6. It then receives a "Location: www.your-domain.tld" header and hurries to that address to get the content for the new page.
  7. It deliberately chooses to keep the redirect URL, as the redirect script has just told it that the new location (That is: your URL) is just a temporary location for the content. That's what 302 means: Temporary location for content [w3.org].
  8. It heads straight to your page without telling your server on what page it found the link it used to get there (as, obviously, it doesn't know - another Googlebot fetched it)
  9. It has the URL (which is the link it was given, not the page that link was on), so now it indexes your content as belonging to that URL.
  10. Bingo, a brand new page is created (nevermind that it does not exist IRL, to Googlebot it does)
  11. PR for the new page is assigned later in the process. My best bet: This is an initial calculation that is done something like: PR for the page holding the link less one.
  12. Some other Googlebot finds your page at your right URL and indexes it.
  13. When both pages arrive at the reception of the "index" they are spotted by the "duplicate filter" as it is discovered that they are identical.
  14. The "duplicate filter" doesn't know that one of these pages is not a page but just a link. It has two URLs and identical content, so this is a piece of cake: Let the best page win. The other disappears.

So, essentially, by doing the right thing (interpret a 302 as per the RFC [w3.org]) Google allows another webmaster to convince it's bot that your website is nothing but a temporary holding place for content.

Further, this leads to creation of pages in the index that are not real pages. And, you can do nothing about it.

[edited by: claus at 3:45 pm (utc) on Mar. 9, 2005]

diddlydazz




msg:748461
 3:37 pm on Mar 9, 2005 (gmt 0)

It receives a "302 Found" status code and goes "yummy, here's a nice new page for me"

lol

nice post claus

idoc




msg:748462
 3:37 pm on Mar 9, 2005 (gmt 0)

Thanks Claus,

You are always spot on. I thought about addressing a portion of that but didn't have the energy to get into it this morning. ;)

japanese




msg:748463
 3:44 pm on Mar 9, 2005 (gmt 0)

soquinn,

One very despicable act of greed and malice against another competitor, webmaster or website is indeed the use of a variant NukeModule based php script. Others like a tuned up CGI, ASP etc can also have a disaterous effect on googlebot. In some cases intentional.

The core of the script was never written to comply with or to be robots friendly. No communication existed between the creators of the script and googlebot engineers, they certainly were not guided by google's engineers that are in control of googlebots algorithm, so that the script may work in harmony with googlebots algorithm. Think about it.

The scripts blackhat gurus quickly became aware that using googlebots ability to cache pages in its crawl, they can get googlebot to cache phantom pages also. It is very easy, just think of the bot as a photo camera.

Here is the process, nice and simple. A 6 Day history.
------------------------------------------
Day 1,
Googles Answer to the Polaroid Instant Camera.
Without a serverside redirect directive googlebot follows a whitehat pure html link, it takes its regular snapshot of the target page that resides at the single URL it was presented within the pure html whitehat link.. In many cases the apache server would also give googlebot a status code depending on any update or file size change has occurred since its last visit. Though this process may not be exactly what it should be. Don’t forget that googlebot also does not carry any referrer information when it crawls, though it can, it simply does not and a brilliant suggestion has been pointed out by “stargeek” on another thread.

DAY 2,
Google, ingratiatingly, presents and declares to the press that its bot can now follow PHP.

Drinks, Champaign, and Bourbon pour out of expensive bottles into the crystal glasses of fat cigar smoking executives in googleplex. The sonerous rapturous applause could be picked up by overhead passing sonic satellites.

Day 3,
The Blackhat Webmasters answer to The Polaroid Instant Camera.
He puts on his page H**P://WWW.BLACKHAT.COM/GO-PHP?=WHITEHAT.COM%I%AM%GONNA%GETCHYA%2F The anchor as Whitehat (Note the absent www)

The whitehat puts on his page H**P://WWW.BLACKHAT.COM/ The anchor is Blackhat. They have negotiated to recciprocate links.

The conniving blackhat webmaster exchanges links with a whitehat webmaster. The blackhat webmaster has 200 links in his links page and the whitehat webmaster invariably follows googles guidelines. More value is passed to the blackhat. The machiavellian blackhat webmaster easily notices the greenness of the whitehat webmaster and in contrast to the whitehats pure html link he is getting, the blackhat places a directive script instead to point to his serverside and not to the URL of the whitehat.

And here is the how the revolting metamorphoses of the maggot link the blackhat webmaster exchanged for a nice value link. And bear in mind, the anchor link is in html and it looks as if it contains the keyword required by the whitehat. The blackhats link is actually pointing to a file called a redirector, a very efficient and totally stoic script that has no mercy or compassion, its function is to create a result and a venomous and residual effect against the target page. It is totally powerless against a browser with a human clicking it, it is a timbomb waiting to explode in the face of the whitehat webmaster and the trip wire is waiting for ill prepared googlebot and the new boy in town, the clumsy msnbot.

Day 4,
Googlebot detects the pure html link on the whitehats page and rapidly goes to verify the existence of the domain the link points to, the blackhat webmasters. googlebot knocks on the door of the apache server of the blackhat. The blackhats server opens the door with a GET and appropriate status code then tells googlebot GO FETCH, “good dog” and don’t forget to follow the links. Googlebot now sees the new maggot link on the page of the blackhat, the link tells googlebot to go to the go-php redirector, the redirector tells googlebot (the virtual Polaroid camera) that the link has a temporary location and it should take its snapshot at h**p://whitehat.com/. (“Hang on a minute”, the “actual” link is the disgusting php directive H**P://WWW.BLACKHAT.COM/GO-PHP?=WHITEHAT.COM%I%AM%GONNA%GETCHYA%2F and it has a temporary location and it is not in my index, and the location is H**P://WHITEHAT.COM/ , I must have missed it before?) google’s gullible bot falls for it hook line and sinker. HOW CAN THE STUPID LINK BE A PAGE. Internet etiquette tells us that a link points to an existing page, here the link has no existing page. The bot was not given the proper URL of the whitehat, it was given the above longer URL with the “I am gonna getchya” syntax via the serverside directive that the bot was designed to obey. Googles gullible bot proceeds to GET the URL without the www and we all know that that is a completely independent offshoot domain of its legitimate www version. During this process another dastardly trick is played against the unsuspecting whitehats website. A dynamically generated “zero second” meta refresh page is generated that points to the whitehats index page to compliment the obnoxious and unpredictable effects of the 302 temporary redirect directive. Google themselves are not in total control of how their bot behaves when confronted with this unbelievable complex procedure.

The stinking trick against the webmaster here is the residue of the process, a meta refresh. This will reside somewhere in the blackhats system and a path is created to keep it alive and kicking for the bots. It is unclear how googlebot behaves in this kind of environment of a double instruction to the whitehats vulnerable index page. But something does happen and it is this unpredictable event that is being exploited. Statistically googlebot will generate a small proportion of new URL's that look like the blackhats directive in its monthly udates and it could be a duplicate of your index page.

Googlebot now has enough information to generate a new page with the compliments of the offending php directive by placing its snapshot of the whitehats index page as a new entity in its results or to demote the whitehats index page as being duplicate content of the LOCATION INSTRUCTION in the php directive of the blackhats server.H**P://WWW.BLACKHAT.COM/GO-PHP?=WHITEHAT.COM%I%AM%GONNA%GETCHYA%2F. And don't forget that googlebot was told that this PHP links location is h**p://whitehat.com/ without its www but we all also know that it will show your index page and not resolve to the legitimate www version unless you 301 the shotened url to resolve to the www version.

This also explains why over the past year hundreds of thousands of shortened URL’s appeared and confused webmasters.

I may have made errors in my explanation of the above, it is too long and too complicated to explain in one post so please accept the general process, make corrections and assumptions yourself. I can assure anybody that reads this, you do not want a link like the above pointing to your site.

Day 5,
The armageddon update.
h**p://www.whitehat.com/ loses its number 1 ranking position and drifts into total oblivion in google's results.

Day 6,
Whitehat webmaster found dead. Committed suicide.

idonen




msg:748464
 3:48 pm on Mar 9, 2005 (gmt 0)

So it seems to me that the easy solution would be for google to look at the Location header in 302 pages, and if it points to a different domain, just don't store that page in their db.

And... it's not all evil or malicious webmasters doing this. I'd say most web developers don't know about the problem, compounded by the ease of doing a Response.Redirect in asp and other server-side systems (as well as the promotion of that functionality in just about all the books). I unwittingly did 302 redirects (for counting the outgoing clicks) for years, up until about a month ago, because I had no idea what was going on. I've since changed it all to direct links, and in the future when I want to count clicks I'll do 301 redirects, which is not what they're supposed to be used for, but at least shouldn't cause any pain.

walkman




msg:748465
 4:30 pm on Mar 9, 2005 (gmt 0)

cont. from [webmasterworld.com...]

Hi GoogleGuy,
I have personally sent several e-mails for my sites and other sites. The problem is not (IMO) one site, two sites or a million sites because 99% of them don't do it to hurt us, they do it to shorten the URL and track clicks. The problem is how 302's are handled by Google.

Although it maybe technically correct, it's not working and I suspect many sites are hurt. All Christmas season I ranked high with &filter=0. Only if I could convince the tens of millions searching google to use that.

For example: a link like this: http++++directory*com/go.php?http+++mysite.com on Google has my index page cached. Apparently, Google thinks that directoy*com has a page (with http++++directory*com/go.php?http+++mysite.com as URL), which is IDENTICAL to my index page. If this isn't a "dupe", I don't know what it is.

I searched Google for "302 redirect yahoo" and found how Y! is handling it. It makes sense because it deals with the 302 itself, not how the name of scripts etc.

Now I don't know if G, ignores the supplemental pages or not, or if it penalizes both sites, but if only one is ignored, this only helps the larger, more established sites. Small sites with very few links or PR can easily be outranked by these types of links.

GoogleGuy, it's the right thing to do morally, and from a business perspective makes sense since it improves your SERPS, even if it's just 0.26% better ;).

japanese




msg:748466
 4:53 pm on Mar 9, 2005 (gmt 0)

Idonen,

Google neglegted to accept the loophole. They thought their prodigious googlebot was indomitable and unconquerable. An Omniscient virtual entity that devoured content and links. Its creators awestruck, spellbound and captivated by their superlative achievements in creating the modern virtual Frankenstein.

Well we all now know that the cavernous appetite of googlebot has unlimited capacity and it is going out of control. It has become a monster entity on the internet represented by a multitude of googlebots like army ants on the war path. Stripping everything in its path and laying barren many websites in its wake.

claus




msg:748467
 4:55 pm on Mar 9, 2005 (gmt 0)

>> If "these hijackers" unknowingly picked a couple news authorities (esp. IT news)

Actually i'll admit to doing exactly that. But, of course i didn't have the PR to turn over CNN and sites like that - never managed to highjack a single one of them. Wasn't my intention either.

I do use 302s myself, although i'm a bit more careful than most. I've got them all robots.txt-ed and haven't created new ones for more than a year because of this stuff. As i update very frequently that really makes running one particular website a whole lot more time consuming, and it also limits what features i can offer my users. Finally, it's not very good for the SE spiders because they see a lot of links that they just can't follow due to "robots.txt".

The 302 is really just the most common way to do stuff like this, eg. when you run a site that (a) tracks which links are the most popular, (b) don't want their links scraped, (c) have links stored in a database to check for 404s, or (d) whatever. I'm totally convinced that a large portion of highjackers don't even know that they are highjackers. And i'm just as convinced that others do it deliberately.

This is equally as large a problem for webmasters wishing to use redirects legitimately on their site [webmasterworld.com] as it is for other webmasters that get highjacked. There are a lot of valid reasons to use a redirect script, it doesn't have to be highjacking related at all.

So, seen from both sides, the way Google treats these 302's creates a lot of problems, and it simply sucks (being RFC compliant or not). These (and meta's etc.) should simply be treated as a plain link, nothing else.

This 713 message thread spans 24 pages: < < 713 ( 1 [2] 3 4 5 6 7 8 9 10 ... 24 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved