Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- 302 Redirects continues to be an issue


japanese - 3:44 pm on Mar 9, 2005 (gmt 0)


soquinn,

One very despicable act of greed and malice against another competitor, webmaster or website is indeed the use of a variant NukeModule based php script. Others like a tuned up CGI, ASP etc can also have a disaterous effect on googlebot. In some cases intentional.

The core of the script was never written to comply with or to be robots friendly. No communication existed between the creators of the script and googlebot engineers, they certainly were not guided by google's engineers that are in control of googlebots algorithm, so that the script may work in harmony with googlebots algorithm. Think about it.

The scripts blackhat gurus quickly became aware that using googlebots ability to cache pages in its crawl, they can get googlebot to cache phantom pages also. It is very easy, just think of the bot as a photo camera.

Here is the process, nice and simple. A 6 Day history.
------------------------------------------
Day 1,
Googles Answer to the Polaroid Instant Camera.
Without a serverside redirect directive googlebot follows a whitehat pure html link, it takes its regular snapshot of the target page that resides at the single URL it was presented within the pure html whitehat link.. In many cases the apache server would also give googlebot a status code depending on any update or file size change has occurred since its last visit. Though this process may not be exactly what it should be. Don’t forget that googlebot also does not carry any referrer information when it crawls, though it can, it simply does not and a brilliant suggestion has been pointed out by “stargeek” on another thread.

DAY 2,
Google, ingratiatingly, presents and declares to the press that its bot can now follow PHP.

Drinks, Champaign, and Bourbon pour out of expensive bottles into the crystal glasses of fat cigar smoking executives in googleplex. The sonerous rapturous applause could be picked up by overhead passing sonic satellites.

Day 3,
The Blackhat Webmasters answer to The Polaroid Instant Camera.
He puts on his page H**P://WWW.BLACKHAT.COM/GO-PHP?=WHITEHAT.COM%I%AM%GONNA%GETCHYA%2F The anchor as Whitehat (Note the absent www)

The whitehat puts on his page H**P://WWW.BLACKHAT.COM/ The anchor is Blackhat. They have negotiated to recciprocate links.

The conniving blackhat webmaster exchanges links with a whitehat webmaster. The blackhat webmaster has 200 links in his links page and the whitehat webmaster invariably follows googles guidelines. More value is passed to the blackhat. The machiavellian blackhat webmaster easily notices the greenness of the whitehat webmaster and in contrast to the whitehats pure html link he is getting, the blackhat places a directive script instead to point to his serverside and not to the URL of the whitehat.

And here is the how the revolting metamorphoses of the maggot link the blackhat webmaster exchanged for a nice value link. And bear in mind, the anchor link is in html and it looks as if it contains the keyword required by the whitehat. The blackhats link is actually pointing to a file called a redirector, a very efficient and totally stoic script that has no mercy or compassion, its function is to create a result and a venomous and residual effect against the target page. It is totally powerless against a browser with a human clicking it, it is a timbomb waiting to explode in the face of the whitehat webmaster and the trip wire is waiting for ill prepared googlebot and the new boy in town, the clumsy msnbot.

Day 4,
Googlebot detects the pure html link on the whitehats page and rapidly goes to verify the existence of the domain the link points to, the blackhat webmasters. googlebot knocks on the door of the apache server of the blackhat. The blackhats server opens the door with a GET and appropriate status code then tells googlebot GO FETCH, “good dog” and don’t forget to follow the links. Googlebot now sees the new maggot link on the page of the blackhat, the link tells googlebot to go to the go-php redirector, the redirector tells googlebot (the virtual Polaroid camera) that the link has a temporary location and it should take its snapshot at h**p://whitehat.com/. (“Hang on a minute”, the “actual” link is the disgusting php directive H**P://WWW.BLACKHAT.COM/GO-PHP?=WHITEHAT.COM%I%AM%GONNA%GETCHYA%2F and it has a temporary location and it is not in my index, and the location is H**P://WHITEHAT.COM/ , I must have missed it before?) google’s gullible bot falls for it hook line and sinker. HOW CAN THE STUPID LINK BE A PAGE. Internet etiquette tells us that a link points to an existing page, here the link has no existing page. The bot was not given the proper URL of the whitehat, it was given the above longer URL with the “I am gonna getchya” syntax via the serverside directive that the bot was designed to obey. Googles gullible bot proceeds to GET the URL without the www and we all know that that is a completely independent offshoot domain of its legitimate www version. During this process another dastardly trick is played against the unsuspecting whitehats website. A dynamically generated “zero second” meta refresh page is generated that points to the whitehats index page to compliment the obnoxious and unpredictable effects of the 302 temporary redirect directive. Google themselves are not in total control of how their bot behaves when confronted with this unbelievable complex procedure.

The stinking trick against the webmaster here is the residue of the process, a meta refresh. This will reside somewhere in the blackhats system and a path is created to keep it alive and kicking for the bots. It is unclear how googlebot behaves in this kind of environment of a double instruction to the whitehats vulnerable index page. But something does happen and it is this unpredictable event that is being exploited. Statistically googlebot will generate a small proportion of new URL's that look like the blackhats directive in its monthly udates and it could be a duplicate of your index page.

Googlebot now has enough information to generate a new page with the compliments of the offending php directive by placing its snapshot of the whitehats index page as a new entity in its results or to demote the whitehats index page as being duplicate content of the LOCATION INSTRUCTION in the php directive of the blackhats server.H**P://WWW.BLACKHAT.COM/GO-PHP?=WHITEHAT.COM%I%AM%GONNA%GETCHYA%2F. And don't forget that googlebot was told that this PHP links location is h**p://whitehat.com/ without its www but we all also know that it will show your index page and not resolve to the legitimate www version unless you 301 the shotened url to resolve to the www version.

This also explains why over the past year hundreds of thousands of shortened URL’s appeared and confused webmasters.

I may have made errors in my explanation of the above, it is too long and too complicated to explain in one post so please accept the general process, make corrections and assumptions yourself. I can assure anybody that reads this, you do not want a link like the above pointing to your site.

Day 5,
The armageddon update.
h**p://www.whitehat.com/ loses its number 1 ranking position and drifts into total oblivion in google's results.

Day 6,
Whitehat webmaster found dead. Committed suicide.


Thread source:: http://www.webmasterworld.com/google/28329.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com