homepage Welcome to WebmasterWorld Guest from 54.237.184.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
Intrigued
Wondered if the screen.width + 'x' + screen.height was a crude method of d
HocusPocus




msg:678238
 11:00 am on Jan 12, 2003 (gmt 0)

I do most of my web work in a particular sector. I regularly check the SERPs of a particular phrase to see the movers and shakers. To my dismay, 'xsite.com' has become #1 almost overnight.

The site has a PR 6 with 200 back links, which is good compared to most of the competition, but I donít think worthy of #1. Naturally I wanted to find out how they did it.

Looking at xsiteís source in a browser there was nothing untoward. Loads of JavaScript and style sheet includes, typical of a large corporation site. In fact there was nothing in the code to suggest that the developers had SEO in mind, or cross browser compatibility, when developing.

Intrigued, I investigated further and used this siteís spider simulator-
[searchengineworld.com...] The spider returned status 200 but picked up nothing in the page.

I then used another spider simulator, this time it returned this code

<html>
<head>
<script language="javascript">
function doRes () {
document.screen. resolution.value = (screen.width + 'x' + screen.height)
document.screen submit();
}
</script>
</head>
<body onLoad="javascript:doRes();">
<form name="screen" action="/index.asp?" method="post">
<input type="hidden" name="resolution" value="">
</form>
</body>
</html>

I figured that this pageís only use was to determine the screen res of the viewer (Maybe they should have been looking at window size?) to be able to position their absolute <Div>s .

Viewing the page with an older browser with JavaScript disabled return the above code. Refreshing the page gave the correct content.(though messy)

I wondered if the screen.width + 'x' + screen.height was a crude method of determining whether the viewer was a spider or a browser. I tried to fool the script by doing a GET to the script rather than a post
eg xsite.com/index.asp?resolution=0x0
xsite.com/index.asp?resolution=800x600
but this didnít work.

My inquisitiveness has got the better of me hence the post. Iím just wondering whether the developers have used a method that inadvertently gives food for Google. I would really like to see the asp source to see whatís going on but I realise this is just about impossible. Whatís bugging me is that if a spider is to GET this sites content all that must be returned is the code above? Is this correct? To the best of my knowledge I believe that Spiderís do not read JavaScript and they also cannot do POSTs, is this correct?

Any thoughts, on these matters appreciated.

 

Brett_Tabke




msg:678239
 4:59 am on Jan 13, 2003 (gmt 0)

Yep, sounds like they are browser sniffing. I'd look at the cached page in Google to see what they have fed google.

GeorgeGG




msg:678240
 6:05 am on Jan 13, 2003 (gmt 0)

Looks like to me they are logging the resolution with
the form submit....

I use something like this to log the JavaScript
'document.referrer' with the form submit.

<form method="POST" action="http://mydomain.com/">
<input type="text" name="search" size=80>
<script TYPE="text/javascript">
document.write("<input type=\"hidden\" name=\"REF\" value=\""+document.referrer+"\">");
</script>
<br>
<input type="submit" name="submit1" value="Submit Request">
<input type="reset" value="Reset Request">
</form>

Reason being the server doesn't always get/have the 'HTTP_REFERER'
and this just gives another go at it if JavaScript is enabled...

On another page I log the screen resolution using the
JavaScript 'SRC' attribute (JavaScript include file).
and 'document.URL', 'navigator.userAgent', 'navigator.platform',
'screen.colorDepth' or 'screen.pixelDepth' and 'document.referrer'.

None of my JavaScript logs ever show any info from Google
or any se...

GeorgeGG

HocusPocus




msg:678241
 11:12 am on Jan 13, 2003 (gmt 0)

Thanks for the replies.

So to confirm, Google doesn't read Javascript. The site is doing some kind of sniffing.

With reference to the cached Versions, the site has two entries in Google SERPs, #1 and #2.

The #1 cached version is the actual site. That is, no optimisation, html code typical of a large corporation. How is Google managing to get to the content of the page? As stated before, viewing the site with JavaScript disabled returns the Screen Resolution Sniffer. Refreshing the same page gave the correct content, incorrectly rendered.

The #2 cached version goes to the normal Google Cached Header then the page redirects to the site. Viewing Source of the cached version shows the Google Cache Header and the Screen Resolution Sniffer.

I just don't think they had SEO in mind when developing, and I doubt if the company is Spider sniffing. Does Googlebot send two GETs requests? The first returning the sniffer and a status 200, then resends another GET for the content?

Iím still find it difficult to understand how Googlebot is managing to index their site, let alone rank it #1. Any thoughts appreciated

Brett_Tabke




msg:678242
 12:27 pm on Jan 13, 2003 (gmt 0)

It's just that robots can't execute the javascript.

<argh> big typo: can'T</argh>

[edited by: Brett_Tabke at 1:04 pm (utc) on Jan. 15, 2003]

HocusPocus




msg:678243
 12:54 pm on Jan 13, 2003 (gmt 0)

...
Can robots POST forms?

Brett_Tabke




msg:678244
 12:55 pm on Jan 13, 2003 (gmt 0)

No - not unless they've been programmed to, but not search engine bots.

HocusPocus




msg:678245
 2:34 pm on Jan 13, 2003 (gmt 0)

So the bot sends a GET request to index.asp, the server returns just the Javascript Screen Size Sniffer. The code is being read but the form is not being posted, so the bot doesn't get to the see the content of the index.asp.

So, how has Google spidered and cached the page? The only way I can see this working is if the bot sends two GET requests.

First GET, the server returns the Screen Size Sniffer.
Second GET, something like

if referrer = self
{
if Posted Query String
then parses Posted query the screen size string,
serve content matching screen size
else
a bot or browser with no JavaScript
serve default content
}

This will only work if bots send out two GETs to index a site. Is this normal?

On alltheWeb the same site is #1,
However the site details are
[anysite.site...] (410 B)
which is the .asp with just the screen sniffer in it.

Excuse my inexperience in this field, but can anybody explain what is going on?

volatilegx




msg:678246
 9:21 pm on Jan 13, 2003 (gmt 0)

Can robots POST forms?

No - not unless they've been programmed to, but not search engine bots.

Bots can and do post forms regularly... Ever heard of the bots that look for formmail scripts? Just a security note...

Brett_Tabke




msg:678247
 2:03 am on Jan 14, 2003 (gmt 0)

Right, a little confusion there. I think were talking search engine bots volatilegx. Non of the majors will post forms that I know.

On the other hand, it is quite easy to build a bot to post data.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Cloaking
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved