Forum Moderators: open

Message Too Old, No Replies

dynamic websites and cookies

         

lmgoblue

1:30 pm on May 13, 2003 (gmt 0)

10+ Year Member



Does google have difficulty indexing dynamic websites? Or sites that require cookies?

We are using a content management system that serves up dynamic pages and our site also requires cookies. Our logfiles show that googlebot is visiting our site but we don't seem to ever appear in their search engine.

chris_f

2:18 pm on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld lmgoblue,

GoogleBot is better at indexing dynamic sites, however, it does not accept cookies. This could be a problem. Is there no way of viewing your site without a cookie?

Chris

lmgoblue

7:24 pm on May 15, 2003 (gmt 0)

10+ Year Member



Hi Chris,

Thanks for your response. It is currently impossible to view our site without cookies turned on. However, over a year ago, our developers added code to check the HTTP_USER_AGENT and allow anything with "Googlebot" to enter the site.

I'm wondering if maybe this code isn't working correctly.

hyperion

9:35 pm on May 15, 2003 (gmt 0)

10+ Year Member



Imgoblue,

I think the code is working correctly, and that's your problem - serving different content to googlebot than to a visitor is cloaking, at least from the technical standpoint, and that means you will be dropped for spamming the index.
I did the same thing about a year ago, with no-spammy intentions, I just did not want googlebot to become entreched in my session-ids. But, sadly, googlebot cannot detect good intentions, so I got dropped from the index ;-(.
Having something like if(strstr("Googlebot", $HTTP_USER_AGENT)) anywhere in your website is a grave error, imho.
Try to make your site browsable without a cookie.
What do you need the cookie for? If its sessions, only assign one after the user requested a page/funtion that uses them (In my case, it was a company page with an online shop, the first version assigned a session on ever pageview, now I only assign it when somebody actually puts a product in a cart, so spiders can crawl the page without problems)

lmgoblue

6:09 pm on May 16, 2003 (gmt 0)

10+ Year Member



Thanks hyperion. Looks like we're going to have to do something about the cookies and make our site browseable without them.

WebGuerrilla

6:16 pm on May 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think the code is working correctly, and that's your problem - serving different content to googlebot than to a visitor is cloaking, at least from the technical standpoint, and that means you will be dropped for spamming the index.

That is not true. Detecting Googlebot for the purposes of serving your content without a session ID or a cookie is not cloaking and it will not get you banned. If you were doing that and you dropped from the index it was a coincidence.

Cloaking in the eyes of Google is serving different content to spiders. The absence of a session id or a cookie does not create different content.

You can find GoogleGuy's comments on the subject here [webmasterworld.com]

hyperion

9:17 pm on May 16, 2003 (gmt 0)

10+ Year Member



WebGuerilla, that's what I thought too, but I got dropped from the index in the next update after I changed to not giving Googlebot a sessionid (and that was the only change on the site) and have been reincluded immediately after I dropped detecting Googlebot, so, empirical evidence suggest at least for me that you may fall through the cloaking filter if you use such a method.
Technically, and that was what i wanted to imply by saying that Googlebot cannot detect good intentions, it is very hard to tell the difference between cloaking and "nice use" of Googlebot detection. I'd say Google sometimes visits the sites without the id "Googlebot", and if the content is different from visitig with agent string "Googlebot", you are suspicious. Maybe if Google can detect (again, not by a human editor, but by a algorithm) that the difference is okay, because you are using standard session-id names like PHPSESSID, ID or something like that (I had a non-standard one), the cloaking-filter will not affect you. So, it may be okay if you serve the same page without the sessionid in the links, but if you have the page without session for Googlebot, and something like "Sorry, turn your cookies on, moron" for all other User Agents not accepting cookies, I do not see how Googlebot should be able to discern this from cloaking, since not only the urls in the links are different, but the content too.
GoogleGuys comments say that dropping sessions is okay for him, but as far as I can tell, he is not from the programming department, and what does it help if the PR-department thinks is not "against the policy", but the algorithm in Googlebot does not get the difference.

Just my 2 cents.