Forum Moderators: open

Message Too Old, No Replies

Googlebot and dynamic urls (once again)

will google index urls with parameters coded between slashes?

         

cidrolin

4:02 pm on Feb 25, 2003 (gmt 0)

10+ Year Member



I am developing a small php content-management system. I implemented it on a small site that was previously static. At first I generated normal GET-method urls of the kind :
[volavoile.be...]

I soon noticed that GB would no longer index my content. So I found the "slash-separated parameters" solution on this forum, and my urls presently look like :
[volavoile.be...]

Needless to say, the sid is present only if the client refuses cookies, which means that it must be inserted at the entry point (so that the script can check whether the client accepts cookies or not, and not loose the session in any case).

On the day I put on line the modified program, I recorded *several hundred* GB hits. GB activity has now come to a more normal rate, but only the sitemap (main entry point) gets indexed...

Any idea?

SebastianX

4:30 pm on Feb 25, 2003 (gmt 0)

10+ Year Member



>the sid is present only if the client refuses cookies
Robots don't eat cookies. Try something like this:

$userAgent = getenv("HTTP_USER_AGENT");
if (stristr($userAgent, "Googlebot")) [dump sid]

Googlebot's user agent is
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

cidrolin

8:14 am on Feb 27, 2003 (gmt 0)

10+ Year Member



yes, precisely, so as robots don't eat cookies I need the sid in the url if I wish to avoid a new session for each hit from a robot (or a cookie-hating visitor).

It's a basic session-management routine, but it is besides the point of why Google won't take the url...

aspdesigner

10:33 am on Feb 27, 2003 (gmt 0)

10+ Year Member



Because you have the session ID in the URL.

The "slash" URL you provided is even worse! By including the Session ID at the end, you are presenting what would appear to Google as an infinite # of pages!

That is probably why you got a large # of Googlebot hits, and then the poor little robot finally decided your site was a bottomless pit and gave up!

Remove the session ID from your URL (at least for robots), and you should be OK.