Forum Moderators: open

Message Too Old, No Replies

Google and SID's

How do we clear them out?

         

trillianjedi

11:17 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some of you may remember that back in April we had a slight issue with SID's being issued to googlebot (I believe that someone else here also suffered google the same indignity but can't remember who).

Those URL's with the SID's have now got into googles new index on -fi. I was hoping that it would clear. It means that freshbot is now coming to the site and following 36,000 URL's all of which point to about 20 pages because of the SID's on the end!

Will google work this out at some point or is some manual intervention required do you think?

TJ

Brett_Tabke

11:23 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



sid = session id...

jpjones

12:28 pm on Jun 18, 2003 (gmt 0)

10+ Year Member



Manual intervention would be the quickest and assured way of dealing with it.

You could program the site so that it performs some checks before sending out a page, e.g.
A) recognises Googlebot as the useragent
B) checks to see if Googlebot has requested a URL with?sid= at the end of it
C) Issue a 301 redirect to the page WITHOUT the?sid

This will then tell Googlebot that the page it originally requested has been moved permenantly. This should then have the effect of removing that request from the index.

Of course, the other way you could clear out the index is to remove your whole site from the index by exclusion of googlebot by robots.txt, but this isn't generally a good idea ;)

JP

trillianjedi

1:51 pm on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Many thanks jpjones - just testing that now.

Obviously, we don't really want to block google, although if it comes to it and we're out of the index for a month we may just have to do so as this is eating bandwidth like nobodies business.

Thanks for clarifying my acronym Brett....

TJ