Forum Moderators: coopster

Message Too Old, No Replies

How to fix Googlebot after Session ID in URL mistake

The damage is done -- method to repair?

         

whoisgregg

8:02 pm on Aug 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So, it happened. Development and production server had different PHP settings and 5 days after uploading an entirely new registration/sign in scheme to a site, I discovered that *some* users were getting session ID's appended to their URLs. To make matters worse, they were receiving a new session ID on every page load. Now, after a weekend of testing different ini_set() variations, I've fixed the session URL problem.

Googlebot had 5 days to try and crawl 3232 duplicates of my site. The damage? 11 gigs The session ID's were fixed 4 days ago and Googlebot is continuing to visit pages with the old session URLs. I'm concerned that Googlebot will continue to consume my bandwidth at the same prodigious rate. (This is a 3 gig a month site -- moving to 70 gigs a month is not cool.)

So, I need to 'fix' Googlebot, while minimizing further damage. Temporarily, the main offenders of my bandwidth consumption are some 5-10 meg files for which I am now serving a 404 error. (They are .sit and .zip, so it's surprising google is crawling them at all.)

I'm so frazzled right now, that I'm having trouble working through the best thing to give Google. 404s? 301s? I'd love to hear possible solutions, both theoretical and technical.

whoisgregg

9:12 pm on Aug 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Of course, I've already read the many back threads on how to *not* screw this up in the first place. :)

But, I couldn't locate any advice on what to do after screwing up -- other than standard stress management techniques.

dcrombie

10:21 am on Aug 26, 2005 (gmt 0)



If you're on Apache I suggest using mod_rewrite. Something like the following should do it (untested):

RewriteCond %{HTTP_USER_AGENT} Googlebot 
RewriteCond %{QUERY_STRING} PHPSESSIONID [NC]
RewriteRule (.*) /$1? [R=301,L]

;)

whoisgregg

5:58 pm on Aug 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Works like a charm! Thanks, dcrombie. :)