Forum Moderators: coopster
Googlebot had 5 days to try and crawl 3232 duplicates of my site. The damage? 11 gigs The session ID's were fixed 4 days ago and Googlebot is continuing to visit pages with the old session URLs. I'm concerned that Googlebot will continue to consume my bandwidth at the same prodigious rate. (This is a 3 gig a month site -- moving to 70 gigs a month is not cool.)
So, I need to 'fix' Googlebot, while minimizing further damage. Temporarily, the main offenders of my bandwidth consumption are some 5-10 meg files for which I am now serving a 404 error. (They are .sit and .zip, so it's surprising google is crawling them at all.)
I'm so frazzled right now, that I'm having trouble working through the best thing to give Google. 404s? 301s? I'd love to hear possible solutions, both theoretical and technical.
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{QUERY_STRING} PHPSESSIONID [NC]
RewriteRule (.*) /$1? [R=301,L] ;)