Forum Moderators: Robert Charlton & goodroi
However...
My httpd.conf rewrites URLs for googlebot to strip off session ids, etc.
Now, I see all sorts of URLs WITH sessionids showing up in the SERPs again.
I tested my httpd.conf by including my own IP as a bot - my rewrites work perfectly.
So, the only thing I can think of is that Google SERPs are using bots other than googlebot to gather data.
Is anyone else noticing this? Perhaps Mediapartners-Google is also feeding the SERPs. Maybe Adsbot-Google too? I don't rewrite URLs or these 2 bots, but I guess I have to now...
I cannot figure out any other explanation for how rewritten googlebot URLs are getting into the SERPs in a pre-rewritten format...
P.S. I am a victim of the June 4 ranking debacle, and and have been searching for an explanation. Not sure if anyone else suffering from June 4 may discover this is a problem for them too.
Perhaps Mediapartners-Google is also feeding the SERPs. Maybe Adsbot-Google too? I don't rewrite URLs or these 2 bots, but I guess I have to now...
Ever since Google built the Big Daddy infrastructure, all Googlebots use a shared crawl cache. Here's a post from Matt Cutts [mattcutts.com] about that.
Now according to that post, when you are crawled by a spider from another Google service, that crawling "doesn’t queue up pages to be include in our main web index." However, Matt's blog post is also over two years old - and things may have changed, or at least become crossed up somewhere.
I don't understand why this isn't a bigger issue? I lost my rankings on June 4th, and have spent 5 weeks trying to figure out why, and fixing things.
Today, I trip over this, and I am now more resolute than all past possible issues that THIS is the cause. This has a bigger effect than anythign else I'e looked at. I am unwillingly filling the SERPs with hundreds of thousands of pages of duplicate content. That's bad for google, though I know they will filter it out of the SERPs anyway, but it sure as hell is awful for me and the people I've had to lay off.
I've fixed it now, a 5 minute fix to add ADSBOT and MEDIAPARTNERS all throguh my httpd.conf... Now I wait for 10-15 days for google to grab this fix. But it really should be addressed better by Google in their help topics, and I wish more people here knew of it so it could surface from time to time as a possible problem for other webmasters to consider.
I will try to keep this problem alive as a possible thing for other webmasters in trouble to look at.
From where I am standing, this is HUGE!