Forum Moderators: phranque
In order to be crawled better, I want to remove an automatic session ID and an additional tracking parameter - only when a bot visits my site.
(ie. www.example.com/?sessionID=aa162314bDRa53872123&crea=52)
Does anyone know of any good examples or other resources of mod_rewrite being used for this specific situation. I need something that is pretty clear and straightforward.
seasalt
So, you'll need to modify your script to suppress session ID's when the requesting user-agent is a known search engine robot.
An approach where mod_rewrite is useful is to put query string parameters into URL form. In other words, output URLs to all clients that look like plain static URLs, such as:
http://www.example.com/widgets/blue/parm23/userid10001
and then use mod_rewrite to convert that to a form that your script uses when the plain URL is requested:
http://www.example.com/index.php?product=widgets&color=blue&parm=23&id=10001
...Just a dumb example to illustrate the point. You will still need to suppress the session-related variables when a robot request is detected, even with this method. Otherwise, you'll get duplicate pages listed for each crawl, and that can cause big problems.
Jim
$remaddr = $ENV{'REMOTE_ADDR'};
# ... (code snipped)
# Bypass counter for requests from Google
unless ($remaddr =~ /^216\.239\.45\./)
{
open(COUNTER,">$ctrpath") ¦¦ die $!;
print COUNTER ($count);
close(COUNTER);
}
I failed to mention that the session ID is not generated on the initial page requested (whether index or interior page); but is generated in subsequent links from that requested page.
Example:
page requested:
www.example.com/dir1/ (and will appear as such to bots)
links on requested page appear as:
www.example.com/?sessionID=aa162314bDRa53872123&crea=52
www.example.com/dir2/?sessionID=aa162314bDRa53872123&crea=52
and so on....
Would mod_rewrite work in this instance? If so, any examples for a situation like that?
Thanks.
seasalt
Unfortunately, mod_rewrite is of no help whatsover in doing this. It's only good when you want to modify the URL that a browser is asking for, and point the request somewhere else or change its form, for example, from a static-appearing link to a dynamic link to be passed to your script. Mod_rewrite works on the "input" or "request" end of the transaction, not on the "output" or "response".
Jim