Receptional Andy

msg:3793181 | 6:56 pm on Nov 24, 2008 (gmt 0) |
It sounds like your pages might be inadvertently including a session ID. An easy way to see if that's likely to be happening is to browse the site with cookies disabled, and see whether session IDs are appended. If that's the case, then you can turn off that behaviour.
|
nmjudy

msg:3793235 | 7:57 pm on Nov 24, 2008 (gmt 0) |
I'm 99.99% sure that these PHP session ids are not being generated by my site - so I'm just trying to figure out how to turn off the behavior. Will inserting the following code in my .htaccess file, eliminate session ids from being used if they are being used in the link to me from another site? php_value session.use_only_cookies 1 php_value session.use_trans_sid 0 If so, does it make a difference where exactly in my .htaccess file I put the code? My current mod rewrite code for having all index.html pages '301'ing to the folder root is below. Is there a way to redirect any string after "index.html" to just redirect to the folder root (with the exception of anchor tags)? # Redirect requests for index.html in any directory to "/" in the same directory RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1 [R=301,L] #
|
Receptional Andy

msg:3793253 | 8:28 pm on Nov 24, 2008 (gmt 0) |
| I'm 99.99% sure that these PHP session ids are not being generated by my site |
| IMO it's still worth checking see it's an easy test - just view the site with cookies disabled. Another way would be to view the Google cache of an indexed URL with a session ID in it, and then see whether the other links on the page also have a session ID appended. It seems unlikely that another site has linked to your pages with session IDs appended, although anything's possible ;) The htaccess code looks fine, and will work if your host supports it. It will not, however, remove URLs from Google's database. Removing pages with session IDs is best achieved by (permanently) redirecting those requests tp the same page, but with the session parameter removed. You can see examples like this one [webmasterworld.com] over in the Apache forum. Note that even if you stop session IDs being generated, and redirect requests, such URLs can hang around for a long time, since Google is unlikely to request them very frequently - so doesn't discover your redirects very quickly either.
|
g1smd

msg:3793271 | 8:37 pm on Nov 24, 2008 (gmt 0) |
*** Is there a way to redirect ... *** Yes. # Redirect anything with a query string, force www, use same path, and remove all the query string parts.
RewriteCond %{QUERY_STRING} [b].[/b] RewriteRule (.*) http://www.example.com/$1[b]?[/b] [R=301,L] This redirect goes just before your non-www to www redirect. On all of your other redirects add a question mark after the target URL to clear the query string on all of those redirects too. This is needed to avoid a redirection chain for certain requests.
|
Robert Charlton

msg:3793431 | 12:09 am on Nov 25, 2008 (gmt 0) |
It's worth noting that redirecting anything with a query string may mess up your Adwords tracking, so be sure to set up exclusions for your landing pages, if you are using query strings to track your ppc campaigns. Exclusions would be in the form... RewriteCond $1 !^landingpage\.html$ They'd follow the query string rewrite condition above, and precede the rewrite rule. I'll leave it to someone else to give you the final code.
|
steveb

msg:3793438 | 12:16 am on Nov 25, 2008 (gmt 0) |
If you don't have a redirect set up properly, all it takes to get duplicate content trouble like this is for someone to link to one of your URLs with a querystring attached, and then for google to screw up and list the query string URL instead of the correct one. They usually get it right, but they sometimes make a mistake and list the wrong one.
|
nmjudy

msg:3793439 | 12:16 am on Nov 25, 2008 (gmt 0) |
It looks like the session ids may be being generated from a 3rd party social bookmarking script. I was able to use link:www.example.com/directory/index.html?PHPSESSID=d1df7a5b58659817c692854ed9c14ed6ý and found a couple of bookmarking scripts linking to my site through a user login screen. I've copied and pasted how I interpreted what I'm supposed to change. I'm not sure if I was clear about the first set of redirects. Is this correct? Or do I need the query string code BEFORE the index redirect? Sorry...I'm "mod rewrite challenged". Options +FollowSymlinks +Includes All -Indexes RewriteEngine on # # # Redirect requests for index.html in any directory to "/" in the same directory RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1? [R=301,L] # RewriteCond %{QUERY_STRING} . RewriteRule (.*) http://www.example.com/$1? [R=301,L] # # Redirect requests for resources in non-www domains to same resources in www domain RewriteCond %{HTTP_HOST} . RewriteCond %{HTTP_HOST} !^www\.example\.com [NC] RewriteRule (.*) http://www.example.com/$1? [R=301,L]
|
nmjudy

msg:3793444 | 12:20 am on Nov 25, 2008 (gmt 0) |
Robert - thank you for the Adwords warning and workaround.
|
g1smd

msg:3793455 | 12:42 am on Nov 25, 2008 (gmt 0) |
The final code above should work, and the order looks to be correct, but I do think the [NC] should be deleted. That will then allow it to redirect for all hosts that are not exactly www.example.com all in lower case, i.e. it will then be able to redirect for upper case WWW.EXAMPLE.COM hostname. There's a slightly more efficient way of redirecting for named index files (as in the first block of code). The new code also caters for appended port numbers, query strings, and some extraneous included unwanted trailing punctuation. Check recent posts in the Apache forum for details. It would be a good idea to upgrade that first block of code, because as it stands now the above example will issue a double redirect for any URL request that includes both an index filename and any sort of query string data. The index redirect won't be invoked until the next listed query-string redirect has stripped the query string off. Chains should be avoided.
|
jdMorgan

msg:3797019 | 5:34 pm on Nov 30, 2008 (gmt 0) |
Coming late to this party, the following changes would seem to be warranted:
# Redirect requests for index.html in any directory to "/" in the same directory RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.htm[b]l(\?[^\ ]*)?\ H[/b]TTP/ RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1? [R=301,L] # RewriteCond %{QUERY_STRING} . RewriteRule (.*) http://www.example.com/$1? [R=301,L] # # Redirect requests for resources in non-www domains to same resources in www domain RewriteCond %{HTTP_HOST} . RewriteCond %{HTTP_HOST} !^www\.example\.c[b]om$[/b] RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Changes: Modified RewriteCond pattern in first rule to match whether or not a query string is appended. Modified second RewriteCond in last rule (non-www to www redirect) to remove the [NC] flag and end-anchor the pattern. The result is that the redirect will occur unless the hostname is *exactly* "www.example.com" or, in the case of HTTP/1.0 requests, blank. Note that that whole rule can be coded more efficiently in only two lines as:
# Redirect requests for resources in non-www domains to same resources in www domain RewriteCond %{HTTP_HOST} ![b]^(w[/b]ww\.example\.co[b]m)?[/b]$ RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Jim
|
|