Forum Moderators: Robert Charlton & goodroi
www.example.com/directory/
www.example.com/directory/index.html?PHPSESSID=d1df7a5b58659817c692854ed9c14ed6ý
My site is entirely static, with the exception of one directory that uses Postnuke. That section of the website has been there for several years and never caused a problem. From what I can see, Postnuke does not generate URL strings like this.
In the past, I've had a problem with paid membership websites framing my site for their content. All my pages have the "break out of frames" javascript code in them.
Could this PHPSESSID string be generated from another website? If so, could Google be indexing both versions of the page and dropping my pages for duplicate content? Obviously, Webmaster Tools sees the pages as 2 different pages with duplicate titles.
I'm one of the people whining over lost rankings from Nov 2 and still trying to figure out what caused it. In October, I implemented a sitewide change from pointing pages from /directory/index.html to /directory/ .
Is this just another buggy Webmaster Tools thing or can I do a workaround in mod rewrite to combine these kind of pages too?
Will inserting the following code in my .htaccess file, eliminate session ids from being used if they are being used in the link to me from another site?
php_value session.use_only_cookies 1
php_value session.use_trans_sid 0
If so, does it make a difference where exactly in my .htaccess file I put the code?
My current mod rewrite code for having all index.html pages '301'ing to the folder root is below. Is there a way to redirect any string after "index.html" to just redirect to the folder root (with the exception of anchor tags)?
# Redirect requests for index.html in any directory to "/" in the same directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1 [R=301,L]
#
I'm 99.99% sure that these PHP session ids are not being generated by my site
IMO it's still worth checking see it's an easy test - just view the site with cookies disabled. Another way would be to view the Google cache of an indexed URL with a session ID in it, and then see whether the other links on the page also have a session ID appended.
It seems unlikely that another site has linked to your pages with session IDs appended, although anything's possible ;)
The htaccess code looks fine, and will work if your host supports it. It will not, however, remove URLs from Google's database.
Removing pages with session IDs is best achieved by (permanently) redirecting those requests tp the same page, but with the session parameter removed. You can see examples like this one [webmasterworld.com] over in the Apache forum. Note that even if you stop session IDs being generated, and redirect requests, such URLs can hang around for a long time, since Google is unlikely to request them very frequently - so doesn't discover your redirects very quickly either.
Yes.
# Redirect anything with a query string, force www, use same path, and remove all the query string parts.
RewriteCond %{QUERY_STRING} [b].[/b]
RewriteRule (.*) http://www.example.com/$1[b]?[/b] [R=301,L] This redirect goes just before your non-www to www redirect.
On all of your other redirects add a question mark after the target URL to clear the query string on all of those redirects too. This is needed to avoid a redirection chain for certain requests.
Exclusions would be in the form...
RewriteCond $1 !^landingpage\.html$ They'd follow the query string rewrite condition above, and precede the rewrite rule. I'll leave it to someone else to give you the final code.
They usually get it right, but they sometimes make a mistake and list the wrong one.
I've copied and pasted how I interpreted what I'm supposed to change. I'm not sure if I was clear about the first set of redirects. Is this correct? Or do I need the query string code BEFORE the index redirect? Sorry...I'm "mod rewrite challenged".
Options +FollowSymlinks +Includes All -Indexes
RewriteEngine on
#
#
# Redirect requests for index.html in any directory to "/" in the same directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1? [R=301,L]
#
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
#
# Redirect requests for resources in non-www domains to same resources in www domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
There's a slightly more efficient way of redirecting for named index files (as in the first block of code). The new code also caters for appended port numbers, query strings, and some extraneous included unwanted trailing punctuation. Check recent posts in the Apache forum for details.
It would be a good idea to upgrade that first block of code, because as it stands now the above example will issue a double redirect for any URL request that includes both an index filename and any sort of query string data. The index redirect won't be invoked until the next listed query-string redirect has stripped the query string off. Chains should be avoided.
# Redirect requests for index.html in any directory to "/" in the same directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.htm[b]l(\?[^\ ]*)?\ H[/b]TTP/
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1? [R=301,L]
#
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
#
# Redirect requests for resources in non-www domains to same resources in www domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.c[b]om$[/b]
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Note that that whole rule can be coded more efficiently in only two lines as:
# Redirect requests for resources in non-www domains to same resources in www domain
RewriteCond %{HTTP_HOST} ![b]^(w[/b]ww\.example\.co[b]m)?[/b]$
RewriteRule (.*) http://www.example.com/$1? [R=301,L]