Forum Moderators: phranque

Message Too Old, No Replies

Googlebot has found links to /forum//index.php?

Can't find code that works to return 301

         

AndyA

4:25 pm on Sep 26, 2006 (gmt 0)

10+ Year Member



Due to an incorrect config file for my forum, the board path was originally http:/ /mydomain.com/forum//index.php? etc. The double slashes are wrong, and Google is indexing both the /forum// and the /forum/ pages, my server is returning a 200 OK for both paths. I've corrected the config file, but I'm certain this is causing yet another problem with duplicate content on my site, and I would like to return a 301 for all requests for /forum// and redirect to the correct /forum/.

I found a code that jpMorgan posted to correct this:

# Fix double slashes in URL
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)
RewriteRule .* /%1/$2 [R=301,L]

But I can't get it to work on my server. I have code in place to redirect requests for folder/index.html to eliminate the index.html, and to get rid of the www. in the domain. I'm wondering if there's a conflict somehow with these three conditions.

Any help would be appreciated, Googlebot is spending a LOT of time spidering my forum, and I'm certain this is a problem.

jdMorgan

7:05 pm on Sep 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try including the full URL in the redirect:

# Fix double slashes in URL
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)
RewriteRule .* [b]http://www.example.com[/b]/%1/$2 [R=301,L]

If that's not it, please describe how you tested, what the results were, and how those result differed from your expectations. Include relevant data from your server error log and access log.

Jim

AndyA

7:28 pm on Sep 26, 2006 (gmt 0)

10+ Year Member



Added full URL, still not working. Since the config file has been fixed, this is only a problem for someone who knows about the /forum//index.php path. (Like Googlebot.) When I type in the URL with the double slash, I would like for the server to redirect to the same URL with just one slash, sending a 301 code.

There are no errors in the log being generated. I did delete the .php? to see if that would generate an error, and it returned a 404, so it would appear that is working properly.

Here's a section of the log:

/forum//index.php?
Http Code: 200 Date: Sep 26 14:10:26 Http Version: HTTP/1.1 Size in Bytes: 9386
Referer: -
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/tile_back.gif
Http Code: 200 Date: Sep 26 14:10:26 Http Version: HTTP/1.1 Size in Bytes: 682
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/logo4.gif
Http Code: 200 Date: Sep 26 14:10:26 Http Version: HTTP/1.1 Size in Bytes: 7754
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/atb_help.gif
Http Code: 200 Date: Sep 26 14:10:26 Http Version: HTTP/1.1 Size in Bytes: 617
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/atb_search.gif
Http Code: 200 Date: Sep 26 14:10:26 Http Version: HTTP/1.1 Size in Bytes: 576
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/atb_members.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 685
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/atb_calendar.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 627
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/nav.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 1072
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/nav_m.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 53
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/tile_sub.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 672
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/bf_nonew.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 1412
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/spacer.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 43
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/lastpost.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 255
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
/forum//style_images/1/br_nonew.gif
Http Code: 200 Date: Sep 26 14:10:27 Http Version: HTTP/1.1 Size in Bytes: 1441
Referer: [mydomain.com...]
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

¦
¦
¦
***THIS IS WHERE I DELETED THE .php? TO SEE IF AN ERROR WOULD BE GENERATED:
/forum//index
Http Code: 404 Date: Sep 26 14:10:56 Http Version: HTTP/1.1 Size in Bytes: 1905
Referer: -
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

Once you're in the /forum// path, everything generated from that point on has the double slashes, if you enter without the double slashes, you don't get any files with them.

jdMorgan

10:51 pm on Sep 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you have other working rewrite rules?

If not, then you will need to add the second line below, and also possibly the first line -- both preceding any rewriterules:


Options +FollowSymLinks
RewriteEngine on

Jim

jdMorgan

10:57 pm on Sep 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I found something else wrong with the code above, so here's the whole mess. Again, you may or may not need the first line:

Options +FollowSymLinks
RewriteEngine on
# Fix double slashes in URL
RewriteCond %{REQUEST_URI} [b]^/([/b].*)//+(.*)
RewriteRule .* http://www.example.com/%1/$2 [R=301,L]

The first line may cause an error if it is present but not needed, or needed but not present. It depends on your server set-up.

Also, there may be some differences between Apache 1.x and Apache 2.x -- What version are you on?

Jim

AndyA

1:20 am on Sep 27, 2006 (gmt 0)

10+ Year Member



Jim,

Apache 1.3.37 is what's on my server.

Options +FollowSymlinks
RewriteEngine on

This is listed in htaccess prior to the redirect conditions.

Made the changes, and still no go. I do have other conditions in htaccess that are working properly, this is the only one that is misbehaving now. I think some of the others were written by you as well. THANK YOU! You have been a huge help.

I noticed when internal links are clicked, the double slash goes away, but if Googlebot enters the site from a double link and then checks for the same URL with a single link, or vice versa, it gets the same page. I'm pretty sure this is the cause of my indexing trouble with Google, which is why I feel the server must do a 301 redirect on the double slash regardless of what the entry URL is.

[edited by: AndyA at 1:26 am (utc) on Sep. 27, 2006]

jdMorgan

1:53 am on Sep 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache basically ignores double-slashes -- that is, it acts as if they were just a single slash. So the only function required is to redirect the search engine spiders to the correct URL to avoid a (minor) duplicate-content problem.

After thinking about it some more, I changed my mind; The code you first posted will work fine, and is actually better because it will catch a double-slash in the leading position, whereas the "correction" I posted won't.

I don't know why this won't work for you. The only things that come to mind are:

1) Flush your browser cache (Temporary Internet Files) before testing.
2) The code must be located where it will be executed for requests for /forum. If /forum is an aliased directory, then the code may have to go into that directory if it won't work in your Web root directory.

Jim

AndyA

2:20 am on Sep 27, 2006 (gmt 0)

10+ Year Member



I changed the rewrite back to the original one, and put it in an htaccess file in the forum folder. It is redirecting now, but it goes back to the index page for the domain.

[my...] domain.com//?

That is what is shown in the browser address bar. Progress, at least.

AndyA

2:35 am on Sep 27, 2006 (gmt 0)

10+ Year Member



I think it may be working now. I'm not sure what happened, but I removed the rewrite from the htaccess, and when I tried a link with double slashes, it redirected back to the forum:

http:/ /mydomain.com/forum/?

I guess the? at the end of the URL doesn't matter. I'll try a few more bad links tomorrow and see what it does. I'll let you know. Thanks again for your help, Jim.

jdMorgan

2:54 am on Sep 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you move the code into /forum/.htaccess, you'll have to modify the substitution URL:

Options +FollowSymLinks
RewriteEngine on
#
# Fix double slashes in URL
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)
RewriteRule .* http://www.example.com[b]/forum[/b]/%1/$2 [R=301,L]

Jim

AndyA

3:35 am on Sep 27, 2006 (gmt 0)

10+ Year Member



It's definately working now. I tried one of the long URL strings generated by a thread, and it redirected and removed the double slashes, with a 301, and then loaded the correct thread with one slash. It just has the domain URL, no /forum/ and it's working for now, so I think I'll leave well enough alone.

I'll watch it for the next few days to make sure it behaves itself.

Thanks again, Jim. You truly are a lot of help and a genius with these redirects!

Peter

1:23 pm on Sep 27, 2006 (gmt 0)

10+ Year Member



There is something I don't understand in this rule, which has carried through from the first post: why are we rewriting to /%1/$2 and not to /%1/%2

Am I right in thinking that $2 is in fact undefined in this rule?

Peter.

jdMorgan

1:35 pm on Sep 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Peter,

Yes, correct.

It's carried through from the first post because you're the only one who noticed it!


RewriteRule .* http://www.example.com/forum/%1/%2 [R=301,L]

would be correct.

I must have annoyed the typo-deities this week...

Jim

AndyA

2:35 pm on Sep 27, 2006 (gmt 0)

10+ Year Member



I made the change Peter pointed out, and it is still working perfectly. Googlebot is going crazy in my forum, spent a lot of time there last night and is back this morning.

Hopefully, this will correct another (albeit minor) issue with internal linking.

It's truly amazing what you can do with rewrites. Thanks again to everyone for the assistance.