Forum Moderators: phranque
Does anyone have experiece in the "URLs followed area?" There was nothing in the help area that would fit this situation. One thought I had was to remove the redirect and let the 6 links land on the 404 page which Googlebot has not had a problem with yet that I know of.
Thanks.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /billy('¦\%(25)*27)s-stuff\.ht [NC]
RewriteRule ^billy http://www.example.com/billys-stuff.html [NC,R=301,L]
To explain the pattern:
The requested URL starts with "billy" followed by either an unencoded single quote "'" or en encoded single quote "%27" or a multiply-encoded sequence, where the "%" itself has been encoded one or more times, followed by the code for an apostrophe. So this should catch "'", %27 (singly-encoded) %2527 (doubly-encoded), or %25252C (multiply-encoded).
Hopefully, that will catch all cases, so you don't have to worry about it.
Important: replace the broken pipe "¦" characters in the RewriteCond pattern above with solid pipe characters before use; Posting on this forum modifies the pipe characters.
I put the start of the URL-path, "billy," into the RewriteRule as well. This prevents the server from wasting time processing the RewriteCond if the URL doesn't start with a path-part that indicates it might need to be corrected. Whatever your URLs might be, put the characters up to the first single quote into the RewriteRule to make it as selective as possible so your server won't waste time on unnecessary checking of the RewriteCond.
To prevent problems in the future, don't use any characters except a-z, A-Z, 0-9, hyphen, and underscore in URLs -- Just don't do it. That's how this trouble gets started, because the HTTP spec does not give Webmasters complete freedom to choose the URL character-set. See RFC 2396 for more information; SOme characters are allowed in URLs, some are allowed in query strings appended to those URLs, and some are not allowed at all (and must be encoded if they are used).
Jim
[bbbbbbb-bbb.bbb...]
[bbbbbbb-bbb.bbb...]
[bbbbbbb-bbb.bbb...]
[bbbbbbb-bbb.bbb...]
[bbbbbbb-bbb.bbb...]
[bbbbbbb-bbb.bbb...]
They redirect to [bbbbbbb-bbb.bbb...]
Below are the original redirects in my .htaccess file. They are there because another engine used to come looking for these three variations. The last one seems to be what caused the URLs Google came up with:
RewriteRule ^vv/firstname_lastname's_sssssssss_sign_-_backwoods_ggggggg$ /vv/firstname_lastname_sssssssss_signs.htm [R=301,L]
RewriteRule ^vv/firstname_lastname's_sssssssss_sign_-_getaway_&_mmmmmm$ /vv/firstname_lastname_sssssssss_signs.htm [R=301,L]
RewriteRule ^vv/firstname_lastname's_sssssssss_sign_-_great_dddddddd$ /vv/firstname_lastname_sssssssss_signs.htm [R=301,L]
The redirects in the .htaccess file worked well until Google got the variations. Is it possible to use what you had without the .htm. I tried removing \.ht but it didn't seem to go except to the 404 page.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /billy('¦\%(25)*27)s-stuff\.ht [NC]
RewriteRule ^billy http://www.example.com/billys-stuff.html [NC,R=301,L]
I hope I haven't confused you too much with all the variations. I have no idea how Google came up with them.
Yes, I know I should have never used the underscore in the site's URLs. But they date from 1998 when there was no answer as to what was better to use, the - or the _. It has never been a problem before. Now the site is too large to make a change. I've learned search engines have a long memory. They will still look for URLs that havent' been there for years.
I did change the ¦.
Thank you again for all the help.
Rhoda
Here's an example:
# If requested URL-path is not exactly as desired
RewriteCond %{REQUEST_URI} !^/vv/firstname_lastname_sssssssss_signs\.htm$
# then redirect to correct it
RewriteRule ^vv/firstname.+lastname.+s.+sssssssss.+sign http://www.example.com/vv/firstname_lastname_sssssssss_signs.htm [R=301,L]
Jim
The second suggestion worked great. I found one other combination of file requests that also worked with this solution. Another couple looked like they would work but didn't because the first word I needed to be the same in all the variations didn't exist in one of searches or did for two searches but not the third.
The timing was good because overnight googlebot came up with another variation using capitals. I have a section in the .htaccess file to remove capitals so this new variation was also taken care of. But now it's becoming a question where to place things. I have the capitals section in the middle of the .htaccess file but don't know if that is where it really should be. Are the rules in the correct order to process quickly? So far, everything works that is in the file.
Processing speed of the .htaccess file is not an issue yet but I often wonder if I've used the right order or over complicated a rule because I've used an example that worked but might not require as many steps as I've used. I have sections on file redirects, case reduction, your new suggestion, directory changes, www to non www, fix double slashes, space to underscores, remove characters after htm and finally the 404 file redirect to the error page.
Thanks again for the help. I'll be watching to see how long it takes googlebot to remove this new error it found.
Rhoda
Following the external redirects, place your internal rewrites -- again in order from most-specific to least-specific.
There are many cases where similarly-specific rule patterns are mutually-exclusive. In that case, it doesn't matter what order you put those rules in. I'd recommend going with "most-likely-to-be-executed" first.
Finally, a review of the whole thing, with the thought in mind, "Does this order make sense?" is the best protection against unexpected/unexplainable results.
Jim
Next up, slightly less specific rules, and again, force www at the same time, and so on.
Usually, about this time you'll drop in the rule for index file canonicalisation - and again force the www at the same time for those.
The non-www to www redirect is always the last one (in my experience).
After the redirects list out any rewrites. Again, most specific stuff should be first.
[seems that jd types quicker]