Forum Moderators: phranque

Message Too Old, No Replies

5 redirects and counting

Need help!

         

Lorel

12:02 am on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I need advice with redirects of file names that had about 5 changes over the last 3-4 years and were not properly redirected. This is so confusing it took me hours to figure out the sequence and write this out. The site is not ranking for anything but the domain name and I'm sure it's due to convoluted redirection and file name changes.

I had built a small website many years ago with about 12 pages with simple urls with everything located in the root, like this:

ORIGINAL DESIGN:
index.htm
about.htm
authors.htm
products.htm
product1.htm
product2.htm
product3.htm
etc.

(file names had about 3-4 words in the file names)

Someone talked the owner into redesigning the site in WordPress so they could update it themselves. The designer put all products and all pages with info about the site in 2 different directories, Only index and about were left in the root.

REDIRECT #1:
They set up a development site that was not NOINDEXed like this:

dev.example.com/all-products/general/name-of-file-using4-8words-with-hyphens/

This redirect wasn't added until all the other redirects mentioned below because I questioned why they hadn't redirected from the original files. And it was placed [[[below]]] the following instead of above (then they only redirected the products and not the other pages).

REDIRECT #2:
Once the site went live they set up the new URLs like this

example.com/all-productsproducts/name-of-file-using4-8words-with-hyphens/

(notice the "general" folder was left out from the dev. and not redirected)

REDIRECT #3:

They later moved all products into a products folder like this:

example.com/products/name-of-file-using4-8words-with-hyphens/

This isn't too bad but the filenames are unnecessarily long and there is no need to put these pages within directories being only about 12 pages and not likely to increase. So, I want to change all files back to what they were originally without directories.

Notice they also changed all the URLS to extensionless also.

The owner asked me to take the site back over once I told her about the faulty redirects, and to remove it from wordpress as they don't use it and want me to load everything from now on. I have it all done, and almost ready to load.

I know there should only be 1 redirect in htaccess from the old to the new.

My question is -- being as it's been over a year since the site was redesigned and Google has all the new pages indexed, is it safe to redirect from the original filenames to my new filenames, and if so, what if the original and new file names are the same?

It seems to me if I do that, google might penalize the site again for duplicate content for leaving out some of the redirects (content has remained basically the same through all redirects).

Or should I redirect from what Google has indexed the last year to my new (old) file names/locations.

Please help! I really wish I hadn't taken on this project now.

I hope I made sense above.

lucy24

12:32 am on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google doesn't care what filenames you use. Honestly. It may be one of the 11,000 microfactors in their algorithm, but it's so far down the priority list you shouldn't even think about it. (Careful! That's URLs, not links. Getting to a given place in two steps rather than seven is almost always better.)

Pick the names you yourself like best. If using a CMS, it may be better to use something that will work "out of the box" instead of requiring extra plugins that may or may not continue to work.

Once you've picked your favorite names, figure out all other URLs that each page has ever gone by, and redirect comprehensively. (The following is a rough paraphrase of an actual redirect chain spanning several years.)

RewriteRule ^paintings/(more_rats|LucysRats2|other-rats)\.html http://www.example.com/paintings/morerats/ [R=301,L]

Sometimes it will be less confusing to list each URL on a line of its own. It doesn't especially matter, as long as each one leads to a single redirect.

LucysRats2 >> morerats
+
morerats >> refrats
=
(morerats|LucysRats2) >> refrats

In some cases, such as /dev/ directories, it may be better to serve a 410 at the outset to encourage search engines in the belief that this particular URL pattern is gone, gone, gone.

phranque

2:58 am on Oct 23, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



your redirects should be ordered from most specific to most general, with the last redirect being a hostname canonicalization redirect.

all redirect substitution strings should specify the canonical protocol and hostname.

you should have a section for each of the previous url structures that redirects requests for those legacy urls to the current url structure.

all requests should redirect to the canonical url in one hop.

wilderness

12:46 pm on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The owner asked me to take the site back over once I told her about the faulty redirects, and to remove it from wordpress as they don't use it and want me to load everything from now on. I have it all done, and almost ready to load.


I recently was presented with a dilemma where a very small site was created with WP. The WP directory structure and file organization templates are made for everything (one use) from novices to very extensive websites, however was intended for complex blogs rather than functioning websites. Given this day and age, with the many, many blogs, people have become addicted to Word Press.

The easiest solution for me was to simply re-create (similar to what was required when many folks created websites with the old MS FrontPage) the complete web site.

FWIW, many, many people have become spoiled with the new longer file names that operating systems allow. We were much better off (at least from an organizational standpoint) when file names were were restricted to eight characters for the name and four for the extension (early DOS).

Lorel

5:47 pm on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#lucy24

Can you explain what the ( and | stand for? I'm not sure how to write out the redirect chain and I've searched all over the Internet for instructions and can't find it.


RewriteRule ^paintings/(more_rats|LucysRats2|other-rats)\.html http://www.example.com/paintings/morerats/ [R=301,L]

Lorel

5:48 pm on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PS. the finished site is not a CMS. Just plain old htm - like it was in the beginning.

wilderness

6:10 pm on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you explain what the ( and | stand for?


The parentheses encloses a string within its own container.
The | means "or".

lucy24

7:51 pm on Oct 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you explain

RewriteRules always use Regular Expessions. There are tutorials all over the place, but you can start with some basics:
-- () parentheses enclose a group. The group is then treated as a unit-- "this bit is optional" or "this sequence may occur more than once"-- and can also be captured for reuse later, so
(blahblah)
in the pattern (the left side of the rule) becomes
$1
in the target (the right side of the rule)
-- | pipes mean options. You can have a single pipe "A or B" or a series of them "one of these options". In practice, piped alternatives will almost always be enclosed in parentheses because there will be non-piped material before or after.
-- certain characters have special meaning, so they have to be "escaped" with preceding \ (backslash) if you mean the literal character. Most common is
. = any one character
vs
\. = a literal period

When I say
(morerats|LucysPics2|refrats) >> refrats

I really mean something like
RewriteRule ^paintings/(morerats|LucysRats2|refrats)\.html http://www.example.com/paintings/refrats/ [R=301,L]

which in turn means "If the person asks for
paintings/morerats.html
or
paintings/LucysPics2.html
or
paintings/refrats.html
then send them along to
/paintings/refrats/

The form reflects at least three cascading redirects: first from the dopey pagename /LucysPics2.html (yes, I had a series of them, 1 through 7, now variously handled), later to the more useful pagename /morerats.html, which in turn was later changed to /refrats.html, and then finally it was made into a directory page instead /refrats/

(Aside: None of these renamings were functionally necessary, since the user-visible page and the linking never changed. But I generally prefer to let URLs coincide with physical filenames, which in turn should be something I have a fair chance of remembering. What did I mean by "LucysPics4"? I now have no idea.)

Lorel

3:48 pm on Oct 24, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Lucy and Wilderness.

I'll give that a try.

Lorel

7:20 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok, I have MAMP installed and working. I loaded my client's website in MAMP/htdocs and can access the pages.

I left the extension on all the files but left the extension off one link on the index page for testing and opened the home page and clicked on the link and it's not working.

I tried both the codes listed below (one on a previous thread and the 2nd was JD morgan's solution for the same issue in 2009) and neither are working (I edited the redirect to reflect the actual domain name in the original).

I get this error:
The requested URL /example.com/filename was not found on this server.

Can someone tell me what might be wrong?

Here is the top of my htaccess file:

Options +Includes
Options +FollowSymLinks
RewriteEngine on



I commented out the first so I could try the 2nd

#
#for extensionless URLs
#RewriteBase /
# check for file not existing
#RewriteCond %{REQUEST_FILENAME} !-f
# check for directory not existing
#RewriteCond %{REQUEST_FILENAME} !-d
# check for filename not ending in .htm to avoid looping
#RewriteCond %{REQUEST_FILENAME} !\.htm$
# if the above conditions are met of no matching file, dir or htm file
# then rewrite to a .htm file
#RewriteRule ^(.*)$ /$1.htm [L]


#to get rid of old .htm URLs, to canonicalize new extensionless URLs
# by removing trailing slashes, and to rewrite new extensionless URLs
# to the correct .htm files.
RewriteEngine on
RewriteBase /
#
# Redirect direct client request for old URL with .htm extension
# to new extensionless URL if the .htm file exists
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+\.htm\ HTTP/
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(([^/]+/)*[^.]+)\.htm$ http://www.example.com/$1 [R=301,L]
#
# Redirect any request for a URL with a trailing slash to extensionless URL
# without a trailing slash unless it is a request for an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Internally rewrite extensionless URL request
# to .htm file if the .htm file exists
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]

phranque

7:44 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



your document is probably linking to a URL that is missing the protocol specification and therefore the browser interpreted the hostname as relative to the root directory.

Lorel

7:55 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



oppps. I posted the above post in the wrong thread. Please remove if you want.

Lorel

7:57 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#phranque are you saying I should use full urls?

phranque

8:02 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



or use the leading double slash if you want a protocol-relative URL

Lorel

8:03 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tried the full url on that link and while it brought up the page without the extension on it the page was empty and had a not-found message, so must be a 404. This is odd as I have a custom 404 error page and this is not it.

Lorel

8:06 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tried just the http://filename

and that brought up a different error (server not found).

[edited by: phranque at 9:41 pm (utc) on Oct 28, 2014]
[edit reason] unlinked url [/edit]

lucy24

8:30 pm on Oct 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First question: Is your htaccess (the real one in . leading dot) recognized by the MAMP installation? If not, you will have to be brave and edit the config file. Since it's all happening on your own computer, go ahead and say
AllowOverride All
for the highest possible directory (<Directory /some-name-here> section). This may depend on OS; I had one form that didn't work in 10.6 but does in 10.9.

are you saying I should use full urls?

That would kinda blow the point of even using MAMP, because you want your local files to be exactly the same as your "live" files. Leading slash always means "stay on the present host", whatever it may be.

I tried the full url on that link and while it brought up the page without the extension on it the page was empty and had a not-found message, so must be a 404. This is odd as I have a custom 404 error page and this is not it.

By "brought up the page" do you simply mean that the browser's address bar ended up showing the intended URL? It sounds as if you got the generic Apache 404 message, which is what MAMP will give you. Possible reasons for your custom 404 page not to show up include:
-- you don't have it in the same location (relative to the root) on your local site as on your live site
-- the htaccess file you're using doesn't include the ErrorDocument directive
-- as discussed earlier, your MAMP installation doesn't recognize htaccess files (assuming there is one, and it contains the ErrorDocument line)

The form
http:/ /filename
will never get you to /filename, because your computer is now looking for a website called "filename" -- and, understandably, not finding it. That's why the browser says "server not found". What you need instead is
/filename
with leading slash. This will work if and only if "filename" is side-by-side with your domain's index page.

If the physical file has no extension, browsers won't now what to do with it: treat it as html? serve as a plain-text file? try to parse with some language? That's why you installed MAMP in the first place, right? So you could say something like
RewriteRule ^([^.]+)$ /$1\.html [L]

and try it out locally. This should work identically on MAMP as on your live site-- always assuming that the htaccess is recognized in the first place.

Yes, of course you can disregard htaccess and proceed directly to the config file now that you've got one. But I don't think it is the most appropriate way to proceed (other thread) in most situations.