Forum Moderators: phranque

Message Too Old, No Replies

mod-rewrite problems: broken images and end slash

         

gcan

3:14 pm on Mar 23, 2008 (gmt 0)

10+ Year Member



This is my .htaccess file:

==================================
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
RewriteRule ^uid/([0-9]+)$ profile.php?uid=$1 [NC,L]
==================================

There are 2 problems with it:

1) All images appear as broken because I am using relative paths like "...img src="images/image.jpg"..."

When using this rewrite rule, the server wants to show images from this folder:

mydomain.com/uid/images

instead of correct folder:

mydomain.com/images

Is there any way how to fix this problem without changing paths of all images to absolute?

2) Second problem is the end slash. Rewrite rule works only without end slash:

[myserver...]

Address:

[myserver...]

produces "Not Found" error

Any help will be appreciated.

jdMorgan

4:00 pm on Mar 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember that it is the browser which resolves relative links. Because the browser thinks the URL is (for example) http://example.com/uid/60, it correctly resolves page-relative links by removing the page-path (60) and appending the page-relative path that you have specified (images/logo.gif), producing the address from which it will request the image: http://example.com/uid/images/logo.gif.

In other words, given your code, this is the expected behaviour.

You'll need to fix all of your image links to point to the correct directory. You could "patch" this using mod_rewrite, but the result will be duplicate content -- Both the /uid/images/logo.gif and /images/logo.gif URLs pointing to the same image.

Rather than specifying a canonical URL to fix your image-linking problem, I'd suggest using server-relative links instead of page-relative links. This will fix the problem you describe, but allow you to use the shorter link format. So, instead of
<img src="images/logo.gif">
or
<img src="http://example.com/images/logo.gif">
you could use:
<img src="/images/logo.gif">
or, if you like:
<img src="../images/logo.gif">

Alternatively, you could detect 'bad' image subdirectory requests and rewrite them, but this can result in a duplicate-content problem.

---

The other problem is that we as Webmasters are not free to add or remove slashes as we wish.
example.com/uid/60 refers to a file named "60" in the /uid folder
exmaple.com/uid/60/ is a directory -- That is, it refers to the directory index or an index page in and of the /uid/60/ folder.

These are two different URLs pointing to two different directory levels.

There is no way to tell which URL the search engines will list, and the two URLs will compete with each other for ranking. I'd suggest using the non-slashed version for compliance with the intent of the HTTP URL addressing format.

After fixing your links and making a good attempt to get other Webmasters to fix their links to your site, you can add:


RewriteRule ^uid/([0-9]+)/$ http://example.com/uid/$1 [R=301,L]

Put this above your existing internal rewrite code to fix problems with search engine listings and duplicate content. However, this is a final step, and this rewriterule by itself is not sufficient to prevent problems. As long as "incorrect" links exist, you'll need to leave this redirect in place, and any request for the "incorrect" link will result in two HTTP requests to your server -- First the "incorrect" URL request, and then another request for the correct URL in response to your redirect.

Jim

g1smd

4:08 pm on Mar 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I were going there, I wouldn't start from here.

I much prefer that all image and file references always begin with a leading "/", and quote the full folder and filename path, so that I don't have to work out where something is relative to here.

Domain canonicalisation has already been taken care of elsewhere in the design, so the domain name is not needed on every link.

gcan

4:51 pm on Mar 23, 2008 (gmt 0)

10+ Year Member



jdMorgan and g1smd,

Thank you for your replies. I am going to add leading "/" to all image references - it solves the problem. I didn't think before that it may cause problems.

Another question is about files which are located in the same folder where .htaccess

This link generates "Not Found" error:

<link rel="stylesheet" type="text/css" href="style2.css" />

When I add a leading slash, it works fine with rewriting and without it:

<link rel="stylesheet" type="text/css" href="/style2.css" />

The same problem with all links (all php files are located in the same folder where .htaccess).

Should I override all links in this way:

..a href="/phpfile.php"..

I can do this, but never noticed that people are using leading slashes before filenames which are located in the same folder. Is it safe to do so?

Thanks.

jdMorgan

5:36 pm on Mar 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Adding "/" to the href just means, "go to the root of the domain, and start there."

Browsers recognize all valid canonical (absolute) and relative links, so there's no "danger" of the browser getting confused.

Also, be careful about following "most people," because sadly, "most people" get things wrong. :(

Jim

[edit] Typo fix [/edit]

[edited by: jdMorgan at 5:46 pm (utc) on Mar. 23, 2008]

gcan

3:14 pm on Mar 24, 2008 (gmt 0)

10+ Year Member



Thank you, jdMorgan.

I fixed all the links and added your suggested line to my htaccess file. Everything works fine, but there are some strange redirects. The problem is that I have 4 "mirror" domains showing the same content. Some domains end with ".eu", some domains end with ".es". I will add more domains which will show the same content (.com and .de)

========================
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /

RewriteRule ^uid/([0-9]+)/$ [domain1.com...] [R=301,L]
RewriteRule ^uid/([0-9]+)$ profile.php?uid=$1 [NC,L]
========================

Now happends the following:

==================
DOMAIN1
==================

[domain1.es...] - (without endslash) working OK
[domain1.es...] - working OK (removes end slash)

[domain1.es...] - (without end slash) working OK

[domain1.es...] - (with endslash) redirects to [www....] domain1.es/uid/60

==================
DOMAIN2
==================

[domain2.eu...] - (without endslash) working OK

[domain2.eu...] - (with endslash) redirects to [www....] domain1.es/uid/60

[domain2.eu...] - (without end slash) working OK

[domain2.eu...] - (with endslash) redirects to [www....] domain1.es/uid/60

I am sorry for asking so many questions. Is there some way how to fix rewrite rules so that they only removes endslash, but do not redirect anywhere. If user who is loggged in at www.domain1.es will be redirected to domain1.es (without "www") or another domain, this user will be not logged in anymore.

jdMorgan

3:40 pm on Mar 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use the HTTP_HOST variable, but make sure it's not blank:

RewriteCond %{HTTP_HOST} .
RewriteRule ^uid/([0-9]+)/$ http://%{HTTP_HOST}/uid/$1 [R=301,L]

Now, a word about your plans: You should NOT attempt to promote multiple domains with the same content. With a few duplicate-content domains, your pages in those domains will compete with each other, and none will rank well in search. With many duplicate-content domains, you risk a real duplicate-content penalty from the search engines.

Depending on your "market area" I'd suggest either getting a single ".eu" or ".com" domain. Otherwise, the only safe way to do what you describe is to have each domain translated to the primary language of the country-code in the domain name, so that the pages are not duplicates. If you do this, the sites still compete against each other in global search, but you have no duplicate-content risk.

Unless you want to support multiple languages, I strongly suggest getting a generic .eu or .com domain, putting the site on that, linking to that domain only, and putting a 301-redirect in place to redirect all of the other domains to that single domain; Make one strong site instead of many weak ones fighting each other.

Jim

gcan

6:10 pm on Mar 24, 2008 (gmt 0)

10+ Year Member



jdMorgan, thank you very much for your help. Now rewriting works perfect.

No, I am not going to promote multiple domains with the same content. Actually, content will be the same but in different languages. Depending of the domain php script will show appropriate content - English content for .eu domain, Spanish content for .es domain, German content for .de domain.

Yhanks for warning about penalties from the search engines. Never heard about pemalties like this.

jdMorgan

4:48 pm on Mar 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is a lot of "fear" about duplicate-content penalties posted in threads on Webmaster forums. Most of it is unfounded, except in cases where the duplication is massive and seems (to the search engines) to be intentional.

The biggest problem is that multiple pages with the same content will compete with each other for position in the search results, and that the winner is selected by the search engines, and not by the Webmaster.

If you've got multiple languages, then duplicate-content is not so much of a problem. Your only concern should be to make sure that you don't have duplicates like:

http://www.example.com/
[example.com...]
http://example.com/
[example.com...]

http://www.example.com/index.php
http://example.com/index.php
[example.com...]
[example.com...]

http://www.example.com/index.php?name1=val1&name2=val2
http://www.example.com/index.php?name2=val2&name1=val1

http://example.com/index.php?name1=val1&name2=val2
http://example.com/index.php?name2=val2&name1=val1

[example.com...]
[example.com...]

[example.com...]
[example.com...]

You can see that if all of these URLs plus query strings are allowed to resolve to the same page, ranking problems would be likely. It's also obvious that if you use many query parameters, and those name/value pairs are allowed to be published in links in any order, that the duplicate-content "URL space" can quickly grow to enormous size.

So, it is a good idea to adopt a "one page, one URL" rule, and redirect all possible alternate URLs to the single canonical URL for that page.

Jim

g1smd

12:32 am on Mar 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There a threads a few years back where I mentioned fixing up a site with 40 000 genuine "pages" that exposed some 800 000 URLs for indexing. Care is needed to get the best results.

gcan

5:58 pm on Apr 19, 2008 (gmt 0)

10+ Year Member



jdMorgan, Can you answer one more question?
I am removing endslashes as you suggested.

Below is a part of my .htaccess file:
==========================

RewriteRule ^directory/$ [%{HTTP_HOST}...] [R=301,L]
RewriteRule ^directory$ directory.php [NC,L]

RewriteRule ^directory/([0-9]+)/$ [%{HTTP_HOST}...] [R=301,L]
RewriteRule ^directory/([0-9]+)$ directory.php?do=cat&cid=$1 [NC,L]

RewriteRule ^directory/cat/([0-9]+)/$ [%{HTTP_HOST}...] [R=301,L]
RewriteRule ^directory/cat/([0-9]+)$ directory.php?do=subcat&cid=$1 [NC,L]

===================

Now my question. Is it possible to remove endslashes with one rewrite rule for all cases which start with "directory"?

directory/1
directory/cat/1
directory/subcat/1
etc

Thanks.

jdMorgan

1:47 pm on Apr 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes,

RewriteRule ^directory(.*)/$ http://%{HTTP_HOST}/directory$1 [R=301,L]

Oops -- This solution should have been fairly obvious, and I strongly recommend a review of the regular-expressions tutorial cited in our forum charter. For the sake of your site's health and ranking, please don't put any code on your server that you do not understand clearly and completely.

Jim

gcan

9:28 pm on May 1, 2008 (gmt 0)

10+ Year Member



Thank you djMorgan.

Does any one know how mush server resources take rewriting.
I don't want to slow down my server. Is it ok to have about 30-40 rules?

g1smd

10:43 pm on May 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is likely there is a better way to simplify that many rules, or else let your scripts handle them rather than use .htaccess to do this.

gcan

9:56 am on May 2, 2008 (gmt 0)

10+ Year Member



g1smd, what do you mean? How php scripts can handle overriting as .htaccess?