Forum Moderators: open

Message Too Old, No Replies

Relative or Absolute - Argument has Escalated

         

internetheaven

3:19 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, the debate in the office is getting us nowhere and I was wondering if anyone knew where we could find some authority on the subject of Relative over Absolute links. Here are the three options:

1. Have all links on the page like this:

href="/page.html"
href="/folder/page.html"
href="/images/image.gif"

with base tag in the header:

<head>
<base href="http://www.example.com">
</head>

2. Have all links on the page like this:

href="../page.html"
href="../folder/page.html"
href="../images/image.gif"

with no base tag.

3. Have all links on the page like this:

href="http://www.example.com/page.html"
href="http://www.example.com/folder/page.html"
href="http://www.example.com/images/image.gif"

Obviously No.1 would be best for file size as some pages have a lot of links/images and the code can get huge just on domain names alone. But we are trying to find out what errors can arise from search engine spidering and browser bugs for each and which is best overall.

Any help would be appreciated.
Thanks

tedster

4:47 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In example #2 -- which is most commonly seen -- you can get into so-called "canonical problems" unless you take extra steps and disciplines. You can avoid many of these problems by also including the base element in the head.

The base element is intended to be the reference from which relative links in the document are calculated. Therefore, in the best practice, the url in the base element should be the full, intended url of the document itself. At a minimum, the base href needs to include the path through the last subdirectory, but as I said, ideally it contains the full, absolute url of the page it appears on.

jatar_k

4:53 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I always use the style in #1 without a base tag. I like my site to be portable, using a root relative url means if pages or sites move they still work.

I have never had any problems

#3 would be my second choice, everything fully qualified, it is brutal when you have to move something though

pageoneresults

4:56 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use a combination of 1 and 3. Primary navigation links are full URIs. Internal (on page) links are root relative URIs. I would never use option 2 but that's just me. ;)

Most of what I do is mostly option 3. When people save things to their local system, I've caught quite a few who forgot to take out URI references before posting their work. Lot's of copycats out there. ;)

internetheaven

5:44 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for your input.

The base element is intended to be the reference from which relative links in the document are calculated.

Please go on. It seems as though I may have mis-understood the base tag. Does the base tag have anything to do with the links on the page?

And do you mean that on the page:

http://www.example.com/folder/page.html

you should have the base tag:

<base href="http://www.example.com/folder/page.html">

tedster

11:29 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And do you mean that on the page:
http://www.example.com/folder/page.html

you should have the base tag:
<base href="http://www.example.com/folder/page.html">

Yes, that's it exactly. See Path information: the BASE element [w3.org] at the W3C website.

The base element has everything to do with how relative links in the document are understaood by a user agent. (I avoid using the word "page" because it isn't technically exact - consider iframes, for example. So I would rather use words such as "url" or "document" instead of "page".)

internetheaven

12:37 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the doc, but from how I read it the "correct answer" is a combination of No.'s 1 & 2:

2. Have all links on the page like this:

href="../page.html"
href="../folder/page.html"
href="../images/image.gif"

with base tag:

<base href="http://www.example.com/folder/page.html">

So if that's the standard why do no sites use it? And is that the standard that crawlers and browsers adhere to?

pageoneresults

1:03 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



internetheaven, ever read this RFC?

[ietf.org...]

I don't think there are many who use the base element because it can be somewhat confusing. Even after reading the spec many times, I still find some things a little confusing that is why I don't use it.

I use a combination of 1 and 3.

I should have stated that I use a combination of 1 and 3 without the base element.

tedster

1:45 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. I know of several sites that have sorted their "canonical problems" with Google by using the base tag as described in that link.

2. Check the header for Google's cache pages -- they also state paths for the base element this way.

The RFC that pageoneresults linked to ends this way:

The term "relative URL" implies that there exists some absolute "base
URL" against which the relative reference is applied. Indeed, the
base URL is necessary to define the semantics of any embedded
relative URLs; without it, a relative reference is meaningless. In
order for relative URLs to be usable within a document, the base URL
of that document must be known to the parser.

The main point being that if you use relative urls and don't explicitly state the base url, then search enignes will assume that the url they requested actually IS the base. The base tag gives you a chance to change their mind!

With approach #3, you're not using relative urls, so the base element is superfluous. But with approach #1, you're still open to canonical troubles via "www" and "https:", as examples.

jatar_k

1:52 am on May 13, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



you could make it easy and just not use relative urls, which need work arounds where absolutes don't ;)

Lobo

1:57 am on May 13, 2006 (gmt 0)

10+ Year Member



I've always used site relitive, it works well..

Using absolute for email and newsletters..

It's still finding the best structure within that...

divide areas in to folders with an index in there, or keep the opening page for each area on the top level.

it's a dilemma!

encyclo

2:31 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with many of the comments here that it is best to simply avoid relative links completely. Base element or not, errors in relative URL resolution are always a possibility, and managing such links is troublesome as you can't easily transpose files across directories.

I use option 1, usually without a base element (but using mod_rewrite to sort out www/non-www confusion). A site-wide base element is useful if you think your page is going to be ripped, as the internal links on a copy will resolve back to your original site.

internetheaven

8:55 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



See, now this is my point, this is what is so confusing:

But with approach #1, you're still open to canonical troubles via "www" and "https:", as examples.

But if using the base tag is what you are "supposed" to do, then surely all search engine crawlers should be set up to recognise this format. So using a base tag "SHOULD" be the same as using absolutes throughout the page, right?

I know of several sites that have sorted their "canonical problems" with Google by using the base tag as described in that link.

That is what I would expect, but as you can tell, not everyone is in agreement which is why our office debate is getting us no-where. Both docs that I have seen now lead me to believe that using:

<a href="../page.html"

with base tag:

<base href="http://www.example.com/folder/page.html"

is the correct way and I have to assume that crawlers and browser operators have read those exact same documents. So why do people still disagree?

A site-wide base element is useful if you think your page is going to be ripped

A ripper would simply change the base tag in the coding automatically though I would have thought.

vincevincevince

9:05 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I always use the style in #1 without a base tag. I like my site to be portable, using a root relative url means if pages or sites move they still work.

I favour that. Start the link with / so you are clearly showing the route from the web-root.

henry0

10:56 am on May 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is option 1 only doable on production and test server?

I guess it can't be tested on a local machine

or do I miss the whole concept idea?

tedster

4:51 pm on May 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can test #1 on a local machine only if you are running it from within a local server.

grubs

2:53 am on May 18, 2006 (gmt 0)

10+ Year Member



I have always used

<head>
<base href="http://www.example.com/">
</head>

href="page.html"
href="folder/page.html"
href="images/image.gif"

(slightly different from OPs #1).

Interresting note: the W3C page with information on the BASE tag (http://www.w3.org/TR/html4/struct/links.html) does NOT HAVE A BASE TAG!

The following (supposedly correct) method seems like double handling to me - "go to this folder, then jump back one and grab the doc from the next folder below you where you started from"

<base href="http://www.example.com/page1.html">
href="../page.html"
href="../folder/page.html"
href="../images/image.gif"

internetheaven

3:26 pm on May 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, I've updated my pages to use the actual recommended format which is:

<a href="../page.html"

for all internal links on the page with base tag:

<base href="http://www.example.com/folder/page.html"

in the head tags. I've just ran a linkchecker on one of the pages and it says that it checked the links:

http://www.example.com/folder/../page.html
http://www.example.com/folder/../page2.html
http://www.example.com/folder/../page3.html

If you go to the URL showns (including the dots) the page shows up fine. But, if I use WW Sim Crawler it shows the links on the page to be

http://www.example.com/folder/page.html
http://www.example.com/folder/page2.html
http://www.example.com/folder/page3.html

So now I'm really, really confused. What on earth is going on and why isn't there a standard format? What will search engine crawlers such as Google/Yahoo see when they index the links on my page?

tobyink

8:16 am on May 21, 2006 (gmt 0)

10+ Year Member



FWIW, I use:

<a href="/">Home</a>
<a href="/dir/page1.html">Some page</a>
<a href="/page4.html">Another Page</a>

with no BASE element.

mrb_63

6:25 pm on May 21, 2006 (gmt 0)

10+ Year Member



Hello!

Have you been to Webmonkey or About? They have a lot of advanced information that I use myself. It may answer some questions.

Thanks and God Bless,
Much Success To You,
mrb_63

internetheaven

6:45 pm on May 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you been to Webmonkey or About?

I've been all over the place which is why I have this question. Some things are so clear cut such as meta tags, validating coding, robots.txt etc. why is this such a ridiculously debated subject. In this thread everyone has pitched in with "what they do" but still there is no answer to "what is right".

Obviously unless we can all agree we are always going to run into problems. Why is that the case? With crawling such a big, big part of all major search engines why don't they list on the webmaster help pages EXACTLY what they expect to see?

encyclo

2:09 am on May 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no "right" way, but relative links are always harder (and thus more error-prone) for the user agent to handle than absolute links. This is simply because there are more variables involved when handling relative links - and the
base
element is just another variable for the UA to consider. Your tests are pretty conclusive - relative URLs can confuse spiders.

The vast majority of sites link from the root URL, ie.:

<a href="[b]/directory/page.html[/b]">link</a>

You may include a

base
element, but it is not vital. This approach is the best for simplicity and consistency, with a slight decrease in portability.

internetheaven

8:32 am on May 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no "right" way,

Why not? Aren't there several authorative bodies that issue "rules"? Isn't this a big one that could use a single rule?

Your tests are pretty conclusive - relative URLs can confuse spiders.

Because there is no rule. If we had a rule, we wouldn't have the problem. Instead of threads discussing what we would like to see in HTML 5, how about we kick up a fuss until somone finally puts this to rest. IT IS A HUGE PROBLEM!

Hester

8:47 am on May 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use two dots and a slash whenever the document is inside a folder and needs to reference a file in the root.

It is also useful when dealing with a file to be included on documents that may lie in nested folders that calls other files. Let's say you make a header and put it in the root for all your pages to use. But you have some pages in a folder, and others in a subfolder. To make sure the header always calls the right files, without having to have copies of it in each folder, simply add enough dots to reference the furthest subfolder used, eg:

2 folders deep = ../../file.txt

I found that if the header is called from the root, it will ignore the dots (as you can't go back beyond the domain itself) and include the file correctly. If it is called from a folder, it will still find the file. It may be a hack, but it works!

The only drawback I have found is when testing the files locally. Because my local server is within subfolders, the links break. Oh well.

Otherwise I have to make copies of the header for each folder, with the links changed to suit, and update each one when I make any changes. My way you only need the one header file.

Romeo

9:06 am on May 22, 2006 (gmt 0)

10+ Year Member




There is no "right" way,

Why not?

... because in real life, sometimes "There's more than one way to do it." (TMTOWTDI, usually pronounced 'Tim Toady'), quoting one of the mottos of Perl here.

As Encyclo already said, there is no "right" way, but some ways may be harder (and thus more error-prone) than others.
It's on your choice.

Because there is no rule. If we had a rule, we wouldn't have the problem.

Rulemaking does not solve third party implementation problems per se. I have no hope, that SE-spider progammers, who can't get simple rules (like handling a simple 301 or 404) correctly, would not screw up this one, too ...

Kind regards,
R.

internetheaven

10:30 am on May 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no "right" way,

Why not?

... because in real life, sometimes "There's more than one way to do it."

Are you serious?! You're going to take "this is real life" stance? Your think I'm out of touch with reality to not understand why there is a clear robots.txt rule but no URL rule? You can't be serious, surely? No-one could possibly reply "this is real life" to a topic that CAN have a clear-cut rule unless they were some complete ... well, TOS prohibits ...

This is computer programming, there can always be a rule, this isn't lawyer eithics counselling.

Romeo

12:39 pm on May 22, 2006 (gmt 0)

10+ Year Member



This is computer programming, there can always be a rule,

Yes, I know, and the 'Perl' I mentioned is infact a computer programming language used in the web environment ... which people use "to get something done" (another motto's quote).

Back to your first message in this thread:

... where we could find some authority on the subject of Relative over Absolute links. Here are the three options:

While there may not be THE authoritative rule you are looking for, there are these 3 options yet.
Some may not be clear as they should (as the 'base' tag), some are. All are valid, some may be more suitable in a given environment than others.
Combining these options we already have in our choice gives us just that TMTOWTDI.
It works and we get our stuff done.

Your think I'm out of touch with reality

No, I don't, I did not say that. I just think that we don't need more formal authoritative rules in a relaxed case like this, where the stuff we already have just works.

IT IS A HUGE PROBLEM!
I don't see it like that, but YMMV.

this isn't lawyer eithics counselling.

Yes, it isn't, and mentioning 'lawyer' and 'ethics' in one sentence looks like a contradiction in terms ... but I disgress, sorry.

No puns intended, and I don't want to escalate this further.
HAND and kind regards,
R.

internetheaven

6:39 pm on Jun 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just as an update. I still have no idea which way is the right way. But I just switched to the recommended version (i.e. base tag with ../ paths) and although I don't know how crawlers are dealing with that, it certainly cleared up some javascript issues we were having with one site and Internet Explorer!