Relative URLs in Google

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Relative URLs in Google

tonmaster

10:16 am on May 10, 2005 (gmt 0)

When I link to an inside article within my website, I usually link it, for example:
<a href="resource/linkname.html">Link Name</a>

Does the google spider see this as the full url? like a regular user, which see that link as only 'Link Name', but the link itself
is shown with its full domain. as www.example.com/resource/linkname.html

Let me explain my question again.
What Im trying to figure out is if what Im doing has an affect on how google crawls into my html files, or is it basically the same as putting the full url inside my 'a href' tag?

[edited by: ciml at 10:21 am (utc) on May 10, 2005]
[edit reason] Examplified [/edit]

mrMister

12:28 pm on May 10, 2005 (gmt 0)

Yes, Googlebot (and all other HTML compliant web crawlers) will follow the URL correctly.

steveb

10:09 pm on May 10, 2005 (gmt 0)

It's very different than putting the full URL into the href. Google crawls it, but it will crawl the www and non-www loops of your site separately. Many of the problems people post about these days are due to the use of relative links.

g1smd

10:11 pm on May 10, 2005 (gmt 0)

I avoid relative URLs like: folder/folder/file.html or ../folder/file.html etc.

I use absolute URLs like: /folder/ and /folder/folder/file.html and http://www.domain.com/folder/ etc.

Always set up a 301 redirect from non-www to www so that Google is not confused as to what the site is actually called.

tonmaster

8:32 am on May 11, 2005 (gmt 0)

Steveb, You said:
"Many of the problems people post about these days are due to the use of relative links. "

I didnt know people are having problems with that issue.
Can you please tell me what problems accure from using Relative links instead of absolute ones?

Also, you said "Google crawls it, but it will crawl the www and non-www loops of your site separately"

so what you mean is, every relative link I have,
like /marketting/index.html - Will be crawled for
domain.com/marketting/index.html & www.domain.com/marketting/index.html?

g1smd,

What is exactly the difference between using
/folder/folder/file.html to older/folder/file.html?
Arent they both considered relative?

Thank you again for your help and knowledge.

reseller

9:13 am on May 11, 2005 (gmt 0)

If you uese relative urls, I suggest adding to your <head></head> section:

I recall either Claus or Reid suggesting that adding above tag might also help preventing 302 hijacking of your pages.

Marval

12:13 pm on May 11, 2005 (gmt 0)

One other problem that can come about is if you have vanity domains (the .net and .org versions of your domain) with an index page to lead people to your .com site. Starting around Jun or Jul of last year googlebot would follow the relative links and assign them randomly to the vanity domains basically splitting your site into pieces - the effect can be seen pretty easily at some of the major sites out there - even Google has the problem - check their backlinks for the .net version or Google and you will see how it can happen.
This leads to splitting of the PR and links throughout your site which of course then can have drastic effects on your rankings. Note that Google seems to have changed their .net page recently to do a 302 to the .com version
HTTP/1.0 302 Found =>
Location => [google.com...]
so hopefully they will not lose too much rank for their site :)

mrMister

3:42 pm on May 11, 2005 (gmt 0)

so what you mean is, every relative link I have,
like /marketting/index.html - Will be crawled for
domain.com/marketting/index.html & www.domain.com/marketting/index.html?

No, steveb is only confusing the issue. That would on;ly be the case if you haven't set your site up properly.

If you have two web sites serving the same data set up at [domain.dom...] and [domain.dom...] then that is a problem in itself and should be fixed using 301 redirects. Trying to fix it using absolute uris including the domain is a half-hearted fix at best and it won't work as well as a 301 redirect.

If you were to start putting your whole domain name in every anchor on your site, then your page size will soon bloat up and your vistors will notice a slowdown.

g1smd

6:31 pm on May 11, 2005 (gmt 0)

>> What is exactly the difference between using
/folder/folder/file.html to folder/folder/file.html?
Arent they both considered relative? <<

folder/folder/file.html - is a relative URL from wherever you are now. It points to a file located two folders down the tree from "here". To get to that folder from another part of the site you would construct the link differently. If you moved a page that contained the link, then you would need to edit the link for it to still work.

/folder/folder/file.html - is a relative folder counting from the root - which sort of makes it absolute. You can use that link from any page of the site without worrying about where you are linking from. The link is not related to where you are now, it is absolute from the root. All links to that page from anywhere in the site will always be identical. If you move a page within the site, the link will still work without editing it.

Set up a 301 redirect from non-www to www so that all users, bots, and browsers are forced to use the www version of the URL for every page of the site.

Additionally, if you link to an index file within a folder, end the link with the folder name followed by a trailing / on the URL. Do not mention the index file filename at all. Let the server work out what it is called, find it, and serve it. You can then change from HTML to PHP at any time in the future, without having to change any of the links on the site. It will just carry on working, and ranking.

steveb

8:06 pm on May 11, 2005 (gmt 0)

"Trying to fix it using absolute uris including the domain is a half-hearted fix at best and it won't work as well as a 301 redirect."

This is of course poor advice since obviously it isn't an either/or situation.

Relative URLs, without a 301, lead (sometimes but not always) to duplicate content problems, split pagerank, URL-only rotting of domain pages, and other problems. Even using a 301 you can still have problems because Google deals with URLs NOT pages. If you get a visitor via an external link to your non-www, since Google seems to only care about the link, then your relative links on that page may still be viewed as in the non-www loop. There is no downside to absolute links and plenty of major positives. If you feel like you have to use relative ones for whatever reason, protect yourself with the 301, the base tag, and self-link your main page with an absolute link to itself.

mrMister

10:16 pm on May 11, 2005 (gmt 0)

Even using a 301 you can still have problems because Google deals with URLs NOT pages.

A URL is a pointer to a page the two are intertwined, you can't seperate them.

If you get a visitor via an external link to your non-www, since Google seems to only care about the link, then your relative links on that page may still be viewed as in the non-www loop.

By issuing a 301 your are changing the URI to the new page. Google sees them ,google interpets them correctloy according to the HTTP RFC. I have never seen it not interpret them correctly. I suspect you've seen badly issued 301s.

There is no downside to absolute [full path] links

No downside! You're adding 300%+ to the time it takes to download each of your anchor tags and all for absolutely no benefit.

Every extra un-necessary byte you add to your code increases the liklihood of someone giving up before your page has downloaded.

steveb

11:11 pm on May 11, 2005 (gmt 0)

"A URL is a pointer to a page the two are intertwined, you can't seperate them."

You need to start reading webmasterworld more. Google not only separates them... lol, nevermind. I can't believe anyone would seriously post that statement these days.

Marval

2:52 am on May 12, 2005 (gmt 0)

An even easier way to fix this is to check with your hosting - see how they have your server config set for the servername - it is important to set that correctly to the www.domain.com instead of domain.com if that is what you want to show up - If it is the latter, any time you have a directory link, i.e.
href="/newdir"
apache sees that it is a directory, issues a 301 redirect with the ServerName which allows googlebot to also find the other version (non-www.)
It can and does happen with the change in the way Google handles redirects - which I mentioned they changed back around June or so of 2004

mrMister - Google has not interpreted them correctly for almost a year per the RFC - it is fact and many of us have extremely hard proof of it through testing and live examples.

If you read above in msg 7 I even posted an example that allows you to see how it has even affected google.com and their .net version that has split itself off by their changes

PatrickDeese

3:10 am on May 12, 2005 (gmt 0)

FWIW, I have 301 redirects on all of my domains to force WWW, *and* have absolute URLs.

I could care less if my page is 14 K instead 13.2 K. No one is going to notice the difference.

If you have so many links on a page, that it *would* make a difference - well then your page has too many links on it anyway, IMHO.

mrMister

1:10 pm on May 12, 2005 (gmt 0)

You need to start reading webmasterworld more. [...] I can't believe anyone would seriously post that statement these days.

I'm starting to wonder if I've wandered in to a troll's layer.

Google not only separates them... lol, nevermind.

Yeah, nevermind, why bother to post if you can't explain yourself?

I've used 301s on a couple of sites over the last year and have seen no problems with Google not interpreting them correctly. 'nuff said.

mrMister

1:21 pm on May 12, 2005 (gmt 0)

I could care less if my page is 14 K instead 13.2 K. No one is going to notice the difference.

With thinking like that, it's no wonder your pages have bloated to 14K!

On average, more visitors will wait for a 13.2K site to load than a 14K site.

mrMister

1:33 pm on May 12, 2005 (gmt 0)

It's alright saying that a few bytes don't matter, but when you add up all the bytes that "don't matter", they amount to massive bloat.

Actually, I had a look at your site. I took your links page and bandwidth optimised it, reducing all the redundant code. I got the page down from 5273 bytes to 1688.

If that page is representative of the rest of your site, then your pages are taking more than three times longer to download than they need to be.

Are you seriously saying that doesn't matter?

DaveAtIFG

2:23 pm on May 12, 2005 (gmt 0)

If that page is representative of the rest of your site, then your pages are taking more than three times longer to download than they need to be.
Are you seriously saying that doesn't matter?

Of course it matters, a little. But using relative addressing leaves a site vulnerable to page jacking as does not consolidating domain names. In addition, not consolidating domain names often splits PR among the domain name variations.

Either of those can impact rankings significantly. Would you prefer nice trim pages that are buried in the SERPs or a little code bloat and a page that ranks prominently?

And please don't argue that less bloated pages rank higher until you've scraped and analyzed every page Google indexes. ;)

mrMister

2:28 pm on May 12, 2005 (gmt 0)

A high search engine ranking means very little without a good site to go with it and a bloated site is not a good site.

mrMister

2:34 pm on May 12, 2005 (gmt 0)

but using relative addressing leaves a site vulnerable to page jacking

aAnchors is also vulnerabe to page jacking. You're putting one small barrier in the way to a page jacker, that's all. You might as well dig a ditch to stop the tide coming in.

One regular expression or Search & Replace and your page is "jacked".

In the meantime, you've got a jacked page and a slow web site.

DaveAtIFG

2:40 pm on May 12, 2005 (gmt 0)

You have much to learn, grasshopper. :)

PatrickDeese

10:19 pm on May 12, 2005 (gmt 0)

Actually, I had a look at your site. I took your links page and bandwidth optimised it, reducing all the redundant code. I got the page down from 5273 bytes to 1688.

Yawn.

You mean the site in my profile? The one I made in notepad the night before I went to the Vegas pubcon last year?

Whatever dude, knock yourself out.

Why don't you go and remove all the carriage returns in your code, you'll save an additional 84 bytes.

In the mean time, I'll take the risk that someone in armenia will be willing to wait an extra 1/8th of a second to see the page.

lawman

1:09 am on May 13, 2005 (gmt 0)

Let's not forget TOS #14 & 19 [webmasterworld.com].

Reid

1:10 am on May 13, 2005 (gmt 0)

I'll take the risk that someone in armenia will be willing to wait an extra 1/8th of a second to see the page.

3 x 1/8

pageload 3/8 of a second is pretty fast.
My guess would be more like 11 seconds vs 33 seconds.
each hit you could add 2 seconds to that html file size.

I use relative urls and have no problem - maybe distributing pagerank through a site is a good thing, PR sure doesn't have anything to do with ranking.

Reid

1:21 am on May 13, 2005 (gmt 0)

If you uese relative urls, I suggest adding to your <head></head> section:
<BASE HREF="http://www.yoursite.com/">
I recall either Claus or Reid suggesting that adding above tag might also help preventing 302 hijacking of your pages.

yes we both agreed that it 'may' help protect from 302's being misinterpreted.
but I wouldn't use www.mysite.com/ as a base URL.
a "base" should be absolute.

example

[webmasterworld.com...]

PatrickDeese

1:29 am on May 13, 2005 (gmt 0)

> My guess would be more like 11 seconds vs 33 seconds.

Download Times*
Connection Rate Download Time
14.4K 6.21 seconds
28.8K 3.31 seconds
33.6K 2.89 seconds
56K 1.89 seconds
ISDN 128K 0.86 seconds
T1 1.44Mbps 0.44 seconds

The biggest thing on the page is the huge-ass img that was a dupe of the business cards I made for the conf.

I think 6 seconds is a respectable download time for a page on 14.4 dialup.

*according to some online tool that I found randomly by googling for a page download speed analyzer.

I use relative urls and have no problem - maybe distributing pagerank through a site is a good thing

There is no evidence that a relative link is distributing PR more favorably than an absolute URL.

In fact, it could be argued that in theory, using absolute URLs actually *helps* your internal links get better ranking. If every link from out site of your site comes in as www.example.com/foo/bar.html - and your internal links are ../foo/bar.html *maybe* (maybe!) those internal links don't go into the URL's link pop "bank account".

All I can say is that I have changed from relative to absolute URLs about 4 years ago, and not only has no one from Armenia complained, I have zero sites that had the problem with getting both the www and non-www version indexed.

ciml

8:56 am on May 13, 2005 (gmt 0)

> What Im trying to figure out is if what Im doing has an affect on how google crawls into my html files, or is it basically the same as putting the full url inside my 'a href' tag?

tonmaster, it shouldn't normally give you any problems in Google.

Some people prefer to use URIs (i.e. [example.com...] in case their site can be accessed in different ways (e.g. [example.com...] or [example.com...] to avoid Google crawling the whole site.

Personally I prefer to worry about duplicate pages being crawlable and not about relative URLs.

Reid

6:47 pm on May 13, 2005 (gmt 0)

patrick - 6s is a respectable download time.

I also have no idea how PR is distributed within a site regarding relative links.

What Im trying to figure out is if what Im doing has an affect on how google crawls into my html files, or is it basically the same as putting the full url inside my 'a href' tag?

The user-agent (a browser or a bot) has a 'base register', it knows where it is.

So it holds within this registry the URL path of the page it is viewing.
say the base is ht*p://w*w.example.com/foo/widget.html

and it finds a link "red-widget.html"
the user-agent will use the base to establish the location of red-widget.html
in this case the user-agent will append
ht*p://w*w.example.com/foo/ (the directory it is in)
to red-widget.html
thus getting ht*p://w*w.example.com/foo/red-widget.html

the base META tag sets this 'base register' in the user-agent (rather than letting the user-agent keep the the base it built itself) so it is not necessary but an added protection in case the user-agent somehow gets confused (wrong info in the base registry)
In this case the user agent would then request getting ht*p://w*w.example.com/foo/red-widget.html and then upon viewing the page would update the base using the base META info on the new page if it exists - or else it would just use the URL that it built itself (which is the correct one in this case)
It would then use ht*p://w*w.example.com/foo/red-widget.html as a base for any relative links it finds on this page.

googlebot does this very well. but if you make an error in your relative linking you could send it to a different page than you intended (or one that does not exist) this is why the base META tag is a good idea as far as establishing canonical URL to avoid problems with redirects (which end of the redirect gets the PR).

so I would say that googlebot follows the relative link but 'builds the url' out of the base so it wouldn't make a difference for ranking as long as the bot doesn't get lost (base register problems).

also I have <base href="ht*p://w*w.example.com/index.html"> on my homepage and google lists it as w*w.example.com/ with no trouble at all.