Forum Moderators: Robert Charlton & goodroi
Does the google spider see this as the full url? like a regular user, which see that link as only 'Link Name', but the link itself
is shown with its full domain. as www.example.com/resource/linkname.html
Let me explain my question again.
What Im trying to figure out is if what Im doing has an affect on how google crawls into my html files, or is it basically the same as putting the full url inside my 'a href' tag?
[edited by: ciml at 10:21 am (utc) on May 10, 2005]
[edit reason] Examplified [/edit]
I use absolute URLs like: /folder/ and /folder/folder/file.html and http://www.domain.com/folder/ etc.
Always set up a 301 redirect from non-www to www so that Google is not confused as to what the site is actually called.
I didnt know people are having problems with that issue.
Can you please tell me what problems accure from using Relative links instead of absolute ones?
Also, you said "Google crawls it, but it will crawl the www and non-www loops of your site separately"
so what you mean is, every relative link I have,
like /marketting/index.html - Will be crawled for
domain.com/marketting/index.html & www.domain.com/marketting/index.html?
g1smd,
What is exactly the difference between using
/folder/folder/file.html to older/folder/file.html?
Arent they both considered relative?
Thank you again for your help and knowledge.
so what you mean is, every relative link I have,
like /marketting/index.html - Will be crawled for
domain.com/marketting/index.html & www.domain.com/marketting/index.html?
No, steveb is only confusing the issue. That would on;ly be the case if you haven't set your site up properly.
If you have two web sites serving the same data set up at [domain.dom...] and [domain.dom...] then that is a problem in itself and should be fixed using 301 redirects. Trying to fix it using absolute uris including the domain is a half-hearted fix at best and it won't work as well as a 301 redirect.
If you were to start putting your whole domain name in every anchor on your site, then your page size will soon bloat up and your vistors will notice a slowdown.
folder/folder/file.html - is a relative URL from wherever you are now. It points to a file located two folders down the tree from "here". To get to that folder from another part of the site you would construct the link differently. If you moved a page that contained the link, then you would need to edit the link for it to still work.
/folder/folder/file.html - is a relative folder counting from the root - which sort of makes it absolute. You can use that link from any page of the site without worrying about where you are linking from. The link is not related to where you are now, it is absolute from the root. All links to that page from anywhere in the site will always be identical. If you move a page within the site, the link will still work without editing it.
Set up a 301 redirect from non-www to www so that all users, bots, and browsers are forced to use the www version of the URL for every page of the site.
Additionally, if you link to an index file within a folder, end the link with the folder name followed by a trailing / on the URL. Do not mention the index file filename at all. Let the server work out what it is called, find it, and serve it. You can then change from HTML to PHP at any time in the future, without having to change any of the links on the site. It will just carry on working, and ranking.
This is of course poor advice since obviously it isn't an either/or situation.
Relative URLs, without a 301, lead (sometimes but not always) to duplicate content problems, split pagerank, URL-only rotting of domain pages, and other problems. Even using a 301 you can still have problems because Google deals with URLs NOT pages. If you get a visitor via an external link to your non-www, since Google seems to only care about the link, then your relative links on that page may still be viewed as in the non-www loop. There is no downside to absolute links and plenty of major positives. If you feel like you have to use relative ones for whatever reason, protect yourself with the 301, the base tag, and self-link your main page with an absolute link to itself.
Even using a 301 you can still have problems because Google deals with URLs NOT pages.
A URL is a pointer to a page the two are intertwined, you can't seperate them.
If you get a visitor via an external link to your non-www, since Google seems to only care about the link, then your relative links on that page may still be viewed as in the non-www loop.
By issuing a 301 your are changing the URI to the new page. Google sees them ,google interpets them correctloy according to the HTTP RFC. I have never seen it not interpret them correctly. I suspect you've seen badly issued 301s.
There is no downside to absolute [full path] links
No downside! You're adding 300%+ to the time it takes to download each of your anchor tags and all for absolutely no benefit.
Every extra un-necessary byte you add to your code increases the liklihood of someone giving up before your page has downloaded.
mrMister - Google has not interpreted them correctly for almost a year per the RFC - it is fact and many of us have extremely hard proof of it through testing and live examples.
If you read above in msg 7 I even posted an example that allows you to see how it has even affected google.com and their .net version that has split itself off by their changes
I could care less if my page is 14 K instead 13.2 K. No one is going to notice the difference.
If you have so many links on a page, that it *would* make a difference - well then your page has too many links on it anyway, IMHO.
You need to start reading webmasterworld more. [...] I can't believe anyone would seriously post that statement these days.
I'm starting to wonder if I've wandered in to a troll's layer.
Google not only separates them... lol, nevermind.
Yeah, nevermind, why bother to post if you can't explain yourself?
I've used 301s on a couple of sites over the last year and have seen no problems with Google not interpreting them correctly. 'nuff said.
Actually, I had a look at your site. I took your links page and bandwidth optimised it, reducing all the redundant code. I got the page down from 5273 bytes to 1688.
If that page is representative of the rest of your site, then your pages are taking more than three times longer to download than they need to be.
Are you seriously saying that doesn't matter?
If that page is representative of the rest of your site, then your pages are taking more than three times longer to download than they need to be.Are you seriously saying that doesn't matter?
Of course it matters, a little. But using relative addressing leaves a site vulnerable to page jacking as does not consolidating domain names. In addition, not consolidating domain names often splits PR among the domain name variations.
Either of those can impact rankings significantly. Would you prefer nice trim pages that are buried in the SERPs or a little code bloat and a page that ranks prominently?
And please don't argue that less bloated pages rank higher until you've scraped and analyzed every page Google indexes. ;)
but using relative addressing leaves a site vulnerable to page jacking
One regular expression or Search & Replace and your page is "jacked".
In the meantime, you've got a jacked page and a slow web site.
Actually, I had a look at your site. I took your links page and bandwidth optimised it, reducing all the redundant code. I got the page down from 5273 bytes to 1688.
Yawn.
You mean the site in my profile? The one I made in notepad the night before I went to the Vegas pubcon last year?
Whatever dude, knock yourself out.
Why don't you go and remove all the carriage returns in your code, you'll save an additional 84 bytes.
In the mean time, I'll take the risk that someone in armenia will be willing to wait an extra 1/8th of a second to see the page.
I'll take the risk that someone in armenia will be willing to wait an extra 1/8th of a second to see the page.
pageload 3/8 of a second is pretty fast.
My guess would be more like 11 seconds vs 33 seconds.
each hit you could add 2 seconds to that html file size.
I use relative urls and have no problem - maybe distributing pagerank through a site is a good thing, PR sure doesn't have anything to do with ranking.
If you uese relative urls, I suggest adding to your <head></head> section:
<BASE HREF="http://www.yoursite.com/">I recall either Claus or Reid suggesting that adding above tag might also help preventing 302 hijacking of your pages.
example
[webmasterworld.com...]
Download Times*
Connection Rate Download Time
14.4K 6.21 seconds
28.8K 3.31 seconds
33.6K 2.89 seconds
56K 1.89 seconds
ISDN 128K 0.86 seconds
T1 1.44Mbps 0.44 seconds
The biggest thing on the page is the huge-ass img that was a dupe of the business cards I made for the conf.
I think 6 seconds is a respectable download time for a page on 14.4 dialup.
*according to some online tool that I found randomly by googling for a page download speed analyzer.
--
I use relative urls and have no problem - maybe distributing pagerank through a site is a good thing
There is no evidence that a relative link is distributing PR more favorably than an absolute URL.
In fact, it could be argued that in theory, using absolute URLs actually *helps* your internal links get better ranking. If every link from out site of your site comes in as www.example.com/foo/bar.html - and your internal links are ../foo/bar.html *maybe* (maybe!) those internal links don't go into the URL's link pop "bank account".
All I can say is that I have changed from relative to absolute URLs about 4 years ago, and not only has no one from Armenia complained, I have zero sites that had the problem with getting both the www and non-www version indexed.
tonmaster, it shouldn't normally give you any problems in Google.
Some people prefer to use URIs (i.e. [example.com...] in case their site can be accessed in different ways (e.g. [example.com...] or [example.com...] to avoid Google crawling the whole site.
Personally I prefer to worry about duplicate pages being crawlable and not about relative URLs.
I also have no idea how PR is distributed within a site regarding relative links.
What Im trying to figure out is if what Im doing has an affect on how google crawls into my html files, or is it basically the same as putting the full url inside my 'a href' tag?
The user-agent (a browser or a bot) has a 'base register', it knows where it is.
So it holds within this registry the URL path of the page it is viewing.
say the base is ht*p://w*w.example.com/foo/widget.html
and it finds a link "red-widget.html"
the user-agent will use the base to establish the location of red-widget.html
in this case the user-agent will append
ht*p://w*w.example.com/foo/ (the directory it is in)
to red-widget.html
thus getting ht*p://w*w.example.com/foo/red-widget.html
the base META tag sets this 'base register' in the user-agent (rather than letting the user-agent keep the the base it built itself) so it is not necessary but an added protection in case the user-agent somehow gets confused (wrong info in the base registry)
In this case the user agent would then request getting ht*p://w*w.example.com/foo/red-widget.html and then upon viewing the page would update the base using the base META info on the new page if it exists - or else it would just use the URL that it built itself (which is the correct one in this case)
It would then use ht*p://w*w.example.com/foo/red-widget.html as a base for any relative links it finds on this page.
googlebot does this very well. but if you make an error in your relative linking you could send it to a different page than you intended (or one that does not exist) this is why the base META tag is a good idea as far as establishing canonical URL to avoid problems with redirects (which end of the redirect gets the PR).
so I would say that googlebot follows the relative link but 'builds the url' out of the base so it wouldn't make a difference for ranking as long as the bot doesn't get lost (base register problems).
also I have <base href="ht*p://w*w.example.com/index.html"> on my homepage and google lists it as w*w.example.com/ with no trouble at all.