Absolute and relative links?

Forum Moderators: open

Message Too Old, No Replies

Absolute and relative links?

are these spidered by google the same?

gutabo

3:52 pm on Dec 31, 2003 (gmt 0)

Hi all!

Just a quick question:
Regarding google, is this the same url?(A tag, href=):
http://www.mysite.com/images/mypic.jpg
/images/mypic.jpg

Do they get spidered/indexed/count for PR/whatever google the same?

Thanks!

nileshkurhade

4:09 pm on Dec 31, 2003 (gmt 0)

yes

gutabo

4:33 pm on Dec 31, 2003 (gmt 0)

So I can remove the "http://www.mysite.com" from all the links I have(leaving them with the "/") and I won't lose PR/links/anything?
(just want to be sure)
Thanks!

nileshkurhade

4:49 pm on Dec 31, 2003 (gmt 0)

Yes you may remove "http://www.mysite.com", GoogleBot will append it, and you wont lose any PR

gutabo

7:07 pm on Dec 31, 2003 (gmt 0)

Thanks!
*updates site*

AthlonInside

7:13 pm on Dec 31, 2003 (gmt 0)

I have always use / for homepage.

It is a mistake where people use /index.html or /index.php or similar because google treat

www.yoursite.com/ and
www.yoursite.com/index.html

DIFFERENTLY.

taxpod

3:37 am on Jan 1, 2004 (gmt 0)

gutabo,

I'd wait for more than one opinion before changing your internal links. I'm hardly an expert but I've always heard that while Google is certainly able to append a domain name to a link, it doesn't treat relative and absolute links the same. I'm not saying I am sure about this. But one or two voices shouldn't change your mind one way or the other.

I had great trouble getting a bunch of pages linked via relative URLs into the database so I switched them to absolutes and got them in. But that was well over a year ago. Also, I've never found "/" to be treated differently to "/index.html" provided that the file named is the default. These four references are treated equally on 8 of my sites:

domain
domain/index.html
www.domain
www.domain/index.html

same PR, same backlinks, same everything. So I question that assumption as well.

doc_z

11:15 am on Jan 1, 2004 (gmt 0)

I have always use / for homepage.
It is a mistake where people use /index.html or /index.php or similar because google treat
www.yoursite.com/ and
www.yoursite.com/index.html
DIFFERENTLY

Google distinguishes between these pages if they have different content. However, Google merges these pages (i.e. add back links and PR; show only one page in the SERPS) if they are identical.

glengara

12:11 pm on Jan 1, 2004 (gmt 0)

To cut down on all the things that can go wrong, I prefer to use absolute Urls, but then I also prefer to use individual IPs ;-)

onfire

12:13 pm on Jan 1, 2004 (gmt 0)

Google distinguishes between these pages if they have different content. However, Google merges these pages (i.e. add back links and PR; show only one page in the SERPS) if they are identical.

Yes I thought this as well.

However some months after the site was launched and PR was gained, it started around August, Sep time to show a greybar on the /, but index was ok.

So I changed all the internal links to / and soon after the grey bar had gone and / was back to showing PR.

I also assumed that / & index would be treated the same, and for a while that certainly was the case until the greybar appeared on the /

doc_z

3:13 pm on Jan 1, 2004 (gmt 0)

Normally merging works very well (even if the pages are on different domains). However, it can take some time. The only case I know which can cause some problems is quickly changing content (e.g. a randomly changing link on a dynamic page).

ciml

5:15 pm on Jan 1, 2004 (gmt 0)

Googlebot has never had a problem with relative links, but a while ago it stopped requesting URLs ending in /index.html and /index.htm, treating links as if they were just to /.

/iNdEx.html, /index.shtml, /index.php and /default.asp are still requested, and will be merged with / only if identical.

jranes

5:35 pm on Jan 1, 2004 (gmt 0)

I have had some relative links that ink had problems with in the past.

Powdork

5:44 pm on Jan 1, 2004 (gmt 0)

Here is something I just noticed. I put some pages up in early to mid december and they still haven't been touched by any spider yet even though they are linked directly from the index. The rest of the site has been deeply crawled by everyone recently and continuously. They're all basic html links but I decided to check the sim spider anyway.
Here's what I found:

when I enter http:/*www.mydomain.com it spiders the site but the relative links it lists don't work. It lists the links leaving the page as http:/*directory/filename.htm/ (which naturally wouldn't work)

When I type in http:/*www.mydomain.com/ the sim spider works perfectly. It lists the links leaving the page as http:/*www.mydomain.com/directory/filename.htm

I am curious if Googlebot is behaving similarly. I am also curious if this is standard spider behaviour or something indigenous to my site. I was looking at this because no search engine has checked out my new pages out since mid december. Any ideas?

GoogleGuy

10:02 pm on Jan 1, 2004 (gmt 0)

Personally, I lean toward always using absolute links. I think we've always gotten this right, but not every new spider will. My philosophy is KISFBTCATOTBD (Keep It Simple For Bots That Could Always Turn Out To Be Dumb). Another way to say it is that the more help you give spiders, the less chance there is that a spider/bot will mess it up.

So even though we should handle this correctly, I lean toward using absolute links just to be safe.

dasboot

10:17 pm on Jan 1, 2004 (gmt 0)

GG, you wrote:

Personally, I lean toward absolute links

Yeah, that's what I use. But if you represent Google, why do you discourse like Plato, and hint like Nostradamus?

This is why these boards are losing their influence.

Could we have one, simple, statement of fact?

Or is this a mere PR exercise: admittedly one of the very highest quality?

Absolute or relative - simple question!

DaveAtIFG

10:59 pm on Jan 1, 2004 (gmt 0)

What was it Brett said a few days ago...
rule #1, don't feed the trolls
if I recall. Does that apply here as well? ;)

I'm with glengara, absolute addresses and a dedicated IP. It wasn't so long ago that Google would occasionally list both www.example.com and example.com in the SERPs. The PR was often split between them.

It sounds to me as if GoogleGuy thinks these sorts of problems are resolved and is awaiting feedback to the contrary. He's also making it very clear how to avoid them.

Chico_Loco

12:24 am on Jan 2, 2004 (gmt 0)

There are 3 types of links.

1. Absolute (http://blah...)
2. Root Relative (/whatever/file.htm)
3. Document Relative (../whatever/file.htm)

It used to be that Google had an issue with no 2 I believe, because if someone tried to view the cache on a document with a linked CSS file or images in format no.2 then the file would attempt to be served from the Google server. As far as I can tell this seems to have been fixed. Perhpas the BASE HREF tag on their cache has been added recently?

Powdork

4:06 am on Jan 2, 2004 (gmt 0)

Well,
I switched everything to absolute and now the sim spider produces the same results when i enter .com or .com/. I can only think that would be a good thing. It's only been a little while but there's still been no spider bites (other than mediapartners, who apparently can spider anything). Wait and see I guess.

gutabo

3:32 pm on Jan 2, 2004 (gmt 0)

WOW! Thanks for the replies!

Yeah Chico_Loco, I was talking about differences between your
1. Absolute (http://blah...)
and
2. Root Relative (/whatever/file.htm)

Yeah Powdork, I'm gonna do that too, since I just changed some links from #1 to #2. I'll just wait and see...

dasboot

3:48 pm on Jan 2, 2004 (gmt 0)

When there's nothing to lose, but potentially something to gain - go with the latter.

With absolute links - apart from an absolutely unambiguous link to a page - you can also move pages from one folder to another without messing up your links. You can also download it to edit and it still has functionality. I use absolutes even on image files for this reason.

The only possible downside is a slight increase in the file size of your pages - but when you think about it, this effect is negligible unless you have an unusually large number of links on a page.

dwilson

3:58 pm on Jan 2, 2004 (gmt 0)

Powdork, can you recommend a sim spider? I'd like to try one on a site I can't seem to get deepcrawled by Google.

GoogleGuy

8:09 pm on Jan 2, 2004 (gmt 0)

dasboot, sorry if I wasn't clear: absolute links have less potential for getting messed up. Even though it shouldn't make a difference, I recommend absolute links.

glengara

8:16 pm on Jan 2, 2004 (gmt 0)

I take it KISFBTCATOTBD is relevant to IPs as well? ;-)

Powdork

8:59 pm on Jan 2, 2004 (gmt 0)

Powdork, can you recommend a sim spider? I'd like to try one on a site I can't seem to get deepcrawled by Google.

SearchEngineWorld has one
[searchengineworld.com...]
(this link won't work, see message #28)

I hope that's ok to give since its linked from the homepage here.

absolute links have less potential for getting messed up. Even though it shouldn't make a difference, I recommend absolute links.

That's an unusually clear statement. How do we know you're the real GoogleGuy?;)

[edited by: Powdork at 11:00 pm (utc) on Jan. 2, 2004]

gutabo

10:17 pm on Jan 2, 2004 (gmt 0)

Powdork:
I can't get to the sim spider. The page keeps asking me to login. WTH?

BigDave

10:33 pm on Jan 2, 2004 (gmt 0)

I have never had any problems with links getting counted for things like PR with root relative links, but...

Googlebot seems to follow those links to new content faster when I use absolute links.

Of course, the timing may have just been right when I changed to absolute links on my "what's new" page, so who knows.

I now stick with absolute in most cases, just because I *know* that it works.

Powdork

10:59 pm on Jan 2, 2004 (gmt 0)

Follow the links from the webmasterworld home page to search engine world->se tools->sim spider

dirkz

4:16 pm on Jan 3, 2004 (gmt 0)

> So even though we should handle this correctly, I lean toward using absolute links just to be safe.

This is a perfectly valid statement. With software you never know :-)

dirkz

4:18 pm on Jan 3, 2004 (gmt 0)

By the way, I always use relative links. This way I can publish to wherever I want. Never as long problems with this.