302 Redirects and Absolute vs Relative Links - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

302 Redirects and Absolute vs Relative Links

As per Googleguy's recommendation.

annej

10:14 pm on Jun 10, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

googleguy wrote the following but I'm basically a writer on a history topic and just know basic HTML. So I would really appreciate some help here. I'm sure there are others like me who would be helped as well.

I recommend absolute links instead of relative links, because there's less chance for a spider (not just Google, but any spider) to get confused. In the same fashion, I would try to be consistent on your internal linking. Once you've picked a root page and decided on www vs. non-www, make sure that all your links follow the same convention and point to the root page that you picked. Also, I would use a 301 redirect or rewrite so that your root page doesn't appear twice. For example, if you select http://www.example.com/ as your root page, then if a spider tries to fetch http://example.com/ (without the www), your web server should do a permanent (301) redirect to your root page at http://www.example.com/

I understand we need to decide on using the www or not. I'm going with www so I write http://www.example.com but what I'm not so sure about is when I refer to a page in the same directory. I usually write something like <A HREF="patterns.htm"> to link to that page in the widgets directory. Here is the part I'm not certain about. If I change the links to <A HREF="/patterns.htm"> with the slash have I made it absolute or do I have to write out the whole http://www.example.com/patterns.htm to be safe?

Also I'm on a shared server with a small company who uses apache servers. How do I write the 301?

[edited by: ciml at 10:56 am (utc) on June 13, 2005]
[edit reason] Examplified [/edit]

annej

2:23 pm on Jun 14, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I wrote the original message several days ago and it just now got approved. I have since found I can't use <A HREF="/pagename.htm"> it looks like I'm stuck with going through hundreds of page and changing my links to the full <A HREF="http://www.mydomain/pagename.htm"> If anyone knows an easier way please let me know.

On the redirect I had to ask my server company to change it for me. Apparently there wasn't way to do it myself. It would still be good to get some information on this thread on how to do it though as I suspect I'm not the only one on this forum who doesn't know all the technical ins and outs.

Thanks to Helleborine there is some good information on how to discover if you have been hijacked at "302 Hijacks for Dummies". If you are concerned that you might have been hijacked just sticky me for the URL. I did find one hijack on my site.

steveb

6:28 pm on Jun 14, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

An absolute link is that full path: [mydomain...]

If you have a search/replace function in whatever you use to make webpages, you should be able to replace what you have with the absolute links in not too long a time.

bumpski

12:18 am on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You should look into the BASE directive. This is a method of establishing your sites BASE URL on each page.
Try www.w3c.org Specifically
[w3.org...]
These documents are not to informative but its the right starting point.
One directive says all your relative URLS should be BASED upon the BASE. Technically this should achieve the same goal if Google follows the rules!

I haven't personally done this!

Front Page has a dialog to set this, but I think you'd have to go to every page, at least in my antiquated version

annej

12:49 am on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

What I'm using is far more antiquated. Does anyone remember HTML notepad?

Thanks to everyone who both here and in stickies gave me suggestions on software for replacing my links. I'm going to explore them and see what looks simple enough for me. ;)

stu2

3:49 am on Jun 15, 2005 (gmt 0)

10+ Year Member

Bumpski

Why would you want to use the BASE HREF metatag if all your links are absolutes? I'm confused about that.

If I made BASE HREF="/" for all my pages wherever they are what would my links look like? Or should I make my BASE HREF equal to the current folder of the page?

Does this apply to IMG SRC's too?

bumpski

10:04 am on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Annej's example was actually about a relative link, this is where I believe the BASE directive is useful. If you've got all absolute links, it doesn't do much for you other than document the BASE link of the current page itself.
I'm definitely not an expert here, it just seemed like a tool that would help solve Annej's problem.

It's just one line of HTML you put in your HEAD section I believe, instead changing hundreds of relative links to absolutes.

Since this documents where your page should have originated, it can help identify HiJacks. Think about it, each page itself does not document its own location or source without this.

larryhatch

10:30 am on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I have the base=href (full URL) -and- full length absolute addresses for all internal links.
The only exceptions are images, and the base= directive takes care of those.

As for non-www vs www.yoursite.com, I definitely prefer the with www. Why? Not really sure here.
Non-tech-savvy visitors have come to expect the www, and I feel better providing it.
GG gave what I think is really good advice (see quote above) and I follow it to the letter. -Larry

TravelSite

11:07 am on Jun 15, 2005 (gmt 0)

10+ Year Member

I've always assumed relative linking is the best method. It seems common sense to me - why write "http://www.<any website>.com/about-us/" when you can simply do "/about-us/" - saving time and space, making pages faster to load.

If I didn't know better I would think Google is asking us to do this as its having problems identifying which sites certain pages should belong to - e.g. when several domains point the same hosting account/website.

hutcheson

11:30 am on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I don't use an HTML editor either -- I use an Emacs clone, so all markup comes directly or indirectly from my fingers. I don't want some idiot "website manager" mangling my links: in my experience, they ALWAYS do it wrong. If a site goes through even a fine tool like Dreamweaver, I've have to waste time hand-tweaking javascripted links so it WON'T recognize and modify them. And don't even THINK about Microsoft tools -- nobody in that whole asylum ought to be trusted with anything more complex than a wooden spoon.

My examples are going to assume you know how to do "A" tag with "href=" stuff, so we don't mess with the forum rules. I'll give examples of what you can do:

"http://example.com/" or "http://www.example.com/" or "http://foo.example.com" or http://example.com/directory/file.htm" are all absolute URLs. -- they contain a "protocol" ("http://") and a domain name (e.g., "example.com") and possibly a file name. (If there is no file name, most servers know what file to deliver.)

"http://" is not the only possible protocol -- commerce servers use "https://"; "ftp://" and others are used for specialized purposes. But any URL that starts with a protocol is absolute.

If the protocol is missing, then the URL is relative. Its location is based on a BASE, which is normally the directory that the current page resides in.

"page2.htm" is contained in the same directory as the current page.

"subdir/page2.htm" is contained in a subdirectory, "subdir" or the directory containing the current page.

"../page2.htm" is contained in the PARENT directory of the directory containing the current page.

"/page2.htm" is contained in the root directory of your website -- which may be the same as the directory containing the current page, or it may be a parent of a parent of a parent of a parent of it.

The BASE meta tag merely says, "for purposes of relative URLs, don't use the directory this file is in -- use this other directory instead.

I like relative URLs. Googleguy is right to not trust spiders, but ... any spider too stupid to handle relative URLs well won't survive on today's web anyway. And even google.com doesn't use exclusively absolute URLs.

The advantage of relative URLs is that you have a great deal of freedom to move website images around on your local machine or your server, or both, from one directory to another, and links to "neighbor" pages still work, without any fancy relocating tools (which, in my experience, are not sophisticated enough to handle, say, javascript-generated links -- one of my favorite techniques for some links that spiders don't need to follow, or arent' supposed to follow.)

And ... I've worked on too many very large programming projects, where careless absolute "include" directives made it very difficult to pick up someone else's project. Absolute file links are (in my experience) just flat evil, and I only have one brain to use for both C++ and HTML coding.

Note that absolute links are absolutely no protection from page-scrapers for non-javascript-generated links. It is a trivial matter for a perl program to check for a BASE directive, and modify it to point to the scraped mirror. Likewise, hard-coded absolute links can be easily detected and modified to make absolute links to the scraped pages. And just because the scrapers aren't doing this yet, doesn't mean they won't be doing it later this afternoon.

Javascripted links have disadvantages: people who are forced to use IE and are yet concerned about security will have to turn off Javascript, because Microsoft's version is so badly misdesigned for security. BUT: JAVASCRIPTED ABSOLUTE LINKS ARE ALMOST CERTAINLY SAFE FROM ALL LIKELY SCRAPER SCENARIOS.

Link spiders won't be your problem for unscripted links, whether relative or absolute; javascripted links are safe from link spiders and page scrapers, but can be damaged by many website mastering tools.

That's your options. Season to taste.

nickied

1:14 pm on Jun 15, 2005 (gmt 0)

10+ Year Member

annej

A couple of comments on your 2 questions:

do I have to write out the whole http://www.example.com/patterns.htm to be safe

steveb already answered this bit, but again, yes, this is absolute.

As an aside, I think good site management should use relative internal linking. However, I understand the need for absolute as it relates to the page jacking issue.

How do I write the 301?

There are a number of 301 examples in various threads here in WebmasterWorld. This is what I've gleaned from WebmasterWorld and am using on my sites:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST}!^www\.mydomain\.com
RewriteRule ^(.*) [mydomain.com...] [R=301,L]

A search for 301 or 301 non should give you some more ideas and if I've made any mistakes above, I'd welcome suggestions.

What I'm using is far more antiquated. Does anyone remember HTML notepad?

Thank goodness you're using HTML notepad and not Frontpage :) I understand you probably don't have the facility with this to do massive global replacement. I use a very old version of Homesite (predecessor of DW I think?) which does global replacements across multiple files and directories very nicely (with regex's if needed).

You may want to search around for another utility to do this for you. I don't know how big your site is, but I don't think I'd want to be doing this by hand. I expect you, like myself, aren't a "unix person"; there are probably a bunch of utilities to do it under unix as well if you can get someone to help you locally.

Hope that is of some assistance.

Regards,

Jim

Reid

2:49 pm on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

if you are using relative links and not having any problems then I don't see any reason to make global changes.
googlguy said that absolute links are 'safer' because of less screwups but if your relative linking is done correctly then what is the problem?

anyone using relative linking should have a base meta tag on each page. This just confirms to the user-agent where they are now, so that they don't get lost. Kind of redundant since the user-agent had to have the URL already just to get to the page but safer than letting the user-agent assemble URL's out of thin air.

The base meta tag can also inject [www....] into every request.

whatever method you choose, as googleguy suggested, should be consistant throughout the site.

some virtual hosts won't allow you to 301 redirect non-www to www. They use the non-www with a 302 redirect for tracking. This is where the base meta tag (with the canonical [)...] can come in handy. Basically if any user-agent arrives at the page with a non-canonical url - or the non-www version, the base meta tag will force it to use the [www....] URL.

stu2

7:35 pm on Jun 15, 2005 (gmt 0)

10+ Year Member

Can anyone point me to GG's post where he gave the advices related to what we can do to protect ourselves from 302 hijacks (not 100% of course). I can't seem to find it.

annej

8:30 pm on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The GG post on this is message #7 at
[webmasterworld.com...]

stu2

3:19 am on Jun 16, 2005 (gmt 0)

10+ Year Member

annej

Thanks for that. It has helped, but it wasn't the post I was recalling. Maybe it wasn't by GG, or maybe even it was on another forum, but I remember something about approx 4 or 5 point plan for making your site less vulnerable to 302 hijacks, which included all those points made by GG in that post.

I've done the 301 redirect from non-www to www. I was looking for a (that) check-list of what else to do.

GG didn't seem so perturbed about the absolute links issue and didn't seem to mention it having an effect on 302 hijacks. So just how does not having absolute links make you more vulnerable to hijacking?

theBear

3:37 am on Jun 16, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Absent a 301 redirect rule set having relative hrefs allows your site to be split apart by the search engine bots.

If you have absolute hrefs, and no 301 redirect your site can still be split apart but it requires more work on someones part because they can't enlist the search engines bot to shred the site. It will stop on the page it starts on. Can still be done though, just a bit more work.

With proper 301s it can't be split in that manner, regardless of the type of hrefs used..

The intent of the splitting is to degrade your site by triping various Google gotchas like massive duplicate content etc.

I do believe that someone has admitted that only a site that is degrading can have a page hijacked.

Of course the proper inital setup of your server should only allow one valid server alais to be visable. No split is possible.

DaveAtIFG

4:04 am on Jun 16, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Is this [webmasterworld.com] the thread you're thinking of stu2?

annej

10:29 pm on Jun 18, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I did my redirect for www last Tuesday. I'm still seeing the non www links when I look at supplimental results. Will it take a little longer to see that change? I'm hoping it's just that Google hasn't yet spidered the whole site and not that the redirect isn't working.

bumpski

5:41 pm on Jun 19, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Thought I'd follow up.
Since I did I 301 redirect about 3 weeks ago, it appears Yahoo had most of one of my one site's pages indexed with the non-www domain. Slurp is hitting 301 redirects for just about every page on this site. June 14,15th mainly Yahoo hit many 301s on my site, it appears to have not affected Yahoo's results much, even today June 19. Yahoo appears to treat www and non-www almost identically.

Slightly different number of pages reported for site:www.aaaa.com versus site:aaaa.com, on Yahoo

I still have more pages properly indexed by Yahoo than Google, although before March Google had 100% of my pages indexed for many months, but now that stands at 65% indexed 35% URL only. Don't have a clue why. Many URL only pages are smaller, but some have lots of content, and still are URL only.

Tomseys

1:44 pm on Jun 20, 2005 (gmt 0)

10+ Year Member

I changed my relative links to absolute links and I'm generally seeing much better ranking except in G. I've noticed that M and Y have the correct urls but G seems to have my sites split up.

My host uses a linux server. What is the best way to implement the 301? Have people had problems after they did the 301 redirect?

Tomseys

3:12 pm on Jun 20, 2005 (gmt 0)

10+ Year Member

I find it interesting that a lot of high ranking sites on very competetive keywords, in fact most I've found, don't do the 301 redirect. A majority of them however do use a measure of absolute linking.

stu2

12:48 am on Jun 22, 2005 (gmt 0)

10+ Year Member

DAVEatlFG

Yep that was exactly the thread I'd been looking for. Thanks. Not only does it have the message I'd been looking for but I learnt some more from reading all the other messages in that thread. (I'd only just discovered wemasterword the day I'd read that thread and hadn't realised the significance of it.. I'd kinda been overwhelmed by the volume of information here... getting a bit more used to it now).