Forum Moderators: open

Message Too Old, No Replies

Another angle on trailing slashes?

Variations among SE results

         

hermanp

8:58 am on Sep 27, 2005 (gmt 0)

10+ Year Member



My website is build using XHTML 1.0 strict, all pages validate using the on-line W3C validator as well as with CSE HTML validator v 6.
It is a static website, no scripting of whatever flavor is used.
The site is hosted on an IIS-based system.

I have been looking closely at the format major search engines (Google, MSN and Yahoo) return search results for my website.

It seems there are small differences in the way they point to a URL where they find a result. This is illustrated by the following summary, showing how the results are formatted for a search term found in the root of the website as well for a search term found in the index file of a subdirectory.

Google: www.mydomain.xyz/
Google: www.mydomain.xyz/directory/

MSN: www.mydomain.xyz
MSN: www.mydomain.xyz/directory/index.html

Yahoo: www.mydomain.xyz
Yahoo: www.mydomain.xyz/directory

Yahoo adds no trailing slashes where Google does add them.

Google and Yahoo seem to be consistent within their own results, MSN behaves a bit different in that it adds index.html for results found in the index file of a subdirectory.

Of course all results resolve to the same URL in the end....

I am just wondering now if this is caused by something in my HTML code, or if this is 'by design' of the various engines.

Enjoy!

Herman

us60

9:34 am on Sep 27, 2005 (gmt 0)



Just offhand Herman, I would say that it has something to do with the Microsoft host server setup, as read by the Microsoft search engine server that leaves Microsoft the odd man out of your results.

You didn't happen to use a BGcolor tag somewhere, there Bill G.? :-)

Hmm... No smiley face showing!

Larry

tedster

9:38 am on Sep 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



These urls do NOT necessarily resolve to the same content. Even though in the great majority of cases they usually will, still this is not a technical necessity. So any consistency that seems to appear in search results may well be an illusion.

The biggest factor in how these urls appear is how they are actually written in various links. Since all engines need to do some kind of duplicate filtering, which version of a given page ends up being shown may well depend on which version is actually discovered more often in a crawl. Keeping urls as consistent as possible within anchor tags is the best way to get good results.

moltar

10:35 am on Sep 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I always write URLs with trailing slash and see the same results in Y and G. Yahoo drops the trailing slash. It's very annoying! I don't know of any way to prevent that.

hermanp

9:11 pm on Sep 27, 2005 (gmt 0)

10+ Year Member



Thanks for the responses so far!

I have an index.html file in every directory on my website. These files contain the introduction and menu for the relevant section. For example the directory www.mydomain.xyz/courses contains an index.html file introducing the courses I offer and a menu which leads the visitor to pages in that directory describing the various courses in detail.

I recall that I read somewhere (don't remember where though....) that it would be better not to refer to the index file in a directory when linking to that directory.
What I do now is link to (for example) courses/index.html, but it might perhaps be better to link to courses/ (without the index.html). Any thoughts on this? Could this perhaps explain why MSN throws in the index.html in results found in subdirectories?

Enjoy!

Herman.

moltar

9:37 pm on Sep 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's better to link to places without being technology specific. For example, you can link
/about/
. In that folder it could be an index.html or index.php or index.asp or default.asp - we don't know.

There are several reasons for that:

  • You might want to change the technology later on (switch from static HTML to PHP) this way your linking structure does not change.
  • It is a good case of security by obscurity.
  • URLs look cleaner and easier to remember and use.

hermanp

9:52 am on Sep 28, 2005 (gmt 0)

10+ Year Member



I just found the reference I referred to in my earlier post. It is on [webtips.dan.info ]
Add this to the reasons given by Moltar and I am about to edit my internal links.

One last question regarding HTML and this linking:
would a link like ./#anchor be valid and resolved by the majority of browsers, or would it be better to use index.html#anchor?
I am asking this because my HTML validator says ./#anchor is broken, but IE6 and Firefox both resolve this link properly.

Enjoy!

Herman.

hermanp

9:55 am on Sep 28, 2005 (gmt 0)

10+ Year Member



Oops!

The link provided should read [webtips.dan.info...]

I am sorry,

Herman

moltar

1:59 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The article is a bit misleading. If you link to "./" then it will open the index page of current directory, not the site root. For example:

You are on page

/bob/fish.html

On that page you have a link to
./

This link will send you to
/bob/

If you want to go to the root of the site, you need to put just slash by itself, without the dot i.e.
/

On page links work just fine.

hermanp

2:55 pm on Sep 28, 2005 (gmt 0)

10+ Year Member



For example:
You are on page /bob/fish.html
On that page you have a link to ./
This link will send you to /bob/
--snip--
On page links work just fine.

Thanks for clarifying this.
My question remains though....
In every folder I have an index.html file.
So if I link from /bob/fish.html to ./#target I expect the browser to display /bob/index.html#target

The question is if the ./#target construct is valid HTML, will it be correctly resolved by (most/all)browsers? I know that a link to index.html#target is handled correctly, but I am trying to get rid of the index.html links and replace them by ./ or ../ or / or whatever is appropriate in that particular case.

Enjoy!
Herman.

g1smd

3:51 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I never use ../../folder/ type links.

I always start with a / and count from the root: /folder/another/folder/

Always omit the filename if the file is an index file.

Have you tried /#section or /folder/#section (without . in) instead of ./# at all?

hermanp

4:04 pm on Sep 28, 2005 (gmt 0)

10+ Year Member



Please, can we go back to what my original question was?

My starting post said:


MSN: www.mydomain.xyz
MSN: www.mydomain.xyz/directory/index.html

So, I am looking for a way to get rid of the /index.html shown only in MSN results.

I can not try something on-the-fly as MSN has to spider and index my site first.

I am aware there are some alternatives to link to index.html files now.
In this discussion I am also wondering if /#target is a valid construct, regardless how we arrive at the relevant index file. Perhaps the second question should be in a separate topic though...

Enjoy!
Herman.

g1smd

4:14 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If MSN has "index.html" as the URL then they must have seen "index.html" in a link pointing to that page somewhere on the web.

If you had linked to just /folder/ then there would be no way for them to know what the index file is actually called, and the index filename would not appear in the results.

moltar

8:38 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My question remains though....

I answered it, I guess I was not very noticable and clear.

On page links work just fine.

Yes #blah will work with any kind of linking sheme.

hermanp

7:43 am on Sep 29, 2005 (gmt 0)

10+ Year Member



Thanks everybody.
The picture is clear now, I am going to edit some links in my website.

Enjoy!
Herman.

g1smd

11:37 am on Sep 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Run Xenu LinkSleuth over your site too. You never know what problems it might find...