Forum Moderators: open
I have been looking closely at the format major search engines (Google, MSN and Yahoo) return search results for my website.
It seems there are small differences in the way they point to a URL where they find a result. This is illustrated by the following summary, showing how the results are formatted for a search term found in the root of the website as well for a search term found in the index file of a subdirectory.
Google: www.mydomain.xyz/
Google: www.mydomain.xyz/directory/
MSN: www.mydomain.xyz
MSN: www.mydomain.xyz/directory/index.html
Yahoo: www.mydomain.xyz
Yahoo: www.mydomain.xyz/directory
Yahoo adds no trailing slashes where Google does add them.
Google and Yahoo seem to be consistent within their own results, MSN behaves a bit different in that it adds index.html for results found in the index file of a subdirectory.
Of course all results resolve to the same URL in the end....
I am just wondering now if this is caused by something in my HTML code, or if this is 'by design' of the various engines.
Enjoy!
Herman
You didn't happen to use a BGcolor tag somewhere, there Bill G.? :-)
Hmm... No smiley face showing!
Larry
The biggest factor in how these urls appear is how they are actually written in various links. Since all engines need to do some kind of duplicate filtering, which version of a given page ends up being shown may well depend on which version is actually discovered more often in a crawl. Keeping urls as consistent as possible within anchor tags is the best way to get good results.
I have an index.html file in every directory on my website. These files contain the introduction and menu for the relevant section. For example the directory www.mydomain.xyz/courses contains an index.html file introducing the courses I offer and a menu which leads the visitor to pages in that directory describing the various courses in detail.
I recall that I read somewhere (don't remember where though....) that it would be better not to refer to the index file in a directory when linking to that directory.
What I do now is link to (for example) courses/index.html, but it might perhaps be better to link to courses/ (without the index.html). Any thoughts on this? Could this perhaps explain why MSN throws in the index.html in results found in subdirectories?
Enjoy!
Herman.
/about/. In that folder it could be an index.html or index.php or index.asp or default.asp - we don't know.
There are several reasons for that:
One last question regarding HTML and this linking:
would a link like ./#anchor be valid and resolved by the majority of browsers, or would it be better to use index.html#anchor?
I am asking this because my HTML validator says ./#anchor is broken, but IE6 and Firefox both resolve this link properly.
Enjoy!
Herman.
You are on page
/bob/fish.html
./
/bob/
/
On page links work just fine.
For example:
You are on page /bob/fish.html
On that page you have a link to ./
This link will send you to /bob/
--snip--
On page links work just fine.
Thanks for clarifying this.
My question remains though....
In every folder I have an index.html file.
So if I link from /bob/fish.html to ./#target I expect the browser to display /bob/index.html#target
The question is if the ./#target construct is valid HTML, will it be correctly resolved by (most/all)browsers? I know that a link to index.html#target is handled correctly, but I am trying to get rid of the index.html links and replace them by ./ or ../ or / or whatever is appropriate in that particular case.
Enjoy!
Herman.
My starting post said:
MSN: www.mydomain.xyz
MSN: www.mydomain.xyz/directory/index.html
So, I am looking for a way to get rid of the /index.html shown only in MSN results.
I can not try something on-the-fly as MSN has to spider and index my site first.
I am aware there are some alternatives to link to index.html files now.
In this discussion I am also wondering if /#target is a valid construct, regardless how we arrive at the relevant index file. Perhaps the second question should be in a separate topic though...
Enjoy!
Herman.
If you had linked to just /folder/ then there would be no way for them to know what the index file is actually called, and the index filename would not appear in the results.