Welcome to WebmasterWorld Guest from 54.162.226.212

Forum Moderators: mademetop

Message Too Old, No Replies

Blog URL Structure for SEO

     
11:53 pm on May 4, 2007 (gmt 0)

10+ Year Member



I have done a lot of research as far as URLs for SEO in the past. I have led many URL rewrites projects in the past to convert dynamic to static URL.

But now, I am working on a blog project and these blogs already have SEO friendly URL options. I used to make all files be .htm , like in

www.mysite.com/products/this-is-my-product.htm

but now, I see a lot of blogs like this:

www.myblog.com/articles/this-is-a-cool-article/

The big questions is:
Do you guys think that it still applies that it's better to have all URLs ending in .htm?

Like in
www.myblog.com/articles/this-is-a-cool-article.htm

Thanks so much

6:26 am on May 8, 2007 (gmt 0)

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I have never seen any evidence, nor any theories, that the file extension is a factor in rankings.
5:21 pm on May 8, 2007 (gmt 0)

WebmasterWorld Senior Member caveman is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Nor have I. Though I wish I'd see a few less .pdf's in the SERP's. ;-)

Personally I much prefer:
www.myblog.com/articles/this-is-a-cool-article/

Not for ranking purposes, but in the event that your system changes and you shift from .htm to .php or something yet to come. Makes updating sites painless. I once had to move a site from .asp to something else, when all the .asp URI's were indexed with the .asp ... and if I can help it, I'll never be in that position again.

12:50 pm on May 9, 2007 (gmt 0)

5+ Year Member



It's all the same for search engines whether you write it with or without .htm at the end. Personally I prefer it without the .htm part - looks much more neat to me.
2:15 pm on May 11, 2007 (gmt 0)

5+ Year Member



I am one of those convinced there is no differencein SERPs - do it as you wish.

J.

3:26 am on May 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I once had to move a site from .asp to something else, when all the .asp URI's were indexed with the .asp ... and if I can help it, I'll never be in that position again.

I've been in the same situation and it's horrible, it might not be one of those things you can understand until you experience, but do it right the first time and you'll be glad you did.

[edited by: caveman at 4:13 am (utc) on May 12, 2007]

12:26 pm on May 18, 2007 (gmt 0)

5+ Year Member



Being a blog your urls will be automatically generated right?
If its a word press blog, then you can change the settings to create filenames like so-and-so.htm, which is more SE friendly rather than a www.yourname.com/blog/?page2.

Please note that the keywords in the file name is what matters to a crawler. So,

www.abc.com/google-product-free.htm is better than
www.abc.com/blog/page-1-automatic

Cheers!

8:18 am on May 21, 2007 (gmt 0)

5+ Year Member



Being a blog your urls will be automatically generated right?
If its a word press blog, then you can change the settings to create filenames like so-and-so.htm, which is more SE friendly rather than a www.yourname.com/blog/?page2.

Please note that the keywords in the file name is what matters to a crawler. So,

www.abc.com/google-product-free.htm is better than
www.abc.com/blog/page-1-automatic

Cheers!

Yes, the keywords do matter. Whether the extension has the .htm part or not - doesn't matter so much (at all) really.

5:55 am on May 28, 2007 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The file extension does not matter.

The question for me is whether to have the trailing slashes... and how to handle all cases of not having them.

Here are two discussions on this that I've followed.

Display URI's in the SERPs
Google vs Yahoo! vs MSN
[webmasterworld.com...]

Rewriting URLs - what does Google like?
Best practices for rewriting unfriendly URLs
[webmasterworld.com...]

In this second thread, encyclo points out the problem of using the trailing slash on all pages....

example.com/ford-mustang-2007/

The above is sub-optimal due to the trailing slash. The issue with the "fake directory name" style (ie. pretending that you have a physical directory called /ford-mustang-2007/ on the server) is that you need more complex rules to account for the trailing slash.

Unanswered for me, though, is how you would resolve the question of the missing trailing slash on a physical directory. I note that in WebmasterWorld...

http://www.webmasterworld.com/google/
...returns the Google forum, as expected.

[webmasterworld.com...]
...does not append the trailing slash, and returns a 404.

As encyclo points out, part of the problem of adding a slash in the above example is that it's a pseudo directory.

But, as I've thought about it, if you had a physical directory in an extensionless environment, you would still have problems with the server adding the slash. If it did, how would it handle a situation where a page-name and a directory-name were the same?

Would you have rules to not do a rewrite in those cases where there was not a 404?

If so, you would still have problems with the Yahoo and MSN display URIs described in the first of the above threads.

3:22 pm on May 28, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The missing trailing slash problem is resolved automatically in Apache by testing a slashless request_uri which is not found (404) to see if it resolves to an existing directory. If so, then Apache mod_dir will append a slash, and re-start request processing instead of returning a 404. This behaviour can be defeated if mod_alias or mod_rewrite is used to intercept these requests, or if mod_dir is not available on the server.

This is another one of those things that is a "non-problem" --like URL and domain canonicalization-- if the server is properly configured.

Again referring to Apache, the problem of resolving slashless URLs to files or directories can be addressed by doing checks for 'files exists' and/or 'directory exists' using mod_rewrite's RewriteCond directive, checking the REQUEST_URI and/or REQUEST_FILENAME variables with the "-d" and "-f" flags. However, in the case where both a directory and a file exist, the Webmaster will have to choose which will take priority in the case where both exist, and write the code accordingly. The default behaviour (using mod_dir only) is to resolve slashless URLs to a file.

Best practice is to use a trailing slash in links where a directory is referenced, and no slash where an extensionless file is referenced, in accordance with HTTP conventions.

Jim

[edited by: jdMorgan at 3:25 pm (utc) on May 28, 2007]

7:06 pm on May 31, 2007 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Jim - Thanks for your very helpful answer. It resolves a lot of questions.

One more question (I think still on topic with regard to the original post), that your answer brings up...

However, in the case where both a directory and a file exist, the Webmaster will have to choose which will take priority in the case where both exist, and write the code accordingly. The default behaviour (using mod_dir only) is to resolve slashless URLs to a file.

In part, because of potential problems with Yahoo and MSN dropping trying slashes, I've been trying to get my head around how to set things up to avoid this potentional ambiguity.

Would there be any problems in simply not using default index files... index.htm, index.html, and "index" (or whatever the extensionless equivalent would be) among the pages in your subdirectories?

I'm assuming that the server would be set up to return the subdirectories with the trailing slash if you did use the index file. I'm just thinking of not using them (except in the root, of course), to avoid this problem of ambiguity.

So that (assuming you had no "index" file) a request for, say...

www.example.com/widgets/

...would always give a 403?...

Requests for other pages in the directory...

www.example.com/widgets/pageone
www.example.com/widgets/pagetwo

...would return requested pages?...

And a request for...

www.example.com/widgets

...would return the "widgets" page... with no ambiguity that it might be...

www.example.com/widgets/index

Is this a valid approach, or is there a pitfall I'm not envisioning? I've always used index files in directories with html files, so I'm not sure what would happen if omitted them (in the subdirectories). Test pages in subdirectories without index files on servers not necessarily properly set up have worked fine.

[edited by: Robert_Charlton at 7:37 pm (utc) on May 31, 2007]

8:25 pm on May 31, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The pitfall is loss of traffic due to appended slashes or (lack thereof).

The "Yahoo problem" is a non-problem, in that they're only dropping slashes in the display URL -- the link-text, not the link. Hopefully, they will fix this error but if not, then your only 'exposure' is for people who (for some reason) type in the URL that they see, or copy and paste it as text (rather than copy-link).

Overall, the best practice would be to avoid naming extensionless files and subdirectories which reside in the same directory identically, thus entirely avoiding this issue.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month