homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Marketing and Biz Dev / General Search Engine Marketing Issues
Forum Library, Charter, Moderators: mademetop

General Search Engine Marketing Issues Forum

    
Blog URL Structure for SEO
skuba

10+ Year Member



 
Msg#: 3331359 posted 11:53 pm on May 4, 2007 (gmt 0)

I have done a lot of research as far as URLs for SEO in the past. I have led many URL rewrites projects in the past to convert dynamic to static URL.

But now, I am working on a blog project and these blogs already have SEO friendly URL options. I used to make all files be .htm , like in

www.mysite.com/products/this-is-my-product.htm

but now, I see a lot of blogs like this:

www.myblog.com/articles/this-is-a-cool-article/

The big questions is:
Do you guys think that it still applies that it's better to have all URLs ending in .htm?

Like in
www.myblog.com/articles/this-is-a-cool-article.htm

Thanks so much

 

agerhart

WebmasterWorld Senior Member agerhart us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3331359 posted 6:26 am on May 8, 2007 (gmt 0)

I have never seen any evidence, nor any theories, that the file extension is a factor in rankings.

caveman

WebmasterWorld Senior Member caveman us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3331359 posted 5:21 pm on May 8, 2007 (gmt 0)

Nor have I. Though I wish I'd see a few less .pdf's in the SERP's. ;-)

Personally I much prefer:
www.myblog.com/articles/this-is-a-cool-article/

Not for ranking purposes, but in the event that your system changes and you shift from .htm to .php or something yet to come. Makes updating sites painless. I once had to move a site from .asp to something else, when all the .asp URI's were indexed with the .asp ... and if I can help it, I'll never be in that position again.

Frida

5+ Year Member



 
Msg#: 3331359 posted 12:50 pm on May 9, 2007 (gmt 0)

It's all the same for search engines whether you write it with or without .htm at the end. Personally I prefer it without the .htm part - looks much more neat to me.

John_Blake

5+ Year Member



 
Msg#: 3331359 posted 2:15 pm on May 11, 2007 (gmt 0)

I am one of those convinced there is no differencein SERPs - do it as you wish.

J.

graywolf

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3331359 posted 3:26 am on May 12, 2007 (gmt 0)

I once had to move a site from .asp to something else, when all the .asp URI's were indexed with the .asp ... and if I can help it, I'll never be in that position again.

I've been in the same situation and it's horrible, it might not be one of those things you can understand until you experience, but do it right the first time and you'll be glad you did.

[edited by: caveman at 4:13 am (utc) on May 12, 2007]

The_Fox

5+ Year Member



 
Msg#: 3331359 posted 12:26 pm on May 18, 2007 (gmt 0)

Being a blog your urls will be automatically generated right?
If its a word press blog, then you can change the settings to create filenames like so-and-so.htm, which is more SE friendly rather than a www.yourname.com/blog/?page2.

Please note that the keywords in the file name is what matters to a crawler. So,

www.abc.com/google-product-free.htm is better than
www.abc.com/blog/page-1-automatic

Cheers!

trooper27

5+ Year Member



 
Msg#: 3331359 posted 8:18 am on May 21, 2007 (gmt 0)

Being a blog your urls will be automatically generated right?
If its a word press blog, then you can change the settings to create filenames like so-and-so.htm, which is more SE friendly rather than a www.yourname.com/blog/?page2.

Please note that the keywords in the file name is what matters to a crawler. So,

www.abc.com/google-product-free.htm is better than
www.abc.com/blog/page-1-automatic

Cheers!

Yes, the keywords do matter. Whether the extension has the .htm part or not - doesn't matter so much (at all) really.

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3331359 posted 5:55 am on May 28, 2007 (gmt 0)

The file extension does not matter.

The question for me is whether to have the trailing slashes... and how to handle all cases of not having them.

Here are two discussions on this that I've followed.

Display URI's in the SERPs
Google vs Yahoo! vs MSN
[webmasterworld.com...]

Rewriting URLs - what does Google like?
Best practices for rewriting unfriendly URLs
[webmasterworld.com...]

In this second thread, encyclo points out the problem of using the trailing slash on all pages....

example.com/ford-mustang-2007/

The above is sub-optimal due to the trailing slash. The issue with the "fake directory name" style (ie. pretending that you have a physical directory called /ford-mustang-2007/ on the server) is that you need more complex rules to account for the trailing slash.

Unanswered for me, though, is how you would resolve the question of the missing trailing slash on a physical directory. I note that in WebmasterWorld...

http://www.webmasterworld.com/google/
...returns the Google forum, as expected.

[webmasterworld.com...]
...does not append the trailing slash, and returns a 404.

As encyclo points out, part of the problem of adding a slash in the above example is that it's a pseudo directory.

But, as I've thought about it, if you had a physical directory in an extensionless environment, you would still have problems with the server adding the slash. If it did, how would it handle a situation where a page-name and a directory-name were the same?

Would you have rules to not do a rewrite in those cases where there was not a 404?

If so, you would still have problems with the Yahoo and MSN display URIs described in the first of the above threads.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3331359 posted 3:22 pm on May 28, 2007 (gmt 0)

The missing trailing slash problem is resolved automatically in Apache by testing a slashless request_uri which is not found (404) to see if it resolves to an existing directory. If so, then Apache mod_dir will append a slash, and re-start request processing instead of returning a 404. This behaviour can be defeated if mod_alias or mod_rewrite is used to intercept these requests, or if mod_dir is not available on the server.

This is another one of those things that is a "non-problem" --like URL and domain canonicalization-- if the server is properly configured.

Again referring to Apache, the problem of resolving slashless URLs to files or directories can be addressed by doing checks for 'files exists' and/or 'directory exists' using mod_rewrite's RewriteCond directive, checking the REQUEST_URI and/or REQUEST_FILENAME variables with the "-d" and "-f" flags. However, in the case where both a directory and a file exist, the Webmaster will have to choose which will take priority in the case where both exist, and write the code accordingly. The default behaviour (using mod_dir only) is to resolve slashless URLs to a file.

Best practice is to use a trailing slash in links where a directory is referenced, and no slash where an extensionless file is referenced, in accordance with HTTP conventions.

Jim

[edited by: jdMorgan at 3:25 pm (utc) on May 28, 2007]

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3331359 posted 7:06 pm on May 31, 2007 (gmt 0)

Jim - Thanks for your very helpful answer. It resolves a lot of questions.

One more question (I think still on topic with regard to the original post), that your answer brings up...

However, in the case where both a directory and a file exist, the Webmaster will have to choose which will take priority in the case where both exist, and write the code accordingly. The default behaviour (using mod_dir only) is to resolve slashless URLs to a file.

In part, because of potential problems with Yahoo and MSN dropping trying slashes, I've been trying to get my head around how to set things up to avoid this potentional ambiguity.

Would there be any problems in simply not using default index files... index.htm, index.html, and "index" (or whatever the extensionless equivalent would be) among the pages in your subdirectories?

I'm assuming that the server would be set up to return the subdirectories with the trailing slash if you did use the index file. I'm just thinking of not using them (except in the root, of course), to avoid this problem of ambiguity.

So that (assuming you had no "index" file) a request for, say...

www.example.com/widgets/

...would always give a 403?...

Requests for other pages in the directory...

www.example.com/widgets/pageone
www.example.com/widgets/pagetwo

...would return requested pages?...

And a request for...

www.example.com/widgets

...would return the "widgets" page... with no ambiguity that it might be...

www.example.com/widgets/index

Is this a valid approach, or is there a pitfall I'm not envisioning? I've always used index files in directories with html files, so I'm not sure what would happen if omitted them (in the subdirectories). Test pages in subdirectories without index files on servers not necessarily properly set up have worked fine.

[edited by: Robert_Charlton at 7:37 pm (utc) on May 31, 2007]

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3331359 posted 8:25 pm on May 31, 2007 (gmt 0)

The pitfall is loss of traffic due to appended slashes or (lack thereof).

The "Yahoo problem" is a non-problem, in that they're only dropping slashes in the display URL -- the link-text, not the link. Hopefully, they will fix this error but if not, then your only 'exposure' is for people who (for some reason) type in the URL that they see, or copy and paste it as text (rather than copy-link).

Overall, the best practice would be to avoid naming extensionless files and subdirectories which reside in the same directory identically, thus entirely avoiding this issue.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / General Search Engine Marketing Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved