Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Rewriting URLs - what does Google like?

Best practices for rewriting unfriendly URLs

         

Cluttermeleon

12:26 pm on Mar 1, 2007 (gmt 0)

10+ Year Member



I'm going to rewrite my forum URLs so they are friendly for search engines. Is there an acceptable length for rewritten URLs? How long is too long in terms of the number of characters? In addition, how many hyphens are considered unacceptable?

Thanks guys
CM

gendude

3:47 pm on Mar 1, 2007 (gmt 0)

10+ Year Member




This is just my opinion, but as an example, if your site was about cars, I would think that

yoursite.com/ford-mustang-2007/ would be a better URL than

yoursite.com/new-powderpuff-blue-automatic-vinyl-ford-mustang-2007/

Why? Because if you add in too many words, you could end up with people who are searching for vinyl couches, automatic toasters, powderpuff makeup, etc.

I've done some very unscientific testing and I found that the shorter I could keep the URL, the more on-topic the visitors were (based on how they were finding the page). You may want people who are looking for vinyl couches, automatic toasters, etc., but I don't know that they would be your target audience, nor do I think you would get a good click-through rate.

I try to keep my URLs as logical as possible. Even if Google doesn't care, and has no problem with crawling and categorizing the page, I can be a neat freak when it comes to sites :-)

Cluttermeleon

5:05 pm on Mar 1, 2007 (gmt 0)

10+ Year Member



Thanks Gendude

If you use the forum thread in the URL how do you restrict the length of the URL to the main keywords? Sorry if this is a dumb question...

gendude

6:32 pm on Mar 1, 2007 (gmt 0)

10+ Year Member



You'd probably have to look at the support forums of the forum software you are using. It's going to vary by forum, as far as how threads are named. It's only been within the past year or two that a lot of forum developers are working on SEO.

If users are creating threads, I don't know that you can do a lot - if it was just you creating the threads, you could probably give it a good, relevant title, and leave it at that.

Forums are tricky, because in addition to titles, you have to deal with sigs - people will put in their sites, or sayings or quotes, or whatever, and in some cases, Google may get confused by what's in the thread.

Using my example about cars and Mustangs - you could start a thread called "New 2007 Ford Mustangs" and the discussion that follows may include people who drive Volvos or perhaps Camaros, and they list the cars they own/like/whatever in their signature, which Google would see at the bottom of their post.

Rather than display ads for Ford Mustangs, it might display ads for Camaros or Volvos instead (this may or may not be a bad thing).

This goes beyond the scope of what you are talking about, but Ford Mustangs are my favorite example to use here and elsewhere because of the fact that Mustangs are also horses - this doesn't apply to forums like you are talking about, but I use it as an example of needing to have relevant content as much as possible - Google could very easily pick up your page about Ford Mustangs, and mistakenly run ads for horse ranches or horse supplies if you're not careful.

You're going to find that problem with forums, because Google could get confused.

Forums are very tough for AdSense/Google to index properly. There are tags that you might be able to insert in your forum code that would tell Google to ignore posters' signatures. That's something you really need to look into.

added: Actually, I take that back - if you have a forum about say a line of computers from a specific computer maker, and people are putting the computers they have in their signatures, that could actually help you out a lot.

I know, it can be a pain :-/

g1smd

6:41 pm on Mar 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is more important to exclude all forms of duplicate content from your forum linking structure than it is to include keywords in URLs.

For a typical forum (VBulletin, PHPbb, etc) each forum thread can potentially have at least 6 to potentially some 20-ish URLs that all return the same content. You MUST avoid that happening.

Cluttermeleon

1:41 pm on Mar 2, 2007 (gmt 0)

10+ Year Member



Thanks guys, appreciate these pearls of wisdom - you've been a great help

gendude

6:58 pm on Mar 2, 2007 (gmt 0)

10+ Year Member



Just to touch on something I mentioned,

you might read this official Google Adsense question/answer:

[google.com...]

>> What is section targeting and how do I implement it?

>> Section targeting allows you to suggest sections of your text and HTML content that you'd like us to emphasize or downplay when matching ads to your site's content. By providing us with your suggestions, you can assist us in improving your ad targeting. We recommend that only those familiar with HTML attempt to implement section targeting.

>> To implement section targeting, you'll need to add a set of special HTML comment tags to your code. These tags will mark the beginning and end of whichever section(s) you'd like to emphasize or de-emphasize for ad targeting.

You might consider using section targeting to ignore your posters' signatures, depending on if you are running AdSense. You could add this code in your forum software somewhere, and that way if you are running AS, it would help with targeting.

trinorthlighting

7:05 pm on Mar 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I like the shortest url possible, google seems to like that. Less data it has to query when crawling. Plus, if you have a huge site that gets a ton of traffic, having less characters will speed your site a bit on your server side.

encyclo

2:52 am on Mar 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



example.com/ford-mustang-2007[b]/[/b]

The above is sub-optimal due to the trailing slash. The issue with the "fake directory name" style (ie. pretending that you have a physical directory called

/ford-mustang-2007/
on the server) is that you need more complex rules to account for the trailing slash.

If

/ford-mustang-2007/
is a physical directory, Apache will automatically handle requests for
/ford-mustang-2007
(without trailing slash) by issuing a 301 to the trailing-slash version. When faking the directory structure with mod_rewrite, Apache will require extra logic in the rewrite rules to emulate the standard file/directory handling.

Yahoo Search (and I think MS Live too) systematically drop the trailing slash on indexed URLs, so if you don't handle the possibility, then you can happily get indexed but your visitors hit a 404 instead of the required page. If you do add the logic into the rewrite, it is still suboptimal because you're regularly redirecting rather than getting hits to the correct URL.

So, you have two better options. The first is to drop the trailing slash entirely, Wikipedia-style:

example.com/ford-mustang-2007

With this style, you get very clean URLs and no redirect problems. You should probably add logic to handle the page being requested with a trailing slash, but it's not the same issue as above.

The second option, and one which can be considered very safe, is to use a file extension, usually a generic one is best:

example.com/ford-mustang-2007[b].htm[/b]

A good example is right here at WebmasterWorld. With an otherwise redundant file extension, you are faking files rather than directories, and this makes your site extremely easy to spider.

For Googlebot and other major search-engine bots, either style (extensionless or with a generic extension) works well. If you are extremely risk-averse, go for the second option of using a file extension.

g1smd

8:53 pm on Mar 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> Yahoo Search (and I think MS Live too) systematically drop the trailing slash on indexed URLs <<

Err. They do drop it from the visible URL printed in the SERPs, but it is retained in the actual clickable link, the URL that gets visited.

encyclo

8:59 pm on Mar 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the correction on that point, g1smd, yes the clickable link is retained with the trailing slash now. However the underlying issue remains the same in that it is better in my opinion to use "fake" files rather than fake directories - the trailing slash is unnecessary and problematic.

gendude

4:12 pm on Mar 5, 2007 (gmt 0)

10+ Year Member



Sorry, I didn't mean to incite a debate about trailing slashes - I've just noticed that some forum software does append that on there - I don't know if it's because they may an generate additional part of a name (i.e. each page is 20 posts long, so they would add /page2 or whatever to the name for the 21st post).

I guess it's good that I accidentally brought it up since depending on the software you may run into it :-)

I think the most important thing is to watch the targeting if you are doing AdSense, or are wanting to keep the site indexed properly.

Robert Charlton

7:41 pm on Mar 5, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If /ford-mustang-2007/ is a physical directory, Apache will automatically handle requests for /ford-mustang-2007 (without trailing slash) by issuing a 301 to the trailing-slash version. When faking the directory structure with mod_rewrite, Apache will require extra logic in the rewrite rules to emulate the standard file/directory handling.

The subject of trailing slashes, Yahoo and MSN and Google, and best practices was also discussed in this thread...

Display URI's in the SERPs
Google vs Yahoo! vs MSN
[webmasterworld.com...]

I'm still not quite satisfied with the answer I got in that thread, about the way directories in WebmasterWorld are set up...

[webmasterworld.com...]
...returns the Google forum, as expected.

[webmasterworld.com...]
...does not append the trailing slash, and returns a 404.

I'm assuming this may have been done for good reason, but I'm not sure what it is.

encyclo

1:17 am on Mar 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



about the way directories in WebmasterWorld are set up... (...) I'm assuming this may have been done for good reason

I don't have any official reply on that, but I think it is simply an inevitable consequence of the current setup. It is an excellent example of the problem I was describing above, in that the "directories" aren't physical ones, and the keyword structure makes it extremely difficult to make an efficient rewrite to account for the 100+ forums. The most important pages, however, all use a file extension for maximum compatibility.