Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Keywords in URL

Can google separate out words in URL?

         

Onders

11:09 am on Jul 4, 2007 (gmt 0)

10+ Year Member



Hi - I assume (please correct me if I'm wrong) that a URL like this:

www.example.com/redwidgetswhitewidgets

is not going to be as optimised / do as well for "red widgets" and "white widgets" as the following URL

www.example.com/red-widgets-white-widgets

but can google separate out the words? Do you think google in the first URL will actually realise that it is a page on widgets (obviously putting all the content aside)

I'm wondering how much difference these 2 different URL's can make - would be grateful for any input!

Thanks

[edited by: tedster at 6:27 pm (utc) on July 4, 2007]
[edit reason] switch to example.com - it can never be owned [/edit]

tedster

4:41 pm on Jul 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know others who disagree, but I still don't think Google separates out concatenated keywords in the url for ranking purposes. There's too much ambiguity involved (the letters can be combined in different ways), and the keyword-in-url factor is too small a part of the algorithm to warrant a lot of effort.

The bold letters we see in the SERP are a last minute character-match routine, not evidence that the algo necessarily used the words in calculating the rank.

I could be wrong - that definitely happens ;) This is just my gut level sense of things, based on workiing with a lot of sites.

Onders

3:43 pm on Jul 15, 2007 (gmt 0)

10+ Year Member



I think it's interesting that you suggest that keyword-in-url is such a small part of the algorithm. I'm not disagreeing with you, but just think of the URL as the page title effectively.. it's the first thing that people and perhaps google sees about that page (bar off page factors of course) and effectively should describe what that page is.

Although the URL title is open to abusing, a page on blue widgets should have a title related to blue widgets, so I wonder if this is more important than we think..

What I hadn't considered before is that "the bold letters we see in the SERP are a last minute character-match routine, not evidence that the algo necessarily used the words in calculating the rank."

This is really possible - but makes analysis for webmasters yet harder!

pageoneresults

3:55 pm on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I know others who disagree, but I still don't think Google separates out concatenated keywords in the url for ranking purposes.

I'll agree.

Keywords in the URI are always of benefit. Forget about the SEs for a moment and think maintenance and manageability after the fact. I find it easier being able to type in a short URI that is based on taxonomy and finding my destination quickly and efficiently. But, that's just me. ;)

If you take those 200+ factors that are considered when ranking a page, I feel strongly that URI naming conventions are of great importance.

Onders

4:11 pm on Jul 15, 2007 (gmt 0)

10+ Year Member



Sorry - you feel that naming conventions are of great importance? It sounded initially like you agreed with Tedster and that they are not that important...

Ok, If you don't consider search engines what is better for the user:

www.example.com/bluewidgets
www.example.com/blue-widgets

I wonder if much research has been done about this... and what users prefer and recognise.

I would defintely say though that with a longer URL having hyphens in it would make it easier for a user to read / remember / pick out main words..

I'm also not too sure about another thing - even if there are 200 + factors in the algo, with Google's resources I'm sure they don't just say, "this is a minor aspect of the algo - let's not bother analysing it too much". On the contrary, I think they aim for perfection.. so probably do go into huge amounts of detail and analysis into each part..

[edited by: tedster at 5:19 pm (utc) on July 15, 2007]
[edit reason] switch to example.com [/edit]

g1smd

6:09 pm on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google can spot words separated by hyphens or by dots, but cannot unravel words that areruntogether, or which have underscores within. An underscore is seen as being a part of the single long_word, not a separator.

The highlighting of the search term words in the title, snippet, and URL, in the SERPs is a results display function, not a view into the inner workings of the mechanisms of scoring and ranking those documents.

glitterball

10:24 pm on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



An underscore is seen as being a part of the single long_word, not a separator.

Why is that? Surely Google could read an underscore character as a separator if it wanted to and use that information as an indicator of the content of that page?

tedster

10:34 pm on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's a decision Google made long ago, because they wanted to empower technical searches. Many technical terms, such as the names of FrontPage extensions and so on, begin with an underscore. So their choice was to treat the underbar as a character, not a separator.

It's still not enough of a factor to go back and change legacy urls, IMO. I've got sites that are just wailing in the SERPs that use the underscore character. I built them before this issue became clear (it was a pretty hot debate for a while), and I would not dream of messing with those urls.

<added>
For an interesting study, do a Google search on the
underscore character, the dash, a space, and a period.
</added>

reseller

10:48 pm on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wish to recall a 2005 relevant post by Matt Cutts:


I've stylized the conversation quite a bit, but I remember how impressed I was that Google indexed numbers and some punctuation (come to think of it, search engines have come a long way in five years). With underscores, Google's programmer roots are showing. Lots of computer programming languages have stuff like _MAXINT, which may be different than MAXINT. So if you have a url like word1_word2, Google will only return that page if the user searches for word1_word2 (which almost never happens). If you have a url like word1-word2, that page can be returned for the searches word1, word2, and even "word1 word2".

That's why I would always choose dashes instead of underscores. To answer a common question, Google doesn't algorithmically penalize for dashes in the url. Of course I can only speak for Google, not other search engines. And bear in mind that if your domain looks like www.buy-cheap-viagra-online-while-consolidating-your-debt-so-you-can-play-texas-holdem-while-watching-porn.com, that may still attract attention for other reasons. :)

Dashes vs. underscores [mattcutts.com]

And a 2006 relevant post by Vanessa Fox which I consider a "Vanessa-Evergreen" ;-)

And speaking of putting a dash in URLs, hyphens are often better than underscores [Ed. Note: bolded by Matt :) ]. african-elephants.html is seen as two words:"African" and "elephants". african_elephants is seen as one word: african_elephant. It's doubtful many people will be searching for that.

Guest post: Vanessa Fox on Organic Site Review session [mattcutts.com]

[edited by: tedster at 10:58 pm (utc) on July 15, 2007]
[edit reason] clear up charset issues [/edit]

tedster

10:55 pm on Jul 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just to address a confusion that sometimes come up in this discussion -- there's a difference between multiple dashes in the domain name itself, and multiple dashes in the filepath section of the url.

Onders

8:28 am on Jul 16, 2007 (gmt 0)

10+ Year Member



With this underscore versus hyphen issue - are there anymore thoughts on how much weight this carries in the algorithm. My thoughts were that keywords in URL are fairly important in determining the content and relevance of a page, but does Google see it this way?

lavazza

10:14 am on Jul 16, 2007 (gmt 0)

10+ Year Member



Google ... cannot unravel words that areruntogether

Ermmm... no

Google CAN

CamelBacking (as it's called) is how variables, constants and function names are written in all C-based languages like Java, javascript, php etc so it would be rather surprising if Google's algorithm writers forgot to accomodate something they use themselves

Search for happyBirthday if you need proof... along with approx 734,000 results, you'll be asked:Did you mean: happy Birthday

pageoneresults

1:09 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



CamelBacking

After all these years discussing RunTogetherWords, I'm totally excited that a term has surfaced that describes this, thank you lavazza very much.

And no, I didn't mean Happy Birthday. I meant Happy_Birthday. ;)

P.S. Its unfortunate that CamelBacking is popular in other industries too. :(

Onders

1:29 pm on Jul 16, 2007 (gmt 0)

10+ Year Member



Hi Lavazza - thanks for this..

Although with your example google does suggest "Happy Birthday" if you search for "happybirthday" when you do these 2 different searches you get a whole different set of results..

Google is obviously at a stage where it can separate words well enough to suggest permutations of the search term, but as it gives a different set of results for each example it's obviously not convinced enough that the user is looking for "happy Birthday" and so is still returning the initial results for the one word...

Do you think this works the same for URL optimisation?

I.e. bluewidgetandredwidget

Google may think the page is about blue widgets and red widgets, but is it going to be sure enough to allow it to count in the algo as positive evidence that the page is about blue widgets and red widgets.

pageoneresults

1:58 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



CamelBacking

No, there are no references to that term within the WebmasterWorld.com domain. Not unless its a word that has been filtered. ;)

I just have to say thanks again. Maybe I should have known that term, maybe not? I've never, ever seen that used in all my years of reading.

IanTurner

3:08 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



P.S. Its unfortunate that CamelBacking is popular in other industries too. :(

Methinks that may be a case of term-jacking by the urban dictionary. (No doubt term-jacking will now be term-jacked)

pageoneresults

6:03 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lavazza, is CamelBacking different from Camel Casing? Based on the description below, I would assume it is?

Camel Casing
The first letter of an identifier is lowercase and the first letter of each subsequent concatenated word is capitalized. For example:

backColor

I would also assume that this is language used in Database Naming Conventions?

Marcia

6:22 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Did you mean?" is probably based on query analysis, as would be snippet generation, which I understand is a separate algo that's strictly query dependent, and would be a display function rather than a scoring function.

There isn't just one algo, there are algorithms (plural).

lavazza

8:05 pm on Jul 16, 2007 (gmt 0)

10+ Year Member



If you use "happybirthday" and "Happy Birthday" in 2 different searches you get a whole different set of results simply because there is a whole bunch of sites that actually have "happybirthday" in their keywords, URI and/or content

bluewidgetandredwidget

Google may think the page is about blue widgets and red widgets

The algorithms (yes, plural) don't think - like all software today they're based purely on logic. Artificial Intelligience is still in the embryonic phase - with most conceptions of it being either stillborn or retarded

Its unfortunate that CamelBacking is popular in other industries too

Why 'unfortnate'?

is CamelBacking different from Camel Casing?

I don't think so

Instead, I think it's down to personal preference. I usually use lowerCaseMajorCase - simply because that's how I was introduced to it in C and then Java, where the 'naming convention' is that classes and interfaces names are MajorCaseMajorCaseEtc, methods and variables are minorCaseMajorCaseEtc and constants are MAJOR_CASE_WITH_UNDERSCORES

It actually felt odd using a major case C for Camel... but figured this is a forum and it was the first word of a sentence

I would also assume that this is language used in Database Naming Conventions?

I'm no expert but I think its pretty much mandatory in many areas where contiguousStringsAreCrucialToPerformance

Any decent browser will modify a white space in a URI to %20

Unfortunately, IE :grr: is both slack and common, hence the abundance of sloppy URIs

Snippet generation?

Do you mean the (typically) 20 or so words on the 2nd and 3rd lines? Google will use the description meta tag... if present... e.g.

<meta name="description"
content="Words that will appear as a snippet or whatever you call it">

Marcia

8:23 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For snippet generation, which is query dependent, they'll pull from where the relevant occurrences of the word/phrase is that was in the user's query.

If it's in the meta description, that'll get used - it's also pulled from page text, navigation elements, alt attributes (not link "title" attributes) Hn elements, etc. When nothing is found (or in some cases where there's been egregious title/meta duplication) the ODP data will be used. I've just seen an instance of the latter, BTW.

As an afterthought, take a hypothetical domain like

www.petcatsexchange.com

Is that domain about cat-lover's topics and products - or something else? ;)

[edited by: Marcia at 8:30 pm (utc) on July 16, 2007]

lavazza

8:57 pm on Jul 16, 2007 (gmt 0)

10+ Year Member



All these are legitimate companies that didn't spend quite enough time considering how their online names might appear.

These addresses are not made up.

Check them out yourself.

============================
"Who Represents" is where you can find the name of the agent that represents any celebrity. Their Web site is

w w w . whorepresents . c o m
============================
"Experts Exchange" is a knowledge base where programmers can exchange advice and views at

w w w . expertsexchange . c o m
============================
Looking for a pen? Look no further than "Pen Island" at

w w w . #*$!land . n e t
============================
Need a therapist? Try "Therapist Finder" at

w w w . therapistfinder . c o m
============================
There's the "Italian Power Generator Company" at

w w w . powergenitalia . c o m
============================
And don't forget the "Mole Station Nursery" in New South Wales

w w w . molestationnursery . c o m /
============================
If you're looking for IP computer software, there's always

w w w . ipanywhere . c o m /
============================
The "First Cumming Methodist Church" Web site

w w w . cummingfirst . c o m
============================
The designers at "Speed of Art" await you at their wacky Web site

w w w . speedofart . c o m /

Gibble

10:16 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No matter how many times I read those, I still chuckle.

pageoneresults

4:05 am on Jul 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Casing Styles
The following terms describe different ways to case identifiers.

Pascal Casing
The first letter in the identifier and the first letter of each subsequent concatenated word are capitalized. You can use Pascal case for identifiers of three or more characters. For example:

BackColor

Camel Casing
The first letter of an identifier is lowercase and the first letter of each subsequent concatenated word is capitalized. For example:

backColor

Uppercase
All letters in the identifier are capitalized. For example:

IO

Credit to DrDoc for providing me with the link to the MSN article...

.NET Framework Capitalization Conventions
[msdn2.microsoft.com...]

What I see a lot of is referred to as Pascal Casing sometimes referred to as Studley Caps, also referred to as Camel Caps. When discussing URI structure, I think the trend is more towards Pascal Case rather than CamelBacking.

Interesting discussion nonetheless. Made me go on an information hunt, thanks.

CainIV

7:00 am on Jul 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Search for happyBirthday if you need proof... along with approx 734,000 results, you'll be asked:Did you mean: happy Birthday

This is likely based on extraction and analysis of data that Google has collected from users.

I think a better question and post Title would have been:

Is there any logistical credit to the domain from a ranking perspective if Google can seperate the domain.

From my experience there is little to no advantage of a hyphenated domain in Google. With some of the other SE's, however, it's a different story...

Marcia

8:11 am on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>Search for happyBirthday if you need proof... along with approx 734,000 results, you'll be asked:Did you mean: happy Birthday<<<

This is likely based on extraction and analysis of data that Google has collected from users.

Query analysis, and there's plenty of co-occurrence data for it.