|Keyword density on Altavista|
Keyword density on Altavista
To check keyword density for a webpage with the keyword freeware for example, do I have to count:
free ware as 1 keyword or not?
free-ware as 1 keyword or not?
free software as 1 keyword or not?
free soft ware as 1 keyword or not?
Hi elie, welcome to the board. Each engine is different. Generally, yes you wouls count them as different, but be careful that your secondary keywords to overyload you on your pimary keywords. I'd becareful of "free" or "soft" being to high on the page. Some engines do use stemming by default and you'll get the density of those to high if you are not careful on the compound words.
keyword density checker [searchengineworld.com]
I have heard that the word 'free' is pretty much considered a stop word on av.com (not however on av.co.uk) Also it seems using the word FREE in your title rather than 'free' works better.
I must of missed this the first time around...
For optimal results on AV your going to want to take the keyphrase and analyse the density of each word in that phrase seperately. This is because AV uses something called IDF (inverse document frequency). What IDF actually does is compare individual words in a phrase against the entire AV database, they then place more importance on words that are found less often in the database. So for the phrase "free ware" AV would place more importance on the density of ware, because it's less common in the AV database.
Good avice above, let me add..
AV frequently turns a two or three word phrase into one "word" IDF still applies.
IMHO, first research must be done to determine if the phrase is in itself considered as one "word"
In this case, <note all were entered without quotes, using AV default>
"free ware" is considered one word by AV- showing 11,819 instances
"freeware" as one word shows 1,976,768 instances
"free software" is considered one word with 852,130 instances, and surprisingly,
using "free soft ware", I found that "free soft"(17,686) was treated as one word, and "ware"(1,841,509) was treated as another.
These numbers show the number of instances found in the AV database, but also show which phrases are considered one word. altameter.x42.com is a great tool for comparing different options. (a little buggy lately - but helpful when it works.)
Keep in mind though, this is just at AV, but I have a feeling that the combining of terms is essential to PageRank and Term vector. We know that users have moved away from one word searching, might it be a good idea for the engines to do likewise?
Very good point bigjohnt! I completely forgot to mention exact matching. The only thing that I would expand on (for those unfamiliar with exact matching), is that when exact matching is in use, AV is going to be looking for that exact combination of the phrase, they'll no longer count individual uses of words in the phrase.
Edited by: seth_wilde
AV's weird. I'm doing those searches a couple of hours after you and am getting different numbers (default AV, no quotes around terms). :/
[Of course your point still stands and is well taken!]
Are you looking at the number of pages returned, or the word count at the bottom of the results? Common mistake, just wondering.
Sorry, I ran my message through stupidity-check first but sometimes it doesn't catch everything. ;)
The numbers match better now. I did get 1,841,909 for "ware" (slightly higher) and, strangely, the first two times I ran "freeware" I got a number in the 2,000,000+ range, but the third time the number changed back to 1,976,768. I didn't change anything, the term was exactly the same, but the number changed slightly over several searches (just reloading the page). I wonder what that indicates? Any ideas?
Update: I just tried it again and got a different number.
It reads "word count: freeware: 2003106".
Is there anything interesting to be gleaned from this?
I'm unconvinced that IDF is not a factor even when exact matching occurs.
Could it be that exact matching is there to help diminish server load with popular search terms and is not part of the algo as such?
NFCC- that is a very good possiblity. I am sure IDF is still used, such as when a two or three term "word" is coupled with another word. Example, the search
"search engine ranking workshops" (again , done without quotes)
"search engine ranking" is considered one word, and using IDF comes up before "workshop", the second word. This is not necessarily exact matching on the full string, but first on s.e.r, and then qualifying the result with the additional workshop.
Have you guys checked out AV Belgium [altavista.advalvas.be]? This was talked about in a thread a long time ago......but the numerical scores that it gives are great for IDF research....