Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: open
For optimal results on AV your going to want to take the keyphrase and analyse the density of each word in that phrase seperately. This is because AV uses something called IDF (inverse document frequency). What IDF actually does is compare individual words in a phrase against the entire AV database, they then place more importance on words that are found less often in the database. So for the phrase "free ware" AV would place more importance on the density of ware, because it's less common in the AV database.
IMHO, first research must be done to determine if the phrase is in itself considered as one "word"
In this case, <note all were entered without quotes, using AV default>
"free ware" is considered one word by AV- showing 11,819 instances
"freeware" as one word shows 1,976,768 instances
"free software" is considered one word with 852,130 instances, and surprisingly,
using "free soft ware", I found that "free soft"(17,686) was treated as one word, and "ware"(1,841,509) was treated as another.
These numbers show the number of instances found in the AV database, but also show which phrases are considered one word. altameter.x42.com is a great tool for comparing different options. (a little buggy lately - but helpful when it works.)
Keep in mind though, this is just at AV, but I have a feeling that the combining of terms is essential to PageRank and Term vector. We know that users have moved away from one word searching, might it be a good idea for the engines to do likewise?
Edited by: seth_wilde
Sorry, I ran my message through stupidity-check first but sometimes it doesn't catch everything. ;)
The numbers match better now. I did get 1,841,909 for "ware" (slightly higher) and, strangely, the first two times I ran "freeware" I got a number in the 2,000,000+ range, but the third time the number changed back to 1,976,768. I didn't change anything, the term was exactly the same, but the number changed slightly over several searches (just reloading the page). I wonder what that indicates? Any ideas?
Update: I just tried it again and got a different number.
It reads "word count: freeware: 2003106".
Is there anything interesting to be gleaned from this?