Find the word bursts in recent Blogs/forums and send out the Fresh bots to already indexed pages which themselves have the same words occurring frequently/relevantly.
Though I wonder how much difference there is in word bursts of recent webdocuments and Zeitgeist type (search phrases suddenly/abruptly occuring more frequently) search query word bursts.
>Find the word bursts in recent Blogs
Or better still buy a blogger, move them to your servers and watch the web in real time. Hmmmmm :)
This could help provide a better mix in results where a word has multiple meanings and interests. Particularly during some seasonal times.
>>>>Or better still buy a blogger, move them to your servers and watch the web in real time. Hmmmmm :)
hehe, nice little transparent proxy server sitting on the front end, watching the traffic flow in and out and following the leads. Could be a killer app if combined with a search engine capable of updating almost on the fly.
I wonder if anyone has thought of that?
he he, combine that with something else - say, the ultimate in profiling - and you could really have something there...blog, profile, word burst, links, etc.
It's all coming together...hm, very interesting indeed.
>>>hehe, nice little transparent proxy server sitting on the front end, watching the traffic flow in and out and following the leads.
Yes, kinda like having your very own group of inner city youths who can tell you what's hot in fashion and music.
I'll be danged. wordburst.com was registered yesterday. Someone always beats me to them.
i think it sounds potentially great - but the examples given in that article were *so* lame - i'd want to see how it scaled up to deal with the whole of the web.
my feeling is that, as far as searching goes, it is a technique much more suited to small groups of documents with a controlled vocabulary, like intranets or large corporate sites, than trying to apply it to every document published on the web simultaneously.
Funny.. I was thinking of doing a similar thing for my site about a week ago. This thread got me inspired to get started on it and I've now got a 'Hot Words' section on the front page. Although it's a small data set with a obvious bias towards games and pc hardware the results are still better than I expected :)
|brotherhood of LAN|
isnt this what g and others have called a "driving query"?
you start on a page that has a high frequency of a certain word/phrase combination, and pages are spidered til' the frequency is below a certain point. Then another (if on topic) query is started from that page, as so on......
Would hot > topics > get > muffled > if the words are in a directory structure and repeated over loads of pages though...hope the concept doesntn eed anything nasty like PR to help it float ;)
Darkness, welcome to WebmasterWorld!
|wordburst.com was registered yesterday. Someone always beats me to them. |
Well, as of 9:48 EST, "jonkleinberg.com" is still available. ;)
Sarah Graham wrote in the article:
|...He [Jon Kleinberg] posits that the new approach could help narrow web searches by better recognizing the time context of a query.... |
The problem is, I don't think word bursts or any similar time-dependent tool would be implemented with too much precision by an SE like Google. There's a disincentive for Google to make it easy for us to search based on precise date ranges: date-stamping a page in the index gives us another datapoint to use in reverse-engineering the ranking algo.
|brotherhood of LAN|
One of his works are here
Interesting to note about halfway down there is a bit about measuring social networks (ala blogs?)
so a word burst from a blog community could be an "out burst" or an endorsment about a given subject.
maybe its not long until we have a "ranters" algorithm....or a script that could weed out rants in the forums, very interesting work, but lots of math.
New? I admit i didn't read the paper, but isn't this just the same as an engine counting word frequency?
This would be a fun application to show up at Google Labs. I would love to see this "burst" concept "make sense" of some data sets that I was able to feed into it.
I also think that this concept could have some really interesting "social understanding" possibilities. This would probably be a very good tool for uncovering "bias" in such places as the media or within universities.
I wonder if it also works with synonyms...
ggrot, judging by the brief article, I believe calculating word burst metrics is an attempt to determine keyword trends over time. It's not enough that a set of pages has similar KF or KD for a given term, the pages would also have to be sufficiently near each other in index date.
I don't have a clue about how to express it mathematically, but I'm thinking a word burst query might sound something like: "Show me a graph of the keyword densities of all non-typical words occurring on all pages indexed between October 31 and December 31, 2002, where the keyword density is at least 1%". The important factors here are:
(1) What do you consider a "non-typical word"? The fewer the stop words you use as a filter, the more inundated you get with data.
(2) At what rate does the keyword density of a given word or phrase cease to be "noise" and become meaningful? 1%? 5%?
(3) What's a meaningful period of time to consider? A week? A month?
Another way I can see word burst statistics used is to start out with a specific keyword, then try to determine the trend for it: "Show me all pages with a keyword density of 1% to 2.5% for the term 'green widgets' where the pages have an index date of November 1, 2002, plus or minus a month or 1%, whichever comes first".
In this case, if the KD trend fell below 1% before the specified range of +/- 1 month, that would mean a short-lived trend, and if it didn't fall below 0.5% for the time period, it would mean a longer-lived trend. Whether that was meaningful or not would depend on what's historically typical for a keyword, though certain keywords probably never fall below a certain density threshold.
Analyze the news sites, certain news groups, forums and blogs. Apply a PR algo to give increasingly authoriative resources greater weight, apply a freshness algorithm, wordburst's in given week, day, hour - the hot stories and concepts organising themselves, where mentioned classifies them... like the concept.
Can't wait to see it's implementation on a well known search engine in the near future.
Hot search engine stories in the last week:
1: word bursts
2: overture buys altavista
I don't think we should get too psyched about the application of word burst techniques with our favorite "well known search engine", because I don't think Google would ever give us access to sufficient tools to figure out how such techniques were used in indexing and ranking.
A recent reply from GoogleGuy [webmasterworld.com] to a question from me about narrowing down queries by date range seems to imply that Google isn't eager to have its users do precise date-matching. For this reason, I'm skeptical that it would ever implement a precise end-user word burst tool, which might run a similar risk of exposing part of its algorithms. (Mind you, this is assuming they went ahead and started incorporating word burst techniques into their algos in the first place. OK, it's too late, gotta get some sleep!)
If this plan gets worked out properly [webmasterworld.com] then such a taste of the times would make very interesting searching.
Much more here [eurekalert.org] including the "The 150 term bursts of highest weight in Presidential State of the Union Addresses, 1790-2002"
Daypop spots a bandwagon and climbs aboard;
NFFC, nice find!
...And just think, they didn't have to buy Pyra to do it! ;)