You'll find information here [ragingsearch.altavista.com]

Raging was meant to provide a clean interface for webmasters searching av.

Thanx engine...

"Our index contains every word found on more than 350 million unique Web pages"

Hmmm... their index contains words, not pages. Maybe term vectors?

"Text relevance searches every Web page for exactly the words you enter. Many factors enter into text relevance, such as how important the words are on the page, how many times the words appear, where on the page they appear, and how many other pages contain those words."

I smell themes. It's surprising that they give away this much information!

This certainly is interesting. I guess it's a good thing that my pages are doing better there (if this is the way they are moving for the future).

Maybe they test out their advanced technology at Raging Search to see how it works, or to see how users like it based on feedback. Then, possibly they start implementing some of those things on regular AV. Maybe it's a beta version of the future Alta?

Any other speculations?

Metaman

A term vector database is all about words, that's what they'll index and that's how they store your pages -

"Although we use the usual TF-IDF weighting to select terms for vectors, we do not store these weights in vectors. Instead, we store just the term frequency, that is, the number of times the term appears in the page"

"In addition to the term counts themselves, this raw data includes the lengths of pages, both in bytes and in terms."

Thanks for clearing that up Seth. I did read that document but its amazing how much more understanding I got with a second read.

I came across another thing that's quite puzzling.

Look at the Inverse Document Frequency part of the TF*IDF equation.

log (Number of documents/Number of documents containing keyword)

Assume the keyword is on every page (you would think this was a good thing). When you do the division you get 1. Take the log of 1 and you get zero. Any Term Frequency you multiply by 0 you come up with zero. Maybe they have some kind of catch for when this happens?

"I did read that document but its amazing how much more understanding I got with a second read"

Tell me about it, I think after about the 5th or 6th read I achieved a moment of clarity and and now I have it pretty much figured out.

I think in that case you wouldn't even include the 1, because 1 x log would = log. I think you would need to just multiply the term frequency with log. (I'll email a friend about this to make sure I'm right)

btw - isn't it log2 not log?

Edited by: seth_wilde

Seth,

It is possible that log * 1 = log. That would make sense. It's been a few years since I had any algebra. I just open up my windows calculator and press "1" then "log". That's all I know.

"Tell me about it, I think after about the 5th or 6th read I achieved a moment of clarity and and now I have it pretty much figured out."

I guess I have a lot more reading to do. I keep my dictionary right next to my desk here (BTW, what...exactly...is a vector?).

"isn't it log2 not log?"

From what I can see it is log

"It's been a few years since I had any algebra"

Well congratulations then, you have now officially moved on to calculas

"BTW, what...exactly...is a vector?"

vector = A one-dimensional array

""isn't it log2 not log?" From what I can see it is log"

Are you using the formula from the link that james gave in the other thread? If so I found one derived from Salton (who they mention in the orginal article) and he uses log2 [instruct.uwo.ca...]

Edited by: seth_wilde

That makes sense James. I was really having trouble with picturing it just by the definition, but now I recall a slide show I went through and saw lines in a graph representing terms. So... that is a term vector.

I will put it all together eventually. After I find time to read those documents a few more times each. There's just not enough hours in a day!

Here is the dictionary definition (Funk & Wagnalls '76) -

Vector: n. A physical quantity that has magnitude and direction in space, as velocity and acceleration.

I guess we are shooting for the most quantity and the greatest magnitude!

On a similar thought, is anyone seeing great results from Raging? We have some great rankings under a couple of top notch keywords and seeing little (4-5) referrals a day. Alta under same kw's is producing substaintially more than that (add 3 zeros).

Is no one using raging? It feels like Alta with it's head chopped off to me...

One other note on logs - log(2) is not only not the same as log x 2 (actually meaningless, since "log" by itself means nothing, but also differs from log2(2) (that first 2 would be written as a subscript). Fortunately, most calculations either use log10, often simply written as log, or natural logarithms, written as ln or loge (the e would be a subscript). "e" is one of those really significant numbers like pi that mathematicians love, and has a value of 2.71... More than you wanted to know, I'm sure.

