| 8:57 am on Jul 30, 2000 (gmt 0)|
You'll find information here [ragingsearch.altavista.com]
Raging was meant to provide a clean interface for webmasters searching av.
| 4:20 pm on Jul 30, 2000 (gmt 0)|
"Our index contains every word found on more than 350 million unique Web pages"
Hmmm... their index contains words, not pages. Maybe term vectors?
"Text relevance searches every Web page for exactly the words you enter. Many factors enter into text relevance, such as how important the words are on the page, how many times the words appear, where on the page they appear, and how many other pages contain those words."
I smell themes. It's surprising that they give away this much information!
This certainly is interesting. I guess it's a good thing that my pages are doing better there (if this is the way they are moving for the future).
Maybe they test out their advanced technology at Raging Search to see how it works, or to see how users like it based on feedback. Then, possibly they start implementing some of those things on regular AV. Maybe it's a beta version of the future Alta?
Any other speculations?
| 4:24 pm on Jul 30, 2000 (gmt 0)|
Metaman, there is different data in the raging index. It could be a test area where the "dirty washing" is not on show to the majority of surfers.
| 6:42 pm on Jul 31, 2000 (gmt 0)|
A term vector database is all about words, that's what they'll index and that's how they store your pages -
"Although we use the usual TF-IDF weighting to select terms for vectors, we do not store these weights in vectors. Instead, we store just the term frequency, that is, the number of times the term appears in the page"
"In addition to the term counts themselves, this raw data includes the lengths of pages, both in bytes and in terms."
| 7:00 pm on Jul 31, 2000 (gmt 0)|
Thanks for clearing that up Seth. I did read that document but its amazing how much more understanding I got with a second read.
I came across another thing that's quite puzzling.
Look at the Inverse Document Frequency part of the TF*IDF equation.
log (Number of documents/Number of documents containing keyword)
Assume the keyword is on every page (you would think this was a good thing). When you do the division you get 1. Take the log of 1 and you get zero. Any Term Frequency you multiply by 0 you come up with zero. Maybe they have some kind of catch for when this happens?
| 7:53 pm on Jul 31, 2000 (gmt 0)|
"I did read that document but its amazing how much more understanding I got with a second read"
Tell me about it, I think after about the 5th or 6th read I achieved a moment of clarity and and now I have it pretty much figured out.
I think in that case you wouldn't even include the 1, because 1 x log would = log. I think you would need to just multiply the term frequency with log. (I'll email a friend about this to make sure I'm right)
btw - isn't it log2 not log?
Edited by: seth_wilde
| 11:01 pm on Jul 31, 2000 (gmt 0)|
It is possible that log * 1 = log. That would make sense. It's been a few years since I had any algebra. I just open up my windows calculator and press "1" then "log". That's all I know.
"Tell me about it, I think after about the 5th or 6th read I achieved a moment of clarity and and now I have it pretty much figured out."
I guess I have a lot more reading to do. I keep my dictionary right next to my desk here (BTW, what...exactly...is a vector?).
"isn't it log2 not log?"
From what I can see it is log
| 11:21 pm on Jul 31, 2000 (gmt 0)|
A vector is a directed line segment. It has do with graphing the entire WWW and making sense of all the data. They've got everything in a bunch of equations in order to compute relevancy.
| 11:27 pm on Jul 31, 2000 (gmt 0)|
"It's been a few years since I had any algebra"
Well congratulations then, you have now officially moved on to calculas
"BTW, what...exactly...is a vector?"
vector = A one-dimensional array
""isn't it log2 not log?" From what I can see it is log"
Are you using the formula from the link that james gave in the other thread? If so I found one derived from Salton (who they mention in the orginal article) and he uses log2 [instruct.uwo.ca...]
Edited by: seth_wilde
| 11:36 pm on Jul 31, 2000 (gmt 0)|
That makes sense James. I was really having trouble with picturing it just by the definition, but now I recall a slide show I went through and saw lines in a graph representing terms. So... that is a term vector.
I will put it all together eventually. After I find time to read those documents a few more times each. There's just not enough hours in a day!
Here is the dictionary definition (Funk & Wagnalls '76) -
Vector: n. A physical quantity that has magnitude and direction in space, as velocity and acceleration.
I guess we are shooting for the most quantity and the greatest magnitude!
| 11:38 pm on Jul 31, 2000 (gmt 0)|
Yes Seth, I was using the formula from the document James found.
It is in a cluttered image file, maybe that makes the 2 unreadable.
| 11:42 pm on Jul 31, 2000 (gmt 0)|
Dang it! There is no log 2 button on my windows scientific calculator. What am I to do?
| 3:02 pm on Aug 1, 2000 (gmt 0)|
I believe you push log and then x^2
| 1:34 pm on Sep 14, 2000 (gmt 0)|
On a similar thought, is anyone seeing great results from Raging? We have some great rankings under a couple of top notch keywords and seeing little (4-5) referrals a day. Alta under same kw's is producing substaintially more than that (add 3 zeros).
Is no one using raging? It feels like Alta with it's head chopped off to me...
| 12:30 pm on Sep 14, 2000 (gmt 0)|
I've found raging to be almost worthless for referrals. It doesn't seem to be worth optimizing for.
| 8:12 am on Sep 15, 2000 (gmt 0)|
On the windows calc, the button for log 2 is ln ie. the natural logarithm.
Surely log(1) is not the same as log * 1?
...and no I'm not calling you Shirley ;)
| 5:54 pm on Sep 16, 2000 (gmt 0)|
"Surely log(1) is not the same as log * 1?"
So if log multiplied by 1 doesn't = log, then what does it equal?
| 7:59 am on Sep 18, 2000 (gmt 0)|
Log(1) is not the same mathematically as log * 1. Log is a function applied to the number 1, not a fixed value that can be multiplied by it.
Just the same as sin(x) is not sin * x.
(looks like my 15 or so years of UK funded maths education haven't completely gone to waste :))
| 1:41 pm on Sep 18, 2000 (gmt 0)|
One other note on logs - log(2) is not only not the same as log x 2 (actually meaningless, since "log" by itself means nothing, but also differs from log2(2) (that first 2 would be written as a subscript). Fortunately, most calculations either use log10, often simply written as log, or natural logarithms, written as ln or loge (the e would be a subscript). "e" is one of those really significant numbers like pi that mathematicians love, and has a value of 2.71... More than you wanted to know, I'm sure.
| 2:04 pm on Sep 18, 2000 (gmt 0)|
Logarithmic distances are the way that slide rules work. Does anybody else remember slide rules?
| 11:31 pm on Sep 18, 2000 (gmt 0)|
Tedster what are slide rules??? Ah yes I vaguely remember my history teacher telling me about them...just joking. Well you did ask :o)