Welcome to WebmasterWorld Guest from **54.234.244.30**

Forum
Moderators: **open**

You'll find information here [ragingsearch.altavista.com]

Raging was meant to provide a clean interface for webmasters searching av.

Thanx engine...

"Our index contains every word found on more than 350 million unique Web pages"

Hmmm... their index contains words, not pages. Maybe term vectors?

"Text relevance searches every Web page for exactly the words you enter. Many factors enter into text relevance, such as how important the words are on the page, how many times the words appear, where on the page they appear, and how many other pages contain those words."

I smell themes. It's surprising that they give away this much information!

This certainly is interesting. I guess it's a good thing that my pages are doing better there (if this is the way they are moving for the future).

Maybe they test out their advanced technology at Raging Search to see how it works, or to see how users like it based on feedback. Then, possibly they start implementing some of those things on regular AV. Maybe it's a beta version of the future Alta?

Any other speculations?

Metaman

A term vector database is all about words, that's what they'll index and that's how they store your pages -

"Although we use the usual TF-IDF weighting to select terms for vectors, we do not store these weights in vectors. Instead, we store just the term frequency, that is, the number of times the term appears in the page"

"In addition to the term counts themselves, this raw data includes the lengths of pages, both in bytes and in terms."

Thanks for clearing that up Seth. I did read that document but its amazing how much more understanding I got with a second read.

I came across another thing that's quite puzzling.

Look at the Inverse Document Frequency part of the TF*IDF equation.

log (Number of documents/Number of documents containing keyword)

Assume the keyword is on every page (you would think this was a good thing). When you do the division you get 1. Take the log of 1 and you get zero. Any Term Frequency you multiply by 0 you come up with zero. Maybe they have some kind of catch for when this happens?

"I did read that document but its amazing how much more understanding I got with a second read"

Tell me about it, I think after about the 5th or 6th read I achieved a moment of clarity and and now I have it pretty much figured out.

I think in that case you wouldn't even include the 1, because 1 x log would = log. I think you would need to just multiply the term frequency with log. (I'll email a friend about this to make sure I'm right)

btw - isn't it log2 not log?

Edited by: seth_wilde

Seth,

It is possible that log * 1 = log. That would make sense. It's been a few years since I had any algebra. I just open up my windows calculator and press "1" then "log". That's all I know.

"Tell me about it, I think after about the 5th or 6th read I achieved a moment of clarity and and now I have it pretty much figured out."

I guess I have a lot more reading to do. I keep my dictionary right next to my desk here (BTW, what...exactly...is a vector?).

"isn't it log2 not log?"

From what I can see it is log

"It's been a few years since I had any algebra"

Well congratulations then, you have now officially moved on to calculas

"BTW, what...exactly...is a vector?"

vector = A one-dimensional array

""isn't it log2 not log?" From what I can see it is log"

Are you using the formula from the link that james gave in the other thread? If so I found one derived from Salton (who they mention in the orginal article) and he uses log2 [instruct.uwo.ca...]

Edited by: seth_wilde

That makes sense James. I was really having trouble with picturing it just by the definition, but now I recall a slide show I went through and saw lines in a graph representing terms. So... that is a term vector.

I will put it all together eventually. After I find time to read those documents a few more times each. There's just not enough hours in a day!

Here is the dictionary definition (Funk & Wagnalls '76) -

Vector: n. A physical quantity that has magnitude and direction in space, as velocity and acceleration.

I guess we are shooting for the most quantity and the greatest magnitude!

On a similar thought, is anyone seeing great results from Raging? We have some great rankings under a couple of top notch keywords and seeing little (4-5) referrals a day. Alta under same kw's is producing substaintially more than that (add 3 zeros).

Is no one using raging? It feels like Alta with it's head chopped off to me...

Log(1) is not the same mathematically as log * 1. Log is a function applied to the number 1, not a fixed value that can be multiplied by it.

Just the same as sin(x) is not sin * x.

(looks like my 15 or so years of UK funded maths education haven't completely gone to waste :))

One other note on logs - log(2) is not only not the same as log x 2 (actually meaningless, since "log" by itself means nothing, but also differs from log2(2) (that first 2 would be written as a subscript). Fortunately, most calculations either use log10, often simply written as log, or natural logarithms, written as ln or loge (the e would be a subscript). "e" is one of those really significant numbers like pi that mathematicians love, and has a value of 2.71... More than you wanted to know, I'm sure.

Account Expired

Tedster what are slide rules??? Ah yes I vaguely remember my history teacher telling me about them...just joking. Well you did ask :o)

- Register For Free! -
**Become a Pro Member!** - See forum categories - Enter the Forum

- Moderator List | Top Contributors:This Week, This Month, Jan, Dec, Archive, Top 100 All Time, Top Voted Members

- Google Updates and SERP Changes - Feb 2016
- February 2016 AdSense Earnings and Observations
- Google is Booting the Botnets From its Ad Network
- Backlinks from my own external blogs?
- What to do When a Site Drops in Google's SERPs
- Google Targeting Bad Ads With New Deceptive Site Label
- Google Extends "Right to be Forgotten" Across Its Network for E.U. IPs
- Opera In $1.2 Billion Buyout Offer
- At a complete loss as to why my server is requesting files
- Bing Working to Fix Bing Ads Data Reporting Problem

- UK: Streetmap Loses "Anticompetitive" Case Against Google
- Google Extends "Right to be Forgotten" Across Its Network for E.U. IPs
- Google is Booting the Botnets From its Ad Network
- Twitter's New Timeline Option Inserts "Best Tweets First"
- Bing Working to Fix Bing Ads Data Reporting Problem
- Opera In $1.2 Billion Buyout Offer
- Twitter Announces Trust and Safety Council To Tackle Trolls
- France Tells Facebook To Stop Tracking Non-Users Without Consent
- What to do When a Site Drops in Google's SERPs
- Report: LinkedIn Stock Dropped $10-billion