Forum Moderators: open

Message Too Old, No Replies

Term Vector?

Can yall 'splain

         

BoneHeadicus

6:50 pm on Jan 5, 2001 (gmt 0)

10+ Year Member



Does anybody really understand what term vector means to us in seo? How could one determine the coefficient of a particular page and how that might fit into the se algo?

littleman

7:47 pm on Jan 5, 2001 (gmt 0)



Isn't the particle essence of 'term vector analysis' for us SEO types boiled down to themes [searchengineworld.com]?

BoneHeadicus

8:25 pm on Jan 5, 2001 (gmt 0)

10+ Year Member



Hey littleman...howz the littlewoman? (couldn't resist)

How much of a factor then is hub and authority analysis playing as far as weighing out the theme keywords...I been readin' white papers, can you tell?

littleman

8:36 pm on Jan 5, 2001 (gmt 0)



>howz the littlewoman?
Just fine. We are going to a party this weekend. Every year the descendants of the actors and actresses who played the dwarfs in the "Wizard of Oz" get together
and have a bash. Las year I got drunk and made a fool of my self when I slipped out of my booster seat.

>can you tell?

Truthfully, I do not feel, err ...big enough to answer your questions with confidence. Maybe someone else will step up to the plate? Brett? Seth? Anyone else?

BoneHeadicus

9:14 pm on Jan 5, 2001 (gmt 0)

10+ Year Member



The reason I ask is because I'm in the phase of design where I could incorporate some new "techniques" in this site still if'n I knowed how much it might help to have hierachical keyword hyperlink to <h1> keyword</h1> intrasite hub (sub-directory) and authority(index.html) structure.

tedster

9:32 pm on Jan 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's my current understanding.

A vector is a mathematical quantity with both a numerical value and a direction. In the search engine world, as I understand it (which is roughly, for sure) each page in the db gets rated according to its theme strength and its "direction", i.e. what pages it points to and what their themes are.

The big deal for any page is how many other pages point to it, and with what strength on a given theme. You end up with a multi dimensional space where the "points" are web pages, clustered in various ways according to the vectors associated with each page on any given theme.

How much this term vector analysis affects the ranking algorithm, and how theme strength is determined for any given page are subject to lots and lots of tweaking by the various search engines.

Trying to reverse engineer the specific formulas they apply would be a losing battle -- probably as intense as trying to crack the human genome. And even if you succeeded, the SE might shift the weighting values or change the algorithm next week.

So the best we can do is make observations (in quantity helps a lot) and then take our best guess. Going back to the human genome, even before it was mapped people still had a very good understanding of heredity just by making anecdotal observations and looking at percentages.

BoneHeadicus

9:53 pm on Jan 5, 2001 (gmt 0)

10+ Year Member



Hey Tedster...you got me started into this from your post on WebWord....great reading from that site! I started reading ALOT of white papers on web structure and hyperlinking and all.
The scary thing is I kinda understand it.

It's almost as if they are trying to build a CREDIBLE database of relevant,worthy content by routinely screening (Term Vector)spam techniques in the algo itself, thus making it difficult to rank for out of relevant terms. I think thats a good thing...

My goal here, being the new guy, is to really come to terms (no pun intended) with this method of analysis (Term Vector) so that by utilizing the proper tekneex, I can build webs to go the distance and outshine the .com-petition who may not be aware of the shifting sands.

I beleive this is relevant to AV in that they have stated that term vector databases are what they are using. Am I correct in this?

rcjordan

10:19 pm on Jan 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>this is relevant to AV

I think it was relevant. In a drive to shake some money out of their traffic, I believe they have abandoned much of their algo and now simply favor Looksmart listings. Whether they can find some sort of "blend" seems to be the question. I wouldn't spend too much time cracking term vectors just for AV.

Brett_Tabke

10:53 pm on Jan 5, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month




keyword set a: cars trucks automobiles tires engine wheels drive shaft oil spark plug

keyword set b: road parkway gravel asphalt ditch footpath sidewalk motorway street interstate freeway potholes concret bridge

The keyword "airbag" would have a higher "term vector" for set A than for set B. Thus, any pages that have the word airbag on it should have a higher search engine ranking under searches on keywords in set A, than in Set B. I think of a term vector generated db as a synonym base where words are "scored" against one another.

Working in themes
How to manipulate a vector based engine is the real question. Term vector is a nice sounding scientific word, but how they actually calculate and implement it in an algo is entirely unknown. However, we know enough about it now from trial an error, that there are solid methods that will generate referrals, keep you in good standing with the se's, and survive the long term.

Like Tedster mentioned, we've been putting forth the "theme" idea where you look at all your keywords and "search engine entities" as an entire group. Everything from keywords, to page design, to links as a complete "seo worthy" package.

Ok, but which Keywords?
Deciding on what keywords to put in places of prominence on your site, is more critical than ever. Just a little over a year ago, you could pick your industry or business primary keywords, put them on a page and they would/could work good. If you were in the car business, you'd go for words similar to set A above.

That isn't case anymore. With so much competition out there now, selecting keywords is critical to site success. I've been focusing on secondary and third level phrases for about a year now. Those search words will often get overlooked on a lot of engines because people don't think they pull. I'd rather have 100 pages doing 10 obscure search each per day, than 10 pages pulling 100 referrals a day. The broader based approach with more content can survive the whims of the algo's much longer and provide your users with more targeted information.

Thinking in Themes
I think in terms of about 20 keywords per site. Most of those are phrases. After than, keep branching out in into related phrases. Goto.com's keyword suggestion tool is excellent for this type of research.

It all relates to themes, since each page can be scored for it's relation (vector) to the core of a keyword set (like set A or B above). Once your site uses all those related words, it will start to filter to the top in bigger "prime time" keywords. This most easily experienced on Google and Altavista (currently the only 2 engines known using "theme" based algos).

Theme Page Design Considerations
Getting your keywords noticed above the rest of the noise on the page is critical. I assume you understand how a spider/indexer system will look at the text on a page. It is going to look for words in places of prominence to help decide what the page is about. Those places usually breakdown in varying degrees by the page elements one can analyze:

Title
Headings (h1...h6)
bolded words
italicized words
table captions and table <th> headings
font size
links
first sentence of paragraph
url and filename of the page
cross bound inner site link text
text outbound links
filenames/urls of links
any form elements (drop down text)
alt tags
image filenames
font color in relation to background
location of text on page
word/phrase density
meta tags

It is a delicate balance of all elements on the page to get your keywords noticed. Trial and error has serious limitations when trying to deduce what an engine is about anymore. If you approach it with caution, by placing your keywords for a page in several of the above element locations, you can pretty much shut your eyes and move on to the next page.

I try to micro focus a page by dealing with no more than one keyword (phrase) per page. I work keywords into title, url, a bold word, a link text, and the content of the page just a few times. It is just as easy to over do it as under do it. If your keyword density rises of 10% in relation to the rest of the page, that is the time to think about lowering it a bit.

The new wildcard : Links
Links consist of the actual url and it's filename, the font it is in, and the text that composes the link. The actual url is important because a spider/indexer can deduce whether the link is to another page on your site or to some other site.

When a se indexes an entire site or a large batch of pages on your site, the number of links pointing at some page within your own site is very important. Where would all the "inner site" links point too? They for the most part, point at the important pages. Just about everyone has a menu on a page anymore, and the se's use that fact to deduce which pages are important. It can also help them determine which pages are "orphan's". Se's don't want orphan pages since those are most often of low to no value.

Outbound links are just as important, because se's can deduce the types of sites you are linking too. If you are linking to Ford, General Motors, and the Acme Car Company, it is a good bet that your site is about cars. Outbound links are important in controlling your "theme" as viewed by the search engines.

The text of inbound, outbound, and crossbound links is also critical. If there is a link to "new cars for 2001", the se's will use that text in helping to determine what the page is about that the link is pointing too. Thus, inbound link text is also critical. You should try to reinforce any keyword pages in your site with crosslinks that include the destination pages keywords. If page B is about Tires, then a link on the "spark plug" page should have the word "tires" in it pointing at the page b about "tires".

How many inbound links? Currently, to get into the top ten on Google under any quality single keyword (top 1000) is running around 2000 inbound links. As you work down the quality of keywords that slowly begins to drop. Many two word phrases on Google, are in the tens to single digits. Most third and forth level three word phrases are in the single digits with many pages with zero inbound links. This is why it is critical to choose keywords that pull, yet they are off the beaten path.

How many pages?
If you are building a theme based site, I'd shoot for around 75-100. That means you probably have to break your content down a bit farther than you have been. Once you get that many optimized pages that are "on theme", magic happens. Pages just mysteriously start to pull a few referrals each per day, then tens, then dozens, and finally you have enough traffic that you need concern yourself more with servicing the users you already have than focusing on acquiring more.

BoneHeadicus

12:06 am on Jan 6, 2001 (gmt 0)

10+ Year Member



Wow, Dude...thats a killer response and much appreciated.

I feel good about what I'm doing having heard from some of the best in the field. When the results do start to show up, which they will, I will be sure to share what I find as well. I am in line with everything that you laid out there.
Pr. 11:14...but many advisors make victory sure. :) I beleive it!
I spent a solid 2 weeks in Wordtracker learning the words related to my sites...I was amazed at how different they were from what I thought. With that knowledge in hand it makes the rest of it fall together much more naturally because the words are already there...you just gotta sew 'em together.

>I think it was relevant. In a drive to shake some money out of their traffic, I believe they have abandoned much of their algo and now simply favor Looksmart listings. Whether they can find some sort of "blend" seems to be the question. I wouldn't spend too much time cracking term vectors just for AV

Unfortunately money makes the rules. We all gotta eat. Somebody will always NEED to offer some type of search engine service. How to make a living doing it seems to be elusive. Maybe the govt will step in as librarian?

Thanks all.

BoneHeadicus

3:13 am on Jan 6, 2001 (gmt 0)

10+ Year Member



Whoa! I was just reading a paper and it was talking about nearness of documents to each other in terms of SUB-DIRECTORIES...and how "near" was defined as 2 deep! Anybody know if this is in fact a factor?

Brett_Tabke

6:16 am on Jan 17, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It is wise to keep your directories 3 deep or less from root. Very wise. When a url starts stretching into that 50char range, it is time to think of shorter names.

BoneHeadicus

5:17 pm on Feb 26, 2001 (gmt 0)

10+ Year Member



Just an update to this thread.

I built my site around this type of structure here. [webmasterworld.com]

After a month of sittin on the porch for bad behavior ( I had a meta refresh in my 404...dumb I know ...now...learn the hard way) I am ranking VERY highly in a broad set of search terms. Now that I'm placing I can start tweaking.

Most all of the terms are on serps 1,2 or 3 on all engines (except AV) and on Fast it's a beautiful sight!

Google rankings are doing very well but needs a little tweaking on the main term, subsets are all doing well.

Overall I am pleased with all the work that I did. Being new to this game I started with nothing and learned almost everything I know from WPG to start and then from here directly or indirectly.

Thanks to Brett and little and tedster and rc and NFFC and all the rest and most of all to Air for sending me here in the first place.

This whole seo deal is quite confusing and to say the least intimidating if not downright overwhelming to newbies. By perservering and distilling from these forums the right information, one can start from scratch and build a site that will go the distance in search engine placement.

tedster

11:52 am on Mar 26, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a link for anyone struggling to visualize what term vector means. It's a slide show from a college course at UC Berkeley back in 1998. Instead of needing to wade through a pile of words, you get some clearly labelled images, much easier to assimilate. I think it quickly communicates a great sense of what this "term vector" stuff is all about.

Term Weighting and Ranking Algorithms [sims.berkeley.edu]

NFFC

12:12 pm on Mar 26, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is the best textual explanation I have found:

Introduction to Vector-Space Models [cs.utk.edu]

Todd A. Letsche and Michael W. Berry