Forum Moderators: open
I'm new to SEO things. I'm waiting for my first Google update. That should be in the next few days according the update table of brett. I just want to say that everyone in this forum are kind and very helpful!
The last few weeks I learned alot in this forum. Here are some notes I made to get better SERPs. Did I understand things allright?.
PageRank
PageRank is an indication for Google to measure the importance of a page. Each page that is indexed by Google get's it's own PR. PR is collected by incoming links of another page which has a PR and pass over a calculated fraction of PR to the target page. The formula is secret and only known by Google themselves.
Sergey Brin and Lawrence Page, the inventors of Google described their ranking system:
"Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document’s hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is
converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights
with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.
For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system.
"
robots.txt
For crawler-bots it's good to have a robots.txt even it's only an empty text file. Especially if you have a custom 404 page set. You prevent having troubles when you have set your webserver incorrectly and your webserver then sends 301 or 302 instead of 404. The crawler would parse your 404-page in that case.
Don't resubmit to Google
Google will find your page if you have incoming links. There's no need to resubmit to Google. If your page is not finished then finish it. Try getting incoming links from other websites and get listed in DMOZ and Yahoo.
Keep your page fresh.
Google loves new content. Fresh pages do have a date in the search results and fresh pages are positioned better than not so fresh pages. If your page has updated content, there is a chance that Googlebot visits you daily.
cu,
webmaster
The quote is out of the paper of the Google inventors Sergey Brin and Lawrence Page. "The Anatomy of a Large-Scale Hypertextual Web Search Engine"
I have downloaded the pdf-file for offline reading. Don't have the link, it's somewhere here in this forum.
There should probably be a very easy to find link to that paper somewhere, maybe in the Library?
as brett has asked, to avoid confusions i'll sing up for new username.
like adminstrator, moderator, supervisor or just user ;-)
this is the last post as webmaster.
if you don't have robots.txt, your server is eventually sending an error page.
that page could be misinterpreted by the robots software.
cu,
ex webmaster