| This 39 message thread spans 2 pages: 39 (  2 ) > > || |
|Technical Words and Clear Thinking about Google|
|A man may take to drink because he feels himself to be a failure, and then fail |
all the more completely because he drinks. It is rather the same thing that is
happening to the English language. It becomes ugly and inaccurate because our
thoughts are foolish, but the slovenliness of our language makes it easier for us to
have foolish thoughts. The point is that the process is reversible. Modern English,
especially written English, is full of bad habits which spread by imitation and which
can be avoided if one is willing to take the necessary trouble. If one gets rid of these
habits one can think more clearly.
This poignant observation is even more true for technical topics. It is essential for clear analysis of search results first, to know something precise about technical vocabulary, and second, to be absolutely rigorous when using these words in our thinking and communicating.
With this in mind I thought a thread about commonly misunderstood words and fuzzily understood technical concepts could be helpful.
1. Page Rank is not Ranking
Don't know why we can't put this craziness to bed, but it's still around. If anyone is not clear about this, read Google PR - PageRank FAQs [webmasterworld.com]
2. Site has no technical definition
Trust me on this one. There is a definition for "domain" but "site" is a casual word with no techical reality.
3. Page has no technical definition
Google indexes a url, not a page. For example, if the viewport of your computer displays an html document that contains an iframe, then there is content from two different urls being displayed.
4. alt is an attribute, and not a tag
You can look this one up. There is no such thing as an "alt tag"
5. title is either an attribute or an element
The attribute type of title does nothing to speak of for your rankings, right now at least -- although it can help your site's usability quite a bit. But the title element is probably the most important on-page factor there is for well-targeted ranking.
6. spidering and indexing are two different processes
Just because googlebot asks your server for a url does not mean that url is indexed. While we're at it, let's mention "caching" -- it's really a third process.
7. linked to and linked from are very different things
This seems obvious, and yet in technical discussions the fog of chaos often starts to build
8. rel="nofollow" is an attribute, quite different in effect from a robots meta tag nofollow
This one gets mangled a lot lately. rel="nofollow" just means "I don't vouch for this link - don't send PR, and please don't nail my domain if this happens to point to a bad neighborhood."
|There are in fact four very significant stumbling blocks in the way of grasping |
the truth, which hinder every man however learned, and scarcely allow anyone
to win a clear title to wisdom, namely,
(1) the example of weak and unworthy authority
(2) longstanding custom,
(3) the feeling of the ignorant crowd, and
(4) the hiding of our own ignorance while making a display of our apparent knowledge.
- Roger Bacon
Does anyone have another example of misunderstood technical language that confuses people in their understanding of SEO?
[edited by: tedster at 2:28 am (utc) on Jan. 5, 2007]
Ted, good post and good idea.
If this is going to be a list stickied, I would like to see the Datacenter nomenclature added as well such as Algo change, data push and data refresh and data update.
Maybe we can get g1smd to plop in distinctions on supplement results and how they vary.
|Does anyone have another example of misunderstood technical language that confuses people in their understanding of SEO? |
I often find somewhat difficult to come up with a simple way to explain to clients the subtle nuances between just plain not ranking, being filtered, and actually being penalized.
Good one, netmeg.
I cringe when I hear about site owners not really understanding things like that -- and making wild global changes to their site anyway based on some poorly understood idea. Then more than likely, coming back in a week or two with another round of changes because "the first one didn't work", and then another and another until they've tied a knot so tight that it can take a new domain to fix. They think we're still in the Infoseek era or something.
When we don't understand, clearly the first order of business is to understand, right? When we make a change and see a poor response (or no response) from Google, sometimes the best thing to do is forget about Google rankings for a bit and REALLY focus just on our visitors and our business operations and planning. Sometimes trying to tweak your ranking is like praying for sunny weather during the monsoon.
There are other Google things well worth the study time, too: Sitelinks, OneBox, Google Maps, Local Search, Blog Search, Image Search, geo-targetting -- on and on. There's opportunity in all of that, but it all takes some degree of technical understanding.
Most of all, I think the issue of scale is terribly misunderstood. When scale gets up to the level where Google is wrestling with it, tens of billions of documents, a quarter million servers globally distributed, etc -- we cannot expect Google to act like a flawless desktop application. But yet, we may still be tempted to think in just those simplistic terms.
From my little technical corner:
A URL rewrite and a URL redirect are two very different things. I try to make that clear by using the terms "internal rewrite" and "external redirect," or when feeling particularly pedantic, "server-internal rewrite" and "external client redirect."
A rewrite changes the server filepath associated with a requested URL. There is no required or fixed association between a URL and a file, and a rewrite occurring in the URL-to-filename translation phase of server processing can modify this association in almost any way that is desired; Almost any valid URL can be associated with almost any valid filepath, subject only to HTTP protocol and server access (security) restrictions. URL rewriting takes place within the context of a single HTTP request/response transaction.
A redirect, on the other hand, is a message sent by the server in response to a client (e.g. browser or robot) request. The message for a 301-Moved Permanently redirect says, "The resource you requested is no longer accessible at the URL you sent. Please update your link database and ask for it again by using this new URL." The sending of this redirect response ends the current HTTP transaction. The server includes the new URL in the redirect response, and it is up to the client to issue a new HTTP request, using the URL provided, to access the desired resource.
An interesting side-note: It was originally envisioned that when your browser received a 301-Moved Permanently redirect, it might pop a dialog box and ask you if you wanted to update your bookmarks accordingly. But the Web grew exponentially, search engines rose, and bookmark use dropped, while sloppily-architected sites proliferated, and this was no longer a workable idea. Now, only search engine robots make significant real-time use of redirect responses to update links.
[edited by: jdMorgan at 5:37 am (utc) on Jan. 5, 2007]
Excellent post, Tedster. I try not to think about such things so as to try to preserve my sanity. As I heard Rush Limbaugh say long ago, "words mean things." (Maybe the only thing he and I agree on).
Off the top of my head, a few terms come to mind:
Website Hits. Technically this means the total number of files downloaded from a web site, with the understanding that each page is made up of multiple files, often dozens or hundreds. People like to brag about the number of hits their web site gets, thinking this means page views or visitors. Oh the glory days of the dot-com boom when I was getting a million hits a day on several of my websites.
Server: Traditionally a server is a piece of hardware running server software. I've heard many people refer to many different things from their email client their HTML editor as a server.
Then there is the whole bunch of muck created every time a Google employee uses one of those $25 words that must be common at the 'plex. I assume that these words were used properly when they were originally applied. The problem is that webmasters take them and run with them as if Google created the words in the first place. Orthogonal and Canonical come to mind. I've read both of these words used as verbs and adjectives recently.
The Mother of All Misnomers, IMHO is that term that we webmasters like to use when just about anything perceived as negative happens to our listings. This is the term Sandbox. As far back as I can remember, the technical term sandbox meant a non-production test-bed. It was first used by webmasters in relation to Google listings when new domains would not get indexed. Now a website can be ranked and still be considered sandboxed. I guess the confusion over the sandbox term is understandable. I mean, how many pages of WebmasterWorld are filled with arguments over the debate of if the sandbox even exists?
Around my office we've been counting how many times we see the phrase "long tail" these days. I had barely heard that term used until MC used it on his blog a few months ago. Now everyone is going after the long tail, despite not knowing it existed a few months ago. Some of the most non-technical people I know have been throwing around the term "long tail" lately (you know who you are, Mother). I can't believe we are the only ones seeing humor in this.
Oh well, I've learned long ago not to argue semantics.
Along the java applet discussion, has anyone seen evidence as of late that they are pulling flash? I heard that initiative was going to be undertaken in 2006 but haven't heard any updates.
Great post, Tedster.
Don't get Brett started on "SPAM". :)
Saying your site uses CSS is so vague as to have no practical meaning. It could be anything from adding one line of code to style your hyperlinks, to removing all presentation logic from your markup.
Great topic, Tedster. It seems to be one of the great passtimes of the modern age to speak in a manner incomprehensible to the uninitiated. But you are correct, people often don't know themselves what they are talking about, from lack of knowledge, effort, clear thinking, or all three. I've skipped over countless questions here and on other sites because the askers did not bother to make their meaning clear. Some people will be downright hostile if you ask for clarification.
I thought hits were the number of requests made to a server, not the number of files downloaded. Is this incorrect? And yes, I do understand that a single "page" might require multiple files to display. The difference here being, for example, whether the request for an <img> is a hit, or if the <img> request has to be processed and delivered before being counted as a hit.
One use of misleading or foggy language that has always bothered me is the use of "keyword" to refer to an entire cluster of words someone types into a search engine.
The term "keyphrase" is available, and seems much less confusing, but I must the the only one this bothers. Many people refer to trying to rank for their "keyword" when they are actually thinking about a phrase like "Home Mortgage Loans" or "Cheap Hotels in San Francisco".
The term "Web 2.0" seems an appropriate addition...
|Jordo needs a drink|
|It is essential for clear analysis of search results first, to know something precise about technical vocabulary, and second, to be absolutely rigorous when using these words in our thinking and communicating. |
Even though there's no technical definition of "site", I don't believe you can conduct a clear analysis of search results without knowing the terms for the different types of sites themselves.
Authority Site - A site that may or may not have Google's trust (hey there's another term to add), but does have respect and trust from the peers within it's niche (whoops, another one).
Scraper Site - A site that develops little unique content on it's own, but rather obtains it content from already published articles and items on other sites.
MFA Site - A "Made for Adsense" site. A site that was created for the sole purpose of getting visitors to click ads. These sites offer very little content, but numerous ads. Because there are arguments that all sites with Adsense on them are MFA's, this term is used in it's strictest since when referring to MFA sites.
Directory Site - A site that organizes items (for example, url's) so that visitors may easily find them.
Thin Site - Usually an affiliate based site. It may contain content, but the content could be from the affiliate itself.
While "site" and "page" may not have technical definitions as far as search engines are concerned, they are still useful terms. You just have to be able to spot when someone else is using it differently.
Netmeg, I would add "banned" to your list. It is amazing how many people consider themselves "penalized" for not being in the top 10 for their keyword that a million other sites are trying for. Google has penalties, but they are very rare. I always default to "not ranking well" instead of assuming penalty.
As for the "long-tail", that term has been around for a few years, and some of us have been arguing for it even before that term was running around.
"Authority site" is just that, a site that is considered as an authority by the users. An "authority-like score" is the term that google uses. It is best not to confuse that with "trust" which is usually related to TrustRank. You can be a trusted site that is not an authority, and you can be an authority that isn't very trusted.
A "Directory" has a similar problem to "authority", it isn't a ranking term. What Google uses is "hub-like score"
Here's another one that blurs many discussions -- url does not mean domain. Moving a "page" to a new url is one thing, moving it to a new domain is a very different deal.
Anchor Text - is the text in the hyperlink. I once had to explain this one to "not-so-computer-savvy" person, and had a really hard time doing so. I've end up saying that anchor text is whatever is underlined. When other people link to his site the most important keywords should be in the anchor text.
So after that he started requesting other sites to link to him with links the size of whole sentences - stuffing all his keywords into one huge un-readble giberish. Hence, over-stuffing - when specific words are being repeated in document trying to take an advantage of serch engine algo's. Oh, yeh, algo - Algorithm is computing procedure that search engine use to figure out in which order documents will come up on the results pages -- oh, yeah - SERP's = search engine result pages.
|3. Page has no technical definition |
Google indexes a url, not a page. For example, if the viewport of your computer displays an html document that contains an iframe, then there is content from two different urls being displayed.
So what do you call those things that Google displays when you click "Cached" in a search result (and which Google calls "pages")? :-)
Yes, Google does call them "pages" - but what they store in the cache is the content of the html document found at a specific url at a specific time, right?
The W3C has long wrestled with a technical definition for "page" - especially as the future of html is clearly device independent. In 2005 they did use a tentative definition for "web page" in a working draft. That's as close as they have come to a definition, to my knowledge.
|Web Page |
A collection of information, consisting of one or more resources, intended to be rendered simultaneously, and identified by a single Uniform Resource Identifier.
More specifically, a web page consists of a resource with zero, one, or more embedded resources intended to be rendered as a single unit, and referred to by the URI of the one resource which is not embedded.
This term was developed from the definition of web page in Web Characterization Terminology & Definitions Sheet.
Glossary of Terms for Device Independence [w3.org]
W3C Working Draft 18 January 2005
My point is that to understand Google's indexing process, it can be critical not to have a naive understanding of what a "page" is, no matter how casually you, or Google, or anyone else may use the word in some situations.
A lot of things are labelled on major sites by what is most identifiable / simple for the mass. Hence, all the focus groups/ui studies performed. It, unfortunately, only fuels the misconceptions that this thread is trying to identify/prevent.
Reciprocal links VS exchanged links. Reciprocal links can and often do happen by accident. I link to a site I like. The site owner notices my traffic, checks out my site, and then links to me without my knowledge.
Exchanged links are a subset of reciprocal links intended to manipulate search engine ranking.
PageRank, PR, and TBPR. Sometimes, someone refers to Toolbar PR as PageRank, or says PR when they really mean TBPR. I only use two terms: TBPR and PageRank. PR is confusing to me because it can also mean Public Relations.
copyright, copywriter, copywritten, copyrighted.
Slightly off topic here, but perhaps related, is the puzzlement I experience that so many people who write slovenly English, with seemingly no ability to distinguish between there and their, or it's and its, or right and write, and who think that an algorithm is an algorhythm (or worse), and that a website is a websight or webcite, are somehow able to be be completely fluent in php.
Obviously the attention to detail that you develop when writing code doesn't translate into writing prose.
Maybe it happens because when you misplace a semicolon in php your entire site (okay, I know there's no such thing as a "site") falls down and so you go fix your syntax, whereas even with egregious flaws in your use of English people can still more or less figure out what you mean.
I suppose it would be unkind to reply to an ungrammatical post by writing something like "Parse error: syntax error, unexpected apostrophe in the word 'its' in post #3906666 on line 2." But it might lead to better writing! ;)
I take great care to use the right words for things, but still want to go back and tidy up posts from a year or three back. Great post Ted.
>> Site has no technical definition <<
This is one major headache for editors at the ODP because the ODP treats "a collection of related URLs on a topic, or belonging to an entity (or related entities) as being one site", and then asks for one URL from the entire site to be suggested (not one per domain, nor one per subdomain, nor one per folder; instead just one representative URL from the "site").
That is, a "site" might be:
- a collection of files in a single folder, or several folders "geocities.com/john.doe.the.plumber/",
- or on a single subdomain white-van-man.freehosting-smallville.com/;
- or it may be the entire contents of a single domain www.widget-corp.com/,
- or a collection of multiple domains red-widgets.com/ and green-widgets.com/ and blue-widgets.com/,
- or multiple subdomains, or multiple folders (on the same or on multiple domains).
Yeah. Some headache.
That is the correct use of the term "url". It's not "an url", which would be pronounced "an earl"
|I thought hits were the number of requests made to a server, not the number of files downloaded. |
In my opinion, yes. Graphics, 301's, 404's, etc, are hits too, although the confusion might come when someone is tabulating statistics, they may prefer to count files of a specific type, or even sessions, rather than hits.
|The term "Web 2.0" seems an appropriate addition |
Yes, and that's something else that has no precise definition, because it's not a concrete technology. It's not as if Web 2.0 [en.wikipedia.org] obsoletes Web 1.0.
For example, I would consider WebmasterWorld a Web 2.0 site, because it consists almost entirely of dynamic user-contributed content. But others may say it's not because it's "just" a forum, and forum technology has been possible since the very beginning of the Web.
URL vs. IRL?
SEF vs SEO came to my mind, with the latter probably being nonexistent these days of ultra-non-understanding google's hickups.
g1smd could you kindly explain what a subdomain exactly is?
1) My primary domain is www.mycompanyname.com
2) In summer I learned that this www-version is a subdomain of mycompany.com (non-www), just like i might set up a subdomain myproductline.mycompanyname.com
3) Ever since 2000, when people said to me "keywords in domain-name do matter," I have registered another domain (www.mybranch-mytown.com), which resides under the same IP and on the same server as my primary domain. Is this also a subdomain? (You used the term 'multiple domains'?) Is it a potential source of a duplicate content penalty?
|Then there is the whole bunch of muck created every time a Google employee uses one of those $25 words that must be common at the 'plex. I assume that these words were used properly when they were originally applied. |
Oh yeah, then there's the term algo. An algorithm is of course just a fancy word for what we used to call a formula.
I've read where MC has said that Google's algo hasn't changed for over a year, and somewhere else where Lasnik said that the algo changes constantly, sometimes daily. So which is it? I guess the Google algo means different things to different people, even different people at the plex.</rant> (sorry!)
I noticed that too, but I think it made sense taken in context. When Matt said something like "the algo hasn't changed for over a year" I understood him to mean that no new factors were added. When Adam said something like "the algo changes constantly, sometimes daily" I understood him to mean that the way the established algo factors are weighted changes constantly.
|I noticed that too, but I think it made sense taken in context. When Matt said something like "the algo hasn't changed for over a year" I understood him to mean that no new factors were added. When Adam said something like "the algo changes constantly, sometimes daily" I understood him to mean that the way the established algo factors are weighted changes constantly... |
This is a correct view....
Adding new variables to the algo means that the algo is changing, growing in complexity and size (I suppose one could subtract variables as well...streamlining the algo)
Daily algo changes simply means the ranking variables are being dialed in differently (all the way down to individual page levels in some sectors)
Think of the daily change as a very large 64 channel recording studio control board...hundreds of knobs...to dial in the quaility of the each of the variables that make up the overall sound quality (in this case ranking quality)
The term Authority Site crops up increasingly often. I suspect this is a vague concept that is not fully understood (by me included). Actually I don't particularly like the idea of 'authority' on the web. It begins to smack of a class system and a tilted playing field. But it sometimes seems to mean "a big site that does better than mine", which is probably inaccurate.
| This 39 message thread spans 2 pages: 39 (  2 ) > > |