Forum Moderators: open

Message Too Old, No Replies

Penalty for linking to non-related site?

         

zorafex

4:57 pm on Jan 24, 2004 (gmt 0)

10+ Year Member



I have a computer site right now, but I also have a new biking site. Would I be penalized if I linked to the bike site from all of my pages on my computer site?

Small Website Guy

5:15 am on Jan 26, 2004 (gmt 0)

10+ Year Member



Google just uses anchor text in inbound links to figure out whaat your page is about.

The new alorithm is penalizing your site on a search term if it's too SEOed for that search term. It's a mystery how Google accomplishes this part, but that's the secret behind the new algorithm.

If other people create natural unsolicited links to your site on widgets, like <a href="www.domain.com">Bob's page of widgets</a>, then Google knows your page is about widgets.

BUT, if you try to SEO the site by following all the advice here, then Google senses that it's SEOed and penalizes it.

Bobby

10:07 am on Jan 26, 2004 (gmt 0)

10+ Year Member



FleaPit and kamikaze I share your thoughts exactly.

Let's not get overly paranoid about what type of link is good and what is not. Sure, stay away from link farms, but web sites have links coming in from all angles for all sorts of reasons. The benefit you might get from the link certainly outweights and "link spam" penalty Google might dream up.

A while back Pizza Hut gave out free pizzas to people placing a link on their site to pizza hut, they are number 1 in the SERPs. I can't believe that everybody's site is about pizza!

The new alorithm is penalizing your site on a search term if it's too SEOed for that search term

Small Website Guy, you've echoed my feelings here too. Could you expand on what leads you to this conclusion? I have plenty of "search phrases" that seem to have been "overly-optimized" in Google's eyes and now are banned from SERPs, yet the site is quite relevant.

Either Google believes that the site down at number 1000 is truly more relevant for my niche because they happen to have a word appear somewhere on a page once with no relationship to the other 3 words in the phrase or there really is a "filter" in place.

No question in my mind.

waynne

11:06 am on Jan 26, 2004 (gmt 0)

10+ Year Member Top Contributors Of The Month



We have already seen themes related to link anchor text - now links with anchor text but no surrounding contextural script are being pretty much ignored in backlinks.

The next phase is page theme (I beleive we are seeing this now) - a bike page on a computer shop site linking to a bike shop has suitable thematic elements to help. The computer and bike have a common theme - a shop so a bike page is relevant on the computer shop.

The final step will be site theme although Google were quoted as preferring a large site with multiple areas (themed areas) over a series of smaller singular themed sites.

Looking at link patterns helps Goo Gal decide which keywords form a theme - look at related suggestions of sites to analyse the patterns.

Small Website Guy

11:53 pm on Jan 26, 2004 (gmt 0)

10+ Year Member



Bobby writes: Small Website Guy, you've echoed my feelings here too. Could you expand on what leads you to this conclusion? I have plenty of "search phrases" that seem to have been "overly-optimized" in Google's eyes and now are banned from SERPs, yet the site is quite relevant.

I have a blog. The blog has a name. It's called "The Word1 Word2". Before Austin, if you typed "word1 word2" into Google, my blog home page was the number one result. This is easy for those familiar with SEO to understand. It had a PR of 5, and many of the inbound links had "The Word1 Word2" in the anchor text. It had "The Word1 Word2" in the title and the <h1> tags.

I have to confess that my site has nothing to do with word1 word2, it's just a catchy name I gave it, and there's a graphic logo that matches the name. I thought it was cool that every day I'd get a hundred visitors looking for word1 word2 but finding my blog instead.

But, suddenly, on Austin day, I noticed a big drop in my visitors. At first I thought maybe the hosting company was having problems and the site was down. But no, I noticed that the Google traffic was missing, and I Googled word1 word2 and discovered that instead of being the #1 slot, I had fallen back to page FIVE. What had happened! I visited this forum, and sure enough I discovered that it was a new Google update.

My blog still shows up in the #1 slot when someone types in my name, "John Doe". The number 2 slot is the page of another "John Doe" and his page by all logic should actually be the #1 slot for "John Doe", because not only does his page also have PR of 5, he has John Doe in the title and John Doe splattered all over his page and he even has inbound anchor text that has "John Doe" in it. On the other hand, my page has my name mentioned only once, in regular text. It's not in the least bit a page about "John Doe".

There is also a "John Doe" who is a semi-famous political figure and his page appears also on page one of the results, but although he's famous and I'm not, Google think's I'm more deserving of the top spot.

So the mystery here is why am I number one for John Doe, but not number one for word1 word2? The answer is that I went out of the way to optimize my site for word1 word2 because I thought that it was cool to get the extra traffic. Other websites I have control over haave links to my blog with "word1 word2" in the anchor text.

My explanation is that Google is smart enough to figure out that I've create artificial links with "word1 word2" anchor text and has penalized it. On the other hand, the "John Doe" inbound links are natural unsolicited links. I had no desire to be the number one page for my name. In fact, I kind of wish that I wasn't. But people chose to link to me using my name. Just as "miserable failure" brings up a site about George W Bush, "John Doe" brings up my blog which is not in the least bit about John Doe.

I really ought to have the #1 spot for word1 word2 as well. I have even more inbound anchor text for "word1 word2" from many other blogs, and "word1 word2" is not a heavily sought after commercial term. Right now there is only a single adword that comes up when you do a Google search for "word1 word2", and it's some link to eBay (I guess eBay sells everything under the sun). But instead I'm on page five.

Clearly Google's new algo has figured out that I'm trying to tell Google "hey, this is a page about word1 word2" and Google has chosen to PUNISH me by demoting my page down to page five. I think that without any SEO at all, my page would show up as #1 due to the page rank and the many inbound links with "word1 word2" anchor text.

sblake

2:16 am on Jan 27, 2004 (gmt 0)

10+ Year Member



Well, since your site has nothing to do with Word1 Word2, it appears that Google got it right somehow.

caveman

3:19 am on Jan 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Guess I'll scrap my plan to launch a site promoting high tech products to twenty-something bike enthusiasts. Shame.

Does any one know if rock climbers use computers?

AjiNIMC

4:03 am on Jan 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There was surely a penalty for over optimization (oops, over optimization penalties), this we saw after florida also. But the sites were getting back and then again Austin.

Many people tried to analysed the florida after effect and had come up with many theories, which cannt be proved.

Does the filter of florida is empowered by the austin update? Are ppl getting penalties for OO?

What is the latest buzz in SEO world? I was out for a holiday and was unaware of this new update.
Let me know in short, if possible.

Thanks
Aji

plumsauce

7:04 am on Jan 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




... because there is so much conjecture based on so little if any fact

BRAVO!

plumsauce

7:07 am on Jan 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Does any one know if rock climbers use computers?

well, they're quite handy for calculating terminal
velocity during free fall.

but bootup time can be deadly :)

Small Website Guy

8:12 am on Jan 27, 2004 (gmt 0)

10+ Year Member



Well, since your site has nothing to do with Word1 Word2, it appears that Google got it right somehow.

Neither is my site about John Doe, yet I come up #1.

Interestingly enough, I typed my name into All The Web, and my blog was NOT on the first page of the SERPs. But what I DID find was a page that WAS about me that was on another website. I was totally surprised.

Herenvardo

9:29 am on Jan 27, 2004 (gmt 0)

10+ Year Member



I have a computer site right now, but I also have a new biking site. Would I be penalized if I linked to the bike site from all of my pages on my computer site?

It's a little more dangerous... The simplest answer is that G's algo won't penalize you. But if you have some dirty competance, it's easy that somebody reports you as a link farm and might be some risk of penalization for both sites. This depends on how many links do you put, how big is the link text, how many traffic do you have, and many other factors.

You could make a page (or more) about bike computers and link that to/from the bike site ... i'd say if you carefully research the thing, you likely will find some crossings between the two topics.

Completely agree! There are many ways to cross bikes and computers. The easiest would be through biking videogames for computer, but there can be more.

In any case, if the site is not to big, you could put something like:
"Please visit my other website about bikes" in your computers site, and not to worry. Even if somebody reports you as a farm, if you have this in your menu, I think G will baypass it: keeping two personal sites and exchanging links among them is more a tradition than a SEO technique!
I'm speaking 'bout personal sites. As more commercial and less personal they are, the linking is more SEO and less tradition. Once more, I can not sa you: Do it! or Don't do it! You must evaluate your site and decide wich option will be most worth.

Hoping this is useful,
Herenvardö

BallochBD

9:43 am on Jan 27, 2004 (gmt 0)

10+ Year Member



Would there not be a problem if both the bike site and the computer site are on the same server/using the same host, etc?

Incidentally, I keep seeing people on this forum saying that "Google does this", "what Google does is", "Google does this first then this ..."

How come you guys who make these statements are so sure of yourselves? If you are so sure why don't you write a user manual? You could become very rich.

WyrdoManx

9:47 am on Jan 27, 2004 (gmt 0)



So the mystery here is why am I number one for John Doe, but not number one for word1 word2? The answer is that I went out of the way to optimize my site for word1 word2 because I thought that it was cool to get the extra traffic. Other websites I have control over haave links to my blog with "word1 word2" in the anchor text.

John and Doe are both in the dictionary. Okay. I know that's not (probably) your real name. If you Google for John Doe you'll see that Google underlines the search terms and links to the dictionary. It might well do that for your real name.

Word1 and Word2 are not in the dictionary.

That's one reason why Google could treat the search differently.

The "blog noise" problem is well known. I think it would be fairly easy for Google to spot names (again via a dictionary or even just examining author, copyright, etc meta tags).

I'd believe that blog results are given a different score if the searcher is looking for a name or not.

Okay. I'm trading speculation for speculation here. I'm just not quite swayed by the over-SEO'd argument yet.

Half_Empty

2:02 pm on Jan 27, 2004 (gmt 0)



How relevant is the trip computer on my bike? It isnt the latest p4 but it is a computer!

Herenvardo

10:02 am on Jan 28, 2004 (gmt 0)

10+ Year Member



Incidentally, I keep seeing people on this forum saying that "Google does this", "what Google does is", "Google does this first then this ..."

How come you guys who make these statements are so sure of yourselves?

I normally use the forms "I think", "I believe", "probably" when I post something without being completely sure. But sometimes, I can give answers much more fiables. As I have some experience in programming, I can easily deduce what G can do automatically, what is possible to do automatically and what must sure be done manually.
Thanks for your post, I'll try to give argumentations when I post, but some people could consider them too basic.

If you are so sure why don't you write a user manual? You could become very rich.

I've wrote some points about programming and they are on my website. But I must say that they are available only in Catalan, not in English. And I won't become very rich because they are 100% free. I only wish to share what I know.

Would there not be a problem if both the bike site and the computer site are on the same server/using the same host, etc?

Ok, I'll try to make an argumented answer, here it goes:
Since there are some sites that spread through more than one domain, and some domains that hold many sites (tripod, geocities, etc), the domain is not a valid indicator of which files belong to the same site.
I can not assure that G has no way to check automatically which files form a site and wich not, but any programmer will know that it is very hard, and would involve applied artificial intelligence, so we can almost assume that they have not such a system.
So, putting the bikes site and the computers' one in the same domain would still be easily detectable through a manual check. The question here is ¿when G people decide to make a manual heck to a site?
Even when some tricks are not detectable by automatic systems, there can be a robot that detects "suspicious" sites. After getting a couple of them, they can all be manually revised.
The other way is to randomize wich sites to check: since this is impredictable, I won't give more details 'bout it.
If G uses such a robot, then is easy that it detects as suspicious two domains when all the pages from one link to the other.
In conclusion, you'd be still vulnerable to a manual check, but it is less probable that that check happens if both sites share a domain.
Ups. This kind of argumentation makes the post a bit longer! :O But I hope everybody is able to understand it.

Greetings,
Herenvardö

Bobby

10:47 am on Jan 28, 2004 (gmt 0)

10+ Year Member



there can be a robot that detects "suspicious" sites

Hi Herenvardo,

I'd like to take advantage of the fact you are a programmer and get a general idea of what Google can and can't do easily in their quest to "stick it to the spammers", I hope you don't mind.

I dropped from number 2 for my main keyphrase to just past Pluto on the outer edge of the galaxy and still haven't got a clue as to why.

My first question is this:

In reference to Google having a robot which checks "suspicious sites", what type of system would they need to label a site "suspicious" and consequently subject it to further scrutiny? More precisely, what elements of a site would be easy to spot as possible spam? Exact matches between title tag and H1? Repetitive use of search phrase (with some threshold point triggering the robot)?

Could, and would Google implement this throughout the whole database or only to certain highly competitive commercial sectors like travel (mine) or real estate?

For what reason would a search for "custom designed blue widgets" not appear in the top 1000 SERPs yet the exact same search with +a (custom designed blue widgets +a) comes to the top?

Lastly, would you subscribe to the point of view that Google has a dictionary of terms (possibly taken from top 10,000 searches or other "most common" monthly searches) to which it applies another algorithm?

Sorry for all the questions but I think we will all benefit from sharing knowledge and working out exactly what Google can and cannot do.

Just inPrefs

9:43 am on Jan 30, 2004 (gmt 0)

10+ Year Member



The mystery is not such a mystery in my opinion. If I was Google, it would be perfectly clear for me that not the H1, ALT or HREF's little content will be the automatic proof for the site's overall theme. The content will. The body overall content. If there is no such content in the pages' body, why would I trust the H (very well resized in CSS), the TITLE (may be a single page one) or a HREF anchor text from another "miserable failure" weblog?

So try to relate your content to keyword 1 and keyword 2 when you target these keywords. It's a secure long term benefit.

giggle

10:30 am on Jan 30, 2004 (gmt 0)

10+ Year Member



Surely it is easy for Google to determine the theme of a site. It would check to see which category the site is in in the directory (Google or DMOZ). If your site links back to the same root category as that of the sites you are linking to then they have the same theme?

taxpod

3:44 am on Jan 31, 2004 (gmt 0)

10+ Year Member



I've finaly got first-hand evidence that links from a page in a site about brass tacks to a new site about juggling is definitely not helpful.

A friend has a site which is on a completely different topic than my 10 or so sites. But my sites rank pretty well and I figured I'd at least put him on the map with some links. After Florida it seemed as if the anchor text in my links to him was the key. And it was but with Austin his site disappeared. Today it is back. But it doesn't show up for any relevant searches. If you search his domain name, you get the typical page but when you click on "similar to" you get almost no hits on his subject. Instead get almost exclusively pages on my subject. So you really have to be searching for "juggling with brass tacks" in order to find his site.

The theme of this message is that Google seems to indeed be theming. And since little weight seems to be given to the words in the anchor text, I guess that game is about over. Now it seems to be the text on the pages linking to the object page. Or perhaps the links into the linking page? In any event there seems to be theming going on. Others have said this is occurring but now I've seen it for myself.

AjiNIMC

3:55 am on Jan 31, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a doubt.

Whether relvency of the site matters or the relvency of the page. I have seen this that a guy running a car site got his page to #1 for computers.
car.com/computer.html is #1, if my site is also a computer site , will car.com/computer.html be relevent.

In Short is it site relevency or page relevency the issue

Thanks Aji

pavlin

4:47 am on Jan 31, 2004 (gmt 0)

10+ Year Member



Well it's not clear for me how woulg G decide wich link is on topic?
How would it "understand" that link "Get a Bike PC" is relevant, but "Get a new PC" is not. And what if the link were "Get a new bike".
I have a feeleing that the current chaos at G is starting to bring some urband (or maybe SEO) legends to life...

AjiNIMC

5:19 am on Jan 31, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What I think is there might be some function(very complex) say relevency(key phrase);

This will check the relvency of the page where the link. For everything
Title
URL
Content
Links e.tc. there will some points for each attributes.
if(relevency(key phrase)> some value)
{
relevent link;
}
else
{
non relevent;
}

pavlin

5:35 am on Jan 31, 2004 (gmt 0)

10+ Year Member



The poin is it takes an AI (artif. int.) when it comes to the semantics. It's a dangerous game and if they realy mean to play it, there should be strict rules set.
I think it takes more than a "very complex" function to deal with the meanings of the anchor text.
It seems the non english searches are not affected by Austin. And there I can find another problem - if G is trying to block spamers, why doest it counts the self-links (from a page to itself) as a backward link?

Herenvardo

11:24 am on Jan 31, 2004 (gmt 0)

10+ Year Member



there can be a robot that detects "suspicious" sites

I've quoted myself to make clear that it's only a possibility. Now, I'll try to answer your questions, Bobby.

If I was Google, it would be perfectly clear for me that not the H1, ALT or HREF's little content will be the automatic proof for the site's overall theme. The content will.

If I was G, I'll also think so. But if we were Gbot, we wouldn't think at all. This is the weakest point of machines and programs (bots are considered programs): they can not think. And to check the content of a page, you have to think. If you are not able to think, then you only can apply some predefined methods to get statistics from the file. And this is what bots do: they look up the site and generate statistics reports that are compared to some models. Content is checked, of course, but also the html tags you mentioned. Current robots can care about the number of times a word appears on the file, how near are two words, etc. They can also check if the result of that content analysis is coherent with title and header tags, etc.
In reference to Google having a robot which checks "suspicious sites", what type of system would they need to label a site "suspicious" and consequently subject it to further scrutiny? More precisely, what elements of a site would be easy to spot as possible spam? Exact matches between title tag and H1? Repetitive use of search phrase (with some threshold point triggering the robot)?

You are almost answering your own question. There are a lot of clues that such a robot would search for, and your examples are very good. Even so, we cannot know surely if such a robot exists, nor what would it take in account. Aything that can be treated as a number could be taken in account, and there are a lot of interesting numbers in each file: proportion of instances of a word over the total number of words, minimum, maximum and mid distance between two instances of a word (can be counted in words or in characters), proportions among <Hx> and <P> tags (more headers than paragraphs could easily mean spam, for example), a lot of <a href=""> links with the same domain or following some kind of model would reveal a link farm, and so on until Google programmers' immagination lasts.
Could, and would Google implement this throughout the whole database or only to certain highly competitive commercial sectors like travel (mine) or real estate?
Could? of course. G even could use a different implementation for each sector, depending on the spam techninques most used in each sector. Would? this is not a question for a programmer. The only one in WW who can answer that is GG.
For what reason would a search for "custom designed blue widgets" not appear in the top 1000 SERPs yet the exact same search with +a (custom designed blue widgets +a) comes to the top?

The most easy explanation would be that the site is not too relevant to be listed in the first search, but it has a lot more of a's than the competance. Another explanation would be that the first keyphrase is filtered by any reason and the second not. Have this happenned recently? This might still be florida!
Lastly, would you subscribe to the point of view that Google has a dictionary of terms (possibly taken from top 10,000 searches or other "most common" monthly searches) to which it applies another algorithm?

G could have some dozen algorythms and apply each to different searchs (or even the same, at different moments). After the Florida Massacre, I think that they use at least 2 algorythms.
Surely it is easy for Google to determine the theme of a site. It would check to see which category the site is in in the directory (Google or DMOZ). If your site links back to the same root category as that of the sites you are linking to then they have the same theme?

Such a test is technically possible, even easy. Even so, I think more factors are applied. But I feel very probable that G uses the directory information to improve and refine SERPs.
What I think is there might be some function(very complex) say relevency(key phrase)

Very complex... let us say that with only 300~400 lines of code, such a function could do something accurate. It's only an approximation.
theorically, if we made a dictionary, in database format, with all definitions clear and unambiguous (machine-readable), a program could use it to emulate thinking, and this could be applied to determine the theme of a site/page. This way of programming is called Knowledge Based Systems, and is the current base fo artificial inteligence. Why not? If games use AI engines to get more interesting, I feel logicall that G uses a little AI engine to improve SERPs (here little could mean something around 200~300 Mbytes).
if G is trying to block spamers, why doest it counts the self-links (from a page to itself) as a backward link?
The hardest question! I'm unable to answer and feel that GG too! Maybe you have detected a bug ;)

Hoping to be useful,
Herenvardö

PS: Many things said in this post are based on suppositions. There where the verb can is used, I mean that it's technically possible.

Just inPrefs

9:14 am on Feb 4, 2004 (gmt 0)

10+ Year Member



Who says the life is easy even for a spider? You actually have very hard tasks being a spider... Mostly if you are Google's. :-)

tenerifejim

3:28 pm on Feb 4, 2004 (gmt 0)

10+ Year Member



I've got to go with AjiNIMC on this one. There are a million socially relevant links that exist that google may not be able to spot.

A site on knitting may also be relevant to someone searching for pensioner holidays. A site on Manga may well be a good place for a link to rock music.

This social relevance is something that will be very difficult to code, and I cannot believe google would ever penalize for it.

This happens in advertising all the time- how many people watch an commercial about home loans during a TV show about Cops?

Herenvardo

8:46 am on Feb 5, 2004 (gmt 0)

10+ Year Member



This social relevance is something that will be very difficult to code

can you translate social relevance to mumeric relations? If you do, you can code it.
I agree: by now machines are not able to understand our society, so they can not detect social links.
If somebody finds the numeric relationship among the differents aspects of society, please let me know ;)

Greetings,
Herenvardö

BallochBD

9:04 am on Feb 5, 2004 (gmt 0)

10+ Year Member



Herenvardo I like the way you approach a lot of the posts on the forum. Some of us get carried away a bit and start to wildly speculate about what the Google algorithm can or cannot do. The fact that you keep pointing out the capabilities of programming code can often put this in perspective. Keep up the good work!

Herenvardo

10:35 am on Feb 5, 2004 (gmt 0)

10+ Year Member



thanks, BallochBD
I only try to use what I know to help others, since a lot of times I've get help from others' knowledge. As I posted before (#45) I try to separate what I trully know from what I supose and/or deduce.
I've been programming as a hobby for many time, and I entered SEO by chance. It was the first job I was able to find related with computers and I'm very happy of all I've learned since past July.
Sharing my knowledge, there where it can be useful, is a way to thank those who have helped me these months.
So, thanks to all of you! :) A forum, or a forum list, is simply a set of databases and scripts. But there is a community around these forums, people who make them GOOD forums and work dayly moderating, posting or even reading.
Maybe this has been a little out of topic, but I wanted to say it.

THANKS,
Herenvardö

ZachFSW

6:01 am on Feb 6, 2004 (gmt 0)

10+ Year Member



You know whats funny about all this - what about forums - my forums talk about sports and politics and the newest movie and who is the hottest whatever.

Like all forums - so how is google can place a site like that in a box - and while currently its obvious that is a forum to google, normally it is not, just waiting for vB 3 Gold and then I redo all my urls in such a way that they are normal looking and unless google really is a god its not gonna get that its a forum from my html (mainly because its all over the place)

I actually, now that I 301ed my extra 400 domains pointing at my box, am doing good on a variety of search terms, all that are related, and like always, doing really good on search terms that I make me double take - then I see what thread they go to and laugh cause for those few pages it is complete relevant.

This 60 message thread spans 2 pages: 60