|I am experimenting with outbound links this week. |
Me too - This part of the discussion has definitely got me thinking as one of my sites (old site, changed content) does very well with just one (non reciprocal) out bound to 'authority' sites on each page, when compared to similar sites where I'm hoarding PR.
>> However, I have a redevelopment case, now 3 weeks live, where thousands of urls have gone 404, and we only used 301 for maybe 60 urls that got the heaviest search engine traffic. <<
I can offer a site with very bad duplicate content problems (multiple URLs leading to same content) that was slowly being fixed, when they decided to completely re-organise the site. Several thousand pages were moved to completely new URLs. Just a few dozen pages stayed at the same URL. So far Google seems very slow at picking up the new URLs, but it has only been a month or two.
I think the discussion of "amount of 404s to trigger something" probably revolves around the question: "does this look like a completly new site, after a change of ownership"?
In fact, I guess that most bits of the algorithm, basically ask of something: "what type of spam is that trying hard not to look like?"
>> Anyone -- your competition, for instance -- can create an unlimited number of 404 links to your domain. <<
I would hope that Google ignores a URL that the very first time they spider it, it returns a 404. I would hope they check it again a few times, then forget they saw it.
However, URLs that return 200 and content for a while, and then later go 404 are a completely different matter.
Again, one important property of a URL would be whether only internal links contain that URL, only external links contain that URL, or a mixture. I would hope that URLs synthesised only from outside a site would not carry as much weight as those found internally.
|does very well with just one (non reciprocal) out bound to 'authority' sites on each page |
I link liberally to pages on other sites that offer more information and/or are a source for each of my articles. I also link to pages like museum pages that have a picture that illustrates something in the article.
You would think I would be leaking PR like crazy but these articles almost always rank on the first page in Google for the topic and often are number one.
I'm not sure if these outbound links are helping me but it is a possibility. I also think it has helped me in getting links from academic and government sites because I list all my resources for an article and link to them when possible.
|think it has helped me in getting links from academic and government sites |
> Maybe I'm not reading this right but if there really were no links to the new pages then without AdSense there is no way Google would have found the page without finding it through AdSense.
Orpan pages. They can be found, for example, if someone with the Google Toolbar or running the Opera browser looks at the page as this sends the URL of the page to Google. Normally orphan pages don't stay in the index . This has caused some surprised where people have set up "private webs" which then get discovered and indexed by Google.
> Google will penalize a domain for too many 404s. Opinion-Myth
I'm not saying Google does penalize a domain but it would make some good sense to do so. Google doesn't have infinite resources to spider the web (it just seems that way). If they start getting a large number of 404s on a single domain the robot may just decide to give up assuming the site is very broken - at the very least I would expect much slower indexing in the future until the problem is fixed.
Making a connection just to get a 404 requires quite a lot of TCP/IP overhead, actually getting the data afterwards is almost a breeze.
It is like that duplicate pages experiment that someone did a while ago with duplicate content where the spider checked out a few pages in the directory then decided not to revisit because it had triggered some kind of duplicate threshold.
> Confirmed. Infact our hyphenated domain name ranks better than our non-hyphenated version. It's as though the search engines recognise the two words as being separate because of the hyphen.
If search results are an indicator of what Google (along with other search engines) can index then search engines cannot recognise words that are run together so:-
bluewidgets.com looks like bluewidgets dot com
however Google does recognise '-' as a word separator (but not '_') just like a space so
blue-widgets.com looks like blue widgets dot com
Google presumably doesn't do anything with the tld although certain tlds may confer more trust in the site (.org, .edu)
However it is important not to confuse an effect with its cause. Why should a keyword in the URL have a +ve effect on SERPS? Is it that Google puts a lot of weight on the domain and weight on the rest of the URL? Or is it that most people are lazy when linking to websites and just use the URL as the anchor. The hyphenated version as the anchor give an IBL boost?
Almost certainly a combination of the two.
Okay and now some science:
taken from the Expression Engine forums:-
[quote]> Derek Jones - 23 August 2006 04:03 PM
> the "Favour dashes rather than underscores as some search engines don't recognise these as word seperators" is a myth.
Actually Derek it is a myth that it is a myth and I suggest you try it.
Google, as an example of one search engine, sees underscores as underscores. Try the search with spaces, dashes and underscores and you will see markedly different results. Then find an Expression Engine site that uses "_" as the separator for URLS. Use Googles allinurl: directive to just search on what part of the URL Google has indexed, that way you avoid confusion for obscure search terms where H1 or even body text has enough weight to figure in results for example:
allinurl:<space separated keywords> - returns nothing
allinurl:<underscore separated keywords> - returns an EE site
That is because Google matches underscores to underscores. Try the same thing with a site using dashes as seperators:
allinurl:<space separated keywords>
allinurl:<dash separated keywords>
both searches match the page: <edited>
In fact with Google, putting dashes in the search term is the same as putting the phrase in quotes. At least as far as the URL is concerned. That means that the exact phrase must occur somewhere in the search term. So:
but would not match
This is actually not a bad thing from a SEO viewpoint as it makes our pages more specific to what is being searched for.
Obviously when you do a general search there are all sorts of things like stemming going which can change the results depending on dashes, spaces, word order etc. Other search engines operate differently. Last time I checked msn search treated underscores as word seperators. But then virtually no-one uses msn search so who cares?
<Sorry, no specifics.
See Forum Charter [webmasterworld.com]>
[edited by: tedster at 3:53 pm (utc) on Nov. 1, 2006]
I forgot to mention, we submitted both of those pages to google via the url submit.
It shows that google has crawling priorities, adsense pages are always first to get crawled and indexed if they meet guidelines
"adding outbound links to relevant sites makes a BIG difference in SERP results"
Proove: About 3 times I posted to a blog and edited wikipedia pages (ontopic of course), that always have been below my site in the serps for certain double keywords, and included a link with the keywords.
Now they're on top of me %-)
About the underscore versus dash, this is a direct quote from Matt Cutts:
"So if you have a url like word1_word2, Google will only return that page if the user searches for word1_word2 (which almost never happens). If you have a url like word1-word2, that page can be returned for the searches word1, word2, and even Āgword1 word2Āć.".
Now that's science.
Slash, hyphen underscore thingie...
That's just on-page info, meaning text, anchor text, alt tags, titles, descriptions and the like... right?
Not URLs. (?)
G recognises _ as word separator in URLs, but not in content...
At least that's my experience.
hoarding PR is so "three years ago"
connect to authority resources in your sector and add value to your site visitor's (human) experience...the engines tend to view this as a positive
|G recognises _ as word separator in URLs, but not in content |
No, the Expression Engine comments above are specifically about the inurl: operator results. The reason behind Google's unexpected treatment of the underscore is that there are many technical keyword searches that perform better when the underscore character is treated as a true chracter rather than as a word separator. Think of the way Front Page uses the underbar to begin their dedicated extension folders. There are a multitude of such technical examples.
File size is a directly measured factor in the ranking algorithm Opinion-Myth
I recently have been forced to downgrade my own take on the file size factor. Years back, it seemed like there was definitely a sweet spot for file size, and that big html files were taking a negative hit. But now I see many counter-examples. This includes a recently re-developed site whose new html files are WAY too big for traditional SEO wisdom (120 kb and more, on average) and yet the urls immediately moved UP the SERP immediately after launch.
What I now feel (Opinion) is that file size was never really an algo factor, and Google has greatly improved at ignoring this irrelevant sigal and isolating the content for ranking purposes, even on very bloated pages. In other words, the file size phenomonon was only peripherally related to ranking, but I was making a "post hoc ergo propter hoc" error in my logic.
|G recognises _ as word separator in URLs, but not in content |
|No, the Expression Engine comments above are specifically about the inurl: operator results. |
Ah. You're right, i just checked O.o
I still don't get it.
When i do a search on a certain keyword combination ( city and district names )i get results from our site that highlight the keywords in the URL. Eventhough it's actually a folder, /cityname_-_districtname/. It does get recognized when doing but a simple search.
Now if i do what you said to do, the inurl: thing...
I could live with it not being visible for the inurl operator search, only normal searches, but this made me thinking for a moment...
What's the deal with this _ then?
Highlighting is just a character string match done over the calculated SERP as a very last step. So it doesn't indicate that each highlighted term was actually used in the algorithm's calculation -- it's just a way of showing the end user that their search terms can be seen here, and here, and here in the search results. Nothing more than that.
[edited by: tedster at 5:07 pm (utc) on Nov. 1, 2006]
Ah, but the highlighting of URLs in the SERPs is simply a display process that highlights occurrences of whatever full or partial character strings you searched for. It is not related to ranking at all. I often see logic failures with it, with some words highlighted and others not, or even just part of a word highlighted.
[Heh, Tedster got there while I was on the phone.]
BTW, the sites that sit on top of this serps are
1. old and
2. are .gov sites or .edu
I would think the trust factor for them is high.
Hmmmm. This sounds like multilevel marketing:
First one's in get all the benefits.
Everyone else has to purchase AdWords.
> Ah, but the highlighting of URLs in the SERPs is simply a display process that highlights occurrences of whatever full or partial character strings you searched for. It is not related to ranking at all.
but it is one of the most pervasive Google myths.
> I recently have been forced to downgrade my own take on the file size factor.
If files are too long or too slow at downloading googlebot will not always download the whole of the file, I'm sure somewhere in Googlebot it has both a file and time limit otherwise the process could be tied up infinitely download one file.
In addition I haven't checked recently but there was a limit on how much of a page is indexed. Last time I looked it was around 500kb. Okay not many pages are that big but I suspect that this is actually a googlebot download limit.
|> Ah, but the highlighting of URLs in the SERPs is simply a display process that highlights occurrences of whatever full or partial character strings you searched for. It is not related to ranking at all. |
but it is one of the most pervasive Google myths.
I think i learned something new today.
Now all i want to know is whether the co-ops labeling websites will have any feedback to general ( ie. non refined ) queries.
A debate I am having with other webmaster friends:
Google still penalizes the target page of GoDaddy 302 redirects (and also possibly the target pages of other "blackhat" 302 redirects).
My opinion: Myth, based on prior Googlebug 302 fiasco, since supposedly fixed.
1) Several 302 redirects from my Godaddy vanity domains to my primary home page, and it is still indexed, and not supplemental.
2) No 302 redirects listed to my site as shown with allinurl or inurl commands, including the GoDaddy redirects I know are in place.
Any other indications I should be looking for in this case?
[edited by: RonnieG at 7:56 pm (utc) on Nov. 1, 2006]
|In the opening post, I mentioned logical fallacies that can corrupt your SEO process, and here's a big one that might be in play on the Sitemaps issue. The fallacy is called post hoc ergo propter hoc, or translated from the Latin, "after this, therefore because of this". |
Sorry I'm not adding more to the discussion other than to say thanks but I just finished re-reading this entire thread and I wanted to specifically thank tedster for starting this discussion and also for pointing out the logical fallacies.
This thread has helped me firm up some of my current thoughts on SEO and allowed me to eliminate some of my "I think it is this way but don't really know" thoughts on the matter.
The logical fallacies has helped me re-think my position on several current real life issues, even some as mundane as what was causing my newborn's apparent late night feeding discomfort.
This thread is the exact example of why anyone serious about SEO/SEM should be on this board soaking up the knowledge.
Thanks all (and specifically you tedster)
|In addition I haven't checked recently but there was a limit on how much of a page is indexed. Last time I looked it was around 500kb. Okay not many pages are that big but I suspect that this is actually a googlebot download limit. |
I have a 70MByte pdf-version of one of my supplier's catalogues on the web and googlebot - after some initial hickups - has been regularly crawling it the past six months or so. At least the webmaster central didn't report any errors.
However: It is not really indexed and has no pagerank. Maybe too many images in it, not enough text. Or maybe there is indeed a limit for the indexing process.
<This message was spliced on to this thread from another location.>
Black hat SEO sites promote the idea that one can boost one's ranking by placing one's link at various locations on the page. They sell links outside of the footer area for more. Upper left corner of the page, middle of the content are the best areas they claim.
What is your view?
I think it is myth.
[edited by: tedster at 7:10 pm (utc) on Nov. 6, 2006]
I've seen evidence (and read articles) that Google assesses different "blocks" on the page in different ways. Rather than Myth, I would rate this idea as Probable - True
Regarding Google I think it's finally time to bury the 302 page hijack - it was a very real thing once, but that was a long time ago.
I have not seen any valid examples of it for several months although people keep writing me. In my experience it's always something else that is causing peoples sites to go AWOL in the SERPs these days.
And no I'm not available for consulting at all, in fact I'm not very available at all.
| This 117 message thread spans 4 pages: < < 117 ( 1 2 3  ) |