| This 213 message thread spans 8 pages: 213 (  2 3 4 5 6 7 8 ) > > || |
|Pages Dropping Out of Big Daddy Index|
< continued from [webmasterworld.com...] >
Seems to me that Matt's recent message confirms my theory. We're either all a bunch of moaning idiots with low quality sites with a few innapropriate, spammy links scattered here and there...or...
|The more I think about it the more convinced I am that the missing pages problem is being caused by a Backlink/PR issue (see Msg #15). |
Tying together all of the evidence from my own experience, and that of others gleaned from the forums, erroneous or out-of-date backlinks would explain all of the missing pages.
The erroneous, or simply out-of-date, backlink information (which we cannot see) leads to insufficient PR (which we cannot see) and hence deep pages are not indexed.
We all know that a "link:www.mysite.com" does not show you the complete picture. But, since Big Daddy, it now shows just a tiny proportion of backlinks. Way less, than it used to show before Big Daddy. Why? Because either the backlink index hasn't been updated (and now dates back to mid 2005), or else because it has been updated, but the update process is buggy. Only a small handful of Google employees know which of these two possibilities is the case.
We know that the missing pages problem cannot be due to any kind of duplicate content filter, as some people are suggesting. If this were the case, then effected sites would see a proportion of their pages disapear. Some would lose 10%, some would lose 40%, and some would lose 95%. But that's not what we see. We see sites losing the vast majority of their pages or else losing no pages at all. The reason effected sites lose such high percentages of their pages is because of the hierarchical nature of a site. The number of pages increases with depth, and the artificially low PRs (based on innacurate and/or out-of-date backlink data) prevents the deeper content from being indexed.
The fact that Big Daddy was kick-started from an index dating back to the middle of last year, not only explains why the backlink data might be stale, but it also explains why ancient pages keep popping up on various data centres.
As further evidence: try a "link:www.mysite.com" and compare it to a search for "www.mysite.com". In my case, the "link:" search shows just 6 results, only one of which is external to my site. The one external backlink probably pre-dates when Big Daddy's index was seeded. The "www.mysite.com" search, on the other hand, finds hundreds of results representing hundreds of internal and external backlinks. Why aren't these showing up in the "link:" search? Is it because "link:" searches are well known for not showing you the complete picture? Or, has that well-known fact simply been obscuring the true cause of all of the problems? Namely, that the backlinks are simply missing from Google's backlink index.
[edited by: tedster at 8:25 pm (utc) on May 17, 2006]
I agree with this faulty backlink/PR data theory. I seems google is now unable to perform a valid PR update. Not counting the highly flawed attemp 2 months ago, the last real update was when? January?
Google is in desperate trouble.
I sort of agree with you.
I know google is not giving us webmasters a full picture with the link command. I did the link command on yahoo and msn and I noticed some scraper sites copied my content and added some links to a few of my websites. I have a feeling google is looking at these links as questionable. I am in the process of emailing these scraper sites webmasters and getting the links removed because I did not request to put them there and they violated copywrite by taking our content.
Since google crawls better than msn and yahoo, will there be a way in the future for us webmasters to see these links? Honestly right now if a competitor wants to silently tank a websites rankings in google all they need to do is drop a bunch of bad links. Without google giving us webmasters the ability to see the links we may never even know this could happen.
Ok guys, have to give my insights mainly due to the people who have asked for my help and I think this is the best topic to post my observations.
Basically, what the Matt Cutts blog entry talks about confirms many of the problems webmasters are now facing.
Big Daddy is a new "crawl and indexing" infrastructure - and this is the key. It is clear that this new infrastructure will choose to index or not (and deep) content based on links - inbound and external.
What I believe we are seeing is the result of changing drastically what are the thresholds for indexing and ranking - this has been stated as much in that blog post.
What has been is irrelevant compared to the new crawling and indexing technology - which is fine, if it works.
Unfortunately it is clear it is not working - why else would Google ask for examples and then state (in Matts Blog) that the threshold was changed to incorporate indexing of the affected sites.
I mean, come on, you have deployed a live infrastructure that by it's definition will define the state of your indexing (depth and scope) and then ask for sites where it wasn't indexing and then say you have altered the "threshold".
Well quite simply - you ain't altered the threshold properly - and as a result you are missing millions of pages and some of them had links on them and as a result inbound links to sites have depleted and therefore couple that with your new crawling - the end site gets all its pages removed because it no longer meets the threshold..... blah blah blah... End result is the chain reaction of the new infrastructure - massive changes in indexing, ranking and site coverage.
Fair enough, but you are getting to the guys that matter - us. And you have taken a fair while to get a considered answer to us Matt, whilst posting days of crap about stif that is the least important to people who use Google (your customers) that we don't care about mate:
It's a case of "Nothing to see here" bull - glad to hear that you have hired a spin doctor, you lot need it.
Anyway, my answer is forget worring about spam - because spam is looking good in the big daddy index.
I know, because I work with more than 8,000 sites for customers and myself. And spam looks good for traffic - link spam is better.
What a load of crap.
Ok, I forgot to mention what is the best way to deal with the new Google and get your site sorted:
1. Get inbound links from quality sites (love to tell you what that quality means, but just stay away from clear spam inbound links)
2. Reduce outbound links full stop
3. Make sure you have no canonical issues (non www do a 301 redirect to www versions of website, make sure all internal links to your home page are to the domain not the page e.g index.html)
4. Description and Title tags are unique for each of your pages
6. Hide affiliate links - use encryption scripts
Key is to get more inbound links and not just to your home page - internals as well. This will up your indexing threshold. But don't get them too fast - one or two a day is good for a small site.
For a big site do a reinclusion request or get links at a faster pace from higher page rank sites.
All this could help - has worked for me.
Also forgot, cross fingers - and hope your site doesn't go supplemental for no reason at all. When that happens, contact Google as that is the bug in my opinion.
Swanson, that's always been the best approach, long before BD. In fact, if you had those things under control before BD, you had no problems when it rolled in.
|2. Reduce outbound links full stop |
Yes, if the site was a supposed "directory", and was nothing but links. Otherwise, outbounds to pertinent sites are as valuable as ever, as long as it's reasonable (i.e. no more than 10-15 total on a page, and best if it's just 2-3).
|6. Hide affiliate links - use encryption scripts |
Oh, and i also forgot to mention my own issue about this Google Big Daddy update:
Googleguy started this thread a long time ago and hasn't posted since - Matt has taken the same time to post a technical and (dismissive) view of Big Daddy problems.
Get used to the new Google, this is it. And ask yourself the question why Yahoo and MSN don't have these basic problems.
Answer, I don't know at all - they seem to be able to indexd websites. Seems to me the only problem they have is getting a better market share.
Anyway, long live Alta Vista - I mean Google.
Swanson, can you explain what you mean by this:
"also use a whitespace removal tool for your HTML page"
Stefan, but whether or not it is good practice before Big Daddy what I am saying is that is if you have a problem now it is essential - not just good housekeeping.
"outbounds to pertinent sites are as valuable as ever"
Valuable in what sense - to the user (yes) - to Google (prove it).
I have a sample size of 8,000 differnt types of site - in my tests reducing outbound links increased indexing and index coverage. Please outline your experience in terms of Google indexing, ranking and traffic.
"Hide affiliate links - use encryption scripts" - easy, just check any free script resources. And on these redirect scripts place "noindex" tags to Google and friends. Works like a charm.
Stefan. Makes me think you are the sort of guy that plays by the rules and it hasn't gone wrong for you yet so you don't need to think outside of the box?
Steph_R, yes no problem - what it does is remove excess spaces and carriage returns from your HTML output.
There are a few free tools out there where you can put your HTML in there - I just can't find one with Google anymore!
I will have a look if you are interested, but if anyone knows of a free tool that would be great as it really does help.
Yes, I am interested. Let me know if you have one in mind.
|Googleguy started this thread a long time ago and hasn't posted since |
No, he didn't. This thread was split off of an existing one, using a post of GG's as a new start point. That one post was long coming in the previous thread anyway, and coincidentally appeared soon after a post of mine (edited out when the new thread started), that suggested he was too busy keeping track of his stock options to care.
That said, imho, Google is in no way broken, and many of the sites that went missing in BD did so for various very good reasons. Generally, in the fields I follow, authority sites had no problems whatsoever. At the same time, there's still us much dross in the serps as ever for many searches, so it's no better post-BD. The main problem is that 99% of the internet is garbage, and G is intent on listing as many pages as possible, so 99% of the results in Google (and Y and MSN) are garbage. Garbage in, garbage out.
are you saying that using 301 Permanent Moved index.htm pages is bad now?
Fair enough, but bothering about who started what is the reason this whole topic is not helpful to people who are asking.
From my experience Google has huge problems indexing content and therfore the spin offs are that other pages and then sites have huge problems too - that has been documented.
Basically Stefan, your sort of comments are the reasons I am leaving webmasterworld - basically you have no volume experience so can't make a value judgement and makes me wonder why I bother to try to spend my own time in trying to give others some help.
I am tired of getting crap back, I don't care because the advice I give out works - just thought it would help.
At the end of the day I make $200k per month through this and my time is valuable, but having to defend my posts - what a nightmare!
Thanks but no thanks - off to do something that matters.
Swanson, don't judge everyone in Webmaster World by one person's response. I, for one, enjoy reading posts from experienced webmasters. Anyway -- would appreciate if you would not give up on us.
|Basically Stefan, your sort of comments are the reasons I am leaving webmasterworld |
Don't let me chase you out - I usually avoid these threads, anyway. I've seen it all come and go so many times. But yes, I don't do volume, and it does seem that those who do are particularly prone to these disappearances whenever a new algo/whatever moves through. My thoughts on it, from the beginning, have been that if you want to sell the same stuff as a million other people, you're in the wrong business.
May I compliment you on your approach to offering a solution to problems. You have made my day by helping me finalize a decision on an issue we have been looking at.
I hope that you have a nice day.
>> Basically Stefan, your sort of comments are the reasons I am leaving webmasterworld
I hate it when people announce this; sounds as if they want attention with a "please don't go."
It's a free world, and no one forces you to post, stay or leave. We all have free will.
Honestly I think Swanson was trying to help people. Kind of refreshing to me.
I am reading this post day after day and have not come up with a solution..
What's next step?
Will that get us fully indexed?
We have unique metas and contents on every pages. No spamming what's so ever.
One thing we are missing is quality inbound links..
Is quality links going to be the "thing"..? Will quality link help our site from not being well listed to good SERP's?
Something's cooking. Record day. Film at 11.
I can tell you one thing, it ain't Google.
I honestly think right now no one can give you hard fast guidelines to follow as its still early days - and a lot of people are still saying G has problems & bugs within BD so any changes you make could long term harm your site
After my drop in traffic yesterday today I see no sign of any type of recovery in fact what few rankings I had with G seem to have vanished and the odd few I can find are very poor quality terms and I'll be lucky if they send 10 hits a day - but the crazy thing my page count again has increased now to 226.
So rankings vanished, although searching for my URL which is a keyword I still rank number 1 across all DC's but page count increases - madness
|Yes, if the site was a supposed "directory", and was nothing but links. Otherwise, outbounds to pertinent sites are as valuable as ever, as long as it's reasonable (i.e. no more than 10-15 total on a page, and best if it's just 2-3). |
This is interesting. On my site, I have a left menu that contains the site navigation, top 5 outbound links (traffic trades more than PR trades), then navigation to my products that I review, and below them I have a list of more outbound links, roughly 20 right now.
I wonder if that seems too spammy for G which is why my pages are being dropped, seeing as the same exact menu is on every page via an include file. Perhaps I should drop the bottom links and just add them to one page (index or links page), and keep the top 5 links carried over from page to page (for traffic purposes).
Would that help, or am I just grasping here?
I think this whole discussion can be summed up with one line.
"Google are rebuilding their entire index from scratch."
This includes everything, PR, incoming links calculations, the full monthy.
This process has just recently started (around mid March) and may take up to 6 months (if not more) to fully complete. Until then, polish your Adwords skills...you are going to need them. Forget the rest....it is all crap and G PR games.
Adwords department....here they come.
From reading MC & GG in between the lines.
vanillaice - Would that help, or am I just grasping here?"
I think you are grasping. It makes sense to have a navigational menu on each page. Smaller sites the navigational menu probably won't change but on larger sites it more than likely will change through the drill down process. (Not talking about breadcrumb navigation).
Just make sure there is enough "actual" content in the main portion of your site to decrease similarity - That goes for meta titles, descriptions, and keywords.
It is just as I said it was.. Big Daddy attacked crap backlinks and therefore if you have less backlinks you dont get deep indexed till your site earns it with quality links or site age.
Everyone who has posted on this thread has either bought some links or traded for some links or sold unrelated links on their site. If not then the quality sites that do link to you lost some of their PR power becasue they lost backlinks and therefore you lost reputation points from them. It is hitting so hard now because of the chain reaction of the death of crap backlinks either effecting you or a site(s) somewhat connected to you.
[edited by: Relevancy at 7:18 am (utc) on May 17, 2006]
We don't purchase links and do all the recip link crap. But there are plenty of junk links to our site. That just occurs naturally when you get ranked and then scraped.
"If not then the quality sites that do link to you lost some of there PR power becasue they lost backlinks and therefore you lost reputation points from them."
I think this may sum it all up right here. Could be the big answer.
I'm still trying to get my head around the comments made regarding the affiliate links does anyone have any views on this?
That I would like clarification on also.
My view on the affliate links is that if your site has nothing more then affiliate external links and product dup content then you are no more valuable then any other site with the same and therefore it goes back to backlinks and indexing. The only way for a site like that to rank is to out PR the other crap and then you are still in an up hill battle since your site does not provide anything more.
Simply put G knows what portion of your site is affliate crap and what is quality original content. IE quality .vs crap links and content ratios.
It jsut takes more to be seen. Some good stuff will get caught in this system, but it will clean up a lot more.
Our job is a hell of a lot harder now.
| This 213 message thread spans 8 pages: 213 (  2 3 4 5 6 7 8 ) > > |