| This 200 message thread spans 7 pages: < < 200 ( 1 2 3 4 5  7 ) > > || |
|Some big observations on dropped pages|
I have been trying to figure our why my site dropped from 57,000 pages down to only 700. Today I noticed a huge pattern, and barring something major, I believe it is the reason for the dropped pages. First, I noticed that all pages three levels deep and higher are indexed. Any pages indexed lower than that are externally linked in some way.
How I noticed this, is that we have a huge directory of content arranged alphabetically with each letter being a seperate page a.html for example. From my front page I have a.html linked, and then all the content links on that page. The content that starts with a letter 'a' is all indexed. The pages like b.html and c.html are also indexed, but the individual content pages aren't.
So, what this means is that Google is giving an overall site PR which tells it how many levels down it will index. In my limited research it seems that a site with a front page of PR 5 will get indexed three levels down, and a site of PR 6 will get indexed four levels down. Those below PR 5 I have looked at are barely getting spidered.
When doing this, keep in mind that your front page counts as a level. So if you are only PR 5 it seems like if you have a huge directory don't split it up into sections, just have a huge page with the links to it all. This of course totally hoses usability but you will get spidered.
Also, externally linked pages will get spidered, as a few of the pages listed under the other letters are indexed, as they are linked in blogs and other sites. This is across the board what is happening on my site and the others I have looked at.
Count your levels getting spidered and you will notice how deep they are going. For me, three levels and that is it except for the externally linked individual pages I have seen.
[edited by: tedster at 6:16 pm (utc) on May 22, 2006]
[edit reason] formatting [/edit]
I'm going to preface this post by saying that some of the complaints I'm reading since the Big Daddy update really don't have any merit.
I've spot checked at least 30 sites since the BD update. A few of those from members here at WebmasterWorld who were kind enough to include either their URI or their email address which led me to sites that have very little ground to stand on.
What have I seen? Sites with very low PR (2-3) who have primary content at levels 2, 3 and even 4+ in some instances. After reviewing the architecture of those sites, I can clearly see why those pages are no longer a priority for crawling.
Sites with every known piece of metadata to mankind. While I personally don't think this would matter, if I were a search quality engineer (SQE), I'd have to take that into consideration when looking at all the different factors involved in determining the quality of a page.
Up until BD and even previous updates, Google would index just about anything. Just because Googlebot is indexing doesn't mean those pages are going to perform. In fact, Googlebot will index for months before pages start to appear for their targeted keyword phrases. That time factor between when a new page goes up and when it starts to pull it's weight is relative to the overall PageRank of the site (and other determining factors). Remember, Toolbar PR is for public consumption. There is more to that little green bar than meets the eye. ;)
What I'm seeing are sites that have unruly URI structures that Googlebot was indexing. Why and how it was able to index some of the URI structures I'm seeing remains to be questioned. I personally believe it was all part of developing the largest index of documents first and then going back and continually reindexing and purging based on the crawl criteria.
I'm seeing URI structures with multiple hyphens, spaces, tildes, you name it. There are some creative naming conventions taking place. A site with a URI structure like this can't expect to have a quality indexing, can it?
Especially when it has 1,000+ pages? Where's the hierarchy? Flat sites with a large number of pages may not perform as well as a structured hierarchical site.
Nor would one expect a PR3 site with a URI structure like this to have a quality indexing (note that the primary product pages are at the 4th level)...
How about this for an HTML Validation result. Can anyone explain what may happen with a page that has these errors?
Line 14, character 6:
^Error: element HEAD not allowed here; check which elements this element may be contained within
Line 18, character 7:
^Error: required attribute TYPE not specified
Line 71, character 34:
^Error: element BODY not allowed here; check which elements this element may be contained within
Line 261, character 7:
^Error: HEAD not finished but containing element ended
Line 261, character 7:
^Error: missing a required sub-element of HTML
After reading about this update and the fallout that has occurred, I'd have to say that some of us were lucky those pages were indexed and pulling results and we can be thankful that we enjoyed those positions for whatever time period.
Things have definitely changed. If what I'm seeing is an indication of things to come, we're going to be seeing a daily plethora of topics related to a continuing pattern of deindexing.
Google CEO admits, "We have a huge machine crisis"
Did you catch the above topic posted on 2006/05/03? Google is having a problem with storage. I could only assume that if storage is an issue, then they need to purge some of the data so that they can implement a solution. So, what data should they purge?
BidDaddy's crawl priority changed and this was one of the major effects of the update. It appears that part of that crawl priority entails the purging of documents (either temporarily or permanently) to allow their current systems some breathing room.
I think there is a massive machine crisis. We just dropped from 90k pages two days ago to 15k today and dropping hourly. They seem to be just chucking pages. Luckily however, our competitor that hasn't been hit at all just dropped from 600k pages to 50k today and dropping even more rapidly than us. The things you say pageone might have merit, but things are fluctuating so rapidly I have no idea what is going to happen or what to do. At least my competitor is getting slammed too.
Have you gotten good daily crawls from googlebot in the last few days?
I think there is a storage issue as well. Even though matt cutts said on his blog that he does not see an issue I think he is wrong. To many pages being dropped....
Yes Robink, and they have crawled the pages that have been dropped too. Just removed from the index.
IMO it is time to stop using this pile of trash and instead use something that works better. MSN and Yahoo are a good choice instead of this rubbish search engine.
Convince you friends and family to do that same. Also .. to all click fraudsters out there ... use anything but google adds to check out your competitors.
If they cant get it working it should not be being used.
We lost about 5 pages today they were not inlcuded in site search, but if I searched for them by keywords they were in the index with a May 28th cache date.
We are slowly seeing our pages come back and not have them drop again, in about the last week or so.
I convinced my family to use search.msn.com
All the pages that have been dropped, what was the page rank for them? 1,2,3,4
Has anyone had a 4 or above drop?
"IMO it is time to stop using this pile of trash and instead use something that works better. MSN and Yahoo are a good choice instead of this rubbish search engine. "
Couldn't agree more. I have been telling people that since the troubles began (around Sept last year).
I live in Spain however and its difficult to convince people to change.
Do you know what they call Google here in Spain?
Madre mia..what are we going to do to convince people?
Hmmm...yes, convince all your friends to use msn. I rank top in my sector there too, so that would be nice.
I've always tokk great care to design neat and tidy websites, all linked carefully, and easy to navigate for visitors. Nodirectory is more than 3 levels down.
I've seen pages drop out apart from deep pages in similar positions that are linked to from external sites. Plus - some of my pages do have slightly 'unorigional' content - (articcles etc) so I think it's more of a content match issue.
I'm being penalised more for duplicated content - not directory set-up, I think (if I'm honest). Although I think I present the duplicate content in a more stylish way - more easy to read and certainly all VERY on topic with my resource site (of which I'm 1 in Y! + MSN for v. competitive keywords). The majority of my sie is unique content but perhaps not unique ideas!
The answer - check your content. I'm sure Google will be back (just get more links from sites, forums, blogs and directories to your content and give google the chance to see you are a useful and ethical publisher).
Again I htink 'content is king'. I think everyone might be getting in a tiz over 'a glitch' or slight change in the way Google indexes.
Certainly I've lost 50% of Google traffic - and some adsense cash - but we've all known about unique (if there is such a thing) content on the web for a while and the improtance Google puts on it.
With a PR + backlink update due any day I think Google isnt taking any chances - they've dumped everything they think is unnecessary to apply the latest PR update with the new datacentres (and of course make space.
Still - pretty disappointed about the hundred or so pages that have been dropped, although i am seeing slight recovery.
And they might use some of the adsense cash they've saved over the last month to pay for their current leagal problems with Adwords!
>unique content on the web for a while and the improtance Google puts on it.
Sorry doesnt wash with me. My sites are 90% - 95% unique. All added content is unique. Serps say that I all of this is 100% useless as an investment.
msn recognises new and unique as does yahoo.
Google as a research or up to date info tool is a terrible failure.
|Sorry doesnt wash with me. My sites are 90% - 95% unique. All added content is unique. |
Same here. I review products, and each review is 400-500 words long, completely written by me, yet I can't get listed. Not only that, the pages I did have listed are dropping.
Yet, spammers own the top 10 for the keyword i'm reviewing.
Btw sorry for the OT post, I did the email notification option earlier in this thread. I would like to remove it now, but I can't seem to edit my old posts like WebmasterWorld suggests I do to 'uncheck' the box. Is there any other way to remove e-mail notifications? Thanks
I am ranking in Google @ #1 in Google for two keyterms on a 50 page site in the finance sector with pages I developed on Thursday last and indexed tthe next day.
Your unique content - is it unique? Can it be found in any other place in Google?
This is just my observation on one site.
For me, the products I review are not unique. I mean for one product if you do a search for it, 1 million results come up.
That being said, the site I use to review with is unique. It's not like I grab an image of the product, throw the price at them, grab a clip from someone elses review and say 'here, buy this'.
Before the BD launch, I *never* had a problem getting listed, and even ranking fairly well (top 50).
I built the site *for* the surfer, but with the SE's in mind. For example instead of using a large font, i'll take advantage of it and make sure I use <h1> command. I know SE's like that.
If I were searching for this product, my site is exactly the type of site i'd want to find, yet 0 new pages have been added since BD, and it's been crawled multiple times.
That being said, I made 2 changes around that time...
1) I added a few link trades, and put them on my side navigation bar (similar to what you see on blogs). They appeared on every page. I've since removed all but 10, and put the rest on a links page. The 10 I have listed, 5 are for traffic trades, and 5 are all from very relevant sites, and have PR of 3+
Before BD, I don't think I had any outbound links, and probably just as few inbound and still not a problem getting listed.
2) I made the site more of a template format using asp. Instead of taking a complete page and doing 'save as' and changing stuff around based on the new product, I included a template control which to the surfer the page looks completely different from completely different image header every page to different text. Scanning the html, it will look similar, however not far off from what it was when I just did a 'save as'.
I'm not sure if either of those had an impact, or BD just doesn't like my sites any longer.
Sandpetra, just because your sites aren't affected doesn't mean everyone else is doing something wrong. That kind of thinking doesn't help anyone here and shows being narrow minded about what affects this update can have on sites. If you have a high pr site and an old site (how old is your url that ranks high so quickly?)then getting indexed is easy. A good example I have is for a highly searched term that the domain was registered in 1997 but has been dormant for over two years. The site only has two pages yet ranks #5 for this term. Incoming links on msn and yahoo only show 12. This page would get indexed very easily just because of age.
We are talking about helping new domains and those still building pr. PR is hard to build when no one can find your site to do "natural linking" back to you. I personally am with a startup that like all startups is not rolling in the dough. We can't afford to just sit around for a year waiting to get indexed. Finding faster solutions to at least alleviate the pain is needed just as much as the same old stuff we get put back to us. I could sum up the advice of most posters I have read here in one paragraph. The underlying attitude is "Hey I am cool and run a big pr content rich site and not affected so you must be a spammy unorganized scraper. Stop complaining and build up your site for a few years." That doesn't help anyone.
I just want someone to explain to me how the drastic changes in pages getting indexed can change so rapidly now and for good sites. Like I said earlier, my competitor went from 650,000 pages indexed to under 50k in one day (they bought exclusive rights to distribute formerly printed materials electronically, they are not duplicates)! We went from 57k to 700 then a week later up to 90k and now range from 10k up to 40k depending on the datacenter. Remember that none of my results return as supplementals. We have never had that problem, they just drop. I have never seen movement like this. The algo is dropping pages and reindexing seemingly at random for those sites not deemed "trusted" which in my experience is determined almost 50% by site age. Which is why I never see old site owners complain. In fact, find a pre 2000-2002 domain that has these problems and I will be amazed.
I have a page rank site 5 after 6-7 months. All internal pages are 3+4+5 - two are even page rank 6, if you can work that out.
Plenty of offsite linking etc.
Sorry - I'm not trying to be smart - I was only offering my opinion on this thread (which is fast turning out to be like a conspiracy theory) - and i believe the thread started about a PR 5 site - not a newbie site.
I make a clear distinction between generating immediate traffic and seo. If anything can be learned from this thread it's not to rely on Google especially if your starting out - like I am as of two months ago.
To help new starts - forget about google for 6 months. optimise pages for msn + yahoo first for the pitiful amount of traffic that'll bring.
Build with css, Keyowrd in Title, H1 Tags bold and italic once. Natural talk on site. Cross link your important pages and spread that pr. Submit,it to directories and visit blogs and forums and take part in discussions about your sector (keep an eye on those High Pr pages in forums). Becaome (or build seperately a resource site (even using free articles) on your product and link it to yours. Use adsens ads to help see if your pages are properly optimised and targetting your term.
Work late every night, pray to your god, dont neglect your dog and hug your 'bird' every now and then.
Please accept my appologies if i upset you - if not shove it. It's been a long day. (Good luck with the new biz, by the way).
Where's that ciggy....?
PS - I've never built scraper sites - the cheek!
tsm26 Hi I have 6 sites, 2 of them from the year 2000 and they have drop like anything, in one site of 300 pages I have 43 indexed and the other one is just gone, no spam, no duplicate content, no directory, but just gone.
I started new domains with different content to the other sites, and from 130 pages I have droped to 14, I don't see and end to sites and pages dropping, MSN has all my sites listed, and yahoo is as bad as google.
Maybe the future of the net is in MSN (Microsoft) hands?
|Finding faster solutions to at least alleviate the pain is needed just as much as the same old stuff we get put back to us. |
The only really "quick solution" is to do PPC/CPC/PFI. Anything else probably comes with a level of risk.
|I just want someone to explain to me how the drastic changes in pages getting indexed can change so rapidly now and for good sites. Like I said earlier, my competitor went from 650,000 pages indexed to under 50k in one day (they bought exclusive rights to distribute formerly printed materials electronically, they are not duplicates)! |
Take a break. Get away from the index for a week. You're too involved with watching the daily grind and it's going to get to you quickly. There's nothing you can do. Complaining doesn't help either. I'd be looking for ways to make up for that lost traffic.
|We went from 57k to 700 then a week later up to 90k and now range from 10k up to 40k depending on the datacenter. |
Datacenter? Can you believe I've never watched any of the datacenters? Really. What purpose does it serve? You have no control over it and it's a waste of your valuable time. Oh wait, I remember checking back when we had www and www2. ;)
|Remember that none of my results return as supplementals. We have never had that problem, they just drop. I have never seen movement like this. |
I can truly understand the panic mode you may be in. But again, I have to reiterate, you have absolutely no control over it. Checking datacenters and watching the SERPs on a daily basis is like watching paint dry and, it never dries. ;)
|The algo is dropping pages and reindexing seemingly at random for those sites not deemed "trusted" which in my experience is determined almost 50% by site age. Which is why I never see old site owners complain. In fact, find a pre 2000-2002 domain that has these problems and I will be amazed. |
You won't find many. I'd go as far as saying if you had a domain pre 2005 June you're probably in great shape right now. And then there are plenty of sites that succeed in shorter time periods.
The algo drops and reindexes pages each and every second of the day. You can't make any real determinations while micromanaging things day in and day out. There is a much bigger picture here and at some point, you've got to step back and take a look. :)
My sites were supplemental until the sitemap problems were fixed. After that traffic returned, pages were indexed and then just yesterday the sites are back in supplemental hell, pages are being dropped again and traffic does a nosedive.
Its pretty ridiculous and laughable at this point.
I'm not glad, i'd rather be getting free SE traffic, but i'm more relieved 95% of my income is coming from paid SE traffic.
I would be crushed right now if I relied on SEO because I have absolutely no clue how to get pages listed right now. I'm completely stumped.
Speaking of paid SE traffic, I do need to start spreading, I don't like that large % from one source either. It hurt me for awhile when adwords made all funky changes.
pageoneresults, we use ppc and other ways of getting traffic to our main site area and it works well. I am not losing sleep over this, just looking at this as a hobby as it interests me. I have friends working on alternatives to Google and things so I have an interest in looking at the serps and seeing what is happening. Looking at indexing trends has helped me in the past when my site was only getting 100-200 pages indexed. I got us solidly indexed up to 700 by making daily changes and fixing problems, so to get these back I am doing the same thing. I also report directly to guys on revenue, and when we go from over 100 dollars a day in ppc revenue to 15 dollars they want to know why ....anyway there you go.
Also, our problem is not getting totally deindexed, just getting the lower levels. We have been blogged about and had columns with links in the wall street journal and other national respected sites, but having a domain registered in April 2005 doesn't help us. I am probably just gonna not expect traffic on these deep articles until we get more links or until Google changes something again.
It can be done! This is what I did to correct the slide. I hope it helps someone. In effect we stopped looking on the forums and took a close look at our sites - then we made subtle changes.
One of our main sites got hit we dropped from 20k to about 500 pages - we made some structural linking changes but only minor, I re-wrote the whole index page to make sure it was fresh and unique. Then I've added a few more content pages linked off the index page and put a blog in the back end and started the blog and ping thing. I've been using the blog to headline the new pages.
The result? Things seem to have settled, the new pages are all in and we are up about 60 pages including pages at least 4 levels deep - we are only PR 3 on index but have our PR spread right down to level 4 pages (PR2) I have not done a link trade for at least 12 months. Doing a site:www.mysite.com "my keyword" reveals sometimes 20k in pages and we are getting the traffic again from our initernal pages which is great. I can't say what we did was 100% the answer but it's worked for us so far, if we tumble over the edge again I will be back ..
Please be so kind as to clarify this "put a blog in the back end". Does it mean that you put it at on of the lower levels? Or what?
Thank you for the time.
I just set it up in it's own directory. There are no top level links running into it yet. Then I just added some content and hit ping o matic. I can't say yet if it has had an impact, it's just one of the things we did.
Well I had it fixed..or so I thought..
Made a 1500 url site map that was crawled and all pages went back to index..Now all is back to normal, missing many pages again..well it's been nice to be back in index for a few days.
So pages will be dropped due to lack of PR, if they are too many levels downs etc? My PR 5 site with thousands of natural backlinks is losing pages everyday in google - and are only 2 clicks away.
(I stopped getting upset at Google long ago..)
I have found at another forum a very interesting entry
"Google - the worst search engine. (1 of 22)
In recent search engine comparison tests, Google consistently puts relevant search results low in page rankings, hiding them on page 3 or 4 of the results, while all the other search engines correctly put those same results on page 1. There are a lot of unhappy webmasters out there.
Webmaster are being told that they need to tweak their web sites to conform to Google's requirements (and we can only guess what those might be). Google wants to put the burden and the blame on the webmasters. But, since none of the other search engines has a problem finding our web sites, ... it is clearly not our problem. Only Google has problems finding web sites. So, it is Google's problem. It is simply not finding many perfectly legitimate web sites, ... or it is ranking them so low that they might as well be invisible. There's no sign that Google is trying to change that situation for the better.
I believe it is time to dump Google. Google is only big because we keep voting for it with our keyboards. It has grown too big for its own good.
Absolute power corrupts absolutely. It's time to stop that. I have started advising all my friends and associates to use other search engines. Any other search engine is better! Here's why.
Google has become the worst search engine (only matched by AOL in incompetence). They've completely messed up their search engine. At best, in an overzealous attempt to stop people from abusing the system, they have made their page ranking so strict that they are hurting innocent people. At worst, they have sold out and become EVIL (despite their motto to "not be evil"), giving unfair advantages in page ranking to those who pay them.
If you don't believe Google is the worst, try a few comparative searches of your own. Looks for some web site you already know, but look for it not by name but using keywords, such as "Boston hockey club", or "Atlanta art gallery", or something like that. Preferably, look for a small non-profit organization, a social club, or a small business, ... someone who cannot afford to hire computer consultants to constantly keep up with Google's ever-changing whims. You will find that all other search engines (Yahoo, MSN, AltaVista, Lycos, A9, AlltheWeb, SearchSight, etc.) consistently give better results than Google.
So, why stay with Google? Why support a company that provides such crappy results? It's time to vote with your keyboard. Move on to a better search engine.
Posted by: edata@... Date: 05/31/06
|There are a lot of unhappy webmasters out there. |
Couldn't that simply mean that Webmasters and SEOs have less influence on Google's search results than they've had in the past? Remember, too, that search results are a zero-sum game, and the losers are likely to be more vocal than the winners--regardless of the actual quality of the search results (which may be better, worse, or simply different from what they were a few months ago).
pageoneresults - you are on the money as always :) I truly enjoy reading your postings.
| This 200 message thread spans 7 pages: < < 200 ( 1 2 3 4 5  7 ) > > |