|Traffic just took a nose dive to 0|
Results 1 - 21 of about 5,510,000 from site:mydomain.com
I have a site that is about 6 years old. It does have over 5 milion dynamically generated pages. It was a "build it 'cause I can" kind of site when it was first built and I didn't really pay attention to it for the first few years. A few years ago I tweaked it a little and the result was that Google indexed close to 100K pages the resulting traffic was great.
Sometime within the last few weeks my Google traffic went down to 0 and when I did a site: search on my domain I noticed that it only had 21 pages available in the result but it now reports over 5 milion pages indexed. I liked it much better when I had 100k pages indexed but they were all available in the result.
Question: What is goin on here? Should I expect Google to shortly be listing my 5 milion pages? Or will the 5 milion+ drop down to 21?
Not enough info to know.
But from the sound of it, a bit of housekeeping would do you and the web a favour.
Check your navigation; use robots.txt to avoid the same pages being listed several times, etc.
A feature of dynamic sites that I've noticed before is that once a filter is triggered, items that were borderline that never hurt before, often start hurting.
If you can find a fault and fix it, great. If not, fix everything that MIGHT need fixing!
BTW - check your server; if there's been several or long downtimes, it may just be that.
You posted a couple weeks ago that Goo contacted you about adsense violations. Is this the same site you are now talking about?
Maybe a connection?
|Is this the same site you are now talking about? |
No it's not the same site.
|if there's been several or long downtimes |
|If you can find a fault and fix it, great. |
I'm of the philosophy of "if it aint broke... don't fix it!" However, when it comes to an ever mutating algo... you might be right. Even if it hasn't been broke for years, G may have changed some variable in the last update. I'm a little scared to mess with it though... what if G isnt done updating? It has been nice to have 100k pages indexed and the traffic that came with that.
UPDATE: yesterday my site was back on G again with 111K pages.... this morning its back down to 21 pages.
>> over 5 milion dynamically generated pages.
without knowing more about the site: I think this is the key problem. Google might not like the "5 million" and the "dynamically generated" part.
I understand the phylosophy of the log tail, but
5,000,000 dynamic pages,, where does all the content come from
I was reading a lot off post from people talking about their 100,000 500,000 1,000,000 etc etc page sites a while ago, an it always get me wondering,,
Will google/msn/yahoo always tolerate such sites or are they
going to develop algo to explicitly exclude such sites
Don't get me wrong, I am all for people doing their own thing, i just wonder at the dynamics off it all
Some sour grapes on my part too :-)
In my sector, a couple of giant sites have so many pages indexed, an all pages ranking so heavily, dat my sites can hardly breathe
So, I am not exactly rooting for these million pagers :-)
Okay, i surrender, i going to learn how to make a 10 billion page site
|Google might not like the "5 million" and the "dynamically generated" part. |
Dynamically generated means that it is database driven. More info on that: It is a multi-parent-child database. Countries X States X Cities X Institutions in the widget industry X other info X even more info = 5 milion plus pages.
I agree, but this site has had between 87K to 100K+ pages indexed for years now. I never thought it would be fully indexed, nor even get 100K pages indexed when I first built it... but my point is that it was for years.
Also the result from "site:mydomain.com" has me a little puzzled -- "Results 1 - 21 of about 5,510,000 from mydomain.com for . (0.25 seconds) " G reports my site having 5,510,000 pages but even with supplemental results past the 21, I can only get to 31 pages. When G reported I had 100K pages, I could get to ALL 100K pages. Why only 31 now?
Assuming that Google hasn't labeled yours as a scrap site: how different are those pages from each other? If all you have is "WE have many (products) for (city-county-state)" repeated a few times on the page, I think you have a dupe problem. After all, it's hard to have "unique" info for 5+ million pages.
|how different are those pages from each other? ... it's hard to have "unique" info for 5+ million pages. |
Yes it is hard and no I dont have unique info on all pages...YET! As time goes by users register for this free sevice and add their unique info. Thousands already have and until now, I was getting more new members every month. The growth rate went from awsome to nil.
If the site is dynamic, six years old, and er, a little bloated, there are several Google changes that may very well be impacting.
In general, if it ain't broke don't fix it is fine - so long as you monitor closely.
But in your case, it is broke (though you need to check on more than one datacenter).
You really do need to reread G's guidelines, you may spot a simple item that can be fixed.
Also, get Matt Cutts on your 'esential reading' list. With a site your size (and I'm assuming it has an income to match) you won't find a better time investment in these changing times.
Good Luck! :)
So out of 5 million pages only "thousands" have content - and the rest are all the same?
"thousands" is less then 100,000.
100,000/5,000,000 = 2%
So you are saying that at the most only 2% of you pages might have unique content...
This sounds like the typical "duplicate content" problems that have been going on for a while now. You'll need to do some work on the site itself to guide Google as to what to index and what not to index.
Make sure that every piece of content has one canonical URL used to access it and keep the spiders OUT of all the alternative URLs for that same content. Make sure all your title and meta description tags are unique.
Check several "GFE" Google datacentres directly, especially gv and eh and you'll see some very different results I expect.
Forgot to ask.
Is there a "click here to see omitted results" link after the last result?
There may, or may not be. Depending on the answer to that, the cause may be slightly different, and the end result will be very different.
If the link is there, Google probably thinks that you have duplicate content and you need to fix it up.
If the link is not there, then Google is probaly de-indexing your entire site; and that may be a spam penalty or something else instead.
|Make sure that every piece of content has one canonical URL |
|Make sure all your title and meta description tags are unique. |
Done those years ago which is what I believe got my first 100k pages indexed.
|Is there a "click here to see omitted results" link after the last result? |
Yes there is. Yesterday I had 21 results and 10 supplemental. Today I have 25 results and 10 supplemental. Today Google aslo reports that it now has 6,320,000 pages of my site indexed. Thats almost a milion pages more than yesterday.
At 4 pages more a day in the result, G should have all my pages in the result within 1,580,000 days... give or take. ;)
It's a "duplicate content" issue. You still have something that needs fixing.
The pages are either too similar, or you have the same content at multiple URLs (different formats, different dynamic parameters, different domains, non-www vs. www, http vs. https, etc).
Since this started, its been a coin toss whether I get a handful of traffic or not on any given day. Some google servers do and some don't.
I have been working on the duplicate content side of things. I've added a robot.txt to guide google away from pages that are useless to index. I've also added a page-topical news feed to help distinguish each page as an individual.
Today, Google reports 1.2 milion pages and now shows 200 results (209 with ommited results)
Am I moving in the right direction?
Maybe, but it will likely be 2007 before you see any real gains of any sort... will probably take at least one PR and backlink update, as well as a data refresh or two...
Oh, and I hope that your filename was actually robots.txt instead...