Forum Moderators: open
Thanks.
The partily indexed URL has another meaning: waiting for update with crawled new data.
During the updating, google dropps some pages and then put new data in.
Obviously a dancing is taking place. My site's indexed number goes: 238->102->243->238->132->243->299->132
Every hour is different.
I saw no change in the number of pages in google or changes in SERPS, all same IP address.
Then on 11-2-04, 2533 hits by "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" IP 66.249.65.165.
now just about all of the 834 pages in google show just url, no description...nothing but URL.
I would like to think that there is something going on, I hope..because I have not seen movement in SERPS for 3 months. In sept I had 1200 pages indexed about 25% of them had full descriptions and cached. Now I'm down to 834 and no cache or description.
I had tried to find out how the bots work in one of the other threads, don't remember. It was my understanding that there are more than one type. One type just checks to make sure the page is there and grabs all links. Another indexes the page itself.
Can anyone shed any light on this for me?
Results 1 - 100 of about 6,140,000,000 for +the
last month it was
Results 1 - 100 of about 5,370,000,000 for +the
not sure if if means anything but the G home page still shows,
©2004 Google - Searching 4,285,199,774 web pages
To me Google business model is simple - kill the SERPS, boost the advertising revenue. Simple and efficient!
Now, as of today the index of my site is 6,640 pages and several thousand 'no-description, no-title' pages.
None of the pages that Google indexed on Nov 1 have appeared in the index and what's more it seems that a lot of previously indexed pages have either disappeared or became 'no-description, no-title' pages.
The pages that remained in Google index have fresh tags on them from Nov 4 and Nov 5.
Did anybody have the same occur to them? Even assuming that there is a duplicate content penalty, what happened to 100,000 pages that were crawled. Why didn't they ever make it into Google's index?
what happened to 100,000 pages that were crawled. Why didn't they ever make it into Google's index?
As far as I can see, the pages crawled by the new Mozilla Googlebot haven't made it to the current index. I had 50,000 pages crawled by it on Nov 1st, none of them are in the index. Maybe they are saving them for a new version.
On the same day I had about 3,000 pages crawled by the old bot, most of which are now in the index. On Nov 1st for a one month old site I had 18,000 pages listed, that's now rapidly increased to 65,000 so it's not the case for everybody that the page count is dropping.
Ours is a database driven coldfusion directory.
Obviously directories have lots of similar pages. However we have taken extensive steps to ensure that each page has sufficient 'unique' content (which is quite hard to do in a database driven format with 50,000-60,000 pages).
Perhaps 2 things are unrelated: the dropping of pages from Google index and the massive Google spidering (that has not been included in the index).
I also agree, the regular Google bot that spidered only very few pages of the site since Nov 1 - has all those pages already included in the index.
What really disturbes me about Google is this - if for whatever reason Googlebot does not successfully spider old pages during their deepcrawl, they just drop those pages from the index and one must wait several months to get re-crawled and re-indexed.
It seems that in their all-knowing heads, the millionaire PhDs at Google have decided that if their software does not get to the page FOR WHATEVER REASON - the page must not exist!
The only search engine to do this. Yahoo tries for the longest time, sometimes 6-12 months, before they will drop previously indexed pages from their index.
Very disappointing - it's a quality site - no doorway stuff or so... :-)
I would be happy if the bot would suck 300'000 from my page ... come on bot ... get it :-)
Regards
Roger
How do you get a brand new site with 50,000 pages of quality content? Even if you got a new businesses catalog up, that is a lot of pages. Or are these just rehashes of travel sites or directories that are a dime a hundred out there? Just the time alone to write 50,000 pages of guality unique content is stagering. That is about 200 pages written a business day for a year. Even a very active forum will take a while to get to that point, and you said this site is less than two weeks old.
That works out to 1,000 pages of content a day for 2 years (assuming 250 work days in a year).
Glad itloc is working full time on this one or I would feel badly. I can only pump out about 2 pages of content a day (part time of course...).
Back to writing!
Is anyone seeing the results of this crawling appearing in the SERPs? It's pushing me toward my bandwidth limit as it is, so was wondering if it might be worth putting a delay into the robots.txt to temper the beast...
However, referral visits to my web site have been increasing steadily for last 2 months by 8-10% each week and 95% of that is google traffic and Keyword rank is steadily rising in google. I have never paid for clicks, etc. Some of the client shopping sites I manage are doubling/tripling referral visitors--probably christmas shoppers for those but I don't know whats causing the rise in visitors to my own site but I'm enjoying it (spending 2-3 hours per day writing estimates).
Lori
Is anyone seeing the results of this crawling appearing in the SERPs?
Nope. (Although pages have been added via other crawling - but not the massive crawl that took place a few days ago)
Although it has been said that Google does not do major updates nowadays - however, when/if all this data does get added to the index it should cause a big shift!
It's pushing me toward my bandwidth limit as it is, so was wondering if it might be worth putting a delay into the robots.txt to temper the beast...
Googlebot does not honor the crawl-delay parameter -- it's a useful, but non-standard, extension to the robots.txt standard.
I emailed google and told them to back off or be banned. I told them the highest acceptable crawl rate for my site.
They replied in a day that they'd adjusted the rate for the site. They are still crawling, but at an acceptable pace.
Odd that they had to tweak their crawler by hand. I thought they prefered to do everything by algorithm.
well then, I'm sure they made the changes due to the fact that you would ban their bot from your site...That would scare me as well...
Just kidding BTW
GoogleGuy did stop by one of the posts and indicate that they did need to throttle it down a bit...for obvious resons.
This started the day after the new Gbot hammered my site for 9 hrs
Frustrating...