Forum Moderators: open
On 9-11 September googlebot crawled nearly 200.000 different pages, but those pages does NOT show on Google results... and googlebot is visiting the site unfrequently (some hits on index nearly every day... one day a thousand of hits...)
Site is PR6 (surely PR7 next update) with a lot of inbound links (and growing daily), most pointing to index, but also to internal pages (some from high PR sites). We've noticed Google showing more pages on other sites, even counting more backlinks since then.
It's been more than a month and we are waiting for the pages to show... Does someone know of something similar? Could we expect Google to show these pages soon?
[edited by: kaijohannkursch at 5:04 pm (utc) on Oct. 12, 2003]
A normal search for your 'double keyword' returns you at position 5.
Everything looks normal to me, it's just that your pages are not keyword specific enough to rank high in keyword searches.
Remember that inbound linktext is very important and I guess everyody is linking to you with your domainname as linktext :(
Everything looks normal to me, it's just that your pages are not keyword specific enough to rank high in keyword searches.
Have you read my first message?
I was not asking about visits or rankings... I was asking about crawled pages not showing in results (around 200.000 pages since more than a month).
I know our currently pages showing in results, I said in the first message, I was not asking about it.
A very large site in terms of pages is bbc.co.uk - a G search for "bbc" on that domain yields 3,1M pages. In three months or so you've published 6% of this number of pages. Guess: Perhaps there is some kind of limit as to how many pages that will be indexed at once.
Actually, i'm surprised that the Gbot spidered all these pages (that's very deep), given that your site is not older, but it's probably your high PR that did it.
/claus
Google might be updating continuously now but they're in no hurry to index deep pages on a relatively new site.
No problem if Google takes more time to crawl/index our site, I don't complain about that, but... why has Google crawled 200,000 pages if it does not show them in results?
I don't understand googlebot doing hundreds of hits per hour for no reason at all... the question is should we be worried about that? is it usual that thousands of pages are crawled but in "limbo" since more than a month?
Not so much as to lose sleep over it. As you have already seen a 33% increase to 8K pages, you should be allright. The Googlebot is probably wondering what hit it and trying to get a hold of the situation. Most likely it's still wondering if it should be worried, sampling a little here and there, trying to make it's mind up.
The deep scan was probably retrieving all your links, and as you're running a directory, that would be millions (theoretically max 20M, likely 2-5M) - it has to digest those links and that takes a while. Imho, you have simply moved too fast.
/claus
[edited by: claus at 9:32 pm (utc) on Oct. 12, 2003]
Here comes my story. We launched last year a travel portal with travel content of 2500 pages (all pages PR5), an ODP directory 350000 pages, and a XML feed to Amazon with another 400,000 pages. We have added as well a live weather-feed to 9000 cities in the world (all PHP mode rewrite in plain html and googlebot loves those pages and we have on average 30% of those pages in the index)
On average google crawls about 150-200,000 pages per month and we have on average about 100,000 pages in google. We serve 9 languages giving us about 10,000 visitors per day by a database powered with 14500 hotels x 10 languages makes about 145,000 pages.
Last month we reached a critcal stage and hit for the first time 150gb per month in traffic. Our server crashed and we moved to a new provider (DNS was lost for almost 7 days as we had a heated debate with our former hosting provider that just couldn't serve the bandwidth).
We have moved already to a new provider based in Canada but they somehow had problems with a dns server and google was still indexing the pages but no more the main index ( I am sure it was related to the google cache but who knows ).
A strange pattern I noticed was that the problem started with the shopping mall (the problem started before our server crashed) of the amazon feed.
googlebot knows that we are an affiliate but they seldom put more then 10,000 XML pages on the google index. They have crawled over 200,000 pages from our Amazon feed but they don't show them. It has been going on for the past 5 months and I don't believe that they will add more then 10,000 pages. (BTW: We have about 2500 inbound links from other websites incl. many sites with PR5 and PR6).
I guess you have to buildup more quality inbound links from external sites but it could be that the site is to new. A Dmoz link is an important factor as well. (We have 9 links on Dmoz but not to the shoppingmall pages. We have in addition 6 links on all Yahoo domains but again not to the shopping mall).
A couple of month ago, it was just so easy to get more pages on google, as they added about 2000 pages per day but we haven't got any new pages for the past 7 weeks added in google.
My feeling is that google doesn't seem to like shopping mall pages that much, if they are pulled from amazon or from any affiliate site.
I maybe wrong but I still have 14500 hotel pages in google, 3000 weather pages, 5000 Dmoz directory pages but the shopping pages seem to have been removed by google from 25,000 pages to 2360 pages in a matter of weeks.
What is strange however is that the 2360 pages of the mall I have in google are all linked from different country guides.
Example:
Our Travel guides (25 countries) has been online for the past 16 months and have many inbound links to those pages from many sites. Every Amazon product page on our server linking directly from those travel guides are still showing up on google but the other pages have been completly removed from level 2 onwards.
So my guess is because the site is new, google will take months before they will be shown throughout the google index or google just don't like them. 5 weeks ago google had 35 million amazon pages in here index and today it is about 25 million.
We still generate per day about 3000-4000 visitors on average and we still make about 100 dollars in comission per day but amazon revenues went down by 85% and we now are slowing down the whole shopping operation as it cloaks to much bandwidth for just to little revenues.
Please send me your website as a sticky email.
I was talking about the time passed from where the pages are crawled until where they are displayed in results... Never saw so long time (more than a month) and that's what worries me.
I was talking about the time passed from where the pages are crawled until where they are displayed in results... Never saw so long time (more than a month) and that's what worries me.
From start to finish, i.e. registration of domain name, setting up dns records to point to the server hosting the site, setup hosting of course, produce the html pages and make then neat and tide to conform to guidelines some SE's like you to abide by.
10-DAYS - I know this as I have done it for one of our clients sites. This occured about a month after the last dance...so your guess is as good as mine why the site was indexed so fast and been found in SERPs for the keywros it was optimised for, plus it tops other sites long established in the SE.
Another example...but of different nature...
After a massive lal/downturn in backlinks showing in google after the last major re-index I know see all the backlinks showing back up again for a particular site I take-care-off.
Originally it had 300+ pages indexed, then it went to 200 something then to 19 and then to nada just a couple of weeks ago.
It now stands at 272 for the site atm.
And I thought the hypothesis that index.html pages will not be counted as backlins, is now gone out the window.
Also you should reconsider removing adsense from your site and add real content. Many of your categories in your directory has zero entries and only a banner to adsense.
So your post is a little bit confusing. Many of the googlebot crawlers you get is actually the mediaserver from google adsense and not the googlebot deep crawler as you serve on every single page adsense.
Also your PR6 for your mainpage I wouldn't give much credit that google indexes all your pages because google thinks that blabla.com/?c=82-4 is your mainpage alsom if it is not.
What you should do is get some PR5 or PR6 inbound links and setup a link exchange. On my site I have a link exhange with 970 travel websites that point to my site and we always investigate each site that gets linked back from our site.
Also you should setup all your links such as blabla.com/World/ and not as blabla.com/?c=63-1
but since google shows that you have not one incoming link, my guess is your links will be dropped until Level 2 which means you will have 31 links left in the index.
Good luck!
My response:
No need to worry. Google is still indexing on my sites for the past 1 1/2 years links and they still don't show up.
The crawler reads the page and follows links. It doesn't mean they will show up at all in the index one day.
We run ODP of Dmoz as well with 450,000 pages and about 35,000 pages getting crawled on average per month. They never show up.
As it looks google indexed all pages up to level 3. This is great actually. You should work on moving level 4 into level 3. This should give you at least another 10-20000 pages indexed.
As a last closing note don't advertise on your site at all about Search engine optimization and don't offer the script with a price of 20USD. It could improve a lot your rating on google.
This means not one website links to your site
The directory has PR6 and 1190 backlinks. Those pointing to index, we have dozens of backlinks pointing to internal pages also.
The links grow daily (several hundreds a week), as we offer a free script with links pointing to the index and the script itself.
Many of the googlebot crawlers you get is actually the mediaserver from google adsense and not the googlebot deep crawler as you serve on every single page adsense
Not. We difference googlebot and media bot. Googlebot did 200,000 hits to different pages, media bot does a lot more.
Also your PR6 for your mainpage I wouldn't give much credit that google indexes all your pages
Google has indexed thousands of pages, and there is no reson to not continue doing that. In fact it crawled the mentioned 200,000 pages (and that's because I posted, we are waiting these pages to show)
What you should do is get some PR5 or PR6 inbound links
We have links from PR6-7 sites...
Only" a month to show crawled pages?
I added 5,000 keyword optimized static html pages to my site during the period between June & August. The pages were not spidered until September and are now only beginning to show in searches.
Before Dominic/Esmerelda pages with a PR4 or higher were spidered monthly and updated monthly. Pages with a PR less than 4 were spidered and updated quarterly. I know this because I put an SSI date call in the footer of my pages and check the cached copy to see when the page was spidered.
I'm not saying that this schedule has changed since the new "rolling" update or that it hasn't.
Google has been very slow to bring in regular and timely updates to backlinks and they are the backbone of PR. Check the PR of your new pages. If the PR is less than 4 you may have to wait until all the backlinks are calculated and updated before your new pages become part of the rolling update schedule.
Some really helpful posts here, but you're knocking them all back!
Sorry, but I did not find the posts helpful until the last messages (not only helpless... they were not answering what I asked at all).
But the last messages help me to understand what might be happening.
Thanks to Asinah and specially Arnett.
Regards.
Googlebot 'fetches' your pages, it doesn't know what those pages are yet but took them anyway and stored in a raw master index. Which in turn...
Each URL have to wait their turn to be analyzed, graded, PRed, and ranked accordingly. In short, the actual indexing process.
If these are new URLs, never before in the Google index then the URLs have to run the whole gamut of the so called 100 factors and this could take a long time to process.
Be patience, little by little the processed URLs will be included in the various data centers.
If there are no duplicate content then consider yourself lucky that Google actually crawled 200,000 pages.
Cheers
Pages don't need to be PRed to be indexed. In fact, there are a lot of pages indexed though without PR; As a matter of fact not only without PR (something Google recalculates approximately each month), but also without backlinks being counted (in the last months this occurs more frequently than the PR update).
Let's go further. There are indexed pages Google does not know even their content (those displaying only the url as title and no content at all. Sure you have realized them)
About the 100 factors... they are calculated REAL TIME when you search any term given. Obviously Google does not calculate these factors for each of the millions of different search terms a single page may target (think about all possible combinations).
If Google would need to do that kind of operations over the crawled pages, it would have displayed them gradually. There is no need to release them in bulk.
Onedumbear, thanks for your message. It's pleasant to know this does not happen only to us.