Forum Moderators: Robert Charlton & goodroi
I have an issue with Google no indexing some of the newly added pages even after several months.
Let me give a bit of context:
The site does exist for about 6 years now, for the french version of it. It is 100% clean, no strange tricks to be well positionned.
The site is displaying AdSense ads.
It has several tausends of original content pages.
- I haved added a year ago a set of pages in english (translation from the french ones, with a link from the french page to its english version). They did get indexed within weeks.
I have added 4 months ago a new set of english pages and they have still not been indexed.
We were adverstising heavily (at least for our scale) on adwords for those newly published pages and did stop this seeing that we were not even entering in the regular index.
- I have also been refreshing the french content with a major update of content. I did take the opportunity of this rework to split some of the pages that were far too long for the web. I did it like this:
From:
orig.php
TO:
orig.php
orig-1.php
orig-2.php
keeping the original file name active, just containing the beginning of the text, the remaing parts being moved to -1 and -2 files.
In some cases, Google did index the change nicely (for the first updates) and the recent ones are not indexed, google still returning orig.php for a searched word appearing only now in orig-1.php.
In the mean time we kept publishing news (about 10 to 15 pages per month) that got and still get indexed within max a couple of weeks.
All pages not indexed by Google do show up in MSN and Yahoo - meaning that they are reachable, I guess.
I have setup a sitemap containing specifically the pages that were not indexed, still no change 3 weeks after that.
We were thinking about moving to Google search for website, but it does not make any sense if we can not even get indexed entirely.
Now, I don't know how to move further. What would you recommend we do?
I just need a good, deep and exhaustiv indexing of the site.
Thanks in advance for your help.
But do check your site navigation, especially if you use any js links.
Xebnu is your friend.
Do you mean the pages are not in Google - or just that they aren't doing well?
How do you know?
The only RELIABLE test for pages in Google is to search for a "unique about-ten-word phrase from the page in quotes". Try it.
Thanks for the feedback.
As for the site navigation, there is no JS links. Only plain HTML. And since Yahoo and MSN did find their way, I think it means the site is crawlable.
I mean that the pages are not in the index. Not that they are showing up to far away in the search results.
The way I test this is by searching in Google a specific word (the site is full or very specialized words) together with site:mysite.com.
The summary page does show up, with a link to the relevant page, but the relevant page is not there.
I did also the "ten words sentence between quotes" search. For the pages the above test shows as not in the index, no luck. For the pages in the index, it does return a unique page - mine.
As far as meta title, they are all different. And I generate the description to the same as the title, just removing the name of the site from it. But this is like this for the complete site - for both indexed pages and not indexed pages, and not recent. Since the beginning I have no better description tags than that.
I don't know what to do. I start wondering if there would not be something wrong on google side. I read the Adsense blog where the crowd is promoting the setup of Adsense for site search. How can I rely on Google to provide the search for my site if it is not even indexed?
Thanks again for your answers!
But I'll give a try to with tool you recommend. I'll do this during the week-end.
I'm 99% sure everything is okay. The fact that MSN and Yahoo did find their way is kind of confirming this.
Do I miss something obvious?
Thanks.
I have an issue with Google no indexing
It's always hard to tell what people mean when they say that.
Traditionally, the primary factor affecting frequency of crawl has been Page Rank. I've not seen any evidence so far that any other variable has become more prominent. If your URL is the one-and-only outbound link from a PR8 page, you'll by gum be indexed pronto, IME.
And as I said, there is no broken link on the site, the pages are linked to from other old pages of the site.
To be more specific, there are some pages summarizing the various sections on the site. That page has been indexed a long time ago, the cached version is recent enough to show the links to some sections that do not appear in Google - for months!
I'm not yet trying to be well positioned, I just want my pages to be in the index. I guess it is a reasonable step.
If google offers the site search tool for webmasters to use on their site, I guess it means they consider that one day or the other they'll get all the site pages in their index.
I never saw things like this on my site until a year ago, or so.
Am I allowed to show specific URLs here in the forums? I could show you examples of what I'm talking about.
Thanks!
I thought all was lost until I picked up the latest expert ebook entitled 'How to get listed and seen in Google in less than 48 months!'
No rhyme or eason to anything at the moment other than being able to conclude that there is probably more wrong with the Google algorithm than is wrong with the majority of websites.
Am I allowed to show specific URLs here in the forums? I could show you examples of what I'm talking about.
Sorry, no. See the Forum Charter [webmasterworld.com].
Nevertheless, you are describing a familiar effect so we trust that you are describing the situation accurately. A "fully indexed" domain of any substantial size is not an easy thing to find, in my experience.
Click paths to the new pages mean a lot in the current crawling pattern. So if those single pages you split up are only accessible by links from the original url (orig.php), then it may be a long time before the new, deeper pages are indexed.
Are you seeing any googlebot requests for any of your missing pages?
I had a site with a similar problem for several months. Over a thousand pages were missing in Google that were indexed just fine in Yahoo. Suddenly, a few weeks ago, the whole lot finally appeared in Google's index and everything seems normal now, but it took eight months. Ugh.
If you've already checked the issues that people have given good advice about, keep working on normal healthy site development and Google will catch up ... e-v-e-n-t-u-a-l-l-y.
Hang in there!
Are you seeing any googlebot requests for any of your missing pages?
I did a bit of digging in the log files and I found this:
On August the 18th, I can see that a given file (thefileImconsidering.php) has be grabbed by Google:
crawl-66-249-65-48.googlebot.com www.mysite.com - [18/Aug/2006:12:00:18 +0200] "GET /thefileImconsidering.php HTTP/1.1" 200 2667 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
And when I search the most obvious keyword from this page with site:mysite.com, the page doesn't show up.
I would thus say that yes, I do see some googlebot requests for missing pages.
Now, I can not say for sure if some missing pages are never accessed.
Coming back to the clicks path, I thought about this, but since G was offering sitemaps, I did offer a sitemap with exactly those files I miss in the index, thinking that this could be one of the reasons for sitemaps to exist. No luck.
I'm resisting the temptation of linking those files from the home page, because I would not do it for any other reason than that. Am I right? Would this help? Am I right to resist?
Thanks again!
make sure your title and meta desc. are not the same. if you are too lazy to change them all, just get rid of meta descriptions.
Missing meta descriptions cause almost the same problems as duplicate meta descriptions.
Take care to add one for every page of the site.
Any one can confirm that no description is bad as well? I have hundreeds of pages and adding a description is going to be a pain with no direct added value for the users...
I just checked and a part of the site that does get indexed in a human compatible time frame (max a couple of weeks) - I'm referring to the magazine section of the site - does not have any description at all.
Am I taking big risks if I get rid of the description tags when they are the same as the Title with the site name removed?
Thanks again for your support!
What it looks like is that Google may need a minimum character count for the snippet, and if the meta description is too short, they add the first body text they can find to the end of the meta to form a long enough snippet. In many cases this means the menu or breadcrumb trail -- or perhaps some text in the header/banner area.
At any rate, it's often the same content on every page and even with the short-but-unique meta description tag, the entire snippet seems to enter into dupe land. Rather irritating, especially for a site that has gone to the trouble to dynamically generate some sort of unique meta description (and it surely can be trouble.)
I'm resisting the temptation of linking those files from the home page, because I would not do it for any other reason than that.
For many websites, there's nothing unreasonable about having a little section on the home page that refers to the newest and most recently updated pages.
In the particular case where the website contains a News section, this is a very common practice (though it makes perfect sense as service for visitors for many types of websites).
I'd like to add another phenomenon to your list, g1smd -- one that I just noticed today. Very short meta descriptions (say 3 to 5 words) even if unique, can lead to similar troubles (Omitted Results and sometimes Supplemental Results).
I just checked what happens for me:
The snippets shown by Google are extracted from the body of the page - exactly the way I want them.
I have the following setup:
<title> is "SiteName - Here the title"
<description> is "Here is the title"
Note here that description is part of the title, just removing the site name. And it is rather short. Between 2 and 5 words, roughly.
I think for the time being I'll concentrate on adding some links in the home page to the recently updated pages and to all pages added in English.
Of course, I already have the home page filled with every news we publish. I just did not use the home page for updates of the core content of the site but I'll start doing it now.
Thanks.
Until now I'm using quasi exclusively Google for my searches (related to webmaster stuff or not). I think I'll more systematically give a try to the others as well, because I obviously can't rely on Google to provide me some reliable results to the search requests.
If I'm sitting around spending my time working on unique descriptions for each page that are between X and Y characters long...who is that for? The average user that just loads up a page could care less about the meta description. Sure I do, I view the source of pages all the time, but I'm probably not your average user.
Does this stance on descriptions and supplimental results signal a change in Google's core philosophy? If so should it be reflected in their webmaster guidelines? It only seems fair that if a publisher doesn't include meta descriptions, or has duplicate ones, they should be forwarned that their pages are doomed to the supplimental index for the semi-annual crawl schedule.
That will be correct if it is the Supplemental Result that you are looking at.
One type of Supplemental Result shows the content of the page as it was before the last edit.
There are several recent threads here with a lot more information about this. Start with: [webmasterworld.com...]
Taking care of the meta description is "designing for users". Users want to know what is going to be on the page that they are about to visit, before they get there.
I think this best sums it up: [webmasterworld.com...]
Taking care of the meta description is "designing for users". Users want to know what is going to be on the page that they are about to visit, before they get there.
Not true at all when the pages are shown in the regular SERPs. The snippet is usually something figured out by the algo showing some if not all of the words searched for. I usually only see the "description" when just doing a site: search, which is definately not normal user behavior.
My point, that you completely missed, is that its fine for google to use the descriptions to help describe the page, but should not be used to decide whether or not it should be indexed, supplimental, or not indexed. Descriptions, keywords, the price of rice, all have nothing with what the user sees, and just like they don't want hidden words, redirects, or bought links as they are not natural, but manipulated by the webmaster.
This is a followup message for this topic:
[webmasterworld.com...]
Result of the experiment: Nada. Nothing. Rien.
- The pages added (new content) at the end of september are still not indexed - and they are all listed with direct links from the main page - they are all listed in a sitemap, cleanly uploaded, with no error, downloaded by Google 1 hour ago again, for no result.
- The pages I updated 6 months ago are not indexed either. Even after I added a list of links from the home page following the recommendations from the above mentioned thread.
- The english version pages added 8 months ago are not indexed either - and I have a sitemap for them since early september. This sitemap is retrieved by Google very often. I wonder why Google keeps retrieving it, it is not doing anything with it:-(
I had a look at the stats from googlebot : 174 pages crawled daily in average. More than enough to retrieve the newly added pages and to update in a matter of weeks the updated ones.
Why crawling the site if it is to not do anything with those data?
I'm spending a very decent amount of USD (at our scale) to advertise (adwords) the pages that Google does not care adding in the index.
And I received an email from adWords suggesting that if I'd increase the limit for my campaigns, I would get more traffic.
I want my pages to be included in the index first! Then I'll see if I keep or even increase advertising.
Again, all this is new since end 2005. The site is up since 2000 and we never had issues like this before.
I have seen hundreeds of messages on [groups.google.com...]
complaining that Google is not indexing the sites. Has anyone seen any type of answers from Google on this?
The whole thing is discouraging.