| This 109 message thread spans 4 pages: < < 109 ( 1 2 3  ) || |
|Google not indexing new sites as it used to|
All my previous sites have been indexed in couple of days time by building backlinks from different older sites, social bookmarking, submitting sitemap, etc., you know the drill.
The last site of mine that was created two weeks ago still doesn't come up with any result in Google when doing site:example.com. When I log into webmaster tools I see that my sitemap info says how urls have been submitted, almost all of them (25), but next to indexed urls is one big 0.
Anyone else experiencing the same thing?
SEOPTI: Yes, us too - crawl rates at the moment are about 5-10% of Dec 2009 numbers.
Welcome to the forums, verycheeky.
I never thought of combining the "past 24 hours" search option with the site: operator. I've just tried it an yes, for some sites I do see "0". But for other sites - including webmasterworld.com - I see positive numbers but they don't make a lot of sense. They're way too low and don't line up with the crawling and indexing I can see IS going on through other means.
There are several threads around here that mention how buggy the site: operator is lately. My guess on this issue is that the results are just not representing what you hope to see. That said, yes crawling and indexing still seems to be slowed down - at least for sites that are not on the hot "fresh" or "social" topics.
Could we be looking at a more permanent indexing setting for Google where " fresh / hot " = index fast , at the expense of all other sites.
4 months is enough time to call this a permanent shift Y/N ?
Well, it hasn't been at this low rate for the full 4 months. It has been a steady decline for 4 months. I worry that you might be right... But my brain insists it doesn't make sense and this is just temporary...
p.s. if fresh/hot were getting the lion's share of crawling, and this is the new reality, then the SERPs in the top 10 would be really shuffling around as the latest hot/fresh pages got crawled and indexed. But my observations, which may differ from everyone else's, are that I am seeing increasingly static SERPs, fairly positively correlated to the slowdown in crawling.
|then the SERPs in the top 10 would be really shuffling around |
Not necessarily. Indexing a page (a visit by the bot) is not the same as returning it in the SERPS. All the twitter posts etc are kept out of the ordinary SERPS in many (most?) cases.
You can easily have stale "normal SERPS" at the same time as having very updated "fresh SERPS" (when you toggle this search on in the left side of the Google SERPS)
I think this corresponds nicely with a "multiple purpose Google", or a "multiple index Google" (here, index meaning "database"), or a "multiple algo Google" ...or whatever we should name the beast
|Indexing a page (a visit by the bot) is not the same as returning it in the SERPS. |
Sorry for being a semantic 'nut' on this, but Google's results are the index... If a page is 'indexed' it is in the results. If a page is 'spidered' it is in the underlying data set, but may or may not be in the index (results).
They do not index all the pages they spider, but the results people see are the index... I posted this in another thread, and think it 'sheds some light' on the terminology, so if you want to know why results would be called the index, think database. They 'index' what they return as results. If it's in the index it's in the results some where. If it's not in the index it's not shown, but may be in the underlying data from spidering.
|Google not indexing new sites as it used to |
I just built a new site, nothing fancy or anything and google picked it up right away and it is ranking very well. Since it is a work in progress I am adding a page or two weekly on the side and google seems to be way faster at including these pages and getting them to rank in the serps the next day. To me it seems like the days of the old sandbox are gone.
The site is listed in google local so that may have something to do with it as there are only a couple other links pointing to the site.
|Sorry for being a semantic 'nut' on this, but Google's results are the index |
Please excuse me for being blunt, but there is no other option: What you are saying is wrong. That "results = index" is simply not the case.
Google's index is the collection of all pages that have been indexed. The results (IOW / a.k.a. "the SERPs") are that particular subset of the index that are returned as a result of a specific query.
It is not custom on this board or anywhere else in SEO circles to confuse the SERPS with the whole index. That is because that would be misleading.
I thank you for your kind offer to shed some light on terminology, but I must reject it, as I am already very familiar with the way Google operates (and I have been for so years). But thank you for the thought.
Is anyone seeing exceptions to the indexing rate , where pages are being indexed fast ? It might provide a clue to any new behaviours, i was thinking .
Or could it be continued Caffiene related issues that are still being tweaked in the background?
You're right Clause. Thanks for setting me straight.
I guess I was confused by the use of the word index in some of these GoogleGuy posts. (Emphasis Mine)
Personally, I find this first one very confusing or misleading according to your position on the use of the word index.
|Users can always search our full index, but sometimes we can serve up even fresher pages as an extra nicety. :) |
Google listing disappeared after 2 weeks [webmasterworld.com]
|I'd expect that things will be back to their normal level of everflux by New Orleans. But we do have incremental indexing after all, so it's normal to expect a certain amount of change to the index every day or so (aka everflux). |
In fact, everflux is a pretty good analogy. If you go back to summer 2003, update Fritz was the beginning of the transition from a monthly update to an incremental index. It caused a lot of comments, because plenty of people were happy with an index that only changed once a month.
GoogleGuy's posts [webmasterworld.com]
Pages Dropping Out of Big Daddy Index [webmasterworld.com]
Why are they talking about pages dropping out of the index if the index is not what's used to generate the results? Shouldn't the thread be about pages dropping out of the results, not the index if it hasn't ever been understood or accepted by members here the results are served from the index? I don't get it, if the index is not what people are searching and seeing in the results, then how would someone think their pages are not in the index by searching and seeing only the results?
|I'm happy to confirm it's a new index. |
I've been poking through the new index myself. I found one link from April, but almost all the links I found were newer.
Let's see, what else? I guess this index answers many of the questions about deepbot vs. freshbot.
Google June 2003 : Update Esmeralda [webmasterworld.com]
|The difference between the "deep crawl" and the "fresh crawl" was much more apparent this time last year when we were only pushing a new deep index about once a month. |
Date in Google Search Results [webmasterworld.com]
|SJ index is not old Critter, the SJ index isn't an older index. You can verify that by doing a topical query such as SARS. The results are more fresh in SJ than they are in our regular index. |
googleguy msg #146 May 5
Update Dominic - Part 2 [webmasterworld.com]
Also, would you please clarify for me if Google uses a heuristic or an algorithm since most people call it an algorithm, even though I think it's a heuristic and there is a distinct difference in what each does?
We are getting a bit off-topic here, I think. I hope the mods will excuse us. Otherwise they will probably move this to a new thread, I suppose.
GoogleGuy generally doesn't speak in exact or well defined terms on technical (search) issues. His statements regarding Google technology are often vaguely worded or "fuzzy". IMHO for this reason his statements can often be misleading, even if it is not intentional.
As I interpret the quotes you posted, the term "Index" is used as a term for the collection of indexed pages, although not the entire collection: only that part of Google's entire collection that forms the base from which the results of a search query are pulled. Meaning that "the index" (mentioned in the quotes you posted) is only that part of the "total index" that constitutes "the active/current index" (for lack of better words). The total collection of pages that Google has is larger than the "active index" as it may hold several historical versions of each page as well as duplicate pages and whatnot.
However, if your really want to get down to the nitty-gritty of it, it may be more appropriate speak of several indexes in stead of one. Eg. Google News stories may not be stored in / served from the same database as twitter posts, which again, are probably not stored the same way/place as addresses from maps, images and movies, or those shopping links that also appear in the SERPS from time to time.
Although there are multiple sources (indexes / databases), information from more than one source is pulled to the result pages every time a query is posted at the Google homepage. So, what the user experiences is one page filled with information (a SERP), but on the back end that information is drawn from several different sources / indexes.
As for "the algo" - IMHO, "the Algo" is a misnomer. It is used as a language construct to cover something like "the way that Google ranks search results, all included". That is a very broad task. And a very diversified task, with several completely different sub-tasks.
Some of these sub-tasks are perfomed at the time of query, some before that. Also, a query in "Images" or "News" or "You Tube" will have different sub-tasks than a query at "Web Search". So, the visual image that the term "the algo" conveys is wrong. It is not as if it's one single computer program that executes and produces the result. It is a series of "sieves" (for lack of a better word) that over time, and collectively, produces a search result.
AFAIK, some of these "sieves" are heuristics, and some of them are algorithms. Some even heuristics-based algorithms ;) And all of those are subject to parameters of various kinds that may be tweaked - either automatically based on rule sets, calculations, or manually. And I do believe that there are options for "manual adjustments" as well: Even if they dislike manual methods you just can't program everything always.
To the original poster: I'm sorry to divert from the topic this much. I hope that the post may be of some use anyway.
Yeah, sorry for the OT discussion and I'll leave it here too, because I think we're saying essentially the same thing with different words in the two paragraphs quoted below:
|As I interpret the quotes you posted, the term "Index" is used as a term for the collection of indexed pages, although not the entire collection: only that part of Google's entire collection that forms the base from which the results of a search query are pulled. Meaning that "the index" (mentioned in the quotes you posted) is only that part of the "total index" that constitutes "the active/current index" (for lack of better words). The total collection of pages that Google has is larger than the "active index" as it may hold several historical versions of each page as well as duplicate pages and whatnot. |
I think I'm simply drawing a distinction (based on the quotes I read) between the index (GoogleGuy constantly refers to, and very strongly implies the results are directly derived from) and what you are calling (basically) 'the index but not the total index'... I would actually say it's much easier to understand and much less confusing to refer to what GoogleGuy is calling the index and what you are referring to 'as a term for the collection of the indexed pages, although not the entire collection' as the index and the rest of the information they have as 'the underlying data from spidering'.
Personally, I think what you say here: "Meaning that "the index" (mentioned in the quotes you posted) is only that part of the "total index" that constitutes "the active/current index" (for lack of better words)." is much better described here: (I did miss a word in the original though, and added some other 'clarification' here.)
|They do not index all the pages they spider, but the results people see are [in] the index... I posted this in another thread, and think it 'sheds some light' on the terminology, so if you want to know why results would [could] be called the index, think database. They 'index' what they return as results. If it's in the index it's in the results some where. If it's not in the index it's not shown, but may be in the underlying data from spidering. |
What you are referring to as 'total index' I'm referring to as 'underlying data from spidering', because I think it's the 'better words' you were lacking as far as a descriptive and understandable picture goes.
And, personally, I have referred to it as the preceding from time to time through two user names and 5+ years here...
Anyway, I think we're saying 6 of one or half-dozen-of-the-other and also hope it will help people with some understanding... And, IMO when G gets to the 'one right answer' they'll have an algorithm, but as long as they return 'multiple possible answers' they technically have a heuristic, regardless of what number of actions they use to generate the resultset... :)
i met this once,i get stuck and can release.hate this.
More on the topic of this discussion...
I can say they are spidering new sites rapidly... I registered a domain at 1am and it was spidered today. No links to it. I don't have the toolbar installed. Private registration. ccTLD. I don't know how they knew it was even registered, but it was spidered toady. I can't comment on whether it's in the index (the one results are derived from) because it has a noindex tag.
One more note on my position re 'index' terminology is the noindex directive, and the fact pages with noindex can pass link weight, so IMO they are part of the larger underlying dataset, but not the index. Sorry for getting all OT again in the fine print. I do hope the mods leave our little discussion too though, because I think the terminology clarification (or organizational clarification anyway) may lend itself to a better understanding of the 'indexing process' and why pages may not appear in the results as quickly some times even though they have been spidered.
I would be so much better at not posting on a topic again if I could type what I meant... In the preceding this: 'the index but not the total index' should read this: 'total index'
The distinction is between what GoogleGuy calls the 'index' and what Claus is calling the 'total index'. I was referring to the 'index' as the same thing GoogleGuy does, and what Claus is referring to as the 'total index' I was referring to as 'the larger underlying data from spidering' (or something to that effect.) Sorry for feeling the need to post on this OT topic again.
Somehow I think a clarification of terminology is ON topic for this thread, as the title is "Google not indexing new sites..."
Googlebot is clearly crawling new sites. And doing so fast. However, crawling a page means that this page is also being indexed - meaning "added to the great collection of pages that Google has".
But what the original poster is referring to seems to be the appearance of new pages/sites in the result pages. Which is slow.
So, Google adds pages to the collection pretty fast. Then some time passes (and that seems to be too long time). And after that time the pages may start to appear in the search result pages (SERPS).
So, actually the real problem seems to be not slow indexing, but slow ranking of new pages/sites.
TheMadScientist: Google runs DNS services and is also a domain registrar. So, the moment a domain is purchased or activated Google knows it.
Must be from the DNS or the Registrar I registered it with sharing info with them... (The registrar possibly sharing information was not something I'd thought of previously, and it may simply be coming from their DNS service, but the registrar providing info would not totally surprise me anymore either.) I'm not sure if I can mention the registrar specifically here, but there is a connection, anyway, G does not offer the tld I registered through their service, so I don't see how they would know from only checking their systems as a registrar.
I actually think it's good we had our discussion too, because regardless of the exact terminology used at least people may start to draw a distinction in what they are talking about rather than using 'index' with only 'contextual' meanings people would really have to understand the situation to 'get' the difference in and which may 'escape' or be 'missed' by some readers and could leave them scratching their heads. IOW: I think it was good for us to 'have it out a bit' and draw a distinction for some readers... And, besides it was fun trying to figure out who was right when we were basically saying the same thing, only different. lol. :)
< moved from another location >
I have the feeling that an important change in Google since 2010 is the time needed to crawl a site, to index the changes made to it and update the SERPS accordingly.
From my experience, until Dec.2009, you could start SEO with the on-site optimization (titles, alt tags, headers, keyword density, sitemaps, etc) and within 2 weeks time see the effect of your work in SERPS. That was the case especially for non-competitive keywords. For competitive ones the time needed could be up to 1 month or 45 days but definitely not more than that. I'm not saying that the site would always hit top positions but at least i was sure that Google indexed the changes and placed the page to the appropriate position.
Nowadays it seems that 2 weeks for non competitive keywords became 3 or 4 months and for competitive keywords I'm not sure yet :-).
Some facts to support this are:
- Google does not display the right page of the site in the serps. I mean instead of the Home page (index.html) or the specific product page(example.com/product.php), it shows the contact.html page or the print version of a page. This lasts for some considerable time and then switches to the expected page (home or product)
- During that time, i can get different pages of the same site rank in different positions, for example services.html ranks on 2nd page and faq.html on 3rd.
- After 3 months i can still see some of the pages on Google cache not updated.
- I have seen this happening to more than 1 site.
Does anyone else share the same experience?
[edited by: Robert_Charlton at 3:51 pm (utc) on May 7, 2010]
So many of goodbyedee's questions... at least those regarding the speed of indexing and indexing vs ranking... apply to this discussion that I've moved his post here.
Regarding rankings themselves, I suggest also looking at the current MAYDAY rankings update discussion [webmasterworld.com].
< Also of note is this most recent discussion - it's almost "breaking" news: New googlebot crawlrate is unusually fast [webmasterworld.com] >
[edited by: tedster at 4:25 pm (utc) on May 7, 2010]
| This 109 message thread spans 4 pages: < < 109 ( 1 2 3  ) |