Forum Moderators: Robert Charlton & goodroi
Is anyone, experiencing a significant drop in pages indexed ?
Or is this discussion purely about serps and not quantity of index pages.
[edited by: tedster at 8:06 pm (utc) on Nov. 26, 2008]
Did supplementals go away completely or just behind the curtain?
If A, how does this explain the SERPs?
If B, how does this explain the SERPs?How will a non-2 partition affect the SERPs?
How can we use this information to our advantage?
*** This is conjecture, a "think piece" based on recent changes ***
The supplemental index was a database partition that functioned as a kind of junk drawer. URLs ended up in there for a host of reasons - low PR, old copy of a changed page, near-duplicates, insufficient relevance signals, new site (especially interior pages), and on an on.
My feeling is that at least some of these criteria have been separated out into dedicated partitions, one for each of the "weak url" criteria. One implication is that a new domain can now begin to rank a bit sooner than in the recent past, when it went into the supplemental junk drawer which seemed only get cleaned out on a sporadic basis. If your urls have only one marginal signal, them they might be removed from the weak lists a lot faster than if they are in many of those partitions.
Plus it seems there are some positive factor partitions. One big example would be whitenight's "ghost data-set". That apparently includes relatively high-trust domains, at least strong enough to have sitelinks. We now also have the new "sitelinks for interior directories" phenomenon, which may well be related. This all points to some kind of reverse-supplemental indexes, or indexes that work something like the old Inktomi "best-of-the-web" index used to, but at a more complex level.
Many different ghost data-sets whould mean that Google polls many internal IPs to roll up any new index for a production release. Some of these data-sets would contain only urls with the most positive signals, and others might be sets of marginal urls that will be culled from any preliminary list, such as those in the old unified supplemental index. The existence of such multiple indexes or partitions would complicate reporting and estimating the number of results - including the site: operator numbers and the WMT reports, all of which have gone quite wonky.
When it comes to positive signals that might make up any given partition, I've recently been wondering whether domains that show a strong number and variety of navigational searches (made by many users using many IP addresses for web connections) might not get a boost. If G is indeed using this as one signal, it could be a lot harder to spoof than it might seem on the surface.
The above is a mish-mash of my private musings and ideas - based to a degree on research and observation but not pinned down - please, let's not start any new mythology. My idea is to stimulate comparative observations.
[edited by: tedster at 9:10 pm (utc) on Nov. 21, 2008]
Might there not be a 'blogs & social media' partition. Kind of an anti-authority partition. I notice that some IBLs give a very strong boost, but for a very short period of time. This contrasts to most links, and particularly authority links, which give you a boost which degrades slowly over time (or at least appear to- I would imagine effects are diluted, but thats relavant here).
Although, thinking about it now (just before I press the 'submit' button) you could probably do that fine by categoriasation rather than partition- if you can geo-target, you can definately categorise. Although I know very little about infortation retrieval, I would guess that pulling data from partitions would take more resources than simply setting a tag.
I notice that some IBLs give a very strong boost, but for a very short period of time. This contrasts to most links, and particularly authority links, which give you a boost which degrades slowly over time.
Yes, or even increases over time in some cases. I've noticed over recent weeks that Googlers on on the Google Groups keep mentioning "your backlinks may not be working the way that they used to," or "we've made some changes in the way backlinks are weighted." It seems to be a currrent focus.
Although, thinking about it now (just before I press the 'submit' button) you could probably do that fine by categoriasation rather than partition- if you can geo-target, you can definately categorise.
Yes, many methods are possible. I assume that the size of the data-set may come into play, and the ability to be nimble with dial tweaks. We can only see the hazy shadow play created by Google's infrastructure, but the explosive growth of blogs and other social media would certainly be a factor they need to cope with.
If the final results begin by blending various major database partitions, that could give Google a quicker response time for queries - compared to checking a host of tags and fields for each search. certain critical intelligence could be embedded within the raw data, rather than needing to be calculated on-the-fly. Or so I'm thinking, as I watch the shadow play continue.
[edited by: tedster at 8:00 pm (utc) on Nov. 26, 2008]
When it comes to positive signals that might make up any given partition, I've recently been wondering whether domains that show a strong number and variety of navigational searches (made by many users using many IP addresses for web connections) might not get a boost. If G is indeed using this as one signal, it could be a lot harder to spoof than it might seem on the surface.
There still SO MUCH information I need to digest. Its like they opened up the whole hood of the algo in this update. But I'm liking the partition theory especially when we place the 2nd, 3rd, 4th generation seed sites(ghost datum2-4) that Cain and I saw into the mix.
Gotta let my subconscious work on the all the webby-connections I'm seeing/saw and how each node relates to the overall picture.
That particular section was pretty much a disconnect from the rest of the post and not so much about partitions. Instead it was a nearly random thought that navigational searches (putting example.com in the search box) might be a positive signal that a site is acquiring user interest and trust.
I'm seeing supplementals increase and decrease a lot faster than before. Seems like a huge speed up of the system which decides which URLs go supplemental and which not.
I also see the same bouncing/dancing on one site I monitor.
It could be deliberate or incidental (derivative from some other process). It could be based on a redetermination of value, i.e., what is good enough to not be supplemental. My numbers were increasing when the pages got longer.
I also noticed a change in the number of results for the second page of results versus the first. (They were close but it seemed weird there was any disparity.)
I'm also wondering if it's a byproduct of more frequent freshness checks by Google. G's long-term goal is to become more and more accurate (index content as soon as it's created. :)
p/g
One of the things I noticed from Google's "bad data import" on October 31 was that the "ghost data set" looked like the ONLY place that Google stored those domain roots. When Google's internal polling of that IP address failed, then the domain roots were COMPLETELY MISSING from the final data set that they use for the SERPs.
Another way to say this is that Google uses a "disjoint set" data structure for their database partitions - that is, the elements of their various lists do not overlap. Only later in the process of creating SERPs is there a union of these disjoint sets.
With the old Supplemental Index structure, Google seemed to use two (or more) "find" operations in sequence, one for the main index and then a second that ran over the Supplemental Index. The second find operation only kicked in when the results of the first were inadequate in some way.
Creating a union of the various data-set partitions would allow a unified find operation, and therefore faster results. This lines up with Google's blog comment when they dropped the Supplemental tags back in August 2007:
Supplemental Results are fresher and more comprehensive than ever. We're also working towards showing more Supplemental Results by ensuring that every query is able to search the supplemental index.Webmaster Central Blog [googlewebmastercentral.blogspot.com]
Some conjecture - Google's changing back end may also be part of their move toward machine learning or AI. The traditional model for any search engine is to measure all manner of data points and then come up with a recipe that combines them for final ranking.
For machine learning, Google could also develop, in parallel to the production algo, a predictive algorithm. It would begin by knowing only the data points and not the current recipe for combining them. It would then attempt to generate or predict how the urls should be ranked. Major discrepancies between the generative algo and the production algo would then be flagged for further investigation.
The new backend structures we see hinted at today may well be a step toward giving those alternate, predictive, or generative algorithms better access to the data points. Or to say it another way, Google may also work to reverse engineer their own SERPs - just to see what the stats that they develop can show them.
Again, this is think-tank stuff, not a definitive analysis. Some of it seems to contradict the Google patent on database partitions, so I'm not at all convinced I've got it straight.
That could be whats causing the yo-yo. The page in question could equally well be placed in 2 (or potentially more) partions, so 'toggles' between sets. Belonging to one partition folds you in to SERPS in one way, and another somewhere else.
Some people have got out the yo-yo by adding high-PR links, other with high-relavancy links. I would guess that whatever sub-prime partition you straddle would have a different escape vector.
That could be whats causing the yo-yo.
Google may also work to reverse engineer their own SERPs
Can feel my own "Eureka" moment coming for using these 2-4th seed datasets to a distinct advantage.
Still percolating...
...patent on database partitions...
One of my clients dropped off the map on October 31, and hasn't come back. I don't think this a result of any nefarious or black hat activities as I have worked with this client for several years and they are very serious about staying on the up-and-up with the search engines. They have been in operation since 1997 and get over 100,000 unique visitors a day.
I have filed three inclusion requests now, on the last one I went in to some detail, because while our main property is no where to be found some of our minor properties (that point to sub-directories ourdomain.com/subject) are doing fine. I am concerned might appear to someone examining my request may think that are back.
If this were a penalty, wouldn't everything on this clinets domain be out of the SERPs?
Anything to do with Google search changes is definately ON TOPIC.
However, to your query.
Almost everyone who dropped in the Halloween update is back. G missed out a partition (hence the convo above) as per Matt Cutts (via Tedster, but we'll trust him).
Anyone not in it has underlying issues that need resolving. These would be non-specific issues.
I'm trying to understand your post.
When you say 'properties' are you referring to domains, subdomains or directories?
When you say 'point' do you mean they link, redirect or just reference the main domain?
Example:
Client 1
Widgets - #1
Widget - #130
- But an internal content page we have is ranking 30 for the singular term.
Client 2
Personalized Widgets - #1
Personalized Widget - #150
Thoughts?
Almost everyone who dropped in the Halloween update is back. G missed out a partition (hence the convo above) as per Matt Cutts (via Tedster, but we'll trust him).
I wish I could say the same...
Anyone not in it has underlying issues that need resolving. These would be non-specific issues.
I really appreciate any help you can offer. There certainly could be issues, but at this point I feel we are grasping at straws. We had a similar problem last year around Oct 18(?) that caused us to examine and re-architect a few things. We have tried to nail down any duplicate content problems and looked hard at redirects and any potential IP issues to no avail.
I'm trying to understand your post.
When you say 'properties' are you referring to domains, sub domains or directories?
When you say 'point' do you mean they link, redirect or just reference the main domain?
We had four separate domains that were shopping/price comparison-type tools for specific categories of products. The traffic to these were minimal, with a large portion of the traffic coming from our main domain. So we consolidated, rolling these price tools into sub-directories on the primary domain. This was all done 4 or so years ago. The old domain names had a fair amount of incoming links so these still 301 to the appropriate sub directory.
The SERP traffic to these subdirectories has been unaffected by the update. And searches for those domain names is no. 1 for those terms. However we are no where to be found when searching for either the name of the primary domain, or the title of a recent article title within that domain (even when coupled with the domain name).
We have an older similar site that is on a separate IP block and it has also been unaffected. It has a very similar layout, which would seem to rule out architectural issues. I fear we have been flagged as a duplicate site (because a lot of sites scrape our content), or that a competitor has gotten us banned (Over several years I have received a number of Google Alerts, where our site was inexplicably linked from a wide range of dubious properties).
Our website, sizable traffic, years of online presense, has been hit hard in Google.
Traffic from G is down about 70 percent or more.
We don't know what is causing. I look at this forum and see many people are in the same situation.
SERPS have probably changed.
But what is it that changed? Where should the webmaster look? What does not to be changed on our end?
Any explanation would be greatly appreciated. Please let's keep this conversation on the topic.
I fear we have been flagged as a duplicate site (because a lot of sites scrape our content), or that a competitor has gotten us banned (Over several years I have received a number of Google Alerts, where our site was inexplicably linked from a wide range of dubious properties).
Also, I'd triple check the 301s to make sure they ARE 301s and they resolve in 1 step. If they only exist to 301 trafic to the main site, I wouldn't expect them to show in SERPs at all
@vetofunk.
Assuming you have a balanced blend of singular and plural instances of widget(s) on your site, it sounds like a filter. Is the anchor text on your backlinks heavily biased towards the SINGULAR? If so, it sounds like an OOP. Also, check for unintentional (or indeed intentional!) keyword stuffing in internal links and/or images.
various posts about problems
This may sound harsh, but i'm cranky, so oh well.
The simple answer is:
- Find out who's ranking and why.
- Figure out what they are doing or not doing that you are/aren't
- Decide to test SOMETHING and start testing it.
- See if it works or doesn't.
- Rinse and repeat if it does.
- Pick something ELSE to TEST if it doesn't.
It's been almost a month now. It's done.
It's not changing again until after New Years and even then YOUR site may still be stuck.
So figure it out.
The algo changed but it didn't change THAT much.
If your site got hit, then it was hit just like any algo update with the normal fall out.
So back to step #1
FIGURE OUT WHO'S RANKING AND WHY.
:)
(ps. if "spam" is ranking then figure out WHY instead of following non-existent Google "DON'TS" -
the SERPS are the ONLY thing you should listen to)
The update rolled out pretty much EXACTLY as it was supposed to... otherwise I couldn't have predicted it!
Yes, the GHOST DATACENTER(TM) may have been "slow" in it's rollout but the REST of the update went just fine.
Once again, MC came on to do a little CYA and FUD-spreading and now people are thinking there's something WRONG with the update.
There isn't...
Even IF there is, you need to figure out why the algo isn't "properly ranking" your site and FORCE it to rank it where it "should" be.
For WHATEVER reason, Google doesn't like your site anymore.
Send the algo some internet roses, some poems, and woo it back to liking you.
The BEST way to do this is to figure out who Google likes NOW and act, dress, and talk like them...
i could go on a 2 day rant about 301s.
Mainly about not using them, but i won't.
Again, not to sound harsh, but this is an excellent time to create content and/or start promoting on your backup domains.
(You DO have backup domains, don't you?)
while you'll figure out what's going on with your main site.
Same rules still apply tho. Gotta know WHY the other sites are ranking so you can duplicate it and then start TESTING.
301s...ugh...i'm not going there.
And if the changes happened weeks ago, why the slamp yesterday?
You mean the "now normal" Google Dance shift yesterday?!
Wow, gotta give that some time before freaking out.
To me it looked like normal Google dance fluctuations, but if your site has gone belly up, I would give it a week or two(unlucky with Thanksgiving this week) before making changes.
Just do the normal background checks to make sure links, pages, servers, etc are all in place...
The site is nearly 10 years old, has 100,000 unique visitors and over a million page views a day. Our repeat visit rate is extremely high with many visitors hitting the site three or more times a day, so moving to a new domain isn't really an option.
The traffic (Google organic) virtually ceased on or about October 31. We are talking about less than 10 percent of our traffic, but it is where virtually 100% of our new readers come from. So the challenge is why, there have been virtually no changes to the site in over a month, but the drop seems to coincide with with the October changes.
Seems like, if this were an intentional slap (and not a glitch) that these smaller sites (that redirect to folders of the main site) would also disappear from the SERPs.
I asked if you ALREADY had domains waiting in the wings FOR SITUATIONS LIKE THIS.
I can't even comprehend why you wouldn't, but that's besides the point now.
If this were an intentional slap (and not a glitch) that these smaller sites (that redirect to folders of the main site) would also disappear from the SERPs.
<small 301 rant>
It's been known for YEARS that Goog has problems with redirects of any type. Don't use them. Aside from the fact that NOT using 301s has huge SEO benefits if you understand how Goog ranks pages.
<end 301 rant>