Forum Moderators: open
----
I'm starting this thread because another member suggested such would be a good idea because the main Google update thread is cluttered with posts like "OMG, I've been dropped in the new index!" and "Yippee, I'm now #1 on a key SERP". This thread is ONLY for serious, generic discussion of changes that you are observing with the new algo in this update. As in things like "Looks to me like PR is less important this month, and anchor text of inbound links counts more.", etc. How your site is doing has no relevance here unless you can explain why you think so in terms of a general algo update.
Some consistency then....
If anyone can be bothered reading the early mega-threads, it is apparent that two things are happening:
a) They are rolling out a new ALGO... not new DATA (ie: recent crawl results)
b) The data will be applied shortly
I would therefore suggest that this is not a new index as we are accustomed to seeing. It is simply an adjusted algo applied to an older database (and as several people have remarked, probably a pre-April dbase with some additions and perhaps using an older DMOZ for example).
It's a strange approach, as a number of authoritative sites (in industry not the Google sense) have been lost. Yes, they may well re-appear with the normal re-index in a week or whatever, but it's a strange way of doing things: rolling out new software with testbed data?
The bottom line for webmasters?
1) Expect this lower grade Google to remain in place for at least a few days
2) The missing links MAY then return, and with them an improvement in results and result quality
3) The last crawl results will also be applied shortly.
Why? Anyone's guess. I would say they are in too much of a rush and should have rolled out the data update before the software update. This sort of situation helps no-one.
I had thought some people reported new sites they put up just showing now in -sj? If so, this is not all old data.
I had thought some people reported new sites they put up just showing now in -sj? If so, this is not all old data.
At -sj I find at least a few new pages of mine that don't show up at google.com.
IMO it's not all old data. This was confirmed (as far as I understood) by GoogleGuy in one of those pre-update threads yesterday.
That doesn't fully align with
"Algo/filter changes. These are being played out on SJ, and of course: if you are testing a change you use older tried and tested data, which is what Google has done (with a dash of Fresh). Hence all the missing links, strangely dropping sites, etc. "
which was accepted by GG as reasonably fair (see: [webmasterworld.com...]
The SJ (propogating) dbase seems to old, with some new stuff and as you say, plenty of links missing (eg: as you would get if you used a very old DMOZ, Yahoo, etc).
The latter is the strange one, as at the very least I would have expected the last DMOZ/etc to be used if testing.
There are quite a few things here that simply do not make sense.
I'm seeing travel destinations bringing up incorrect cities, a search for motorcycle parts brought up a bed and breakfast, a search for a cleansing product brought up a software site...
There are three people here on PCs checking phrases on all 8 datacenters and 3 servers and we're seeing so many seemingly random phrase changes that it's hard for us to keep to make any sense out of what is going on. A site that had 3k backlinks is showing 300, some that had a few hundred are showing no backlinks and some sites in the top ten are randomly vanishing from the first one hundred results only to reappear again in the top ten in different positions.
SJ must mean seriously jacked ... ;), but of course I think it's a bit early to make any serious judgement calls.
Of course "stuff" is happening, and that can be talked about.
But this isn't an update. Update means updated. Any analysis now is not of the update but just of Google's algo on *non-updated* data. Mostly antiquated data, with some non-deepcrawled data. Damndest thing, a combo of obsolete and unproven.
That doesn't fully align withNothing fully aligns. There is no arguing that pages that have only been crawled by the mid april 216.239.246.* deepbot are showing up on-sj. There are also many pages crawled by same that do not show up. A single important missing link can significantly change the serps for a search. A number of these can lead to changes webwide. Also many banned sites are not currently being filtered out. This means their links (and anchor) probably are being counted. Many of the previous filters which may have caused dramatic changes (sept 02, crosslinking late '01) may not be factored in yet. When these are aded back in (and the sites removed, reranked) the web's linkstructure will be significantly altered again. It'll be fun.:)
"Algo/filter changes. These are being played out on SJ, and of course: if you are testing a change you use older tried and tested data, which is what Google has done (with a dash of Fresh). Hence all the missing links, strangely dropping sites, etc. "
If you did it through your hosts file, you are pointing to the IP of -sj, 216.239.35.100, right?
Having switched my hosts file from 216.239.33.100 to 216.239.35.100, I was dismayed to see that numerous deep pages had dropped from PR4 to PR0 white bar. I then remembered that I had renamed all of these pages right before the last deep crawl occurred. Pages that were not renamed retained their PR.
Despite pointing all old file names to the new filenames with 301 permanent redirects, Google is no longer using the PR of old pages nor are they automatically assuming PR based on the root of the directory. All of these renamed pages now have a PR0 as if they never existed before (as with a new site).
So it would appear that the new algo is making an attempt to refine PR calculation of internal pages, instead of relying on a guess based on the root directory PR.
As a second topic, does anyone have any ideas why allinurl: counts have dropped too? Previously, it would show all pages on a site regardless of PR. Now that has changed as dramatically as the change in link:
Ted
[google.com...]
[www-sj.google.com...]
Google drops to #4 for "search" on -sj. But maybe thay are just being penalized for trying to spam the index with their Britney Spears page...
[google.com...]
- Lots of recent spam websites have been removed by hand or with "patch" filters (e.g. the expired domain filter, that removed looots of expired domains regardless of spammy or not).
- The new also contains more accurate filters that are going to be applied soon. These filters are designed to detect and remove real spam websites/behaviours, based on the data they gathered from the websites they hand-removed.
- This is maybe why we see previously banned/penalized domains back in the index, because it is preferable to implement scalable spam filters and algo that maintaining a "black list" of domains.
The only thing I see is that they've let it all hang out and are using an old database with some fresh results. They've removed all spam filters and backlinks.
In agreement with Nick earlier, the last deepcrawl data isn't here yet, but to be fair to GoogleGuy, fresh data is here.
Anyway, a bit too early for this thread, no? Maybe we could call it the
"What stage are we at in applying
1. backlinks
2. last deepcrawl data
3. spamfilter
to Dominic" thread?
"Stuff is definitely happening. I've yet to see a single update where the results at the end were dramatically different than at the beginning. If this is not true on this update, this will be a first since I started watching updates."
This is a flawed statement for the simple reason you can't compare this update to the rest because the rest never started this way, this is new, therefore can't be compared to past updates. Silly thread at this time.
On old www.google.com, by link:example.com, you could only access half the number of links announced in the blue bar
while on www-sj.google.com, there are fewer links, but the number you can read is the number you find in the blue bar.
So a mysterious -and troublesome- feature of the link: query seems to be removed from the new algorithm ; but with the many backlinks missing, it seems too early to know whether this means that all (sufficiently rich in PR) links are returned, or still only half of them, but with the number in the blue bar reflecting the amount returned.
To me at the moment I see on www3 and www2:
1) Not all backlinks showing
2) Internal backlinks not giving as much weight
3) Link data (anchor keywords) that seem unbelievable
As mentioned in many posts its really hard to work out until the update is complete. I reserve further judgement until then.
Smiley
What SJ is displaying right now is more or less the "atypical topical theme" with level A links. What Google is withholding right now is likely to be the level B,C,D links - that is why we all see significant drop in the number of links across the web. If you browse through all the backward links now, there is a good chance that you nearly see all of them - level A which explains the mystery why Google don't show the other half of the backward links (C,D) before.
The "atypical topical theme" is unlikely to be the hierachical style that people are usually thought of. I call it "atypical" because the more content you have, the more endangered you are to be dropped. The more varied, the worst. If you are in the competitive field, that is where you are going to be buried in. The reason is so simple, those contents dilute your theme...
This helps to explain why authoritative sites drop and why smaller sites with less content rise, especially in the competitive area. Of course, these smaller sites may look spammy but its advantage is it carries concentration of the topical theme, whereas the authority ones the dilution.
(My industry is where the dog eats dog, so I have several dozens sites with Grade A super riched content, Grade B riched content, Grade C less content but majority...In SJ, only Grade C still survive at the coveted spots on serp for competitive keywords.
We have to see what will be the end results after Google adds more links and spam snapshots in the next iteration, but I feel the basic equation would be:
Previous: Link Popularity + ....
Upcoming: Atypical Theme + Link Popularity + ....
To conclude, neither link or theme alone can boost your rank in the future - you need both!
Although this SJ phenomenon would have a very little impact on our business and overall ranking right now, I honestly and personally do not like it because this would only arouse SPAM WAR! - Not with Google of course, but with fellow competitors.
Hope this help to explain the wide disarray of SJ update and hope the above equation would not turn out to be true...