Forum Moderators: Robert Charlton & goodroi
Now this is interesting.
A site with 40 000 "real" pages and some 80 000 duplicate content pages excluded using robots.txt (it's a forum - see my prior posts about vbulletin) and still some 80 000 duplicate pages that are not yet so excluded.
Additionally some 500 000 non-thread pages also excluded in robots.txt and most of those already delisted. The whole site is listed as www; nothing is listed as non-www at all.
Looking purely at indexed threads:
site:domain.com shows 90 000 www pages all as normal results; including some duplicate content that will eventually be excluded.
site:domain.com -inurl:www shows 24 000 www pages all of which are marked as Supplemental Results and all of which also have an old cache date. This search should show zero results. It certainly should not be showing www pages at all, the search was for "-inurl:www". What is going on?
[edited by: tedster at 8:44 pm (utc) on June 13, 2006]
[edit reason] split into new thread [/edit]
Spot on. We don't rank for our main term blue widgets but we do rank first page for aqua widgets which most people would consider a synonym.
Noticed that for the "blue widget" term they are not showing our homepage but instead the contact-us.html page and a random internal page. They do, of course, rank our homepage for the aqua widget.
Ahh Google you crazy little monkey. Past midnight so off to bed and maybe some sleepy pixey dust will fix it all.
Anyone else seeing any leaps in page counts?
Site 1 - Jumped from 20k pages to 150k [ nice round figure :) ... kinda looks strange ]
- Keywords with brackets "widget keyword" in top place
- Same keyword with no brackets not in first 100 results.
- Keyword of widget's name shows an old supplementary page we had. Why not show the new one!
Site 2 - Stuck at 780 pages out of approx 200k
Site 3 - Stuck at 12,800 pages for 4 weeks
I defy anyone to say Google is working properly -
I just did an experiment using the main target 2 word term I watch.
On 216.239.59.104 which is my current default .co.uk DC showing Copra results the rankings do not change at all but the site at #3 has (whether surrounded by quotes or not) an inset listing at #4 and the title of this is truncated down to a single word when I search for "blue widgets" in quotes but it shows the full page title without quotes.
On 72.14.203.104 with Turd results the site mentioned above is now at #2 with an inset at #3 but here although the rankings remain the same, whether surrounded by quote marks or not, the title of that inset result also remains the same and is not truncated.
Why would that be?
Sid
I had a hard time determining what was Copra and which was Turd pre-Skata since the top #5 are more or less set for this industry...fortunately those nutty, nutty results did move over like you mentioned (at least now I know which one you all are labeling Copra). Needless to say, I had a fitful night.
Intuitively, I couldn't see how a site that was #1 allinanchor, allintitle, allintext would go completely MIA on one major term, but not another. Thankfully, it must have been some sort of data folding going on...call them re-building results?
I don't know how you full-time DC watchers do it; this is maddening. Admittedly, it hasn't been since Florida that I've been this worked up over DC issues.
Best of luck to any of you still seeing wild swings.
I am still wondering what the heck these results had in common because they didn’t just come out of the blue without some sort of filtering.
If I remember correctly they were all pages belonging to very large established sites and by large I don’t mean SEOed sites with hundreds thousands pages containing junk (in the sector I watch these pages are now either in the “Omitted”, or the “Supplemental” index or they don’t appear anywhere, even though some are cached). They were sites which with the site: command were showing 1000 pages in the main index.
Did you notice that?
Noticing the top 5 results in a very competitive sector seem to all be authoritative directories, such as DMOZ, Y and of cource WIKI's and .edu .org etc
A sector I monitor went frim 70 million results, to 1 billion.
Some keywords are not phases,while others others seem to be somewhere in the monster index.
These results are terrible! Google cannot assume that just because it deems a site as an "authoritative" that it's now an authoritative on everything...
The pain just doesn't end. Those skata results still exist, but are query-based (maybe backing up re-build theory?). Just getting into the office this morning I'm noticing that those same datacenters have us MIA for a phrase that we're usually anywhere between #2-5 and have held also for about 2 years.
The new top 10 for the skata is pretty bad, with AT&T homepage junk, redirects, Yahoo directory listings, and some sites I thought they had banned a while back.
Like before, I can't actually see the data by going directly to the datacenter even after clearing cookies; so, according to McDar's tool, the below are the crazy DCs for one of the queries...they seem to be fixed for different quereies on the same DCs though (insert picture of me pulling out hair):
64.233.161.107
64.233.161.147
64.233.161.99
64.233.161.104
216.239.39.107
216.239.39.99
216.239.59.147
Edit: I literally just watched the skata disappear from the 216.* sets and then added itself to 216.239.59.107 -- I better click on Caryl's ads as a thankyou for soaking up the bandwidth.
[edited by: JoeSinkwitz at 1:34 pm (utc) on June 15, 2006]
Jaffstar this is exactly what I’ve noticed and mentioned in my post on the previous page. Last night’s results were a nightmare but even on “kopra” this is quite obvious too and it all started (I think) with Jagger, do you agree? Amazon and Wiki took over where relevant sites where more appropriate.
I am now fearing this “policy” might propagate – then we will all be dead corpses soon
Copra is now on only 16 of the 55 on McDar, the rest are Turd.
There seems to be no logic to what is going on.
Do they know that there are two distinct datasets?
Are they trying to manage the propogation of one of those or are they simply twidling knobs in the hope that eventually they will have reasonable results?
I used to like to try and predict what was happenning on Google now it just makes my head ache.
Sid