Forum Moderators: open
The pages haven't been totally removed, instead it seems that the pages exist in the Google SERPS but they have no title and no snippet and therefore no longer appear for any searches.
At first I thought this was some sort of penalty / filter to remove some of the controversial search sites from its index, but it seems this applies to other large sites, e.g. dmoz. I would estimate that dmoz has had around 200,000 pages "nuked".
Has anyone noticed this phenomenon on any other sites?
On my site, every page that has a URL only listing, is FILENAME.ASP?value=XXXX
XXXX being the different value that controls page content. It also happens, that all the pages with URL listings have basically the same content as the page calling them; they are a printable version of the page they are linked from.
I think its some form of penalty for a variety of factors including duplicate content.
Otherwise, its a pretty huge fluke that they were all indexed at one point, then all my printable version pages have been given URL only listings.
And, i have never seen a URL only listing appear in the search for a standard search, unless the search is for that particular URL or part thereof.
Another interesting point of mention is with the total pages found for a query. When i do a site:domain.com on my site, i get 1170 pages total.
However, if i click through the pages, i get to a total of about 300, then the little Gooooooooogle thing down the bottom doesnt increment further or let me see beyond it.
You ask me, those figures are never really that accurate, and they wouldnt want them to be because the main people wanting to check actual quantities of pages for searches with any degree of precision would be SEO'rs and opposition SE's....
l did a query for [a -the] which gives the following result:
***********
Web Results 1 - 10 of about 19,600,000 for a -the. (0.19 seconds)
News results for a -the - View all the latest headlines
NHL Playoff Top Individual Performance - NHL.com - 30 minutes ago
High School Schedule - Hartford Courant (subscription) - 21 hours ago
dmoz.org/cgi-bin/add.cgi?where=
Similar pages
dmoz.org/cgi-bin/add.cgi?where=$cat
Similar pages
[ More results from dmoz.org ]
KidsClick!: Subjects: A
KidsClick!:Subjects: A. Go to these specific subjects: ... Search our 600+
subjects by letter:A B C D E F G H I J K L M N O PQ R S T UV W XZ. ...
sunsite.berkeley.edu/KidsClick!/suba.html - 9k - Cached - Similar pages
<snip the rest>
**********************
look at the first entry. this establishes once and for all that a regular query can include url-only entries.
>On my site, every page that has a URL only listing, is FILENAME.ASP?value=XXXX
google considers each of your filename.asp?value=**** as a separate, distinct page. it doesn't give a hoot whether you meant it as a printable version or not. url-only state of a page is no indoication of a duplicate content penalty. a url-only status is simply an indication that google failed to fully index the page for what reason. nothing to do with dup content which google handles differently
nothing that you've described invalidates the theory that google is simply having index capcity problems.
A site: search for my site reveals that Google thinks domain.com/widget is NOT the same as domain.com/widget/
No matter how you look at it - that is a sad state of affairs for the world's most powerful search engine.
Kaled.
Your server should return a
301 Moved Permanently together with the field Location: http[i]:[/i]//domain.com/widget/ . If it doesn't, it's correct to treat both as different and to list both - from a w3c perspective - although the listings could get merged to one. However, if the content changed between the hits to domain.com/widget and domain.com/widget/, it's again correct to not merge them.
yes unfortunately google considers domain.com/widget and domain.com/widget/. i've had both cases (unintentianally) indexed and appear in the serps. its is certainly a waste of googles precisous index capacity.
Pimpernel,
>This "capacity issue" is getting ridiculous. Google do not have a capacity issue, or ... You are wasting your time looking for conspiracy theories etc that do not exist
your logic goes like this => I've been able to add new pages therefore google has no capacity problems.
sorry but your logic is faulty and incomplete. the theory does not preclude google adding new pages. in fact the experience (in these forums) is that google adds new sites, and google adds new pages to old sites. it is also a fact that google drops old pages, classifies pages as "url-only" or as supplementals.
so far nothing is inconsistent. the experience with your specific site does not disprove the index capacity problem. in fact it can be explained by the theory.
note: i used to be able to readily add new pages to an old site and rank immediately. lately, i've noticed that this is not true anymore, that the indexing of new pages in old sites has slowed considerably. anybody else seeing this?
I'm sure you're right from a technical perspective, however, considering the efforts Google goes to to eliminate duplicate content and considering that it recognises that domain.com/widget/ is the same as domain.com/widget/index.html, I do not believe that Google's behaviour is likely to be by design, rather, I think it is more likely a bug.
[ADDED]
Just checked - my host is correctly configured as per your 301 description.
[/ADDED]
=======
On the subject of missing snippets, again, I can see no possible reason why such behaviour should be by design - users do not benefit - it is a bug. It should be pointed out that it is not just the snippets that are missing it is the entire page content that is not indexed. Therefore, such pages can only be returned in the SERPS based on off-page factors. (Certainly, that is true of my site - I've tested it, so I assume that is true of others.)
Kaled.
why are you looking for a reason that this is by design? google has already said that the page has been identified by their crawler by has not been indexed yet. of couse there is no title or content. IT's NOT BEEN INDEXED. IT IS NOT IN THE INDEX.
your explanation that it is a bug is wilder and more speculative than my explanation.
the other issue that was discussed is whether "url-only' pages appear in the serps. definitely! and you provided an explanation of how it possibly can be included - through off page anchors! does it mean all url-only pages can appear in the serps? NO! all we established is that url-only entries do appear in the serps. period.
bug? pfft...lame explanation to me.
I was getting a number of hits from people looking for a particular type of data recovery tool. These people are not being served well by Google dropping the relevant page from from my site (yes it is indexed as an url, but no-one will ever find it).
If it crawls like like a bug, bite's you on your bum like a bug and leaves a nasty rash like a bug - it's a not a duck, it's a bug.
In my particular case, it could be an Everflux issue (if it still exists) since the pages are new, but I don't think that is true for others, and in the past, when pages vanished as a result of Everflux, there was no trace, not even the url (I think) so I am not inclined to believe this.
Kaled.
Why can't they give us a glue what is happening here?
greg
I have a situation where it could be the "sandbox issue", or something else.
Basically my site went live in mid february - it is a large site for a client that has thousands of product pages each linked to from a category. All pages have spider friendly URLs.
The PR of the homepage shows as a "5", but I believe it is a low 5 or high 4 in reality. The main content pages with are at the 2nd/3rd tiers are fully indexed, and all relevent text shows in the listings.
The product pages, which are at the 4th tier are showing ONLY the URL's - the snippet problem. There are about 50 product pages that actually are fully indexed and show all text in the google listing - but the other 6000 do not, and they are "snippeted".
I had heard a rumor that your homepages PR value determines how far google will crawl/index down into your site (how many tiers - OR how many total pages).
My task now is to figure out if it is simply a "sandbox" issue that is causing this (the site was launched if February), or if it is a situation where I need a higher PR on the homepage to get the spider to fully get all of the product pages...
Has anyone had a similar situation, or have any advice on this?
I appreciate your time.
Thanks,
Dan
It kind of looks like there's a process with "steps" when pages are being added or being removed from the index and you can catch changes in how things look and differences in the numbers on different data centers.
I'm watching one site (not any of mine) that's got all duplicate pages, one for one, changing file extension and also switching from using www to without - or a combination. Using a meta-refresh from the old pages to the new.
As pages are being removed, first they get the URL only treatment. The new ones being added are first showing up with only the URL - but then, when you click the link for the omitted results (where they say they've shown the most relevant) - you can see a partial indexing of some of the pages - only the alt text of the top graphic and the first text on the pages which is repeated identically site-wide.
I've been watching this on main Google and 216.239.57.104 - with differing numbers of pages showing. Except when you "force" that same sitewide snippet for some of the pages, it's practically all URL only listings and the site's lost all its rankings.
ENOUGH ALREADY!
That's something i have been speculating about since some time (see my previous posts about title records and possible ranking/filtering processes). It looks like this is the new behaving of googlebot for pages that either are updated often or that don't make use of the "304 If Modified Since" header. Everytime googlebot recrawls an allready indexed page, it will disappear and reappear after a while. So the referers may go up and down like a Yo-Yo. I observe this with my sites - pages that are recrawled often have url-only listings one day, then recover their snippets and titles and then show url-only listings again for a few days ...
i have a much simpler speculation. to me, google has a capacity problem which limits the number of pages in the online index. so what google does is to keep a number of pages out of the index. since this appears to be widespread without any rhyme or reason, i speculate that it is done in a random fashion.
:( yes my site has once again faced the same fate. Google Guy, whats the story? 1 was enough is 2wice to be the killer?
18th March
Page totals jumped up to 68100 from 38400 “Site:www.mysite.com –weqweqw”
Almost 909% without DESC or TITLE
9th April
Page totals drop from 58100 to 33700
Links totals also drop from 7210 to 4620.
27th April
Links totals dropped again 4620-2450
Page totals were 57500
PR Stayed the same
SERP’s didn’t change
2nd May
Still No change in the SERP’s
First drop in Googlebot visits. From an average of approx 600 a day to only 100
3rd May
SERP’s still the same
Only 1 visit from Googlebot
4th May
Lost mysite.com Home page in Google’s index.
Page totals dropped from 57500 – 44800
Links still at 2450
Pages still have PR
Dropped out of the SERP’s
7th May – 13th
Page totals dropped from 24600-0
I wrote to Google about this a couple of days ago. They sent a response today. They said that my site was fully indexed according to their records.
I don't know if they didn't understand the question, or if they don't care as long as the site is indexed, but it wasn't the answer I was looking for.
One of them is similar content. We were able to get pages out of the no title / no snipped state by making the pages more different to each other. Be aware that Google is not crawling those pages too often so it takes some time before those changes take place.
As I mentioned, not all having pages with no title / snipped can solve their problem like that but if your pages do have very similar content then this is what worked for us.
Don't forget that Google has many datacentres, each with a slightly different index and algorithm in use. When you search again, it is likely that the results are coming from a different datacentre to the one that supplied you just a few minutes previously. That usually explains results that change from minute to minute. Search using one consistent Google IP if you really want to see the real changes in one version of the index, rather than a different sample from a random version of the index each time.
.
I have some pages with no title or description too; just the URL shows. They are pages that I put a robots noindex meta tag on, several months ago. Google refuses to completely forget about them.