| 3:31 am on May 12, 2004 (gmt 0)|
huh? sorry you lost me.
| 4:07 am on May 12, 2004 (gmt 0)|
i disagree with the theory of it being an inability to store the information.
On my site, every page that has a URL only listing, is FILENAME.ASP?value=XXXX
XXXX being the different value that controls page content. It also happens, that all the pages with URL listings have basically the same content as the page calling them; they are a printable version of the page they are linked from.
I think its some form of penalty for a variety of factors including duplicate content.
Otherwise, its a pretty huge fluke that they were all indexed at one point, then all my printable version pages have been given URL only listings.
And, i have never seen a URL only listing appear in the search for a standard search, unless the search is for that particular URL or part thereof.
Another interesting point of mention is with the total pages found for a query. When i do a site:domain.com on my site, i get 1170 pages total.
However, if i click through the pages, i get to a total of about 300, then the little Gooooooooogle thing down the bottom doesnt increment further or let me see beyond it.
You ask me, those figures are never really that accurate, and they wouldnt want them to be because the main people wanting to check actual quantities of pages for searches with any degree of precision would be SEO'rs and opposition SE's....
| 4:39 am on May 12, 2004 (gmt 0)|
>And, i have never seen a URL only listing appear in the search for a standard search, unless the search is for that particular URL or part thereof.
l did a query for [a -the] which gives the following result:
Web Results 1 - 10 of about 19,600,000 for a -the. (0.19 seconds)
News results for a -the - View all the latest headlines
NHL Playoff Top Individual Performance - NHL.com - 30 minutes ago
High School Schedule - Hartford Courant (subscription) - 21 hours ago
[ More results from dmoz.org ]
KidsClick!: Subjects: A
KidsClick!:Subjects: A. Go to these specific subjects: ... Search our 600+
subjects by letter:A B C D E F G H I J K L M N O PQ R S T UV W XZ. ...
sunsite.berkeley.edu/KidsClick!/suba.html - 9k - Cached - Similar pages
<snip the rest>
look at the first entry. this establishes once and for all that a regular query can include url-only entries.
>On my site, every page that has a URL only listing, is FILENAME.ASP?value=XXXX
google considers each of your filename.asp?value=**** as a separate, distinct page. it doesn't give a hoot whether you meant it as a printable version or not. url-only state of a page is no indoication of a duplicate content penalty. a url-only status is simply an indication that google failed to fully index the page for what reason. nothing to do with dup content which google handles differently
nothing that you've described invalidates the theory that google is simply having index capcity problems.
| 10:28 am on May 12, 2004 (gmt 0)|
Well, this problem may have something to do with capacity or maybe not, but I wonder how many of the 5 billion indexed pages are actual real and unique urls.
A site: search for my site reveals that Google thinks domain.com/widget is NOT the same as domain.com/widget/
No matter how you look at it - that is a sad state of affairs for the world's most powerful search engine.
| 10:37 am on May 12, 2004 (gmt 0)|
This "capacity issue" is getting ridiculous. Google do not have a capacity issue, or at least that is not why snippets are occuring. We have created tens of thousands of pages over the last two months and some of these pages have snippets and then get a full listing and some have got indexed straight away. The general way it has worked is that the better rated the site and the larger it already is, the better chance that the pages get listed quickly. On the other extreme if you have a one page site and add 100 pages you are almost definitely going to have a snippet problem. Look elsewhere for the problem, not with Google's capacity. You are wasting your time looking for conspiracy theories etc that do not exist
| 10:58 am on May 12, 2004 (gmt 0)|
>Google thinks domain.com/widget is NOT the same as domain.com/widget/
Your server should return a
301 Moved Permanently together with the field
Location: http[i]:[/i]//domain.com/widget/ .
If it doesn't, it's correct to treat both as different and to list both - from a w3c perspective - although the listings could get merged to one. However, if the content changed between the hits to domain.com/widget and domain.com/widget/, it's again correct to not merge them.
| 1:29 pm on May 12, 2004 (gmt 0)|
yes unfortunately google considers domain.com/widget and domain.com/widget/. i've had both cases (unintentianally) indexed and appear in the serps. its is certainly a waste of googles precisous index capacity.
>This "capacity issue" is getting ridiculous. Google do not have a capacity issue, or ... You are wasting your time looking for conspiracy theories etc that do not exist
your logic goes like this => I've been able to add new pages therefore google has no capacity problems.
sorry but your logic is faulty and incomplete. the theory does not preclude google adding new pages. in fact the experience (in these forums) is that google adds new sites, and google adds new pages to old sites. it is also a fact that google drops old pages, classifies pages as "url-only" or as supplementals.
so far nothing is inconsistent. the experience with your specific site does not disprove the index capacity problem. in fact it can be explained by the theory.
note: i used to be able to readily add new pages to an old site and rank immediately. lately, i've noticed that this is not true anymore, that the indexing of new pages in old sites has slowed considerably. anybody else seeing this?
| 1:32 pm on May 12, 2004 (gmt 0)|
I'm sure you're right from a technical perspective, however, considering the efforts Google goes to to eliminate duplicate content and considering that it recognises that domain.com/widget/ is the same as domain.com/widget/index.html, I do not believe that Google's behaviour is likely to be by design, rather, I think it is more likely a bug.
Just checked - my host is correctly configured as per your 301 description.
On the subject of missing snippets, again, I can see no possible reason why such behaviour should be by design - users do not benefit - it is a bug. It should be pointed out that it is not just the snippets that are missing it is the entire page content that is not indexed. Therefore, such pages can only be returned in the SERPS based on off-page factors. (Certainly, that is true of my site - I've tested it, so I assume that is true of others.)
| 3:53 pm on May 12, 2004 (gmt 0)|
>On the subject of missing snippets, again, I can see no possible reason why such behaviour should be by design - users do not benefit - it is a bug.
why are you looking for a reason that this is by design? google has already said that the page has been identified by their crawler by has not been indexed yet. of couse there is no title or content. IT's NOT BEEN INDEXED. IT IS NOT IN THE INDEX.
your explanation that it is a bug is wilder and more speculative than my explanation.
the other issue that was discussed is whether "url-only' pages appear in the serps. definitely! and you provided an explanation of how it possibly can be included - through off page anchors! does it mean all url-only pages can appear in the serps? NO! all we established is that url-only entries do appear in the serps. period.
bug? pfft...lame explanation to me.
| 5:07 pm on May 12, 2004 (gmt 0)|
MY PAGES WERE INDEXED AND WERE THEN DROPPED. It is simply not true to say that pages were not indexed - they were and, no doubt will be again, when the problem is fixed.
I was getting a number of hits from people looking for a particular type of data recovery tool. These people are not being served well by Google dropping the relevant page from from my site (yes it is indexed as an url, but no-one will ever find it).
If it crawls like like a bug, bite's you on your bum like a bug and leaves a nasty rash like a bug - it's a not a duck, it's a bug.
In my particular case, it could be an Everflux issue (if it still exists) since the pages are new, but I don't think that is true for others, and in the past, when pages vanished as a result of Everflux, there was no trace, not even the url (I think) so I am not inclined to believe this.
| 6:25 pm on May 12, 2004 (gmt 0)|
My pages which had URL-only listings since 1 month are now vanished completely from the index.
When I use the keyword site: command, I get zero entries. Although the domain seems to have PR5, but when I open the index.php the toolbar says PR0.
It was a perfectly clean site with top rankings for years and has still many hundreds of backlinks but Google kicked it out of the index.
Why can't they give us a glue what is happening here?
| 7:18 am on May 13, 2004 (gmt 0)|
|Why can't they give us a glue what is happening here? |
Stick with it!
(sorry, I couldn't resist it :o)
| 7:16 pm on May 13, 2004 (gmt 0)|
I have a situation where it could be the "sandbox issue", or something else.
Basically my site went live in mid february - it is a large site for a client that has thousands of product pages each linked to from a category. All pages have spider friendly URLs.
The PR of the homepage shows as a "5", but I believe it is a low 5 or high 4 in reality. The main content pages with are at the 2nd/3rd tiers are fully indexed, and all relevent text shows in the listings.
The product pages, which are at the 4th tier are showing ONLY the URL's - the snippet problem. There are about 50 product pages that actually are fully indexed and show all text in the google listing - but the other 6000 do not, and they are "snippeted".
I had heard a rumor that your homepages PR value determines how far google will crawl/index down into your site (how many tiers - OR how many total pages).
My task now is to figure out if it is simply a "sandbox" issue that is causing this (the site was launched if February), or if it is a situation where I need a higher PR on the homepage to get the spider to fully get all of the product pages...
Has anyone had a similar situation, or have any advice on this?
I appreciate your time.
| 10:28 am on May 16, 2004 (gmt 0)|
Mine is not a large site but about a dozen new pages suffered this fate. However, all (I think) are now fully indexed again.
| 2:16 pm on May 16, 2004 (gmt 0)|
Check 22.214.171.124 and 126.96.36.199 also.
It kind of looks like there's a process with "steps" when pages are being added or being removed from the index and you can catch changes in how things look and differences in the numbers on different data centers.
I'm watching one site (not any of mine) that's got all duplicate pages, one for one, changing file extension and also switching from using www to without - or a combination. Using a meta-refresh from the old pages to the new.
As pages are being removed, first they get the URL only treatment. The new ones being added are first showing up with only the URL - but then, when you click the link for the omitted results (where they say they've shown the most relevant) - you can see a partial indexing of some of the pages - only the alt text of the top graphic and the first text on the pages which is repeated identically site-wide.
I've been watching this on main Google and 188.8.131.52 - with differing numbers of pages showing. Except when you "force" that same sitewide snippet for some of the pages, it's practically all URL only listings and the site's lost all its rankings.
| 3:13 pm on May 16, 2004 (gmt 0)|
Well I have been on an elevator - everyday for several weeks the index count has been different. The bite is anywhere from 20 to 250 pages. Whole directories disappear and reappear and then disappear again after 2 or 3 days. Nothing comes back to stay. But I do notice that once the count gets up to around 400 it drops within 2 days back down to under 300. I am still looking for over 1200 pages MIA. My traffic is down by 96% and I have been smitten one way or the other by this 'THING' since February 10.
| 6:29 pm on May 17, 2004 (gmt 0)|
I have seen big fluctuations... a few minutes ago i was seeing lots of URLs with no titles or snippets on my site; I just checked again and all the pages had titles and snippets. Go figure. It seems like there is really nothing to be done about this except cross your fingers and wait it out.
| 6:46 pm on May 17, 2004 (gmt 0)|
this is really getting ridiculous. if google is having this much problem and it is revealed after the IPO has happened that a problem does exists, do you think they will be liable for stock manipulation? In this age of SEC sensitivity, this can likely happen.
| 6:59 pm on May 17, 2004 (gmt 0)|
>It kind of looks like there's a process with "steps" when
>pages are being added or being removed from the index
That's something i have been speculating about since some time (see my previous posts about title records and possible ranking/filtering processes). It looks like this is the new behaving of googlebot for pages that either are updated often or that don't make use of the "304 If Modified Since" header. Everytime googlebot recrawls an allready indexed page, it will disappear and reappear after a while. So the referers may go up and down like a Yo-Yo. I observe this with my sites - pages that are recrawled often have url-only listings one day, then recover their snippets and titles and then show url-only listings again for a few days ...
| 7:25 pm on May 17, 2004 (gmt 0)|
i have a much simpler speculation. to me, google has a capacity problem which limits the number of pages in the online index. so what google does is to keep a number of pages out of the index. since this appears to be widespread without any rhyme or reason, i speculate that it is done in a random fashion.
| 11:31 am on May 18, 2004 (gmt 0)|
Did Googleguy say anything about this?
I run a phorum site (PR5) with 20.000 pages in the index and looks like 99% have not description, just only the url in the index.
Any idea about what should I do?
| 12:42 pm on May 18, 2004 (gmt 0)|
The sme with my phorum-site (just a few hundred pages) :-)
no title, no descr. for a few month, now
i didn't made any changes and the website i sback with title and descr. since yesterday. there is no relevant Position but I'm sure this will come soon...
| 1:07 pm on May 18, 2004 (gmt 0)|
i am having the same problem! you say it has been a few months now for you with this problem?
| 1:44 pm on May 18, 2004 (gmt 0)|
What kind of phorum are you using?
Maybe we are using the same one and we can think in any problem in this way...
I use phpbb
| 2:35 pm on May 18, 2004 (gmt 0)|
:( yes my site has once again faced the same fate. Google Guy, whats the story? 1 was enough is 2wice to be the killer?
Page totals jumped up to 68100 from 38400 “Site:www.mysite.com –weqweqw”
Almost 909% without DESC or TITLE
Page totals drop from 58100 to 33700
Links totals also drop from 7210 to 4620.
Links totals dropped again 4620-2450
Page totals were 57500
PR Stayed the same
SERP’s didn’t change
Still No change in the SERP’s
First drop in Googlebot visits. From an average of approx 600 a day to only 100
SERP’s still the same
Only 1 visit from Googlebot
Lost mysite.com Home page in Google’s index.
Page totals dropped from 57500 – 44800
Links still at 2450
Pages still have PR
Dropped out of the SERP’s
7th May – 13th
Page totals dropped from 24600-0
| 3:30 pm on May 18, 2004 (gmt 0)|
I haven't read all the posts on this thread, but what I think it might be is that pages with similar text are less likely to get indexed. If I have 1000 similar pages, it seems to me that Google may not even spider them all. Googlebot may grab a few of them to look, and if too much of the content is similar, it won't even look at the others. Just a guess, but I think that there's some kind of similar content rating that gives the lowest status to duplicate pages and increases the status gradually and gives the highest status to pages with very original content.
| 4:09 pm on May 18, 2004 (gmt 0)|
Nice theory SlowMove, but based on my site and its content, it doesn't hold up, unless you count navigation as similar content.
I wrote to Google about this a couple of days ago. They sent a response today. They said that my site was fully indexed according to their records.
I don't know if they didn't understand the question, or if they don't care as long as the site is indexed, but it wasn't the answer I was looking for.
| 4:18 pm on May 18, 2004 (gmt 0)|
Agree with Yowza - for my sites, all pages are unique content excluding the navigation. If Google is considering navigation items to be 'duplicate content' then they have a serious problem, so I am pretty sure it is not that.
| 6:55 pm on May 18, 2004 (gmt 0)|
no title / no snippet has different causes.
One of them is similar content. We were able to get pages out of the no title / no snipped state by making the pages more different to each other. Be aware that Google is not crawling those pages too often so it takes some time before those changes take place.
As I mentioned, not all having pages with no title / snipped can solve their problem like that but if your pages do have very similar content then this is what worked for us.
| 10:09 am on May 19, 2004 (gmt 0)|
>> I just checked again and all the pages had titles and snippets. <<
Don't forget that Google has many datacentres, each with a slightly different index and algorithm in use. When you search again, it is likely that the results are coming from a different datacentre to the one that supplied you just a few minutes previously. That usually explains results that change from minute to minute. Search using one consistent Google IP if you really want to see the real changes in one version of the index, rather than a different sample from a random version of the index each time.
I have some pages with no title or description too; just the URL shows. They are pages that I put a robots noindex meta tag on, several months ago. Google refuses to completely forget about them.
| 2:40 pm on May 19, 2004 (gmt 0)|
No title / no snippet in my eyes just means that Google knows about the page but is not considering it at the present time to be in SERPs. There are many reasons for it
- Google is not able to crawl the page anymore (or was not able to crawl it)
- page has content that is very similar to other content
- technical problem of any kind