| 3:15 am on May 21, 2004 (gmt 0)|
>Google is crawling more actively than ever before, with fresh tages appearing every single day.
this is your logic:
-Google is crawling more actively than ever before therefore google is nout out of index space (google does not necessarily include everything that it crawls. do you dispute this?)
-fresh tages appearing every single day therefore google is not out of space (are you blind to the fact that pages are also disappearing? have you analyzed the net total of totally indexed pages?)
| 3:20 am on May 21, 2004 (gmt 0)|
|Google is crawling more actively than ever before, with fresh tages appearing every single day. |
Yep. As Steveb also said in an earlier post, it doesn't go as deep, so we have a very different situation than the ole deepcrawl that would grab every page every month or so.
Now, main pages get freshed and tagged almost daily, and pages a few clicks down might go 6 weeks between bots. For a site like mine, with about a dozen pages that serve as branches on the tree linking to all other pages, including the new ones, and also a whole whack of old static field note pages that never change, it's a perfect arrangement. We write up notes/articles etc, then link to them from the appropriate main page, and within 1 to 3 days the new page has been found and is in the index.
Imho, pages that are totally static, several clicks away from the index.htm, really don't need to be crawled that often. If it's a trade between deepcrawls, and hyperfreshbots hitting the main pages every day, I'm with the freshbots.
Sorry if I wandered off topic... we only have about 250 indexed pages and none have gone missing.
[edited by: Stefan at 3:27 am (utc) on May 21, 2004]
| 3:20 am on May 21, 2004 (gmt 0)|
its quite funny really.. space is cheaper than ever, and a team of engineers building a company to sell would easily work around any space or indexing constraints...
the duplicate content issue is still a possibility on the larger sites too;
if i cut and paste something of a large site, bet you it will end up with a url only listing (or the original or someelses cut and paste etc..), and its possible on these large sites, especially ones like mircosoft that people have cut and pasted or quoted info from there sites on numerous pages round the net...
renee, how are you checking for url only listings? can you sticky me or show me the search string?
actually, just checked and google seems to have gone and added about 200 new url only listings today, but as you will see from my previous post, the amount of indexed pages has remained the same. Strange thing is that alot of the NOINDEX pages have popped back up as URL only again!
| 4:00 am on May 21, 2004 (gmt 0)|
>its quite funny really.. space is cheaper than ever, and a team of engineers building a company to sell would easily work around any space or indexing constraints...
as i said in my previous post, the problem must be a lot more difficult than just disk, memory or time constraints. could be an algorithmic or address limitations - but this is pure speculation.
>the duplicate content issue is still a possibility on the larger sites too
i don't believe duplicate content has anything to do with it. i have direct proof that google does index duplicate pages; it just does not rank or present hem in the same serp.
>how are you checking for url only listings?
just do a search: [site:yourdomain -word1 -word2 ...]
where word1, word2, etc are words that are in your pages but not in your url. try "the" as one of the words. usually eliminates a large chunk of fully indexed pages.
| 9:10 am on May 21, 2004 (gmt 0)|
> pure speculation
Yep, and address limitations have been denied in the past (maybe someone remembers the threads?)
|Big sites suffering no title / no snippet in SERPS |
Is google penalising big sites?
Sometimes a large site is indexed less because it is less well linked (Googlebot won't crawl as deeply without PageRank and deep links from other domains)
Sometimes a large site is indexed less due to a penalty ("slow death").
Sometimes a large site is indexed less due to changes to Google's crawling patterns.
| 9:26 am on May 21, 2004 (gmt 0)|
"Sometimes a large site is indexed less due to a penalty ("slow death"). "
is anybody on this board still able to talk about this Penalty which case slow death?
If bad neighbourhood or Dupe (In case of google IMO Dupe = Mirroring ) were the only cause...innumerable sites would die togeather.
| 9:53 am on May 21, 2004 (gmt 0)|
Essentially, this is either a bug or it's by design. If it's by design, the only likely reason is some sort of capacity issue. If it's a bug, the possibilities are almost endless.
My vote - a bug, however, I think it's worth noting that few key pages seem to have vanished so this could be part of a new algo.
A lot of time has been wasted collectively on this - Google know about this by now and either they'll fix it or they won't.
However, if someone wants to spend time investigating another theory - how about this. Perhaps Google have reduced the timeout setting on their robots so that sites have to respond more quickly. There have been comments about increased spider activity so this is plausible (but unlikely).
| 10:08 am on May 21, 2004 (gmt 0)|
"Perhaps Google have reduced the timeout setting on their robots so that sites have to respond more quickly."
- Most of big/performing sites are suffering from this hard luck (lemme call it hard luck)
- Its obvious that big performing (major bread and butter) portals would take a special care of their server/host to avoid Time Outs/ Bandwith problem/ any other server related issue.
So IMO, this should not be the possible cause.
| 10:29 am on May 21, 2004 (gmt 0)|
|If it's by design, the only likely reason is some sort of capacity issue. |
or some sort of penalty.
| 11:32 am on May 21, 2004 (gmt 0)|
|- Its obvious that big performing (major bread and butter) portals would take a special care of their server/host to avoid Time Outs/ Bandwith problem/ any other server related issue. |
Popular sites aren't always the fastest. If robots timed-out after five seconds and only two rereads were attempted, a lot of pages would go missing.
OK, that's simplifying things somewhat, but a change of policy in this area might explain this problem. However, I think this explanation is unlikely.
| 6:04 pm on May 21, 2004 (gmt 0)|
I just read through the last couple of posts and I have to say that it's really annoying that there are people that keep on telling that it's a bug without having analyzed it.
There are of course some big sites that have missing sites but that's more natural. Also they don't have any supplemental results and no grey-toolbar pages.
And there are other sites that have lots of missing sites (more than 20%) and pages are getting less and having supplemental results and grey-toolbar pages a bit after.
We should focus on the second ones I guess because it's a penalty that affects lots of sites from various sectors.
| 6:08 pm on May 21, 2004 (gmt 0)|
yep - this topic is long over due to be done.
| This 312 message thread spans 11 pages: < < 312 ( 1 2 3 4 5 6 7 8 9 10  ) |