Forum Moderators: open

Message Too Old, No Replies

Directory structure, index pages & crawl-ability

does directory structure stop you getting crawled

         

Dubya_J

3:47 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



Hi Guys,

My sites got some of its best pages in a folder 3 sub dir's down but they haven't been indexed, and google's been really great about crawling the whole site. And it s a biggy too!

It hasn't crawled anything from domain.com/sub/sub/thisparticularfolder.

Its crawled other pages in other folders that were 3 tiers down, and we were stumped royally as to why it hadn't crawled them. Especially when there were links to pages in the uncrawled folder on the home page & other key site pages. (PR 6's & 7's)

The only thing we could find that was different about this uncrawled sub dir, was that it didn't have an index page.

Could this be the problem? Is Google crawling hierarchically and not crawling folders unless they can start with the index?

Your thoughts are warmly appreciated

MeditationMan

3:59 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



I have directories with no index page and they get crawled.

Dubya_J

4:06 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



Is it ASP too?

Can't imahgine it makes too much difference, but gotta cover off every variable

Nick_W

4:08 pm on Dec 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I do too. Maybe it's just 'one of these things' -- How long have the pages been there?

Nick

MeditationMan

5:30 pm on Dec 4, 2002 (gmt 0)

10+ Year Member



Hi Dubya_J

No, my pages are all *.html. What PR is your index page? Wondering if it's maybe low and googlebot is rationing itself.

sun818

6:32 pm on Dec 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tend to believe there is an upper limit [webmasterworld.com] of pages that get crawled based on your PageRank. As I receive more links to my web site from other high ranked sites, the crawling goes deeper with each update.

Dubya_J

12:35 pm on Dec 5, 2002 (gmt 0)

10+ Year Member



The uncrawled pages are linked to from the home page (pr8) and a few of sub pages (pr's 6's 5's & 4's).

The thing that bugs me is that its crawled other pages that are 3 tiers down in other folders, which seem to have very simialr link relationships with the major pages on the site.

Interestingly enough it seems to be doing a daily crawl of everything in fisrt and second sub dir's, and thats enormous.

Like I said...stumped...royally!

ruserious

12:56 pm on Dec 5, 2002 (gmt 0)

10+ Year Member



I totally agree with sun818. There was a discussion aboput this here: [webmasterworld.com...] unfortunately it is too old to reply to.

My site is all dynamically generated. There was a strong relation between the PR of the site and the number of pages shown in the google-index.

chronix over the past months:
PR4-5 -> around 4400 pages
PR2-3 -> around 2200 pages
PR5-6 -> around 6600 pages

Th interesting thing is, I got a link from a PR7-page mid-month, and a couple days later the number of pages in the index went up and stayed there until the actual google-index-update, when I got the new higher PR myself.

The concrete numbers may well have to do with the structure/link-depth of our pages, so that differently organized pages get different numbers, but I think the general idea that the PR relates to the indexed-pages is definetely applied IMHO.