Forum Moderators: open

Message Too Old, No Replies

Too Flat?

Site directory structure too flat

         

savvy1

4:08 pm on Sep 14, 2002 (gmt 0)

10+ Year Member



Does anyone have an opinion on whether it'd be a Bad Thing to have a site which had say 15,000 pages all in / , ie no subdirectories whatsoever?

Something to the effect of blue-widgets-230.html blue-widgets-2938.html red-widgets-2338.html etc? It seems that, although the toolbar "guesses" PR based on how far down from / a page is, that once true PR is calculated, its more a factor of the link structure rather than the directory structure.

Or, is it some of both?..

But..when is TOO flat?

In any case I think back to 10 years ago, going to a computer novices house whose computer was "broken" because he had too many files in the root directory. He didn't understand that you could create subdirectories and therefore never did. :) All his files were in / (well \ ).

I imagine a finicky googlebot who thinks that a site is too messy because it has 15000 files in the root directory instead of nicely sorting them into proper directories. Am I crazy? :)

The permance gain or loss of having that many files in a single directory on a machine is irrelevant to this excercise.

Opinions? What is too flat? Is there such a thing?

ikbenhet1

4:14 pm on Sep 14, 2002 (gmt 0)

10+ Year Member



I know for a fact that google at least indexes 40 files per directory, probably much more, but i never did that much before in 1 directory. so at least 40+

Would get a little messy also in your root.

[edit] "PageRank is on a page-by-page basis" so this means that every html will be indexed? or not?

Anyway would never do this. At least i would make easy named subs.

ciml

5:16 pm on Sep 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You shuold ignore the Toolbar guesses, Savvy. PageRank is just about links.

If you have enough PageRank to encourage Googlebot to spider deep through the link structure, then I don't see why you couldn't get indexed.

Does anyone here have experience of thousands of URLs in one directory? (or that might look like they are)

bird

5:28 pm on Sep 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Depending on your OS and file system, having that many files in one directory will result in massive performance problems. In my experience, Google doesn't care about the number of subdirectories. The best ranking page on my site is three levels deep.

tedster

5:38 pm on Sep 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My biggest so far is just under 1,000 and I had no problems - in fact that site had very good Google results.

Flat is very, very good IMO. Short URLs are yummie!

Giacomo

5:45 pm on Sep 14, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



But..when is TOO flat?

Never, I guess: I really can't think of any good reason why Googlebot should have any problem with a "flat" site. I don't think a site with many subdirs is bad, either, as long as the link structure is good and the URLs fit in your browser. ;)

In fact,

Internet Explorer has a maximum uniform resource locator (URL) length of 2,083 characters, with a maximum path length of 2,048 characters. This limit applies to both POST and GET request URLs.

However,

RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1 [ietf.org], does not specify any requirement for URL length.

:)

muesli

6:06 pm on Sep 14, 2002 (gmt 0)

10+ Year Member



Does anyone here have experience of thousands of URLs in one directory? (or that might look like they are)

i have millions of pages in one directory. google has indexed about 120,000 of them, those that have a relevant amount of inbound links and PR (the rest is a bunch of pages of lesser importance).

no problems whatsoever. i think i have noticed a slight preference of googlebot on pages with shorter filenames on my site. so all pages in the root directory shouldn't be a disadvantage IMO. however, as ciml points out, how many of your pages will be indexed will rather depend on PR issues.

ciml

6:46 pm on Sep 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks muesli, I though that someone here would have experience of this.

Giacomo

6:58 pm on Sep 14, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



millions of pages? That's quite a lot of content! :)

If I may ask, what's your site about? (just curious)

slight preference of googlebot on pages with shorter filenames

Now, that's an interesting find: maybe a spam filter against keyword-stuffed filenames?

savvy1

7:40 pm on Sep 14, 2002 (gmt 0)

10+ Year Member



Great comments everyone..
no problems whatsoever. i think i have noticed a slight preference of googlebot on pages with shorter filenames on my site. so all pages in the root directory shouldn't be a disadvantage IMO. however, as ciml points out, how many of your pages will be indexed will rather depend on PR issues.

When you say filenames, do you mean filenames? hehe, sorry. I just want to be clear..you mean short filenames contained in that one flat directory, right?

Interesting.

Giacomo

7:49 pm on Sep 14, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



savvy1, since muesli's pages all reside in the same dir, I guess he's referring to the filename (not pathname) length.

brotherhood of LAN

8:00 pm on Sep 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Millions is alot! :)
Someone here owns a healthy fraction of the G index ;)

Reading what has to be said by GG about dynamic URL's [webmasterworld.com].....and thinking of how easy it is to create one page and many dynamic URL's you can maybe see why they encourage the showing of dynamic URL's rather than something that looks like a file :) Well, it's a bit 2+2....

I see the use of folders good for humans at least...don't see why Google should bother.

bcc1234

11:29 pm on Sep 14, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does anyone here have experience of thousands of URLs in one directory? (or that might look like they are)

Yeap.

Besides, there are so many technologies out there - the directory structure does not mean a thing any more.

/ - is no longer a path separator in a uri.

Jack_Straw

1:50 am on Sep 15, 2002 (gmt 0)

10+ Year Member



Google cares nothing about file folders. The site directory structure is irrelevant to Google. What matters to Google is links. Forget about google or seo considerations when considering directory or folder structure and let other considerations guide you in those matters. I have much experience with large flat directory structures and it has no relevancy to Google.

The site structure that is important to Google is the site link structure, not the site directory structure.

Of course, a popular way to organize your site is to have your site link structure shadow your site directory structure. If you are doing this, then we are really discussing the same issue.

But, the question of how flat can a site link structure is an important we have had to deal with. It arises for us when the data we work with is inherently of a flat nature (such a big list of names) and there are a large number of pages to present. In this regard, it comes down to this:

Yes, a site (link) structure can be too flat.

If the number of child pages for a parent page is greater than the number of links that googlebot will successfully read off that page, then the site structure is too flat.

How many links can a page have? Here's how you can get a handle on that. Technically, the maximum page size visible to Google is 100K. So, the question is: How many links can you put on the page so the total page size will not exceed 100K? If the other page contents are more brief, you use relative addressing, your page names are short, and your link descriptions are short, you will be able to maximize this number. My experience shows 1000-2000 links are reasonably possible. Extreme measures (no non-link content, a simple page, and short page names descriptions) could do a lot better.

So, if this works out to 1000 links per page for you, then a site of a million pages will require only two levels below the home page. The home page links to 1000 intermediate pages and each of them links to 1000 final pages.

[edited by: Jack_Straw at 2:07 am (utc) on Sep. 15, 2002]

savvy1

2:01 am on Sep 15, 2002 (gmt 0)

10+ Year Member



savvy1, since muesli's pages all reside in the same dir, I guess he's referring to the filename (not pathname) length.

Giacomo, I figured that, but, I also thought there was a very slim chance he was referring to another site. (And that being the reason he chose that model for the "million page" site) Rereading, I'm sure you're right, though.

muesli

12:48 pm on Sep 15, 2002 (gmt 0)

10+ Year Member



If I may ask, what's your site about?
my site is a community and every registered user gets his own little website.

When you say filenames, do you mean filenames? hehe, sorry. I just want to be clear..you mean short filenames contained in that one flat directory, right?
i mean it in a more general way: i have noticed googlebot not crawling certain pages. we i shortened the URL suddenly it did. no real empirical data. the experience is also from other parts of my site, not the million pages. the URLs in question were all dynamic.

savvy1

5:08 am on Sep 16, 2002 (gmt 0)

10+ Year Member



Whoops, guess my first guess was right. :)

Thanks for the update, muesli. :)