Forum Moderators: phranque

Message Too Old, No Replies

How do you guys handle this?

I am new to that many files on my site

         

dsz11

8:02 am on Jan 30, 2005 (gmt 0)

10+ Year Member



I see a lot of posts here that someone have just added 5000, 10000, 15000 pages to their site. It's never been an issue to me as I usually add 1 to 5 pages a day. But now I am working on a site that adds hindered or so pages a day each day. I mean at a point not so far ahead I will have to deal with 10000+ pages. I mean how you guys handle this - just add every new page as a simple html page or using a database to store the info for every page and using one or couple (for example php) templates to extract the info and make the pages. Is using a database will be FASTER/More reliable solution?
If my question is still unclear let me add this. If I decide to use database I will not use it for any on its features (like getting searches or queries) but simply for storing the same data in one big file and requesting it let say 5000 times daily instead of requesting 5000 out 10000 files/ daily in case I choose storing the same info in 10000 files.
My problem with 10000 files is that I tried making a folder on my computer with 5000 files and my Win'2000 have definitely problems opening this folder every 2 out of 5 times so I suspect that I will probably have the same problem with the server of my site, am I right? What if these 10000 pages become 20K or 50K pages?

DSZ

roldar

9:14 am on Jan 30, 2005 (gmt 0)

10+ Year Member



I would recommend you go the database route. PHP / mySQL is an excellent combination for a database-driven site.

mySQL is extremely fast - you won't notice a decrease in speed if you switch from individual html files to mySQL-driven pages. In fact, it may even be faster, as your server won't have to wade through the 1000's of html files to locate the one you need.

If you're on a windows server you could use ASP in place of PHP. I would recommend you stick with mySQL regardless of platform, however, as it is capable of holding millions of records per database without performance hits like MS Access would. MS Access also has a limit of concurrent users before it starts throwing up. mySQL is also free, unlike some other comparable databases such as MSSQL or Oracle.

Other reasons to use a scripting language like php or asp:

You can use includes on your pages. If all your pages share a similar header, imagine what would happen if you had to make a change to all of them. You'd have to edit each html file separately. But if you're using PHP you can just include a header file; all you'll have to do is edit that one file and all the pages will change.

You can use mod_rewrite [on apache webserver] to dynamically create filenames. What this means is you can make it appear as if you have 1000's of html files, when in fact the server is just renaming something like:

widget.com/webpage.php?databasequery=135

to:

widget.com/title_taken_from_database.html

I'm not an expert by any means - in fact, I was asking the same question as you about 6 months ago. I figured out the basics of php and mySQL within a few days, despite not really even knowing what mySQL was or how to use it.

So the moral of the story is that you should switch to a scripting language and database, and that it won't be hard :) The people around here are very helpful, and you can find answers to most scripting/database questions in Google.

dsz11

3:52 pm on Jan 30, 2005 (gmt 0)

10+ Year Member



>You can use includes on your pages. If all your pages share a similar header,
>imagine what would happen if you had to make a change to all of them. You'd have
>to edit each html file separately. But if you're using PHP you can just include a
>header file; all you'll have to do is edit that one file and all the pages will change.

That is really a great point and I am putting couple of big ++ for using database.

But let mi get back to my original question. Can anybody positively confirm and hopefully explain why the following (1) is faster then (2):

(1)
1.Open page1.php 1.1. page1.php is found almost instantly as the search is within few pages 2.Open PHP 3.Open mySQL 4.Find where is record XYZ in the database 5.Get record XYZ from that location 6. Include the XYZ in the data to be send. 7.Send Data

(2)
1.Open page1000.html 1.1 Find where is file page1000.html in the file system 2. Get File 3. Send Data

As I see it point 4. from variant (1) must be significantly faster than point 1.1 of variant (2) (i.e. designed to work with substantial amounts of data, or the file system to be unable to handle well more then couple of hundred files) to overcompensate the additional steps required in var. (1)

Is this the case or am I missing something?

DSZ

mincklerstraat

4:05 pm on Jan 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I haven't ever worked on a live site with that many pages (would have more than that on my test server with all the scripts installed), so you can take this with a small grain of salt, but I think that adding a database layer to your site would be more likely to slow it down than speed it up. Apache is very fast, and doesn't have problems returning pages quickly even when there are a few hundred 'sites' on the server with many pages per site.

Databases, however, are also very fast, and the difference in speed is likely to be minimal if you use a fast script that doesn't hit your database too hard, and you aren't on a server that's bogged down with too much traffic or other sites that do use processor-intensive scripts.

The database would help you a great deal in adding lots of information to your site fast. If you don't use any database-like features and it's just for page fetching, you'll have one very, very fast site indeed. Maintenance will be much easier - you won't have to run a search/replace on your template for changes, but just add the stuff you need to one file and it'll be there on all your pages. When you add or delete a page, links will be updated without you having to do anything. You'll have to think about your cache policy, though - if you don't, just try to mimic a caching scenario similar to what your server is already running. A good caching script will allow you to do this if it supports E-tags and other cache-relevant headers.

larshus

8:44 am on Feb 2, 2005 (gmt 0)



Maybe I'm losing track of what your asking, but it seems to me that it's a management question. I would never in my right mind try to put 10K files in one folder. I use a database and folders for each case in a lawfirm. There is maybe 20 to 75 files TOPS in each case folder. I query using a "case number", which is also the folder name, and viola I have all the info I need on that case using the same results page for any case.

You may lose time with a database(maybe, I don't know), but if you write a good script to access your database and time it against your current structure, I bet there is only a microsecond of a difference. Too small of a difference to even notice.

Hope that helps

charlier

9:06 am on Feb 2, 2005 (gmt 0)

10+ Year Member



It doesn't really matter unless there is a big problem with the OS. I manage sites on FreeBSD servers that have hundreds of thousands of pages all of which are static pages generated by a database AND they all have banner ads on them which are looked up on the fly and served from another database. So you can do both at the same time and still serve 10 pages a second with no noticable delay. The biggest delay is always the transit time over the internet (50-500 ms or so) not the < 20 ms to grab the page to send. For a really big file system the number of disk access could be a problem but with a large memory (a couple gig) most of the highly requested pages will be cached.