| This 94 message thread spans 4 pages: 94 (  2 3 4 ) > > || |
|The Technical Aspects of running a One Person High Traffic site.|
One person's experience.
Technical Discussion continued from here [webmasterworld.com].
I'm also more interested in the technical aspects of your server performance Markus.
Any threads you wanted to start on that subject I'm sure would be well supported....
[edited by: Woz at 10:46 pm (utc) on Mar. 17, 2006]
[edited by: tedster at 4:39 pm (utc) on May 28, 2007]
|it was a lot harder on the technical side of things. I wrote every line of code on the site, and i custom built every single server. Man did i ever learn a lot about running high performance sites. |
Strictly from a technical perspective, I think it would be pretty darn interesting to learn about the kind of performance issues that you had to deal with as you ramped up your traffic. I tend to think most of the performance issues would have been caused by tasks that are mostly CPU-bound and not network-centric. I think a discussion on technical aspects of how to scale the infrastructure using home-brewed servers would make very interesting bedtime reading. A separate thread, maybe :)
Congratulations on your success, Markus.
|Markus, it seems that you did a great work saving on server costs. If I'm not wrong, you are using mainly good old static html files on your high traffic site, renouncing to dynamic bells and whistles to reduce server load and increase speed. |
Oh, God, how can you tell ...
well, here it is, the site is TOTALLY dynamic, it happens that he is using a technique known as URL rewrite. And it is using Asp.Net by the way.
I have 4 servers.
1. DB server
2. Web server, handles 1 million pageviews an hour at peak. No static pages at all, way to slow. All pages are Gzipped on the fly.
3. Mail Server. Handles 1 million emails/day and also has a webserver that handles a Instant messager. That translates to 4-5 million polling pageviews/hour at peak.
4. Image server, Like all major sites it serves images to a massive content distribution system/cache.
5. Outbound traffic is 70 to 100mb/sec If it was uncompressed it would probably run at 140mb/sec
I have a sort of a AI, that i built that handles the site for me. When you've got 2 years of steady growth you can build something super fancy to automate problems as they come up. I don't have a single competitor with under 200 servers and 200 staff.
Thanks Markus - useful info.
If you don't mind me asking, what is the spec on the box running the DB server, and what backend is the DB?
|Image server, Like all major sites it serves images to a massive content distribution system/cache. |
This is something I've been looking into recently, but decent information about dedicated image servers is thin on the ground. Is it running Apache/Squid, or something else?
I'll risk to guess that markus is mostly on Microsoft platform...
I think he is becase he said it was run from his home server then moved to a dedicated server. .NET + Home, looks like an microsoft enviroment.
Still can not believe MS can handle that much dymamic pages.
Still can not believe MS can handle that much dymamic pages.
Guy, Asp.Net flies! (the old asp was really bad).
|Markus007 wrote: |
No static pages at all
That's impressive. Usually dynamic pages are a great load for servers.
|Fischermx wrote: |
it is using Asp.Net
Yes, with aspx file format for searches, etc.
Url rewrite seems to be working well to get a good number of Markus pages indexed by Google (almost a million and half from his main site).
I thought he would not use it (similar -in Asp.Net- to Apache's mod_rewrite), to avoid bots slurping too fast dynamic pages masked as static *.htm. I was wrong, and it's indeed excellent that he can solve all these dynamic performance problems. Well, in fact I'm on Apache/Linux, with no experience on Microsoft's Asp.Net.
I hope Markus can say us more details about how he optimized and solved the technical issues because, as he said, his site really seems to use less resources than usual for that high traffic.
|I ran my site off my home computer, spent 1 grand on adwords to start it off, 2 grand on link buying, and the rest on a server and a few ads on high traffic cheap sites |
This is actually some key advice. The problem with many site owners is that they do not want to spend ANY money. This niche was thought to be saturated back in 2000. Look what pumping a few $ into the site in 2003 did.
Markus- If you discuss the techical side some more can you answer what was the site before .Net? Was it ASP?
I would also be interested in knowing what database is being used to support all the traffic.
MySQL could handle that site no problemo, with the right hardware.
I had a hybrid of asp and asp.net for a while. The more i learned asp.net the more pages i ported over and all new stuff was in .net.. Took me a while
In public filings by match, americansingles, myspace, friendster etc they all have the same pattern. They all need huge numbers of servers to accomplish anything.
Taken from a public filing of matchnet PLC Jan 2003, when they had sub 9 million pageviews/day.
"we currently own approximately 300 web servers, 40 database servers and 12 file servers"
The fact that my site has 10 to 20 times the technical complexity of their sites due to location based searches on every pageview and 14 million pageviews a day gives you an idea of what i've accomplished. Not to mentional all those above mentioned companies have 200 to 600 employees each.
At the end of the day, its all about algorithms and AI and not about platforms/languages..
Thanks for posting the server list, very enlightening.
But just how fancy are those servers, are we talking hundred thousand dollar machines here?
I ask because I looked up wikipedia, and it looks like they are running about 7x your output, but it takes them 171 servers. Seems a wiki page request should be less intensive than most of your pages, so either you have some mega expensive servers or your code is magic to run on 4 servers.
WebmasterWorld is a good example it gets a ton of traffic and is on one server. I know Brett has mentioned what it is before but I canít find it. I did see that he got 10 to 20 million page views a day just in bots back before the changes [webmasterworld.com]. There are forums out there that have less traffic and use many servers. Markus is right it is all about putting thought into it. A big company does not think like that. They just throw money at everything. They hire programmers that just know what they were taught in school and the Propaganda MS spews out. No one teaches efficiency.
I don't doubt the money that is made but the technical claims seem a bit sketchy to me... especially running the whole operation single-handedly for only 1 hour each day...
350 db inserts each second is over 30,000,000 each day, and that doesn't include retrieving the content for the massive amounts of visitors, although caching would no doubt come into play, that's still a ton of activity... sql is certainly capable, but it would take a significant amount of hardware and some knowledgable dbas for upkeep...but all the servers built by you by hand?
If you have found such a great architecture I would say you should drop adsense and make more than $1 million each DAY consulting for google/yahoo/ebay/hotmail/etc. The company I work for (in the top 10 for traffic) would definitely pay top dollar, as the database always causes the most problems and running on hugely expensive hardware, it would be nice to consolidate it to one box that I could buy from fry's. :)
Can you sticky me your site?...I would be interested in seeing it.
A couple of questions, if they are OK to ask and you don't mind answering them, please...
What type of database setup do you use (i.e. database name, level of normalization, amount/percentage of "logic" in database, etc.)?
When you mention AI, what kinds of things are you talking about?
What's your background/education in IT? A CS/EE degree, self taught, etc? What sources do you find most useful for learning/keeping up technically?
Again, I'm not sure you have time to or feel comfortable sharing the above, so if you prefer not to, I understand.
It's just SO NICE to know that well applied theory and reason has proved to work so well in this age of fad, fashion and "huddle with the herd" IT infrustructure and practices. You have no idea of how great it is to read your descriptions of your set up and approach to the technical side of things!
Thanks again and best wishes to you and yours,
What exaclty is an image server?
Just a webserver with an special task?
Or is it another protocol? a webservice?
I know its hard to believe, whenever i'm exposed to new things it takes me a while to believe :)
My db is quad dual core opteron with 32GB/ram. Unlike the other sites i don't have a bunch of 8 way servers. I've got the db CPU & disk bound! (you don't see that often)
My other servers are just ~$3k machines with 2 CPU's. Nothing special. Image server is just a server deadicated to storing and serving images.
All the stuff on optimization i learned on my own. I spent 2 months rewriting a billing system for one of the worlds largest companies. They had a 32 CPU db maxed... I got the run time from 45 days down to 2 minutes. I had to deal with impossible fun stuff like select statements causing deadlocks because there was so much traffic.
From 1990 to 1995 some professors used several super computers and hundreds of computers to find a string of prime numbers. I spent half a year writing a program that made the search thousands of times faster by storing numbers in multipul dimensions and then scanning all dimensions at the same time via a wheel factorization sieve. I found a bigger string of numbers in under 2 weeks on a single machine. Its all public.
Those 2 things taught me how to optimize.... I don't want to get into 2 many tech details because competitors would give anything to know what i'm doing. A few have even asked to lisence my software.
As for wikipedia & forums serving, those are extremely trival you could do 12-60 million pageviews a day off nearly any server as long as you have the drives to handle IO, or some kind of html cache. Also last i read Webmasterworld had 10 or so servers?
|Also last i read Webmasterworld had 10 or so servers? |
No, only one (I think it's a quad proc with 8gig RAM).
I know that efficiency is also always top of Brett's agenda.
I appreciated you wouldn't want to get into high detail specifics, but for the average webmaster without your level of experience in software/DB optimisation, what would advise as the key areas to examine? A list would be incredibly useful ;-)
Is the image server based on something open-source, or did you custom write one?
|Markus007 said [webmasterworld.com]: |
I wrote every line of code on the site, and i custom built every single server.
It's going to be difficult to do something like this. ;)
|From 1990 to 1995 some professors used several super computers and hundreds of computers to find a string of prime numbers. I spent half a year writing a program that made the search thousands of times faster by storing numbers in multipul dimensions and then scanning all dimensions at the same time via a wheel factorization sieve. I found a bigger string of numbers in under 2 weeks on a single machine. Its all public. |
Yes, according to mathematical sites, Markus was in the team that discovered the largest known arithmetic progression of primes, a few years ago.
Yes, throwing CPU or extra servers at a busy site is likely to make it slower and more complex (like putting extra managers on a late project).
Definitely if you can find where the bottlenecks are and optimise those away, and re-engineer your algorithms to be:
(3) low O() ie O(n-log-n) is better than O(n^2)
(4) low or stable concurrent/peak memory
then you have a chance to serve lots of users off cheap hardware.
Ideally your system should be, like Markus' DB, nearly CPU and I/O bound, which means you are getting value from all of your hardware. Heavy cacheing in lots of memory is one way to push towards being CPU and network-I/O bound for example. You'll generally find that being disc-I/O-bound doesn't take much effort(!), so reduce or batch up your writes, mirror/stripe your discs, spread load out across several filesystems/partitions/spindles, and don't allow your system to start paging which kills performance.
My old-and-slow 400MHz Sun T1 can serve several hundred concurrent users if pushed. (A few years ago, while doing some consultancy for a search company, we discovered that Linux/NT would simply collapse under equivalent strain, BTW, so clearly you have to pick an OS that is stable under heavy load. Linux is up to the job these days.)
PS. Looking at the site/app problems from a different perspective can really bring results. Not as impressive as Markus by any means, but my best end-to-end optimisation was 5000:1 for an investment bank so that they could value every instrument of a particular kind in the market every couple of hours rather than one per *day*! A more reasonable speed-up to hope for after a thorough re-analysis is 2:1 to 10:1, which means that you just don't need lots of extra servers and the complexity that goes with them!
>Outbound traffic is 70 to 100mb/sec
Id be interested to know what kind of routers and NIC's you use to cope with that level of thru-put, where are the bottlenecks?
So it seems to me that if anybody else did the same thing you did but had to hire out the db stuff they would have lost money or not made much. It seems like your whole thread about making $1 mil is very misleading. You did not make money with a website you just paid yourself a lot of money for being good with db's. Anyone else would have had to invest a ton of money when the traffic went up. You did something that only a few people in the world could do. Of course a lot of these guys get excited when they make $100 a day.
Thanks for sharing, Markus! I fully understand you not wanting to share more. So please don't, for your sake. No reason to risk your golden goose unless you decide to make sharing your "better way" your new career/avocation. Enjoy the fruits of your labors and thoughts, man.
Just can't say how nice it is to read what you have shared and done; just to know that at least one shining example of intelligent and reasonably approaching and solving a large problem of scale exist -- contrasted to the usual so called "best practices" that make me shake my head at work.
BTW, a MAJOR tip of the hat to you, Markus for this: "I spent half a year writing a program that made the search thousands of times faster". It says a lot about you. And I'll bet you enjoyed it and found it quite satisfying, too. Congratulations, man.
I'll stop now. It's just SO NICE to "meet" (even online) a fellow soul who uses their BRAIN.
Way to go, Markus!
While I'm going to kind of agree with you in terms of markus being 1 in a million, I'll have to disagree with the misleading aspect of it.
I am 100% positive that you could have come close or met markus' setup spending a very small fraction of his claim of 1 million gross (or net).
With that said, one thing that folks need to understand upfront, and here I agree with ogletree, is that markus is one of a kind.
My web site is 10 years old. I have a rather diverse set of skills and graduate level education to include programming, data mining, networking etc. Even so, even with my background, I recognize that what he figured out is unique. You heard about the guy who invented BitTorrent? This is on that level.
Using painting as an example, while I would be a competent painter, markus is Rembrandt.
Not to take anything away from Marcus, but I disagree. You can hire people who are good at this stuff to do all this for you, for relatively small amounts of money. Again, not to take anything away from Marcus but what he is doing is not revolutionary, trust me.
All I'm saying is don't let that hold you back from considering "big" projects. In fact, do not worry about scalability at all when you are starting a project. It's a waste of time and money, because there's a good chance your project will flop. If it takes off you can always make it scale later ... using profits to pay for it rather than risking your own money. Of course, if your revenue is not scaling at least as fast as your user base/resource needs, then you have a bigger problem ...
On top of this, a site doing that many pageviews should be making more money. Period. I can think of 10 ways off the top of my head to significantly increase revenues, without disturbing the "anti" culture of the site. There's no rush for Marcus to do this obviously, but had you been doing it from day 1 you'd have more revenues to fund expansion.
His money was made after he did the fancy server stuff. Anyone else that tried the same thing would have hit a road block because of the huge expense to get past the spike in traffic. Most people would not even know what they needed so they would have to hire somebody to tell them what to do. Most the time that person makes more money by setting up a complex system. In my experience consultants will not help you do this as efficiently and cheaply as possible. The misleading part is that anybody can do this. That is not true you can only do this if you have access to money or can code like him. Even if I get a huge spike and make a lot of money with adsense I will not see that adsense money for over a month and almost two months if it starts at the begining of the month. If you do not have the money right when the traffic spikes your dead and you lose that momentum. Most sites have a hard time with the dig affect let alone a sustained high level of traffic. If the traffic comes at a slow steady pace nobody would start investing in a big infrastructure until it was too late. I guess if you were real frugal and saved your money as you grew you might be able to catch it in time but you would be lucky to get half of what he did. It is hard to justifiy getting ready for that kind of traffic based on your current growth when very few people get to that level. That is what the 90's was about and most of them went out of business when it never happened.
No doubt, those with only $250 to start a project are at a serious disadvantage no matter what they are doing. And for many of them, as soon as they start making a tiny bit of money they pull it right out to pay their overdue bills, etc. Those who have "it" figure out how to do what needs to be done, and never give up.
| This 94 message thread spans 4 pages: 94 (  2 3 4 ) > > |