| 9:49 pm on Jul 11, 2013 (gmt 0)|
Did a little more searching and I guess the answer for me is "no". I'll probably give caching another try.
My problem is that I have 1.5 million versions of a page that used to take 9 secs to build and server, but I've added almost twice the data and with Bing and Googlebot crawling all those pages night and day, the serving is getting bogged down and doubling the serve time. During lulls in the crawling, the longer pages load as fast as the shorted pages did.
I really don't want the crawlers to slow down (it's been taking Gbot about 3 months to cycle through the site, before Bing joined the party last week, anyway), can't afford to add server capacity. The pages don't change too often. Each page may get viewed once a week and have content changed by users once a month.
| 1:34 am on Jul 12, 2013 (gmt 0)|
Correct about Ajax and caching, 9 secs is a long time, anything you can do to reduce it?
Page load time will effect page ranking and users opinion of the site.
| 1:49 am on Jul 12, 2013 (gmt 0)|
I've already split the 120 million records into 239 tables to get them into 500k records ea. Each page is an average 80 (0-300 range) record subset of the data in one of those tables. I guess caching is my best bet. Getting the intial cache saved while serving both bots and humans is still a big problem, probably take 3 months to get 80% cached. Maybe revert back to old version and run caching scripts at night.
| 5:09 am on Jul 12, 2013 (gmt 0)|
I take it the slow process it selecting the subset of the 120 million to go into each page ?
Wondering if there is scope to change table indexing to make selection faster. Is maintaining direct index between page(userId) and page records possible ?
Have you some profile tool to find bottleneck ?
| 1:03 pm on Jul 12, 2013 (gmt 0)|
Making AJAX Applications Crawlable:
| 3:11 pm on Jul 12, 2013 (gmt 0)|
Yeah, I saw that article, way too complicated for me.
Fortunately, my home-built caching system is working better than it did in earlier tests (or at least as well, back when I was trying to improve 9 sec load times, not 25 secs). The sites bogs down occasionally, but I'm not sure why it's not always bogged down, with GBot and BBot churning away. It should take about 3 months for most of the pages to be added to the cache and I feared that improvement wouldn't be significant for a few weeks at least.
The caching system introduces so much complexity, and I had forgotten to consider that when I view a page (which displays additional admin links/tools not meant for the general public) *my* view of the page would be cached and then available to the public.
I will, however, try putting some of the auxilliary info under Ajax control. These are essentially summary/stat views of the main content, heavy use of lookup tables or at least moderate use of mulitple queries. I might be able to Ajax a tiny portion of each record that *is* in the bulky main content. Most of this content doesn't need to be seen by the crawlers.
| 3:58 pm on Jul 12, 2013 (gmt 0)|
Sounds like you have an indexing issue. What database are you running?
| 4:12 pm on Jul 12, 2013 (gmt 0)|
mySQL. Pretty sure my indexes are ok. One can always tweak, but I'm not aware of any major issues with my indexes.
| 4:32 pm on Jul 12, 2013 (gmt 0)|
|Pretty sure my indexes are ok. |
have you tried using the EXPLAIN SELECT syntax for your queries so you can be "sure" your indexes are okay?
| 5:12 pm on Jul 12, 2013 (gmt 0)|
I have tables with tens of millions of records which I perform advanced calculations on -- none of which take longer than half a second to complete.
9 seconds for 500k records -- something's definitely amiss. The suggestion to run EXPLAIN SELECT may provide useful. That said -- even if the indexes may seem okay it may still be possible to improve them.
| 4:02 am on Jul 16, 2013 (gmt 0)|
Thanks so much for your comment, it made me re-evaluate some assumptions. Turns out the biggest problem was not the size of the table used in the main query, but the additional lookups in a related table performed on each record returned. I have a couple of options on that related data (Ajax it, or de-normalize it and pull it into an array with one query at the beginning of the process, reducing a couple of hundred queries down to one). I've been running with it totally disabled for over an hour now and am hoping the noticable improvement in CTR is directly related to the faster load times.
As for EXPLAIN SELECT, I wasn't familiar with it, and the only thing it turned up was one non-indexed columns in the WHERE, for a 1 char VARCHAR flag. Removing that WHERE reference didn't significantly affect load speed. Anyone have any thoughts about the value of indexing such a critter or not?
| 6:55 pm on Jul 20, 2013 (gmt 0)|
|de-normalize it and pull it into an array with one query at the beginning of the process, reducing a couple of hundred queries down to one |
^^^ Gets my vote... I run a couple DB driven sites and want speed. I couldn't care less about normalization. I can back up more often and just revert if there's ever an "integrity issue" due to using a single big table and all the info right there to grab with a single SELECT.
IMO: Normalization is great for a bank or somewhere like that where it's absolutely critical to "be right on every time", but for serving a website all it did for me when I tried it was turn a single "grab the info and dump it on the page" SELECT into about 10 related SELECTS that bogged things down and gave me a headache.