Our system has been under an extensive rewrite operation the past two weeks. I’ve been sucked in 10 to 15 hours at a time (17 last Saturday) and can’t sleep because the entire project is so engrossing. Needless to say there have been incredible goals reached through hard work and severe perserverence; I believe this is the closest I have come to quitting programming and becoming a lumber jack due to the unimaginable strain on my life. L has been wondering where I have gone and why my office door has been shut so much, and poor Biggles hasn’t seen the park but three or four days in the past two weeks…one visit during which I was bit by a dog on the arm and had to go home.
The dog bite inspired the idea of a post because it was a situation of looking at what was done and then fixing the gross amount of errors made. It was similar to my current project…the errors were not made with malice or laziness, but out of pure ignorance (mind you a portion of the 17 hours last Saturday was spent fixing errors of pure stupidity). When it came to this project, and also the dog bite, I learned a lot in hindsight.
Don’t Be an Idiot
If you don’t know what you’re doing, ask someone else for assistance so that you can learn. Don’t presume that your technical abilities are enough to conquer a method that you have never seen before. Reading a book for a few days about a concept before trying to work with it will save you twice as many days implementing it.
Benchmarking and profiling led to the discovery of the largest flaw in our system which had gone unnoticed in my initial rewrites due to the significance of the function and the lack of doubt that it was doing anything harmful. The end result being that a previous developer didn’t understand the basics of OOP and during a simple load of the page created multiple instances of the entire site (yes, entire. hence the bold, italics and underline). Sometimes up to 180 instances, requiring unexcusable amounts of memory and processing times. And the entire time I presumed that the problem lay within the database and that I was just an idiot when it came to administering MySQL.
An entire day was spent rewriting the base of the system and by morning the site was performing 10x faster and requiring 75% less memory for each load. I cried. Then slept for 12 hours.
Benchmarking & Profiling
Tracking time was the extent of profiling during the first build of this application, and it wasn’t even a profiling framework but rather echo statements inserted directly in the code. The system we have in place now keeps a thorough stack trace, memory usage by function call and class instantiation, system calls, database query tracker and analyzer, scope of variables and, yes, time tracking (as useless as it is anymore). Each page generation outputs a report that is human readable and separated from the content itself.
It is not hard to get this into your system. xdebug outputs some sexy stuff that can be read with KCacheGrind (or the windows port WinCacheGrand). After trying it I opted to use the system already implemented into our framework with a few minor tweaks to control the output and force analyzation on top of the core manipulations we enforce on some of our display functions.
Caching
Caching is amazing.
Our system initially had no cache, then right before being released from the previous developers it was given one point of cache. And all it ended up being was a database query, which our MySQL database was already caching. It was an embarrassing reality and very much so a problem that I had helped to maintain. During an initial rewrite I then moved the cache object to hold the content of the preprocessed information on the page and let memcache call it right before display (still with some PHP processing required).
The system has gone through an extreme (EXTREME SCRIPTING!!!) overhaul the last four days. The results are magnificent. Thoughts and ideas on caching by level:
Top Level: Page Views
Files. Use files. I don’t care if you write your own method to do it or use a method that a hundred other people have written, but figure it out. Make sure that the system locks files while being read/written, that it uses an efficient manner for determining expiration of cache objects, and that caching new objects is possible through a front-end user (unless you want to develop a back-end method *drool*). It should be less than 50 lines and quick to write.
Our system now stores files for each COMPLETED page view. No unprocessed PHP. We have a few locations on our pages that require the content to be generated on page load though, so I developed a quick interceptor for the cache object and inserted it into the caching system we are using in order to process and insert dynamic content. All of this without using any database connections either. The system needs to be as quick and dirty as possible so that your pages come up immediately.
Data Level: Queries
Memcache. I have grown a love-hate relationship with this software. Due to the latency issues I resolved to continue its use, and probably will in the future, but I would never choose to rely on it as a caching system in full.
Database query results are no longer stored with memcache. Instead we let the data be manipulated into the object as necessary, then memcache-ed with fairly long life spans on the webservers themselves. This prevents, as mentioned before, the latency of retrieving the data from the DB servers then process time of building the objects. It’s easy to install. Easy to configure. Use it. Don’t abuse it. I really dont’ want to talk about it anymore.
Low Level: PHP
PHP is wonderful. I do love it as a scripting language. I do hope to one day look back and laugh as to why I stagnated so long with it. It’s kind of fast, reliable, and for the most part pretty versatile. But there’s one more thing you can do with it to beef it up. I had heard this little gem mentioned in the past but had yet to implement it.
eAccelerator is nuts. The amount of performance increase we saw from it was incredible. On their home page they claim to increase the speed 1-10 times, and they were absolutely correct. Not only does it save the script in compiled form but it stores it in memory and allows multiple instances to access it (in our case we have up to 10 instances per server…times that by the amount of simultaneous users). AND the damn thing optimizes the code. So far it has been my favorite addition because of the simplicity of install and how absolutely effective it was at speeding things up. On my personal development machine it took a page from 1.2 seconds down to 0.7 seconds execution time (mind you this is on a laptop and the power is no where comparable to a web server, so you won’t get results that outrageous in a “real” development environment).
Final thoughts on caching
Take the time to read what everyone else has done. Use your connections to call and ask fellow developers/sysadmins why they did it, what kept them from using the other methods, and which avenues they regretted. Whether you have a system that needs to scale or not, I highly suggest practicing it now. In case your page gets hn’ed, reddited or dugg…or any other excuse you can think of. It’s an amazing tool and research has shown that responsiveness of a page is incredibly important to the user. For anything else it will make your page lightning-flash-fast.
Read Code
This has been something that has slipped past me from the beginning. I have seen it recommended on blogs, but never thought it to be productive. Read the code other people write. People better than you. Read every language you can. Try and figure it out if you’ve never seen it. Find multiple libraries that do the same thing and take a half hour to analyze them. Learn the tricks people have discovered. Through doing so you’ll watch your code become more elegant immediately, then slowly become more refined. I am still amazed at how a simple half hour to hour a day looking through open source projects related to my task can be so incredibly helpful to the skills I am building.
Breathe
In the past I have made rather poor decisions when faced with stress or challenges I could not (relatively) quickly solve. Stress would mean that it was time to have a drink and come back to it tomorrow. Which led to two more drinks and often a debauchery. Hurdles too high led to frustration which led to the worst thing of all, accepting something below my standards as a solution. My boss has been helping me a lot with the stress, though I’m not sure if he is doing so on purpose. He constantly reminds me to get up, take a deep breath, and find something fun to do for a short while, then reapproach the situation. When your hours start racking up to 60 and 80 a week it is hard to break away from the habit (the keyboard), but it’s incredibly important to do so in order to maintain sanity.
As for standards — I have never been so proud of a system. I believe everyone I know hears me talk about it enough to know it’s my baby. Probably too much. But I’ve also never relied so heavily on a system. It is currently my life line. My first real world situation where it’s make-it or break-it. On and on. I think that has played the largest role in me giving this product as much as I can, at the highest quality possible. Maybe you have to love it to treat it right?
Oh. And alcohol blows for productivity. My liver over the last two weeks has gone through such an awesome recouperation period I think it’s strong as steel now.
Final Thoughts
We post the new system in a week (hopefully we get the bugs out by then). We will see if all this mumbo-jumbo really means a damn thing.
Next blog post: my awesome linux based house-wide wireless media center idea, and how far I’ve implemented (it rocks).




Greg | 04-Sep-08 at 6:47 am | Permalink
Thanks for that caching breakdown. I was wondering what you were doing for that. I’ve been wanting to mess with caching for years, but never had a good reason. You’ve convinced me to just do it anyways.