Monday, January 11, 2010

Hello World! - The latest in a long line of things with this name

Of all the "Hello Worlds" I've cranked out as a programmer, this is perhaps the only one where the title has really been apt. Yes, I finally drank the delicious blogger kool-aid and decided to take a leap into blogdom. I guess this means I'll start using words like blogosphere, blogophilic and blogophobic. If I'm really successful, I'll even coin a new term in this blog. Here, let me try... ...It has to be a term that gets zero results when I google it. Ok, got one:

ablogual
An individual who neither blogs nor reads blogs. Not a member of the blogosphere due to either ignorance, lack of interest, a superiority/inferiority complex, or blogophobia. Sample usage: "That dude is totally ablogual. He's never even heard of Joel Spolsky."

Blogging, the sane approach vs. the Asaph approach

The sane approach to starting a new blog is to use one of the many fine blogging software packages freely available on the web so one can focus on writing insightful, witty drivel without getting bogged down in re-inventing the wheel. Being a programmer, I was of course tempted to embark on the largely pointless task of building yet another blogging engine. I actually started down this yak shave thinking it would be only minimally distracting.

Writing a blog engine is the new Hello World.

-- Jeff Atwood

Blogging software, just a bunch of static HTML pages, right?

Like many programming tasks, writing blogging software is deceptively simple at first glance. Being new to blogging, I thought: "How hard could it be?" The truly lazy would just roll a static HTML page for each blog entry. Right? Being slightly less lazy, I made my blogging software in Java and database driven (Are you impressed?). I got the basics for that up and running relatively quickly. After a few evenings of programming I even had a functional back-end admin console. Then it occured to me that user comments would be essential. The people must be heard! But I would have to prevent comment spam somehow. Being selectively pragmatic, I decided I'd be willing to deal with that manually in the near term. One yak shave lead to another and before I knew it, I was spending time writing a Gravatar URL generator and an HTML source code syntax highlighter.

But those were just the features I could think of. The real kickers were the features I didn't know I needed because I was a blogging noob. Turns out, there is a secret XML undercurrent that moves information around the blogosphere. I don't want to miss that ride. Every blog needs an RSS feed (maybe ATOM too), pings (but not spings), pingbacks, trackbacks, linkbacks, refbacks and smackbacks. Ok, some of those are redundant and at least one is made up. But the point is I underestimated the task. That's actually not surprising at all since as a programmer, I routinely do that. If I ever give you an estimate, go ahead and double it.

After about a month, I came to my senses and decided that I should at least bootstrap my blog with an existing free solution while I roll my own. Honestly, the project is shelved indefinitely. I haven't admitted defeat but I've become keenly aware that it's probably a waste of my precious time.

Why re-invent the wheel when open source wheel libraries already exist?

The obvious choice to start is Wordpress. It's mature, solid, scalable, supported by a thriving community, has a bazillion plugins, and it appears to be the leading open source blogging solution. It's written in PHP, which while not my first choice, is a language I'm very comfortable with. I test drove it on my laptop. The install was a breeze. I simply unpacked the archive, created a MySQL database for it, set some easy config parameters and fired it up. Hitting the site in a browser for the first time prompted me to create the database schema and with that the install was pretty much complete. It worked well right away and I was ready to install it on my production server. Only one thing stood in the way. My server runs Tomcat on port 80 and Wordpress runs on Apache. There is no shortage of instructions online for proxying requests from Apache to Tomcat but almost nothing for doing the reverse. The standard advice used to be to run Apache on port 80 for static content and proxy requests for dynamic content to Tomcat running on port 8080. But since Tomcat now serves content at speeds that compete with Apache, I don't think the setup makes sense for my servers. So I set up Apache on port 8080 and set off on the task of getting Tomcat to proxy requests to Wordpress. After writing a simple proxy servlet, I discovered that Wordpress breaks this scheme by detecting that the page was served from a non-canonical URL and issues a redirect. I started digging into the Wordpress internals only to find a total mess of thousand line long non-object-oriented PHP scripts. I tried tweaking the siteurl and home entries in the options table of Wordpress's MySQL database but I couldn't get them quite right. Going with an open source solution was supposed to save time, so it was at this point that I decided to stop short of tweaking Wordpress's ugly PHP code.

The obvious next step was to look for the Java community's answer to Wordpress. This search lead me to Apache Roller. The install was very similar to Wordpress; Set up a MySQL database, deploy the code, edit a configuration file, and proceed the rest of the way with a browser based install. I didn't like having to stick an Apache Roller properties file in Tomcat's lib folder. I think an application's configuration should be in the app's lib folder, not the webserver's lib folder. but after coming this far, I was willing to live with this annoyance. The other thing I noticed about Roller on my laptop is that it seemed sluggish. The web based install actually took over a minute to complete. I tried to just put it out of my mind. After all, Apache Roller is supposedly robust and scalable enough to run Sun Microsystems' blogs. So I went ahead and installed Roller on my 2 production servers which happen to be a couple CentOS virtual machines with a paltry 256 megs of ram each. One of the servers handles all my live web traffic and the other serves as a database server and failover web server. I immediately noticed the load on the primary web server server higher than before. I hadn't even posted a blog entry yet. A couple of days later, I started seeing OutOfMemory Exceptions in Tomcat's logs and getting error reports from users for one of my high traffic sites hosted on the same VM. I hadn't had this problem before installing Roller so I immediately pulled the plug and uninstalled Roller. The memory issues seemed to be immediately and permanently resolved. I imagine that Roller, which appears to include every Jakarta subproject under the sun in its lib folder, needs a lot more horsepower than my dinky CentOS VM or my MacBook Pro could provide. If anyone has anymore insight, please leave a comment.

So with that, the search for a lightweight java based alternative to Apache Roller and Wordpress began. This lead me to Pebble, an open source java blog server supporting all the basic features I was looking for. After the Wordpress/Roller excercise, I was expecting an uphill battle. Unlike the other 2 blogging engines I tried, Pebble doesn't use a MySQL database for its backend. Instead it's based on Lucene which theoretically should make the search function of my blog perform better than Wordpress or Roller which would presumably have to rely on MySQL's rather weak full text search capabilities. I did end up tweaking my Pebble instance a little bit. While customizing robots.txt, I discovered it was being served with a Content-Type: text/html instead of text/plain. It looks like the bug is due to the fact that .txt files are set to be parsed as jsp files and are getting the default jsp Content-Type. I worked around this by adding <jsp:directive.page contentType="text/plain" /> to the top of robots.txt and it solved the problem. I should probably submit a patch for that fix. So I held my breath and installed Pebble on my production servers. Unlike with Roller, they seemed to be unaffected, which is of course a good thing. Finally, I was ready to start blogging! Maybe... Hopefully... Meh. I'm sure something will go wrong...

So I've started a blog. Now what?

So what can you expect from yet another programmer's blog?

  • code
  • tips & tricks
  • marginally witty commentary about programming
  • humerous stories about code gone bad
  • programming related rants
  • book reviews
  • web site reviews
  • warnings about quirks and strange behavior that I encounter in various APIs or software packages
  • shameless self promotion
  • fawning over my favorite technologies
  • bragging about projects I'm working on
  • miscellaneous

Traditionally, Hello World! is supposed to be just a few lines. It should certainly all fit above the fold. But economy of expression is overrated and verbosity is a programmer's friend. This has been a little long but not a bad start. Ok, now that Hello World is finished, it's time to get some real work done.

4 comments:

  1. Welcome to the blogosphere! I have some bad news though. While you were getting your blog software setup everybody stopped blogging and started tweeting. Then while you were writing your hello world post, we entered 2010 and everyone is over tweeting. I'm not sure what's next. Maybe it's podcasts, don't ask me though I'm pretty out of touch with these things.

    ReplyDelete
  2. muzzafukas gotta blog sometimes .. I get it.

    ReplyDelete
  3. Wow, that was a lot of work for one post in three years!

    ReplyDelete
  4. Hey, you need to post some more post! :)

    Thanks for the homebrew formula, it'll be useful.

    ReplyDelete