I'll be honest here from the get-go. Until I really started digging into the guts of the new monitoring engine in SAP BusinessObjects Business Intelligence 4.0 (BI4), I had never even heard of Apache Derby. I thought it was a place where you went to watch horse racing and sip mint juleps, or a place where you went to smash your jalopy into someone else's jalopy. It turns out that Derby is a database that runs entirely inside of a Java Runtime Environment (basically in-memory) and is widely embraced by developers because of ease of use and no install footprint. So is it strange that I've never heard of it before? Not really, when you consider that I am an administrator of enterprise applications and servers, and not a developer.
Thus begins my argument. Apache Derby does not belong in enterprise software. (Please read all the way through, then if you are still moved to disagree, let's discuss in the comments).
First, let me state my case as to why I am concerned. As I said, I did not even know Apache Derby existed before delving into the BI4 monitoring engine, which uses Derby to store the monitoring trend metrics. So I did a little experiment. I logged onto a Windows server running SAP BusinessObjects Enterprise XI 3.1 to see if the Derby files where a part of the previous release of the platform. Lo and behold! They were. But only in two places:
In XI 3.1, Derby is a part of the Business Process BI and dswsbobje web applications. Chalk this up as my learning experience for the day. Apparently these applications haven't overly suffered from the inclusion of Derby (although I could start another argument about the Query as a Web Service - a.k.a. QaaWS- app, but not today).
Now to compare, I went onto a Windows server that has SAP BusinessObjects BI4 SP04 installed on it:
WOW! So someone at SAP who thought using Derby was a good idea in XI 3.1 seems to think it was a REALLY good idea. Notable applications that Derby include the Data Federator Service, Visual Difference engine, Lifecycle Manager, and the Monitoring engine Trending DB. It is clear that Derby is very much a part of the SAP BI4 platform.
So, SAP BI4 aside, the question very generally remains, does Derby belong in any enterprise software?
Out to the inter webs I went looking for any other answers out there to back up my gut hunch. I found three.
Take first, for example, this discussion where Apache Derby performance is characterized as "disappointing".
Second, in this discussion thread, a Derby expert from Oracle defends the use of Derby in enterprise software, but really fails to see he countermanded his own argument.
"> is there any limitations of derby...?
Yes. However, every big enterprise application I know of encounters
database system limitations, and deals with them using the standard
techniques of big enterprise applications: partition and replicate
your data, update it asynchronously, distribute it over multiple machines, etc.
As with all database applications, step 1 is your database design, and
the basic principles are both database-independent and well-established
over decades of experience. So long as you follow those, Derby works well to surprisingly large scales."
Countermanded his own argument! How can we, as enterprise application administrators, partition, replicate, backup, or tune a database we didn't even know existed?
Third, I found this YouTube video of a presentation given by a Derby expert at some conference somewhere, where he flat-out says that Derby is NOT for enterprise applications because of performance. At 12:52 he starts talking about size, "not an enterprise-caliber DB" and then says it plain out that it is only for "Small Business Applications".
Now, I will openly admit that each of these links are several years old. I was hard-pressed to find anything more current.
But does that mean that Derby has matured by leaps and bounds and these same arguments don't hold true? Not in my book. Code bases don't change THAT much over the course of 3 or 4 years. Features get added, sure. But short of a total rewrite, that legacy code is still in there somewhere.
The existence of legacy code explains why we see errors resurface in BI4 that were fixed back in XIr2!
Now, I'm going to have you indulge me while I voice my opinion. Dissenting opinions are welcome in the comments below.
Let me restate my position: Apache Derby does not belong in enterprise software.
1. Developers make super-cool applications and components of enterprise software, but in my experience, application developers make really crummy database architects. Derby is a crutch used by developers to circumvent traditional database design restrictions. Because Derby exists in a local Java Runtime Environment, the app developer can do whatever they want with the database, often without any need to consult someone trained in the art of database design.
2. Derby uses up crucial resources on the enterprise server. Take BI4 as my case-in-point. As a 64-bit application, part of the benefit is that I can now use as much memory as the operating system can see. But my total memory is being leeched by these multiple Derby instances, which each need memory (and disk for storage) to operate. Making matters worse, I, as the enterprise application administrator, have little or no way to control the memory or disk consumption of those Derby instances. I'm totally slave to whatever heap and disk settings were put in place back during the development phases at SAP. I don't even get to specify where the Derby files get written. Whacked much?
3. Derby was not designed with high availability in mind. If the lights go out, or the server crashes, what happens to my data? How robust is Derby to be able to handle fail-over and clustering? In-memory it is, but SAP HANA it's not.
4. Why this huge divergence from the tried and true System Database? As the enterprise administrator, I should get to choose the database platform and have a team of qualified DBA's to manage it, back it up, etc.
5. Evil silos of data exist where they should not. This is something that as practitioners of analytics fight on a daily basis. This is the whole argument for having a data warehouse as the single version of the truth. Now we have to fight them from within the application as well as from without. All data is valuable, even the data being generated by my BI4 system. Audits, industry and regulatory requirements are becoming more stringent year after year, and I want the clearest insight into the operations of my analytics system possible. Evil silos of data make that goal a real challenge.
Time for a little rumination on my part, then I'll stop (I promise). This large increase of use of Derby in the BI4 platform says a few things to me.
First, it screams that the development teams are not talking to one another enough. When I look at all of these little pockets of data, it seems disjointed to me and not clean and unified as I would expect this release to be. A lot of time was spent making the front-end look clean and unified, the same attention should have been paid to the back end. This brings to mind the Steve Jobs biography by Walter Isaacson, where Steve talks about learning design concepts from his Dad and how good design goes all the way through; even to the parts that nobody sees.
Second, BI4 is for analytics folks. We call ourselves analytics folks, because, well shucks, we care about data. None of us likes to see data siloed, dirty, and unmanaged. Since I can't see into those Derby databases, I have to assume the worst. I have to have a System DB and an Auditing Data Store anyway. Why not just continue along that model?
Why this sudden splintering into a dozen different little memory-leeching databases scattered across my platform? This makes the internal data from my BI4 system extremely difficult to analyze. As analytics people, we practice bringing disparate data sources together in order to gain insights. Evil data silos get in the way of those insights, big time.
I'm asking all of these questions because I care. I really like BI4. It is such a huge improvement from previous versions and I like working with it. But this is a disturbing trend that makes me nervous.
My challenge to SAP. By whatever version of BI4 it is that goes into ramp-up next year (October 2013) if it is not 100% Derby-free, let me at least have the option to switch every component that uses it into either the System DB or the Auditing Data Store. Please give me the choice. Derby should not be a part of the BI4 platform. It does not fit in the mix as a sustainable, maintainable part of a high-performing analytics application. It leeches system resources from the server, and creates pockets of isolated, unmanaged data. It is time to go on a Demolition Derby.
Ok, I know I've made some strong (but hopefully constructive) criticism here. Let's discuss!
Please vote it up if you agree with me.