Not long ago I had a chance to visit the big data center at 365 Main Street in San Francisco. I was invited by friends to help them install the first servers for their startup, which is still in stealth mode. The data center was enormous, though my friends occupied only a small part of one rack not far from the Oakland Raiders and one floor up from Bebo, the social network bought not long ago by AOL. Bebo is a big hit in the UK and I found it odd that all those British profiles are hosted in San Francisco, eight time zones away.
We installed the servers — three little boxes and one big one — then fired them up. In moments the site was live, though with only a few alpha clients. The application, which I am sworn not to mention yet, is clever, but not particularly resource intensive. It’s just like any other web site only a little different in several good ways. The three little web servers just sat there blinking, doing their jobs. But the bigger box (a 3u, versus the 1u web servers, if you are keeping track) was wailing right from the moment of booting. Inside were three 15,000-rpm drives, a bunch of processor cores, and a ton of RAM — all of it thrashing away despite the small load. What was going on?
What was going on was the twilight of enterprise application development as we know it today. That 3u box was a database server that sits behind the three little web servers and makes this new enterprise work. And work hard, apparently, for all the drive thrashing and lights blinking.
We’re at an interesting point in the development of computer technology. Processors, having been failed somewhat by Moore’s Law in their attempt to become more powerful by widening data paths and raising clock speeds alone, have now resumed or even accelerated their performance growth by replacing one processor core with 2, 4, 8, and eventually hundreds of cores, most of them not really needed.
Processing power isn’t what binds enterprise or Internet applications today. I/O and disk access do that. Servers have one or two gigabit Ethernet connections, each of which could be easily saturated by an old Pentium 4. It’s the pipe that limits us, not our ability to pump bits through that pipe. Thanks to the gamers, I suppose, and to a surreal and not particularly useful competition between Intel and AMD the main server CPUs are barely sweating even though they are running the core business logic of the application. It’s the database server with its disk drives that is working so hard, grabbing data to feed the web servers seemingly just in time. But don’t blame the hardware here or even the disk drives — blame the database.
We’re at the apex of SQL database development. It’s 1890 and we make the best darned database buggy whips on Earth.
There is a better way to handle large volumes of data and that better way has been established, not surprisingly, by Google with its BigTable semi-structured database that essentially caches the entire Internet. HBase from Hadoop is the Open Source version of BigTable and both are rapidly making old SQL databases like Oracle and DB2 obsolete for certain users.
Amazon.com runs on an Oracle database, but one that was extended and optimized at a cost of more than $150 million. Amazon probably represents the most that one can do with SQL in terms of scalability. Anything bigger requires a completely new approach like BigTable.
Or maybe it isn’t so new at all. I recall something very analogous to BigTable during the network operating system wars of the 1980s. Microsoft had a couple dozen OEMs working on network operating systems based on the hierarchical file system of DOS 2.0 (Paul Allen’s last technical contribution to Microsoft). While a hierarchical file system may have made some sense for a workstation it made little to no sense for a server accessed by dozens of workstations in the view of the programmers at Novell, where Netware was being born at the time. Those guys ignored the hierarchy and wrote the entire File Allocation Table for each drive to memory as a single flat file called an Indexed Turbo FAT. Where the DOS-based network operating systems had to search the disk for files, Netware had the entire index loaded in memory and instantly knew where the target data could be found. The system was easily 100 times as fast. BigTable takes this a step further, I suppose, by ignoring the distinction between index and data, dramatically expanding the memory footprint but, at the same time, completely eliminating a retrieval step.
An irony of BigTable and Indexed Turbo FATs is that both Google and Novell were pretty upfront about what they were doing and why, yet competitors have remained bound to lower performing technologies because, well just because.
Which brings us back once again to Oracle buying Sun, a deal that has continued to bug me because it didn’t make sense… UNTIL I thought about it in terms of the scalability of SQL architectures and market positioning.
Right now almost every web application has an Apache server fronting a database box running MySQL or its closed source equivalent like Oracle, DB2, or SQL Server. The data bottleneck in all those applications is the SQL box, which is generally doing a very simple job in a very complex manner that made total sense for minicomputers in 1975 but doesn’t make as much sense today. Five years from now the situation will be very different with HBase running everywhere, the dedicated SQL box eliminated completely, and the database shared across redundant web servers like a micro-Google.
Where does this leave Oracle?
It leaves Oracle bleeding its big stupid corporate customers for another decade but eventually losing both the bottom half of the market and the very top where applications scale to tens of thousands of servers.
Part of the distinction here is between running a mobile phone billing system in one case and Facebook in another. In the mobile phone example you’d better get all those minutes or money will be lost. But in the Facebook example reality is more approximate and if an update propagates slower than expected, well big deal, so you missed Little Johnny’s birthday pictures for an extra 20 seconds. There are even business software cases where this philosophy applies. Progressive Insurance, for example, is always ready to give you a comparison price quote for auto insurance not because they can generate that quote (and the price quotes of their major competitors) on the fly, but because THEY GENERATE A SPECULATIVE PRICE QUOTE FOR EVERY CAR IN AMERICA EVERY NIGHT. They don’t generate a quote when you call, they just access it because it is already done.
So Oracle keeps the mobile phone company as a customer but doesn’t keep Progressive in this example. And in the long run there’s enough data redundancy built into the loosey-goosey HBase model that it becomes just as reliable as the more rigorous SQL model that it is inexorably replacing. That’s when Oracle loses the mobile phone company, too.
Larry Ellison won’t like that.
So what’s to be done? Buy Sun. Get into the database appliance business. Start selling highly-tuned database appliances that achieve the simultaneous goals of vertical integration (making profit on the hardware as well as the software), obfuscation (keeping the customers out of the lower-level code by encasing it in an appliance), and increased overall performance (putting off the inevitable loss of market dominance for another three years through a hardware tour du force).
IBM, as the other big SQL company, doesn’t really share Oracle’s problem, because IBM makes money from the hardware already. If DB2 gives way to something like HBase, IBM will run HBase on its premium iron — a luxury Oracle can’t share without buying Sun.
As hardware gets cheaper we extend performance by distributing software across more and more machines. But that distribution in itself undermines the lucrative software licensing system. So we introduce a new level of abstraction — the database appliance. Prices will go up a little while performance will go up a lot. Customers will think they are getting more for their money and they will be. But the ultimate comparison that has been at least postponed is between paid and free, where free always wins in the end.
And THAT’s why Oracle NEEDS Sun — to extend its current run by another three years, buying Larry time to write an Act II for his company.
Bob — there’s a piece yet to be addressed.
If flat-table HBase appliances are the wave of the future, what will that mean?
If we’re removing the current limiting variable (bottleneck) from the internet, what application or service will that liberate from bandwidth-chokedom?
Sure, it will make it easier for current businesses and models to scale a little better, but what new convenience will this presage?
I’ve been talking to a couple of folks about their experience with Hadoop (HBase/BigTable) and they feel like it’s not appropriate at all unless you have huge data sets. Mobile phone billing and Facebook are two very large data set users. I’m looking at databases and soa and cloud from an enterprise IT perspective and trying to understand where things fit and what the options are.
If my contacts are right and HBase/BigTable is only useful for very large data sets, then only a small portion of our business would be able to use it. We just don’t have a large volume of data or volume of transactions for any one thing. We have lots of little or mid-size things, though.
I don’t know if Hadoop can do this, but I recall reading somewhere that folks are deploying MySQL and app server instances into the cloud (Amazon? EC2 not bigtable style? not sure), which is a little confusing if it’s a database on top of a (cloud) database. I’ll keep trying to figure it out and do more research, but so far it sounds like there’s always going to be a need for SQL servers for SMBs or Enterprises that don’t have tons of customers or don’t produce tons of things. If my contacts are wrong and Hadoop is more generally applicable, I’ll be looking at it much more closely.
I agree with what you said Jerry.
Most people have huge amounts of small transactions not large amounts of large SQL transactions.
There umpteen companies ( Neteeza, Quantisense, Data Wh apps ) that have solved the idea of large data crunching for the enterprise.
Bob has written about this before. There are other ideas for getting data for apps that are “more appropriate” then a SQL DB but its not going away in the enterprise at all.
And SMBs make the economy go round. They make the googles and apples money.
I do not agree with Gerry.
When to use a MapReduce based, massively parallel data management solution like HBase? Any application where realtime querying of Big Data is required.
Small, medium and large enterprises generate Big Data all the time; often it is discarded or retained on short GFS cycles because it is not cost effective to store, analyse and report on nor to query.
* Hospital records
* Supermarket purchases
* Weather station reports
* Access logs (eg. corporate web proxy, ISP clickstream etc.)
* Banking related transactions
Try running an online query for a specific transaction against your local 7-11’s nightly master cash register report, for example.
Bob, Interesting article. Databases are pretty much at the core of everything and as you state, the technology of DB is pretty “old.” It will definitely be interesting to see what develops over the next few years. Will hosting companies continue offering MySQL or go to HBase?
Stan
Help people in need by donating at https://www.giftcardsfordinner.com
Donate and help spread the word!
Found a good presentation on DocStoc (similar to SlideShare) that seems to sum up the things I was wondering about.
https://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS
Looks like at the time of the presentation there was definitely reasons to go one way or another. Some of the other stuff I’ve been googling says that there may be some changes being implemented this year to make it more friendly to additional usage scenarios.
When SQL (and relational databases in general) first got popular, the idea was that they might not be FASTER, but they were easier to work with, less error prone. Three things ruined that:
First, competition between Oracle, IBM and Microsoft (among others) led to enormous feature creep, making the language itself much more complicated, adding procedural language features (looping, branching) on top of the “simple” structured query language and finally leading to major “language” differences it what was supposed to be a standard. Security features alone for the various databases are all over the map.
Second, the problems people were trying to solve got progressively more complex, and this tied in with the feature creep to make longer and longer Select statements, some of which, in production systems can take hours of study to understand, but more importantly, don’t often produce what you thought they would.
Thirdly, add to the above an influx of “database specialists” who are really not up to the task. Give them a simple Select statement of a few hundred characters and and a few clauses and they are OK, but people today are working, often with code they inherited from someone now in an asylum, single SQL statements that would span pages and pages of a book. And the hope is (almost universally untested) that one single do-all SQL statement will always perform faster than some much easier to understand programming logic involving multiple passes at the database (and therefor simpler SQL logic). Worse yet, while these monster SQL statement often pass unit tests, the programmer is never quite sure they produce the desired result in every single case (usually they don’t). Furthermore, once a problem is detected you have what may be an unrepairable mass of old data that is wrong, null fields where there shouldn’t be, etc. and new data that has completely different characteristics (just shut up and let the analysts discover how bad the data is on their own, or hope that they are incompetent too!)
I can safely predict that once feature creep sets into Bigtable system, and once the unwashed hordes start using it, we’ll again be twiddling our thumbs, watching the lights blink, and wonder if we are producing anything useful.
Macbeach,
I think you have pointed out a core problem for any data mapping simplification scheme, like SQL.
The bug reports from the big data mining software companies are usually for complex multi-conditional queries. This raises a question in two ways. For a given complex query, with multiple conditionals;
How does a vendor test for correctness of output for a given data set AND
How does a vendor test for correctness of query parsing and optimisation ?
I suspect so far, badly.
Secondly, there seems to be a Greshams Law for technology. Cheap barely-adequate (or less) technology drives out good or better technology. Is relational database technology being replaced by merely adequate technology that is only feasible because disk and RAM is now cheap ?
A consequence of this, if it is a trend, is that coders are getting less clued about what the data they are working with looks like, so they cannot write optimal code, so things run slower anyway.
Back to square one, get bigger hardware.
SSD drives will allow SQL databases to scale another 10-100x, keeping them unchanged for at least another decade.
The Sequel Dilemma – Cringely on technology…
Thank you for submitting this cool story – Trackback from SAPTechWatch.com…
There is a basic mismatch between a more-or-less carefully designed and implemented SQL database and the application – the DB designers haven’t a clue what queries the programmers REALLY need to run and there is almost no sensible channel to communicate that information in most design methodologies.
The result is that the initial release has abysmal performance until the DBAs get a clue, capture the actual SQL statements, analyse their use and rebuild the physical schema to support reality. The success of this approach assumes the modern RDBMS still supports this capability, but many easy-to-use products (mySQL, PostgesQL) that only support filing-system tables effectively limit control to little more than adding indexes while preventing precise control over physical data placement. The physical schema has been kicked under the table because its ‘too hard to use’ by yer average DBA.
The problem is that the RDBMS is designed round the myth that a mission critical application needs continously flexible and adaptable data access: it doesnt. While a switched-on CIO or CFO may need this to support their data mining queries, the mission-critical application is in practise a relatively slowly changing entity that uses data access paths that any competent system designer can map out before any code is cut.
In other words, forget RDBMS and sexy OODB. Most mission critical systems would perform better if they were based on a DBMS that permitted the system designers to specify the data access paths at the same time as they write code module specifications.
So what database type provides this capability? Hierarchic (IMS) and flat files never made it because they force the design into their mould. Codasyl/IDMS databases are the answer.
These beat the crap out of RDBMS for mission critical systems simply because they let the system designers specify, analyse and optimise data access paths in a way that’s never been possible with RDBMS. Of course, this needs clever system designers, but then systems designed by idiots have always failed and always will: good system design is hard.
Wise comments. Most apps have very poor DB design. And most programmers no nothing about SQL and how is the best way to use SQL on the DB they will use.
You cant abstract SQL b/c all DBs vendors have strengths and weaknesses.
So usually ( no offense to anyone ) Java programmers are the worst in this area.
The same stuff u did for an Oracle DB you cant do for a MySQL.
As with ANY technology. The problem is the same.
Poor Communication, Poor Design, No Wisdom of the tech being used. MONEY. MONEY. MONEY.
Laziness.
You can apply this to anything in tech. Nobody REALLY knows what they are doing and the ones who really know whats right, Nobody is gonna listen too b/c there is no money in that.
“Of course, this needs clever system designers, but then systems designed by idiots have always failed and always will: good system design is hard.”
In The Age Of XP/Agile/SCRUM, “Design” is utterly obsolete, TOTALLY!!! last century, just quaintly archaic.
What we need is more code!!! Today!!! Before close of business!!!
If it does something, terrific! If “Something” is productivce, terrific!! Just make sure it’s glitzy, and cool, and in CVS by close of business!!!
dg has half the equation. Bob, you ignorant slut, have none of it. MySql is popular with the Web KoderKiddies because they either never read a database course in school, or didn’t understand it if they did.
Fact: whether Bob or the KoderKiddies like it or not, ACID has to be co-located with the data, or it doesn’t work. That’s why IBM invented CICS *before* Dr. Codd invented the Relational Database. CICS is the ACID provider for file systems. With CICS, all data access has to go through it, with its syntax.
With the KoderKiddies, and MySql, all access has to go through the applications, just as was true for COBOL programs BEFORE CICS. We’ve gone back to the future by 50 years. Dr. Bob/Mark, you should know better.
What dg points out is true: for transactional systems, SSD and 5NF databases don’t need petabytes of data just because they don’t have petabytes of duplicate, redundant data. Progressive, using a SSD DB2/SqlServer/Postgres application, could both generate quotes in real time, and have access to data from any application. That last part is the reason to use RDBMS. If you just want access to a fixed set of data from a single application, then gird your loins, and write bespoke I/O to your files. It will be faster, and you’ll keep all those programmers (which means that their managers will get bigger bonuses for having more staff) busy. If you want true efficiency in a transactional system, the RDBMS is your only hope.
Oracle is making a fool’s gamble: they’re betting that the arc of hardware progress can be ignored and changed. It can’t. Multi-core/processor Intel boxes connected to industrial strength SSD running real RDBMS will be cheaper and more efficient than any flat-file implementation. Whether they are connected locally or over the ‘net doesn’t matter.
The bottleneck exists because KiddieKoders are idiots, worshipping the Emperor’s New Clothes. There will be fortunes made by those who are just a tad smarter.
First knowledgeable response in a sea of incompetence – people who think mathematics should be replaced ’cause it’s old. The mind boggles.
merci’
I would go with HBase any day in the week, If I built a webapp today.
Not that I’m a bad coder. On the contrary, I’m quite good.
Great article Bob, I use your market insight every day at work. I will check out HBase thanks to you.
bespoke I/O
What a gorgeous turn of phrase.
If you are around in this industry long enough, you find that apparent trends are really just cycles. For example, the Cloud that everyone is raging about is just a recycled idea of old that has gone around the industry too many times to count.
As each cycle nears its end, new ways of solving the same problem emerge, with elegance, simplicity, and a few engineering trade-offs. Like, do I go for performance? Simplicity? The peace of mind of ACID properties? Or the costs and benefits of free software.
The SQL cycle is one of the longest ones in the industry, and Oracle may sense its death rattle. I love the idea that this may have been their motivation to shake up the whole game and reinvent themselves as a growth engine – bravo!
Bob,
This was an excellent post. I’ve been saying the same thing for a while. There are also a couple other DB’s that are interesting: CouchDB and Tokyo Cabinet.
CouchDB is the most interesting to me right now because it’s written in Erlang. Erlang has picked up speed recently with Facebook using it for their chats (ejabberd) and with SAP researching it (https://www.erlang-factory.com/conference/SFBayAreaErlangFactory2009/speakers/SumeetBajaj). It would be very interesting if SAP moved away from the traditional DBs and moved towards CouchDB.
Recently I’ve asked if we even really need databases. Couldn’t you have a distributed filesystem that just writes documents to a B-tree and indexes all the metadata of it’s documents for searching? One that has a built-in Map/Reduce that would allow for pre-built searches to be done more quickly and only return the data requested.
>> Recently I’ve asked if we even really need databases.
Yes. One way or another (in the database, TPM, or your code) there will be ACID for transactional systems. If you don’t care about transactions, then whether you want to access using a SQL syntax, or something else, is up to you. It doesn’t make any difference.
If you do care about transactions, the database engine writers know more about how to do that than any other class of coder. There is no good reason to duplicate that effort. Use your time to add value.
@Robert Young
I don’t think I understand what you are trying to get at.
File systems have file locks. If the file is locked, you can’t update it until someone else is done. If you get an error while updating a file, you merge the changes and then try and save again. If your machine crashes, there are file systems that record the changes in a ledger and pick back up where you left off. So you don’t lose any data. XFS, EXT3/4, RiserFS all do this. I believe that ZFS, BTRFS and the HAMMER file systems have a MVCC write system so that you never write over data and that you just write a new file and link to the old one.
So that all your “transactions” are all taken care of by the file system. For some file systems, there is a file lock in case you try to edit a file someone else has the lock on, or you are appending to a file and have to do a merge if your file is in conflict. If the system crashes, on a single system, you have a ledger to pull the changes you’ve made or, in a distributed environment, you can use a checksum to see if there is a problem with the file and then resolve it.
But I may be missing what you are trying to get at.
>> So that all your “transactions” are all taken care of by the file system.
If that were true, no form of database engine (starting with IDMS, which became the CODASYL model) would ever have been written.
WikiPedia, 101:
Open files and programs are not automatically locked in UNIX. There are different kinds of file locking mechanisms available in different flavours of UNIX and many operating systems support more than one kind for compatibility. The two most common mechanisms are fcntl(2) and flock(2). Although some types of locks can be configured to be mandatory, file locks under UNIX are by default advisory. This means that cooperating processes may use locks to coordinate access to a file between themselves, but programs are also free to ignore locks and access the file in any way they choose to.
In other words: if your program abides by *nix rules, then One process can access the file. There is no provision for multi-user access. This is equivalent to Serializable Isolation level. I suggest Gray & Reuter for transactions.
1.) This posting convinced me to do a load of “homework”, as I was unfamiliar with Big Table, Hadoop project, HBase and Hypertable, learn a little more about MySQL, Oracle, db2 than I initially cared, and pore over the Novell User Communities Glossary. I certainly do miss the links you used to provide on the old PBS website.
2.)Very shrewd that Big Table is mainly used in house across a host of Google applications and services. But people are getting a taste of it when they use Google App Engine. Don’t be too surprised when Google is allowed to revamp the antiquated and overtaxed (no pun intended) system used at the IRS, and then implements and executes its system over the the rest of the federal and state and municipal governments’ file systems. Many people will cry conspiracy, Illuminati and Big Brother, and there will undoubtedly be some abuses, but something like the re-organization of all government records is inevitable. I’m pretty sure that Google will be the monopoly that gets in first and does it best. Not perfect, but best.
3.)So Sun Microsystems gets to be Oracle’s stop-gap until Larry and his posse can think of an Act II? In ten years, Larry will either be a seventy five year old jet setter or dead. Will he still be sharp, will he still care? I’m betting he’ll still be around AND he’ll move his company into consumer electronics. But why start up a new entertainment electronics company when there are others potentially up for grabs? Sony? Pass. It’s a complete mess anyway. Fixer uppers are for suckers. Get something closer to home, stylish and just works. Buy American, and you can’t get any more stylish and American and successful as Apple,Inc. Don’t be surprised if Larry makes a play in the coming decade, probably with Steve Job’s blessing. It’d have to be a merger on equal terms, no engulf, devour and subsume under the Oracle brand.
>Fixer uppers are for suckers. Get something closer to home, stylish and just works. Buy
>American, and you can’t get any more stylish and American and successful as Apple,Inc.
>Don’t be surprised if Larry makes a play in the coming decade, probably with Steve Job’s
>blessing.
@Kevin: You do realize that Apple doesn’t make their stuff in the US, right? It’s all made in China, same as Thinkpads for Lenovo, VAIO’s from Sony, etc. Apple advertises “Designed in Cupertino” but that is about as American as kids who buy parts from NewEgg for a quarter the price of an Apple to build their workstation claiming it was “Designed in Mom’s basement.”
If you like the OS, buy a Dell Mini 9 and run it there. If you already owned OSX it costs you about 20% as much as the Mac Fart. I mean the Mac Air.
Oracle and the other relational data base systems were the culmination of two important events. First, was the relational data base approach to organizing data. Until then you used different types of data base approaches for different situations. The relational data base could be used in a broader number of applications — so it became the defacto favorite. This doesn’t mean the relational data base is the best choice for a given application. The second important event was the creation of SQL, structured query language. This provided the world with a consistent way to interact with the database. Before SQL, each database had its own way to interfacing with applications. SQL provided a “standard” way to do things. Once you had a standard data management approach and query language, then all kinds of tools and know-how could be developed to help with application development. Not unlike the original PC, a SQL relational database provided a “standard platform” for an industry to build upon.
For the world to evolve to another type of database management, similar things will have to happen.
>> For the world to evolve to another type of database management, similar things will have to happen.
Were the Google, etc. offerings “another type”, those of us who were around when the RDBMS was being developed from previous types wouldn’t be telling you all that the Emperor has no clothes. Fact is, what is being offered as “new” is rehashed badness from pre-history. The KiddieKoders were neither there, nor read up the history. It’s as if a physicist today were insisting that the sun and stars revolve around the earth, denying Copernicus.
Dr. Codd had math to prove that the Relational Model worked. The “new” offerings have only back of the envelope scribblings as narrative justification. Such was the creation of both the hierarchical and network databases. xml is hierarchical, again. BigTable and such is just a file. Meh.
Without bullet proof transactional management, and universal access (using a language API, whether COBOL or java doesn’t count), lots of time and money will be squandered on these “new” databases.
One problem with your mobile vs facebook example. Web companies like Google have done studies that show that every fraction of a second in latency increased can translate into a quantifiable number of lost users. The relationship is essentially linear(or in the linear domain of a sigmoid) Maybe web companies can tolerate greater latency variance, but they care about latency a *lot*, or at least the good ones do.
There is another element that has been introduced. Filemaker Pro can now act as a query front end on SQL databases. This makes SQL easier to access and much easier to work with. That may very well change the dynamics of this whole affair.
I respect Cringlely’s opinions, but I think the article over-simplifies the situation.
Just because the new tools are better for some applications, or even if they are better for a lot of applications, doesn’t necessarily mean SQL is obsolete. Adding a saw to my toolbox doesn’t obsolete my hammer.
A lot of smart companies will jump to the new tools because they gain better performance or other advantages from doing so. A lot of other smart companies will remain with SQL because it works, and works very well, for what they need to do. SQL databases provide an excellent framework for expressing relationships among data entities in a rigorous way.
A lot of the applications I build are not performance-critical, but do need very complex data relationships and need to expose the data to ad-hoc queries by moderately-technical (but non-programmer) team members. SQL does that very well, and the apps have response times that are an order of magnitude faster than they need to be. Going from 0.1 second to 0.001 second response time is not significant to the end user. Yes, I realize that if we needed to scale up to a million users, the performance difference would matter. My point, though, is that not all apps are in that category.
Use the right tool, old or new, for the job at hand.
Scott
Bob, why ignore some other strikes at the bottleneck like the post-relational databases, like Caché of Intersystems?
On another note, in Amazon, they flexed their muscles on caching (pun intended) servers a few years ago. They even tried some stuff from the Big O themselves, but it simply crashed too fast. So they started writing cache server from scratch, simply nothing could cope with their loads and oh, boy did they love their Oracle sh*t …
After using a multi-value database product for 25 years in a business then having to turn to sql for other apps, in particular websites, is no fun. Getting things to talk to each other can be a real pain in the neck. Cache (intersystems) is taking a stab at simplifying this process but is still fairly young and have growing pains just as MySQL.
The multi-value databases and post-relational databases by themselves run pretty much flawlessly when setup properly on a *nix server. It is usually not the database but a *nix quirk that will send a server crashing. In todays world I’ve seen *nix servers up and running for years without a burp although a scheduled shutdown does help every now and then to refresh resources. Problem is, on a active server, a single unplanned shutdown could cost a business much in lost revenue.
They (the mv databases) require fewer resources to run the same number of queries than the SQL’s, RMDBS’s, and oracles of the world. Cache (intersystems) is a implementation of one of the origional MV (Multi-value) databases called mumps and runs primarily in the healthcare industry. They have a post-releational back end and sql front end. I’ve looked at it and program with it but the one I’m more familiar with is Pick Systems D3 (now called Rainingdata).
For complex apps you can’t beat them when their designed right from the start. IBM bought one of them, called universe, but they are more interested in DB2 revenues with expensive servers. The biggest problem I see with them is their expense, for business – no biggie to purchase and maintain yearly support contracts considering their stability.
Ideally I’d love to see a mv player open up their code to a version of GPL and then run it’s database against the RMDBS’s of the world. The MV system would squash the RDBMS, but then again who am I to say. Experience only count’s on what you know and know well, build a well designed database with either and my experience shows my preference would be the MV, someone else may choose the RDBMS. Guess that’s why you have a choice, may the best database win!
You gotta be kidding. The MV model a) is ancient b) solves a problem that doesn’t exist. Nothing about the relational model necessarily forbids what MV calls multiple values in a database.
They’ve been duking it out for decades, MV lost. Condolences, as the Big Lebowski would say. Re-positioning these dinosaurs as ‘post-relational’ may fool a couple benighted developers, but most wouldn’t touch that stuff with a ten foot pole, unless they kind of want to throw their career in the toilet anyway.
I had decided to not waste the keystrokes on such drivel, but since you’ve engaged the heathen… Pick (Dick, and yes, that was his name) was created before Dr. Codd wrote his first paper. “Post-relational” is desperate hype; has been for decades. It was/is just a file system. Bah.
[…] I, Cringely » Blog Archive » The Sequel Dilemma – Cringely upon … […]
A competitor to HBase is HyperTable, an Open Source, High Performance, Scalable Database written in C++ currently in Alpha. Sponsored by Baidu and Zvents.com, this has a lot of potential. Keep an eye out for it.
For the record, Progressive generates quote real-time
Maybe I’m not understanding the whole issue, but to me it seems unwise to change database fundamentals just because you have a lot of data to handle. As I understand it, the big advantage of a system like Bigtable is that it allows the designer to store an arbitrary number of column values for each row, as opposed to the static number of column values you get with a relational database. But this problem can be overcome very easily within the relational framework by adding a layer of abstraction. Rather than store all your values in their own columns, you can just create a table of columns and establish a many-to-many relationship through an intermediary table. Once you have your application set up that way, it allows you to define as many columns as you want and assign them all values for each row without changing the application code or wasting storage space on unused columns.
Mark, while what you’re describing is certainly doable (it’s known as a “looping construct”), one of the problems with implementing it is that it winds up ignoring many of the strengths of the RDBMS. Initially it seems cool and clever, then you try and query against it and wind up having to reinvent the RDBMS within your construct within the RDBMS (try dealing with multiple levels of parent/child, or ACID). Hence things like SQL Server’s new “Sparse” columns. I’m not saying hadoop & co. fix this issue, just pointing out that your idea has its own set of issues.
>> try dealing with multiple levels of parent/child, or ACID
Industrial strength databases, DB2, Oracle, SQLServer, Postgres (with 8.4) all have common table expression syntax to deal with hierarchy intelligently. No need for IMS or xml. Really.
heard it all before… SQL provides an important solution to a common problem. that will never go away. so what if it’s been around a long time? are we ditching our hierarchical file system? news just in: arithmetic now been around for too long, new fuzzy logic to replace it. the people who keep claiming the death of SQL are people who do not understand it or what it’s for…
oh and big table and others like it are nothing new. in fact they’re like SQL but without joins.
the reality is that as our reliance on computing increases, complexity increases, and it takes more skills to do the job right. we’re still in a phase where business tries to cut down costs with cheaper labor that is not qualified, hence the horror stories. but done right, SQL is irreplaceable for the foreseeable future.
Bob,
What is preventing Oracle from having a custom version of HBase, since it’s released under the Apache License?
Larry could call it OBase and it would integrate into the existing Oracle database. Perhaps Oracle 13h?
Alex
Hey Bob, enjoyable article, though it rankled a bit. I always find your stuff very well informed, except when you address my particular area of expertise. I wonder if other readers feel the same way 🙂
I’m a professional SQL database developer, if such a description is not a contradiction in terms. I really can’t judge the effectiveness of the particular SQL implementation you mention in the introduction from the information provided, and perhaps neither can you. I think it was mostly for dramatic effect — the hook of “SQL is dead; long live the cool new thing” seemed punchier than “Here’s a new DB technology.”
But in a pluralistic world, Thing A being good doesn’t mean all other things are bad; it just makes for a better story. Just ask the folks at Fox News. I think relational databases in general and specific commercial SQL Server implementations in particular are still providing good performance and good ROI for their customers.
The biggest reason SQL implementations suck is that they are often built by amateurs. I have seen many startups where the C* development was done by pros and the DB dev was done by the same people, who were not DB pros. C developers (gor bless ’em) see the C code as important and the DB as an afterthought, so they cobble together a DB based on the SQL class they took in college. It’s only later when things don’t scale that DB pros are brought in to shovel out the stables.
So grump moan complain. Have your friends call me if they want the SQL DB to stop thrashing. 🙂
>> The biggest reason SQL implementations suck is that they are often built by amateurs. I have seen many startups where the C* development was done by pros and the DB dev was done by the same people, who were not DB pros. C developers (gor bless ‘em) see the C code as important and the DB as an afterthought, so they cobble together a DB based on the SQL class they took in college.
ding, ding, ding. The only problem is that, more often than not, rather than face the reality that they hadn’t a clue what they were doing when they did what they did, the Coders blame SQL/RDBMS for being “inadequate” and then spend boatloads of time and money building a TPM in their code to manage xml files. Kind of Middle Ages. 🙂
The set up that Bob describes with 3 load sharing Apache http servers and only one database server that is maxed out shows that maybe they are not rdbms or system architecture experts and maybe not so great programmers to boot.
Wow, 3 hard drives. A “bunch of cores”, like what 64, 1024??, no probably 4. A “ton of RAM”? What is that really, 8GB, 16? The hardware sounds really kind of weak for something that has any I/O load. 3 drives?
Any programmer can bring an RDBMS architecture to its knees with a few badly written SQL statements, a single weak sister server will go down even faster. Maybe it is time to hire someone that knows databases or
architecture.
Check XML Topic Maps as that does seem a promising alternative to RDBMS. Being based on Graph theory (unlike RDBMS based on set theory), Topic Maps offer unmatched flexibility and power.
>> Check XML Topic Maps as that does seem a promising alternative to RDBMS. Being based on Graph theory (unlike RDBMS based on set theory), Topic Maps offer unmatched flexibility and power.
Oh my, not again. Graphs are not “superior” to sets. Any graph can be described in sets. Moreover, just as in xml and network (CODASYL), one has to specify the graph structure a priori and navigation has to be coded. With RDBMS, one can extend the relationships simply by defining new ones: the structure is manifest in the keys. There is no more to it than that. Traversing the relations in any direction is a piece of cake.
All industrial strength RDBMS provide support for navigating graphs, if one wishes. I forget now who it was, but my favorite quote on the subject: “there are more hierarchies in software than there are in the real world”.
The network model, CODASYL, is a graph theory (although there is no known theoretical paper as precursor) implementation. That was first implemented in 1961 as IDS by GE. Not new.
[…] I, Cringely » Blog Archive » The Sequel Dilemma – Cringely upon … […]
As a long time reader, I doubt myself when I doubt you, but this assertion seems way off the Mark:
> Processors … have now resumed or even accelerated their performance growth
It seems obvious that Moore’s “Law” is at its end! How can a linear growth process (number of processors) hope to match the curve we enjoyed with an exponential one (megahertz speed)?
[…] I, Cringely » Blog Archive » The Sequel Dilemma – Cringely on technology […]
Hi Friend,! Congratulations for this nice looking blog. In this post everything about Web Development. I am also interested in latest news, Great idea you know about company background. Increasing your web traffic and page views Add, add your website in http://www.directory.itsolusenz.com/
My Background: I am the ‘Chief Oracle Guy’ for a software company in the Telco space. I’ve been using Oracle for over 20 years and currently work with databases that process billions of records every day.
I think you’ve missed the fundamental advantage of relational databases. I’m not going to dispute that Oracle gets very expensive as volumes go up, or that the technologies you refer to are better / very useful in certain instances. One of the consequences of Oracle’s success is that it is now used in situations where it really shouldn’t be. But I don’t see Larry Ellison crying any time soon.
The reason that the RDBMS model is still dominant is because it handles complexity well. Alternative databases force developers to make decisions about access paths at design time.
Take Amazon as an example. It’s easy enough to envisage a catalog spread over many physical machines, especially as catalog data is essentially static. But what about the inventory of books actually available for shipment now? The logical thing would be to split it the same way the catalog was split, but wouldn’t it be far more convenient for all the stuff I ordered to be on the same machine as my account records? And what about the requirement that just came in for a list of heavily browsed titles by zip code updated every hour?
This isn’t a problem if you only have to keep track of one thing such as insurance quotes. As the example above shows the problem arises when you need to share or re-use the data. Sharing happens when the business decides to create new products or integrate systems and re-use happens when a newer version replaces the first generation system. The same architecture decision that allowed the original system to run like greased lightning turns into unanticipated complexity that may well prevent the new applications from working quickly or possibly at all. When a developer tries to ‘simplify’ the model so his application works well what usually happens is that more complexity appears somewhere else, usually in the interface between his model and the rest of the business.
Many real-world applications have a couple of hundred changeable tables with complex interrelationships that can not easily be split, sharded or shared. The bottom line is that complexity is inherent in the business problems we attempt to solve as well as the outside world we have to interact with. Oracle DBAs like myself have been earning a living for years by managing this complexity and attempting to create databases that reconcile the conflicting needs of different applications. RDBMS’s such as Oracle are the best tool we have available, but the real work is done in our heads.
So I’m not the least bit surprised that this new technology appeals to start ups – they have no preexisting investment in data and often have a single clear goal. I’d guess it’ll be a couple of years before they find themselves beating their heads off the wall trying to get their non-relational database to grow with the business. Neither will I be surprised when BigTable finds a use for on line catalogs and directories, but I don’t see my bank using it any time soon…
>> The reason that the RDBMS model is still dominant is because it handles complexity well. Alternative databases force developers to make decisions about access paths at design time.
>> The same architecture decision that allowed the original system to run like greased lightning turns into unanticipated complexity that may well prevent the new applications from working quickly or possibly at all.
>> So I’m not the least bit surprised that this new technology appeals to start ups – they have no preexisting investment in data and often have a single clear goal.
(Nor any experience with anything more complicated than a spreadsheet image of data. Sigh)
Another voice of clarity and sanity. I have been disparing that there were none left. Except Pascal and CELKO, not that they get along.
Dear Friends, Happy April Fool’s Day!!!
For the first time in many years, a an old man traveled from his rural town to the city to attend a movie. After buying his ticket, he stopped to purchase some popcorn.
Handing the attendant $1.50, he couldn’t help but comment, “The last time I came to the movies, popcorn was only 15 cents.”
“Well, sir,” the attendant replied with a grin, “You’re really going to enjoy yourself. We have sound now”
Happy April Fool’s Day!
I got mumps last year and it was really very painful. I have to take some pain killers to ease the pain. ,
There is not so much comfort in having children as there is sorrow in parting with them.
Hi, I searched for this blog on Yahoo and just wanted to say thank you for adding this top list. I would have to agree with it, thank you again!
Out of sight out of mind.
The biggest reason SQL implementations suck is that they are often built by amateurs. I have seen many startups where the C* development was done by pros and the DB dev was done by the same people, who were not DB pros. C developers (gor bless ‘em) see the C code as important and the DB as an afterthought, so they cobble together a DB based on the SQL class they took in college.
There is a better way to handle large volumes of data and that better way has been established, not surprisingly, by Google with its BigTable semi-structured database that essentially caches the entire Internet. HBase from Hadoop is the Open Source version of BigTable and both are rapidly making old SQL databases like Oracle and DB2 obsolete for certain users.cheap VPS
Amazon.com runs on an Oracle database, but one that was extended and optimized at a cost of more than $150 million. Amazon probably represents the most that one can do with SQL in terms of scalability. Anything bigger requires a completely new approach like BigTable.
Great to see you back. And again with the interesting post.
Book marked your webblog. Appreciate expressing. Surely value the time clear of our tests.
Brilliant and well deserved! Enjoy the moment and the prize!
mumps is so damn painfull that i don’t wanna hear about ,,;
Hi, a helpful blogpost buddy. Great Share. However I am having problem with your RSS . Unable to subscribe to it. So anyone facing similar RSS feed problem? Anybody who knows kindly reply. Thnx
I am continuously having problems when I try to subscribe to your RSS feed. When you get some time can you look in to it.
Wow this definitely takes me back, sending this to my mates now.
Im grateful for the blog.Really thank you! Will read on…
great thanks man…
You’re awesome thanks for the post friend.
Amazing blog layout here. Was it hard creating a nice looking website like this?
good thanks o/
I value the article. Want more.
Awesome blog, I think it is great when folks express their positions as vehemently as you have. Approach to show your school pleasure! Make us proud!
mumps infected me last month and this is a very painful disease, my jawline were all swollen ;
Good blog here.
If you want to save money on your car insurance though, check out Kanetix.ca
I used to read you blog faithfully, I can’t believe I ever stopped! Now I remember what got me addicted to begin with.
Superb blog! Outstanding knowledge on the topic. Positively the latest bookmark.
Is it just me or dose this blog post suck? Im not trying to bash the author or anything im just stating my opinion.
I switched specialists for the reason that after a few years with laser, I figured the results are not worth the cost. Next I ordered the home laser hair removal. They will guarantee that after eighteen months in the event that hair grows back again, all the treatments are actually free of cost. They’ve got both laser light and also Intense pulsed light, but I want to stick with laser now as Intense pulsed light didn’t seem to decrease the quantity of hair returning.
list of hotels in killarney,hotels in killarney special offers
I never believed I would be required to understand this thank goodness for the web, right?
Wonderful information! I’ve been wanting for a thing like this for a while now. Many thanks!
I switched specialists for the reason that after a few years with laser, I figured the results are not worth the cost.
I was trying to find more information on I, Cringely » Blog Archive » The Sequel Dilemma – Cringely on technology on Ask and this place was the first site I saw about it. Thanks for your opinion and now I know where to look for great stuff when I want it
very informative post. Looking more to something like this
Its Pleasure to study your blog.The above articles is extremely extraordinary, and I seriously enjoyed reading your weblog and points that you simply expressed. I really enjoy to come back on a regular basis,post much more about the subject.Thanks for sharing…keep writing!!!
Really glad I found this article! Thank you so much for submitting.
Super cool logo designs! I really like the textures.
[…] Roberto “X” Cringely, em “The Sequel Dilemma“ Outra razão para que eu continue repetindo uma pequena provocação na primeira parte dos […]
Thanks for your insight for the great posting. I am glad I have taken the time to see this.
a useful article. Thank you for sharing
Hello! I just discovered your page via Yahoo. What a fantastic blog you have! I love it very much! Thank you for providing such valuable information to the whole internet community!
you’ve got an incredible weblog here! would you like to make some invite posts on my weblog?
searching for someI have never read such a blog and I am coming back tomorrow to finish reading.
Can I just say what a relief to find someone who actually knows what theyre talking about on the internet. You definitely know how to bring an issue to light and make it important. More people need to read this and understand this side of the story. I cant believe youre not more popular because you definitely have the gift.
Thanks for this great posting. Awesome stuff!
One more thing. I really believe that there are a lot of travel insurance internet sites of dependable companies than enable you to enter your vacation details and get you the quotations. You can also purchase an international holiday insurance policy on the net by using your credit card. Everything you should do is always to enter your travel information and you can understand the plans side-by-side. Only find the program that suits your financial allowance and needs then use your credit card to buy them. Travel insurance online is a good way to take a look for a reliable company to get international travel insurance. Thanks for giving your ideas.
From all the sites I have been to covering this subject matter, I think you do that best at explaining it, so very well done my friend.
Its Pleasure to study your blog.The above articles is extremely extraordinary, and I seriously enjoyed reading your weblog and points that you simply expressed. I really enjoy to come back on a regular basis,post much more about the subject.Thanks for sharing…keep writing.
mumps is so damn painfull that i don’t wanna hear about ,,;
Thank you very nice place to implement for Happy
Hi all Nice put up did not take long am content rich and that i i’m definetly preparing to cut back it. Making it very the Indepth researching this info has might be trully notable.Individuals who travels that will one step further lately? Done well. Merely another idea people canget the Translator for use on your Universal Viewers !
Hi, You have carried out a fantastic job. I’ll definitely stumbleupon it and I’ll recommend to my friends. I know they’ll be taken advantage of this blog.
I would like to thnkx for the efforts you’ve put in writing this site. I’m hoping the same high-grade site post from you in the upcoming also. Actually your creative writing abilities has inspired me to get my own website now. Actually the blogging is spreading its wings quickly. Your write up is a good example of it.
I’d like to be able to tag blogs I like and have them show on my website. Any suggestions on the best route to take on this. Any input would be helpful..
Children suck the mother when they are young and the father when they are old.
tiens peninggi badan
[…]Sites of interest we’ve a link to[…]
china jewelry manufacturers
[…]check below, are some absolutely unrelated sites to ours, however, they may be most trustworthy sources that we use[…]
HENS NIGHT
[…]Wonderful story, reckoned we could combine a couple of unrelated data, nonetheless definitely really worth taking a appear, whoa did one understand about Mid East has got additional problerms as well […]
http://vxlo.com/artist-hit
[…]very couple of websites that come about to be detailed beneath, from our point of view are undoubtedly very well worth checking out[…]
HITRUST
[…]although web sites we backlink to beneath are considerably not related to ours, we feel they’re actually worth a go by means of, so have a look[…]