Data, Big Data, and Data Field Day 1

May 29, 2015 John Obeto

Data.

Big Data.

Heck, it’s all the same thing.

Data Field Day One
The advent of the Internet of Things or IoT, is going to exponentially expand the number of datapoints from which telemetry is returned.

And it isn’t going to get any easier.

People, this growth will be relentless, and the data deluge is just about to start.

What seems to be lost in the hoopla is some simple fact: without analysis, dissection, and visualization of that data, it is largely a useless dump of data.

So far, at the Gestalt IT Tech Field Day series and at the in-house HP Storage Days, we have seen how hardware companies plan to tackle the storage of data

Thankfully, several companies are stepping, and technologies have been invented that are bringing innovation to this issue.

Two weeks ago, I was at Data Field Day One where a select group of companies showed us their attempts to solve data issues.

NOTE: two of the companies that presented at Data Field Day 1 were focused on the software of Big Data, which was my mission to seek out. The other two, HGST and SanDisk FlashSoft, were seemingly more focused on the hardware, despite their software bent. For this article, I will focus on Cloudera and Hedvig.

I intend to spend more time on yet another recap of the Tech Field Day video presentations on HGST and SanDisk to see if I missed their métier.

Many ways to skin a cat

Seriously though, data needs a lot of flexibility and attendant hybridization is analysis and processing.

Why?

Well, data tends to live where it is born.

Huh?

Cloudera

Nice explanation by Cloudera co-founder and CTO Mike Olson: there is data that is born is the data center: sales transaction, and data that’s born in the cloud, such as returned information from the ginormous numbers of connected sensors globally. Consequently, it makes sense to analyze that data where it is, rather than getting into the business of moving huge amounts back and forth from cloud storage repositories into datacenter storage.

And Mike should know. His company Cloudera, which is a pure-play big data firm located in Palo Alto, happens to be the 1,000 lb. gorilla in Big Data.

Valued at about $5 billion minimum due to a recent 15%, $750 million investment in it by mighty Intel, Cloudera is harnessing the best of open source and proprietary IP built upon open source foundations to deliver products that help users make sense of the flood of data hitting them, thereby monetizing their data.

Cloudera, as mentioned above, is largely agnostic in the use of whatever open source tech it needs to deliver that database and data analysis functionality, with names like Hadoop, Pig, and some other hipstery names flowing smoothly of the tongues at DFD1.

They even have a UI for Hadoop!

Oh yes, and most importantly, the use and retention of proprietary IP allows Cloudera to deliver a business plan that is real-worldly, and has a profitability horizon. That, I like, and approve of!

This, is what grownups do.

Cloudera has, I believe, the only PCI-DSS certificated secure compliance data store. Their flagship product is used by some of the biggest names in the financial and healthcare services fields, such as FINRA.

“I like to look at all my data”
Commonly said by a lot of people.

Actually, you don’t.

If a typical ERP database contains 15,000 tables, and if the average healthcare EMR system contains 245,000 columns, trust me, you don’t want to “look at all your data”!

For that, Cloudera added Explain.IO to their stable.

Cloudera Explain.io creates an Enterprise Data Hub that determines business intent by data mining the queries against a businesses’ data store, not the actual data.

Hedvig

Hedvig was another company that caught my eye.

Hedvig is a software-defined storage startup that lives in a VM.

From the demo, it is powerful, fast, and get this: the first enterprise startup to actually deliver a product that targets both x86 and ARM CPUs, though, according to Hedvig, most of their sales are for x86, which is a pattern that is currently the norm.

Hedvig’s platform is elastic, is cloud-agnostic, and works fine with commodity servers. Scaling to petabytes, it provides enterprise block, file, and object storage.

I may have misunderstood, but I believe Hedvig is currently VMware-only.

Once they arrive on Hyper-v, I will try to play around with it.

The Data Field Day 1 homepage is here. It has links to the event, videos, and all content generated subsequent to it. It also has links to both delegate and company bios.

Cloudera is here.

Hedvig is here.

The #DFD1 site also has links to upcoming Tech Field Day events, which are livestreamed to everyone in this star system. For free.

Follow @johnobeto