Striving for Innovation
With data now being abundant and Open Source technologies for Big Data and Analytics now often the defacto system of record, I see three trends that CXOs care a lot about these days.
First, governance and security are in sharp focus, with projects like Ranger. These are actually leap frogging proprietary technologies in the same way it happened a few years ago with storage and compute technologies like Hadoop and YARN, Hive, and Spark.
This is because Open Source has become more and more key to data management strategies in the enterprise. For a couple of years now the open community has really stepped up its game to a point where some capabilities including column level encryption, role filtering, masking, and tag-based security—are now only available in Open Source projects. That’s a pretty amazing milestone.
The second trend where Open Source is really making a big dent in the enterprise is machine and deep learning, and artificial intelligence. Historically, IT in the enterprise has been a post-transaction world. Right now it’s about real time events and predicting failures or anomalies, getting to a point where you can move to pre-transaction and predictive. In a couple of years it'll be significantly different with things like autonomous vehicles, which will rely heavily on things like AI and machine learning.
Open Source has become more and more key to data management strategies in the enterprise
The third trend is probably cloud. One of the great benefits of using Open Source is the ability to bridge the technology stack. You can make a transition from an infrastructure point of view, and retain flexibility in your capex or opex, without making a transition from a technology standpoint. It’s the same tooling, the same experience, the same governance and security that will work in the cloud.
Open Source also helps if you want to be cloud-agnostic. Increasingly CIOs are wary of choosing a technology that just works with one cloud as this really limits their ability to move applications across clouds.
Advice to Peers Implementing Open Source
It’s pretty clear at this point that the data movement has already happened and Open Source has won. So my main advice is, if you haven’t already got on this thing called Open Source Big Data, try to figure out why.
It’s probably a cultural problem, not a technical one because you can look at any industry and find examples of how people have successfully taken advantage of it. They might not have gotten to Google’s level yet, but the majority of industries have made significant progress and are making headway on their fourth or fifth project.
Even if you find yourself at an organization with a more traditional mindset, it is your job to push through those old dogmas and educate your company on the importance of Open Source technologies. That’s the most important challenge to get over.
The success of things like artificial intelligence and machine learning are critically dependent on how you manage data at-scale. So you’ve got to be able to figure out how to manage data consistently and at-scale, including the governance and security, across both on-premise infrastructure and across your clouds. So, as this next wave hits, at least you will have the data to deploy machine learning for historical or predictive analytics.
We see cloud across the enterprise, everywhere from banks to healthcare. There is no question whether the enterprise will use cloud; it’s just to what extent. Like all technologies, this is a journey that will take many years to actually finish; this is why data management and apps consistently across all on-premise and cloud infrastructure is so important. You have to work out how to manage that on ramp from your existing to the cloud. So getting your organization to be really good at managing data is the precursor to being able to use data in really interesting ways using machine learning, deep learning or artificial intelligence.