Reality check of PaaS Solutions for distributed systems in IOT and Big Data applications

Manoj K Nair

Manoj K Nair

‘Platform as a Service’ (PaaS) in the distributed systems arena is gaining wide adoption nowadays as the cloud is gaining more customer confidence. The latest IDC forecast states “By 2020, IDC forecasts that public cloud spending will reach $203.4 billion worldwide”. They also predict a fast growth in the PaaS segment, precisely in the next five years, Compound Annual Growth Rate (CAGR) is predicted at 32.2% which is very promising. PaaS Solutions for distributed systems have captured the serious attention of big players, like Amazon (AWS EMR), Google (Google Cloud Platform), Microsoft (HDinsight), Databricks (Unified Analytic platform) etc., and the count is growing by the day. The same is the case for IOT, with platforms from Amazon (AWS IOT), IBM (Bluemix), CISCO (Cloud Connect) etc. being the major ones in the growing list.

The explosive growth of PaaS Solutions is boosted by the complexity of DevOps and administration nightmares encountered in distributed systems; we still remember the Apache Hadoop version upgrades that always led to sleepless nights!

PaaS Solutions absorb a lot of complexities of the distributed systems which allows us to,

1.     Do the evaluation of platforms straight away. You don’t need to wait anymore for cost approvals, deployment completion as in the case of On Premises or IaaS deployments.

2.     IOT enabling becomes as fast as just plugging in an agent in your device.

3.     Automatic version upgrades of opensource distributed platforms like Apache Spark, Apache Hadoop, Apache Kaa etc. becomes just configuration changes.

4.     You can enjoy the additional features like Notebook integration, REST API service support etc. provided by the vendors

All fine! But are there any hidden factors in PaaS Solutions that need to be considered? From my experience of the past few years, it is a big YES! Especially for IoT and Big Data applications.

A ready-made dress may still need alterations!!!

PaaS solutions allow us to remain focused on the application use case by simplifying the spinning up of any platform with few clicks. Moving to another platform configuration is as easy as changing a few parameters and doing a restart. Major configurations and optimizations inside the platform are completely transparent to the user, which is an advantage most of the time.

However complete transparency to the system is not always insightful. You may need to play around with platform configurations to tune your application on top of it – scenarios like trying a few customized or new plugins into the platform which can give extra muscle to your application. As the open source incubations are growing rapidly and lot of new innovative tools in distributed systems are getting released every month, you need to have the flexibility to use them on the platform. Debugging or performance benchmarking of the application running over a totally transparent underlying platform is not good news for system designers. So when the platform is said to be transparent, we should also check the level of control we have over the platform.

For instance, while working with a major US healthcare player for collecting their large data streams for predictive and descriptive analytics, we were using Kafka for data injection and Apache Spark Streaming PaaS for data landing and processing. The initial evaluation and selection of the platform went well with standard architectural considerations and we were happy with the platform choice. Once the development of the application’s functionality was over and alpha tests completed, we started looking to make a few optimizations and tuning as part of the refactoring, for which accessing the platform cluster nodes became essential. We requested the platform vendor for access to the cluster nodes, but their reply was disappointing. Their customer support said “It’s completely transparent to user and we do not recommend any access or modification of the platform configurations”. We were stuck!!!

In another case of a Smart Battery IoT project, we were pushing status info from the smart device to an IOT PaaS platform for self-tuning. The data was being stored internally in the PaaS system. Things were working great and we were able to view the data using their custom tools and REST API based limited query access. However in our project, we made a strategy to create a raw data lake into AWS S3 for future analytics. To our surprise we found that there isn’t an option for data export! Being a very basic yet important feature, we contacted the IoT platform technical support. Their response was “Yeah, it is a simple feature, but it is not in our ‘Business priority list’ of features. So, it may take us some more time to do it”. How much more time was unknown! We were stuck again, and had to review the raw data lake policy.

In both cases, our development plans were seriously impacted and we were forced to skip/postpone major use cases, or start looking out for alternative platform to migrate to, although so late into the project. Let’s closely observe the responses from technical support in these two cases for a few interesting facts.

Case 1: “It’s completely transparent to user and we do not recommend any access or modification of the platform configurations

Transparency of platform complexities is definitely an important motivation to opt for PaaS, as it gives a quick, efficient and cost effective way of building the distributed system. But it is important to have an insight into the platform internals and in few cases some control as well. Being a system designer, we don’t like to swallow things as they are!!! After all, “platform limitations” is definitely not the story we want to tell our customers! In this specific case, we were looking to try out external monitoring tools that need a few agents to be installed into the cluster nodes. Eventually supporting a third-party BI tool took us roughly two months, in coordinating between the technical teams at the PaaS vendor and the BI vendor. This is simply not acceptable to the customer in terms of time or budget.

Case 2: “Yeah, it is a simple feature, but it is not in our ‘Business priority list’ of features. So, it may take us some more time to do it

Not just disappointing, this is alarming!!! Technical interoperability for the customer’s data should not be restricted for the sake of business priority. Unfortunately, the so called “business priority” often loses focus in retaining customers which reminds one of a “My way or the Highway” strategy! No customer wants his data being stuck in a specific platform. We need the flexibility to move it through multiple platforms, as business data has latent insights which could be extracted through different systems today or in the future.

To sum up, apart from traditional architectural considerations for selection between OnPremises or IaaS, PaaS or SaaS, we should be vigilant regarding these hidden factors during the selection of distributed platforms especially for IoT and Big Data applications, where large amounts of data are generated. The hidden factors are tricky in the sense that they may not be visible in the first look.

Some of the architectural considerations that help mitigate these hidden factors are given below.

1.     Create a proper migration plan – This may not be a short-term goal. But it becomes very important because as and when the data grows you may end up in a world of restrictions.

2.     Make sure you have enough control over the platform internals – Although you want to avoid administration overheads as much as possible, you still need good control of the platform for development, refactoring and analysis. Distributed system usage without platform control is painful in the long run. Telnet or SSH access to the cluster nodes, privilege to install custom tools and configuration level flexibility are few items to be verified in general.

3.     Third party integration flexibility – Most of the time, the system that we develop would be part of a pipeline and may need integration with customized systems like monitoring tools, custom logging methods etc. which make the integration hooks critical.

4.     Platform vendor’s willingness to provide functionality on demand – Platform vendors should be able to handle custom functionality requests on demand. We cannot wait indefinitely for the platform to support it in due course. Make sure that their quick and efficient response is covered in your Service License Agreement (SLA).

Distributed Platform as a Service is definitely growing rapidly, and customers will continue to invest heavily for the combo-advantages of reduced Capex cost, reduced time to market and reduced maintenance/administrative complexity. But I hope the quality and competitiveness of PaaS Solutions also matures fast for the benefit of investing customers, like our IoT and Big Data customers at Sequoia AT. Let’s hope a day will come soon when the platform vendors start advertising their respective platforms by throwing an open challenge “Hey, try out our PaaS solution and if you don’t like it, migrate to any other PaaS solution in 24 Hours or 1 Week!!!”

Link to Article on Linkedin

Leave a Reply

Your email address will not be published.