AWS solution to build Real-time Data processing Application using Kinesis, Lambda, DynamoDB, S3

Reading Time: 4 minutes

A Capstone Project by Amit Bajaj and Sathya Guruprasad

Introduction

Cloud Computing has become very popular due to the multiple benefits it provides and is being adopted by businesses worldwide. Flexibility to scale up or down as per the business needs, faster and efficient disaster recovery, subscription-based models which reduce the high cost of hardware, and flexible working for employees are some of the benefits of cloud that attracts businesses. Similar to cloud, Data Analytics is another crucial area which businesses are exploring for their growth. With the exponential rise in the amount of data available on the internet is a result of the boom in the usage of social media, mobile apps, IoT devices, sensors and so on. It has become imperative for the organisations to analyse this data to get insights into their businesses and take appropriate action.

AWS provides a reliable platform for solving complex problems where cost-effective infrastructure can be built with great ease at low cost. AWS provides a wide range of managed services, including computing, storage, networking, database, analytics, application services and many more. 

Problem Statement:

We have analysed multiple software solutions which perform analysis on data collected from the market and provide information as well as suggestions and provide better customer experience. This includes trade application providing stock price, taxi companies providing locations of nearby taxis, journey plan applications providing live updates on the different transport media and many more.

We have considered a “server-less” platform / “Server-less Computing Execution Model” to build the real-time data-processing app. Architecture is based on managed services provided by AWS.

What is “Server-less”?

A cloud-based execution model in which the cloud provider dynamically allocates and runs the server. This is a consumption-based model where pricing is directly proportional to consumer use. AWS takes complete ownership of operational responsibilities eliminating infrastructure management and availability with higher uptime. 

Services Consumed:

  1. Kinesis – Kinesis Data Stream- Kinesis Data Analytics- Kinesis Firehose
  2. Athena
  3. Lambda
  4. Dynamo DB
  5. Amazon S3
  6. AWS CLI

Architecture:

AWS solution to build Real-time Data processing Application - cloud computing

Without building a sizable infrastructure, how to receive data from different sources for cloud-based infrastructure?

Kinesis, a managed service by AWS, Amazon Kinesis makes it easy to collect, process, and analyse real-time, streaming data so you can get timely insights and react quickly to new information. Kinesis Datastream allows user to receive data from data generation source. We have created amazon kinesis data stream using AWS CLI commands which is expected to consume data from the data source.

Technical + Functional Flow 

Create Kinesis data streams: 

      1. Create a stream in Kinesis using AWS Console or AWS CLI Commands; one to receive data from Data generator and another to write post processing. Data generator will produce the data which will be read and written to input/source data stream. Kinesis Analytics App will process and write data to Output/destination stream.
      2. We have created a program to generate data, and with the help of AWS SDKs and AWS CLI commands transmitted to Kinesis Data Streams. Data can be generated in various fashion:
        1. Using IoT devices
        2. Live trackers
        3. GPS trackers
        4. API
        5. Data generator tools (in case of Analysis)

Create a Kinesis Analytics App to Aggregate data

      1. Build a Kinesis Data Analytics application to read from the input/source data stream and write to output/destination data stream in formatted fashion in a specified time interval.
      2. It is very important to stop the application when not in use to save unwanted cost.

Data Storage and Processing:

      1. Lambda, another managed service by AWS processes data from trigger data stream and write to dynamo DB
      2. Lambda function works on trigger basis and cost model is strictly driven by consumption. No cost is incurred from user when function is not running. Data is stored in Dynamo DB and can be accessed in standard fashion.

Kinesis Firehose, S3 and Athena:

    1. Kinesis Firehose acts as mediator between Kinesis Datastream and S3 where Data received from Kinesis Datastream will be predefined S3 bucket in specified format
    2. Amazon Athena is server-less interactive query service which enables user to glorify data stored in S3 Bucket for analysis. 

Amazon CLI, AWS Cloud formation and AWS IAM also plays a very important role in building Cloud based infrastructure and ensure secure connectivity within and outside AWS cloud world. 

Conclusion:

Using AWS services, we were able to create a real-time data processing application based on serverless architecture which is capable of accepting data through Kinesis data streams, processing through Kinesis Data Analytics, triggering Lambda Function and storing in DynamoDB. The architecture can be reused for multiple data types from various data sources and formats with minor modifications. We have used all the managed services provided by AWS which led to zero infrastructure management efforts. 

Capstone project has helped us in building practical expertise on AWS services like Kinesis, Lambda, Dynamo DB, Athena, S3, Identity and Access Management, Serverless Architecture and Managed Services. We have also learnt the Go programming language to build pseudo data producer programs. AWS CLI has helped us to connect on-premise infrastructure with cloud services.  

This project is a part of Great Learning’s post graduate program in Cloud Computing. 

Authors
Amit Bajaj – Project Manager at Cognizant
Sathya Guruprasad – Infrastructure Specialist at IBM Pvt Ltd

Experts Talk Series: Migrating to the cloud

Reading Time: 5 minutes

Episode 1 – Cloud migration

Migrating to the cloud is a buzzword these days. Every enterprise wants to say that they are “100% cloud-enabled”. If you are an enterprise looking to move over to the cloud, how should you go about it?

First off, let’s just clarify that “100% cloud-enabled” is a myth. Most enterprises will have a portion of their business running in their own datacenter, also known as on-premise. Therefore, a better way to quantify cloud enablement would be “100% of all applications that have been found fit for the cloud have been migrated”.

How to decide if you really need to migrate?

To get the process off the ground, the first thing you have to decide is whether the cloud is the right fit for your use-case. If your application landscape consists of legacy code or is highly optimized for the hardware it is being run on, it is safe to say the cloud will do more harm than good. But, if your application comprises of a set of loosely coupled components, each being a small highly specialized hardware-independent function, these seem like ripe candidates for a cloud-based server-less implementation.

There should also be a good reason for this endeavour. Change for change’s sake does not always equal to progress. The pros and cons of a cloud-based infrastructure must be taken into account, along with factors like cost and manpower requirements and whether they can be met.

So you want to migrate. What’s next?

Have you decided that you want to jump into the cloud? If so, let’s venture together into the labyrinth of choices you will have to make during this journey.

First, you will have to look at various business dimensions while contemplating your cloud implementation. For example, immediate cost benefits will be highest on IaaS implementations, after a lift and shift of on-premises applications to the cloud. Likewise, other dimensions like time to market, functional responsiveness, and scaling have to be taken into consideration and a balance has to be found. This will help you to decide if your implementation will be IaaS, PaaS or SaaS-based. Perhaps a combination may yield the best results.

The next step is app evaluation. As mentioned earlier, it is necessary to check which applications are fit for the cloud. Low-risk applications from a business perspective can be safely migrated. However, an enterprise may feel more secure storing trade secrets, proprietary functionality, and security services on local servers. Let this be noted though, on-premise servers do not guarantee 100% security any more but cloud providers do. As a matter of fact, cloud providers take security very seriously and take strong measures to make sure that you know exactly where, and by whom your data is being accessed. Also, only authorized users can access your data.

You may be on the fence about migrating certain services, like client-server applications and supporting functions. For such cases, an ROI analysis will help you decide. Please note that on-premises implementation allows the enterprise to take advantage of financial levers like depreciation. In the end, let me emphasize that these decisions are highly case-specific and are not cast in stone. 

An application in an enterprise is hardly ever standalone. Hence, you will have to go through various levels of integration. The usual options are synchronous and asynchronous integration. The on-premises data centre can be integrated with the cloud to create a hybrid cloud deployment topology. This means the cloud applications can access the on-premises applications directly, though a bit of latency will be at play. Maybe asynchronous or batch-based integration will help hide the latency.

The migration process 

It is a myth that cloud migration is a single-step process. As mentioned earlier, the first step is usually a lift-and-shift approach. This is where the existing on-premises architecture is cloned onto the cloud. This relieves the enterprise of the burden of maintaining a data centre, but that is all the benefits you’ll ever get from this approach. After that, gradually, some of the functionality can be re-engineered to take advantage of managed cloud services, such as a database can be moved over to a cloud-provided database. Then there is the concept of cloud-native applications, where new components or functionality can be designed from the get-go to take advantage of platform-specific services built for media, analytics, or content distribution. This way the workload on the enterprise is reduced until you can be solely responsible for the business processes while letting the cloud handle the heavy lifting.

The next step is to choose a cloud provider. Your hired or in-house Cloud expert can help you make an informed decision from myriad choices available to you. Which of these is suitable for you is highly situational, and requires you to take several factors into consideration, like cost, software or platform requirements, compliance requirements, and geographical zone availability. You may also want to take advantage of a specific API or managed service offered by a service provider. It should be noted that most of the top cloud providers have a nearly similar set of services, so if you don’t have any highly specialised requirements, you cannot go wrong with either of them.  

The on-premises setup then has to be restructured to fit the cloud architecture. Your cloud provider will definitely have a list of reference architectures available based on real-life use-cases and a list of best practices to follow, including but not limited to data and application migration tools. They also have an extensive collection of white papers to aid you in this task.

Implementing the migration plan

The above discussion concludes the planning and selection stage of cloud migration. All that is left now is to implement the plan. This should begin with drawing up and implementing a proof of concept. Not only will this allow you to run performance comparisons with your existing application, but it will also highlight unforeseen challenges and complexity levels which may show up during the actual migration process, allowing you to be prepared for the same. This will also give you a good idea of the reliability of the chosen cloud provider and will allow you to evaluate its support system.

While performing the actual migration, you should be careful to minimize the resulting disruption time and service outages. Dry runs should be conducted to identify potential failure points and minimize errors during the process.  Every use case will have its own set of steps to follow during the migration, but it generally starts by taking a backup of the databases, followed by the deployment of applications, and migrating the database. Also, there will be quite a few application components to manage and set up, like middleware, caching, warehousing, and file systems. All these components must be planned and mapped to the relevant cloud service. Don’t forget to set access roles and policies! Make sure you have a clear idea of who should be able to access your applications and which components they can access, then assign appropriate roles for them. Parallel deployments of the application in the cloud and on-premises must be performed to check performance and detect failures.

Benchmarking tests are a must. This will let you know how your cloud application runs in comparison to your on-premises setup and will allow you to fine-tune your setup and be sure if it is ready for deployment.

Congratulations! You have successfully migrated to the cloud. As mentioned before, cloud migration is not a goal but a journey. Every new application will have to be evaluated whether it is a better fit for cloud or on-premises implementation. If it is destined for the cloud, integration with other applications that may still be on-premises will have to be taken into account. As new services are released by the provider, existing on-premises applications will have to be re-evaluated to see if they can take advantage of those new services. 

As you can see, this journey is not easy, but once it has been completed, just sit back and watch the clouds do their magic! But with regular management and prompting from you of course!

Experts Talk Series is a repository of articles written and published by cloud experts. Here we talk in-depth about cloud concepts, applications, and implementation practices.