AWS solution to build Real-time Data processing Application using Kinesis, Lambda, DynamoDB, S3

Reading Time: 4 minutes

A Capstone Project by Amit Bajaj and Sathya Guruprasad

Introduction

Cloud Computing has become very popular due to the multiple benefits it provides and is being adopted by businesses worldwide. Flexibility to scale up or down as per the business needs, faster and efficient disaster recovery, subscription-based models which reduce the high cost of hardware, and flexible working for employees are some of the benefits of cloud that attracts businesses. Similar to cloud, Data Analytics is another crucial area which businesses are exploring for their growth. With the exponential rise in the amount of data available on the internet is a result of the boom in the usage of social media, mobile apps, IoT devices, sensors and so on. It has become imperative for the organisations to analyse this data to get insights into their businesses and take appropriate action.

AWS provides a reliable platform for solving complex problems where cost-effective infrastructure can be built with great ease at low cost. AWS provides a wide range of managed services, including computing, storage, networking, database, analytics, application services and many more. 

Problem Statement:

We have analysed multiple software solutions which perform analysis on data collected from the market and provide information as well as suggestions and provide better customer experience. This includes trade application providing stock price, taxi companies providing locations of nearby taxis, journey plan applications providing live updates on the different transport media and many more.

We have considered a “server-less” platform / “Server-less Computing Execution Model” to build the real-time data-processing app. Architecture is based on managed services provided by AWS.

What is “Server-less”?

A cloud-based execution model in which the cloud provider dynamically allocates and runs the server. This is a consumption-based model where pricing is directly proportional to consumer use. AWS takes complete ownership of operational responsibilities eliminating infrastructure management and availability with higher uptime. 

Services Consumed:

  1. Kinesis – Kinesis Data Stream- Kinesis Data Analytics- Kinesis Firehose
  2. Athena
  3. Lambda
  4. Dynamo DB
  5. Amazon S3
  6. AWS CLI

Architecture:

AWS solution to build Real-time Data processing Application - cloud computing

Without building a sizable infrastructure, how to receive data from different sources for cloud-based infrastructure?

Kinesis, a managed service by AWS, Amazon Kinesis makes it easy to collect, process, and analyse real-time, streaming data so you can get timely insights and react quickly to new information. Kinesis Datastream allows user to receive data from data generation source. We have created amazon kinesis data stream using AWS CLI commands which is expected to consume data from the data source.

Technical + Functional Flow 

Create Kinesis data streams: 

      1. Create a stream in Kinesis using AWS Console or AWS CLI Commands; one to receive data from Data generator and another to write post processing. Data generator will produce the data which will be read and written to input/source data stream. Kinesis Analytics App will process and write data to Output/destination stream.
      2. We have created a program to generate data, and with the help of AWS SDKs and AWS CLI commands transmitted to Kinesis Data Streams. Data can be generated in various fashion:
        1. Using IoT devices
        2. Live trackers
        3. GPS trackers
        4. API
        5. Data generator tools (in case of Analysis)

Create a Kinesis Analytics App to Aggregate data

      1. Build a Kinesis Data Analytics application to read from the input/source data stream and write to output/destination data stream in formatted fashion in a specified time interval.
      2. It is very important to stop the application when not in use to save unwanted cost.

Data Storage and Processing:

      1. Lambda, another managed service by AWS processes data from trigger data stream and write to dynamo DB
      2. Lambda function works on trigger basis and cost model is strictly driven by consumption. No cost is incurred from user when function is not running. Data is stored in Dynamo DB and can be accessed in standard fashion.

Kinesis Firehose, S3 and Athena:

    1. Kinesis Firehose acts as mediator between Kinesis Datastream and S3 where Data received from Kinesis Datastream will be predefined S3 bucket in specified format
    2. Amazon Athena is server-less interactive query service which enables user to glorify data stored in S3 Bucket for analysis. 

Amazon CLI, AWS Cloud formation and AWS IAM also plays a very important role in building Cloud based infrastructure and ensure secure connectivity within and outside AWS cloud world. 

Conclusion:

Using AWS services, we were able to create a real-time data processing application based on serverless architecture which is capable of accepting data through Kinesis data streams, processing through Kinesis Data Analytics, triggering Lambda Function and storing in DynamoDB. The architecture can be reused for multiple data types from various data sources and formats with minor modifications. We have used all the managed services provided by AWS which led to zero infrastructure management efforts. 

Capstone project has helped us in building practical expertise on AWS services like Kinesis, Lambda, Dynamo DB, Athena, S3, Identity and Access Management, Serverless Architecture and Managed Services. We have also learnt the Go programming language to build pseudo data producer programs. AWS CLI has helped us to connect on-premise infrastructure with cloud services.  

This project is a part of Great Learning’s post graduate program in Cloud Computing. 

Authors
Amit Bajaj – Project Manager at Cognizant
Sathya Guruprasad – Infrastructure Specialist at IBM Pvt Ltd

Setting up a hospitality business model on AWS

Reading Time: 5 minutes

A capstone project by Sajal Biswas and Shreya Sharma

Use Case: Accommodation options in the travel industry are not limited to hotels and resorts. People often look for homestay options as this model benefits both the parties. Tourists can enjoy home-like comfort while owners can earn reasonable revenues on the rent.

Introduction:

We have taken the Airbnb business model as a reference, and we have analyzed how to utilize AWS cloud services so that business only need to focus on their model.

We are following ‘server-less architecture’ for our proposed solution. Serverless architectures help in significantly reducing operational cost, complexity, and engineering lead time, at the price of increased reliance on the vendor. 

Architecture:cloud computing capstone project

CICD Architecture:

cloud computing capstone project

Tech stack used:

– ReactJs for creating the web application using AWS AMPLIFY

– Profile Management using AWS COGNITO

– ChatBot using AWS LEX and AWS AMPLIFY

– Static website hosting on S3 bucket

– CLOUDFRONT for CDN

– Code repository in CODECOMMIT

– Backend API’s using Lambda functions(in Python) which will be triggered via API Gateway

– AWS ElastiCache for efficient Search functionality

– DynamoDB database for storing data in key-value pairs

– Static files like images are kept in an S3 bucket

– CloudWatch Alarms are being used for monitoring purpose

– AWS SES service to send emails to customers

– AWS Pinpoint and Athena for analytics purpose

Case Studies:

  1. Without provisioning Infrastructure, load balancing and less cost, how can we develop API, as fast as business needs to launch in the market?

For this requirement, Serverless architecture is the best choice. So, we have implemented the same so that business need not worry about Infrastructure changes and management.

  1. What if a user wants to track email user communication and process the data based on reply?

Enterprise solutions not only want the business to send promotional emails, contact services but also interested in user replies and track user communication as well. AWS SES is implemented for this feature, though we have integrated only sending email using Lambda function, other features can also be explored.

  1. The design approach for Search and Listing Properties on website

We have considered that a large amount of data will be generated, hence transaction would be huge as well, so we have chosen Dynamo DB. We are maintaining property list by partition key as <propertyCode>_<stateCode>_<pinCode> so that we can easily search, and whenever a huge request comes in, then it should split up in such a way that hot partition key issue does not arise.

  1. Efficient Search functionality using AWS ElasticSearch.

We are using AWS ElasticSearch for saving a record along with DynamoDB. We have also created Lambda function for collecting transaction data from DynamoDb and create a CSV file in S3 bucket which will be used from Athena for analytics purpose.

  1. Is it possible to increase customer interaction, instantly? 

We have integrated LEX ChatBot with basic functionalities.

  1. What would be a good approach for User Profile Management?

The initial thought was to use AWS RDS service for this, but later we used managed service for this which is AWS Cognito.

  1. Analytics from Business Perspective.

Currently, we have used below services for analytics purpose:

– Aws pinpoint

– Athena Query 

Technical Details:

Website hosting with API integration:

We have developed a static website using React Js and AWS Amplify. This website is hosted on S3 bucket and Cloudfront is integrated for caching and CDN.

– User Registration, Login, Password Management, Logout and Session management using AWS Cognito.

– LEX Chatbot for basic functionalities

– Integration with backend API’s deployed on API Gateway. We have consistent response JSON format i.e. ArrayList of objects

– AWS pinpoint for tracking user activity on the website

Deployment:

Repository Management: Website repository is maintained using AWS Code Commit.

CI/CD: We have used AWS Code Pipeline for website deployment

API deployment: All Backend API’s are deployed in API gateway integrated with AWS Lambda and we have created the dev stage environment for the same.

Monitoring and Metrics:

We have used Cloud Watch logs and Metrics for debugging and monitoring purpose using various tags.API’s and Database:

We have created API’s using AWS Lambda as backend. All functions are written in the Python environment. 

Although neither of us has expertise in Python, we learnt about it in the PGPCC course. 

Library:

We have used PIP package manager for installing boto3 for python.

API Endpoints:

All Lambda functions are exposed through API gateway as a POST request, wherein we have used “action” field in the body so that based on this field, API can respond accordingly.

Services details: 

We have created the following services:

Product Management Service:

We have created 3 functionalities by querying DynamoDb database/ ElasticSearch

– Create product

– Get All product

– Get all product by state

For the same functionality, based on the “es_service” flag in the body, we decide whether to call DynamoDb or ElasticSearch

Transaction Management Service:

We have created 3 functionalities by querying DynamoDb

– Create Transaction

– Get All transaction and by UserID

– Transaction by date for a particular UserId.

Transaction Analytics service which will gather all transaction data and dump into s3 as CSV file where we can query the data using Athena.

Conclusion:

Serverless computing offers several advantages over traditional cloud-based or server-centric infrastructure. For many developers, serverless architectures offer greater scalability, more flexibility, and quicker time to release, all at a reduced cost. With serverless architectures, developers do not need to worry about purchasing, provisioning, and managing backend servers.

We have observed the following advantages while working on this capstone project:

– No server management is necessary

– Developers are only charged for the server space they use, reducing cost

– Serverless architectures are inherently scalable

– Quick deployments and updates are possible

– Code can run closer to the end-user, decreasing latency

Authors’ Bio:

Shreya Sharma – Shreya is an AWS Certified Solutions Architect and is currently working as Senior Software Developer with Hexaware Technologies Pvt Ltd. in Mumbai. She has a particular interest in all things related to AWS Cloud, migration from on-premise to Cloud & Backend API. She has 8 years of extensive work experience in designing and developing Full Stack Applications on cloud and on-premise both.

Sajal Biswas – Sajal is passionate about cloud computing development and architecting cloud migration projects with backend API development. He is an OCA 7(java), CSM, Mule ESB certified professional and is currently working with Capgemini as a software consultant in Mule ESB technology. He has a total experience of 6.7 years including extensive experience in API integration.

 

Experts Talk Series: Migrating to the cloud

Reading Time: 5 minutes

Episode 1 – Cloud migration

Migrating to the cloud is a buzzword these days. Every enterprise wants to say that they are “100% cloud-enabled”. If you are an enterprise looking to move over to the cloud, how should you go about it?

First off, let’s just clarify that “100% cloud-enabled” is a myth. Most enterprises will have a portion of their business running in their own datacenter, also known as on-premise. Therefore, a better way to quantify cloud enablement would be “100% of all applications that have been found fit for the cloud have been migrated”.

How to decide if you really need to migrate?

To get the process off the ground, the first thing you have to decide is whether the cloud is the right fit for your use-case. If your application landscape consists of legacy code or is highly optimized for the hardware it is being run on, it is safe to say the cloud will do more harm than good. But, if your application comprises of a set of loosely coupled components, each being a small highly specialized hardware-independent function, these seem like ripe candidates for a cloud-based server-less implementation.

There should also be a good reason for this endeavour. Change for change’s sake does not always equal to progress. The pros and cons of a cloud-based infrastructure must be taken into account, along with factors like cost and manpower requirements and whether they can be met.

So you want to migrate. What’s next?

Have you decided that you want to jump into the cloud? If so, let’s venture together into the labyrinth of choices you will have to make during this journey.

First, you will have to look at various business dimensions while contemplating your cloud implementation. For example, immediate cost benefits will be highest on IaaS implementations, after a lift and shift of on-premises applications to the cloud. Likewise, other dimensions like time to market, functional responsiveness, and scaling have to be taken into consideration and a balance has to be found. This will help you to decide if your implementation will be IaaS, PaaS or SaaS-based. Perhaps a combination may yield the best results.

The next step is app evaluation. As mentioned earlier, it is necessary to check which applications are fit for the cloud. Low-risk applications from a business perspective can be safely migrated. However, an enterprise may feel more secure storing trade secrets, proprietary functionality, and security services on local servers. Let this be noted though, on-premise servers do not guarantee 100% security any more but cloud providers do. As a matter of fact, cloud providers take security very seriously and take strong measures to make sure that you know exactly where, and by whom your data is being accessed. Also, only authorized users can access your data.

You may be on the fence about migrating certain services, like client-server applications and supporting functions. For such cases, an ROI analysis will help you decide. Please note that on-premises implementation allows the enterprise to take advantage of financial levers like depreciation. In the end, let me emphasize that these decisions are highly case-specific and are not cast in stone. 

An application in an enterprise is hardly ever standalone. Hence, you will have to go through various levels of integration. The usual options are synchronous and asynchronous integration. The on-premises data centre can be integrated with the cloud to create a hybrid cloud deployment topology. This means the cloud applications can access the on-premises applications directly, though a bit of latency will be at play. Maybe asynchronous or batch-based integration will help hide the latency.

The migration process 

It is a myth that cloud migration is a single-step process. As mentioned earlier, the first step is usually a lift-and-shift approach. This is where the existing on-premises architecture is cloned onto the cloud. This relieves the enterprise of the burden of maintaining a data centre, but that is all the benefits you’ll ever get from this approach. After that, gradually, some of the functionality can be re-engineered to take advantage of managed cloud services, such as a database can be moved over to a cloud-provided database. Then there is the concept of cloud-native applications, where new components or functionality can be designed from the get-go to take advantage of platform-specific services built for media, analytics, or content distribution. This way the workload on the enterprise is reduced until you can be solely responsible for the business processes while letting the cloud handle the heavy lifting.

The next step is to choose a cloud provider. Your hired or in-house Cloud expert can help you make an informed decision from myriad choices available to you. Which of these is suitable for you is highly situational, and requires you to take several factors into consideration, like cost, software or platform requirements, compliance requirements, and geographical zone availability. You may also want to take advantage of a specific API or managed service offered by a service provider. It should be noted that most of the top cloud providers have a nearly similar set of services, so if you don’t have any highly specialised requirements, you cannot go wrong with either of them.  

The on-premises setup then has to be restructured to fit the cloud architecture. Your cloud provider will definitely have a list of reference architectures available based on real-life use-cases and a list of best practices to follow, including but not limited to data and application migration tools. They also have an extensive collection of white papers to aid you in this task.

Implementing the migration plan

The above discussion concludes the planning and selection stage of cloud migration. All that is left now is to implement the plan. This should begin with drawing up and implementing a proof of concept. Not only will this allow you to run performance comparisons with your existing application, but it will also highlight unforeseen challenges and complexity levels which may show up during the actual migration process, allowing you to be prepared for the same. This will also give you a good idea of the reliability of the chosen cloud provider and will allow you to evaluate its support system.

While performing the actual migration, you should be careful to minimize the resulting disruption time and service outages. Dry runs should be conducted to identify potential failure points and minimize errors during the process.  Every use case will have its own set of steps to follow during the migration, but it generally starts by taking a backup of the databases, followed by the deployment of applications, and migrating the database. Also, there will be quite a few application components to manage and set up, like middleware, caching, warehousing, and file systems. All these components must be planned and mapped to the relevant cloud service. Don’t forget to set access roles and policies! Make sure you have a clear idea of who should be able to access your applications and which components they can access, then assign appropriate roles for them. Parallel deployments of the application in the cloud and on-premises must be performed to check performance and detect failures.

Benchmarking tests are a must. This will let you know how your cloud application runs in comparison to your on-premises setup and will allow you to fine-tune your setup and be sure if it is ready for deployment.

Congratulations! You have successfully migrated to the cloud. As mentioned before, cloud migration is not a goal but a journey. Every new application will have to be evaluated whether it is a better fit for cloud or on-premises implementation. If it is destined for the cloud, integration with other applications that may still be on-premises will have to be taken into account. As new services are released by the provider, existing on-premises applications will have to be re-evaluated to see if they can take advantage of those new services. 

As you can see, this journey is not easy, but once it has been completed, just sit back and watch the clouds do their magic! But with regular management and prompting from you of course!

Experts Talk Series is a repository of articles written and published by cloud experts. Here we talk in-depth about cloud concepts, applications, and implementation practices.  

This program has elevated my role – Rajesh Kumar, Engagement Lead at Cognizant, UK

Reading Time: 1 minute

Cloud Computing is swiftly becoming one of the top skills considered by tech-professionals to switch their career in. Read what Rajesh Kumar has to say about Great Learning’s PG program in cloud computing and how he was able to complete AWS architect certification.

I have recently completed the Post Graduate Program in Cloud Computing with Great Learning and would like to share my gratitude & experience.

First Things First, Kudos to

– Great Lakes Content Team for preparing & delivering the curriculum aligned for Managers to renew & scale on cutting edge technologies like Cloud, Containers, Microservices, Big Data, Business Transformations etc.

– Experienced mentors & Enter-trainers like Nirmallaya & Shiva, who delivered the content & their experience to students in an exceptional way rather than being monotonous & mediocre

– Emphatic Program Manager – Ekta Singh, Her timely support has kept me on the progress track for successful completion. She played a vital role in pushing to complete labs and projects that instilled confidence to pursue the technology ladder again (though I was core L4 techie a few years back). I truly appreciate her commitment to the overall success of the program.

This program has elevated my role to learn, practice & apply knowledge on all technology & business transformation in the digital world. My role is elevated from the Delivery Manager to Engagement lead focused on Business & Technology.  

I would definitely give credit to PGP-CC for the Training Program, Mentoring Sessions, Sharing abreast of all technology advancements under “Industry Focus” sections & Challenging with Labs, Projects & Capstone projects, that made me achieve AWS cloud practitioner certification and preparation for AWS Architect certification.

Upskill with Great Learning’s PG program in Cloud Computing and unlock your dream career.