AWS solution to build Real-time Data processing Application using Kinesis, Lambda, DynamoDB, S3

Reading Time: 4 minutes

A Capstone Project by Amit Bajaj and Sathya Guruprasad

Introduction

Cloud Computing has become very popular due to the multiple benefits it provides and is being adopted by businesses worldwide. Flexibility to scale up or down as per the business needs, faster and efficient disaster recovery, subscription-based models which reduce the high cost of hardware, and flexible working for employees are some of the benefits of cloud that attracts businesses. Similar to cloud, Data Analytics is another crucial area which businesses are exploring for their growth. With the exponential rise in the amount of data available on the internet is a result of the boom in the usage of social media, mobile apps, IoT devices, sensors and so on. It has become imperative for the organisations to analyse this data to get insights into their businesses and take appropriate action.

AWS provides a reliable platform for solving complex problems where cost-effective infrastructure can be built with great ease at low cost. AWS provides a wide range of managed services, including computing, storage, networking, database, analytics, application services and many more. 

Problem Statement:

We have analysed multiple software solutions which perform analysis on data collected from the market and provide information as well as suggestions and provide better customer experience. This includes trade application providing stock price, taxi companies providing locations of nearby taxis, journey plan applications providing live updates on the different transport media and many more.

We have considered a “server-less” platform / “Server-less Computing Execution Model” to build the real-time data-processing app. Architecture is based on managed services provided by AWS.

What is “Server-less”?

A cloud-based execution model in which the cloud provider dynamically allocates and runs the server. This is a consumption-based model where pricing is directly proportional to consumer use. AWS takes complete ownership of operational responsibilities eliminating infrastructure management and availability with higher uptime. 

Services Consumed:

  1. Kinesis – Kinesis Data Stream- Kinesis Data Analytics- Kinesis Firehose
  2. Athena
  3. Lambda
  4. Dynamo DB
  5. Amazon S3
  6. AWS CLI

Architecture:

AWS solution to build Real-time Data processing Application - cloud computing

Without building a sizable infrastructure, how to receive data from different sources for cloud-based infrastructure?

Kinesis, a managed service by AWS, Amazon Kinesis makes it easy to collect, process, and analyse real-time, streaming data so you can get timely insights and react quickly to new information. Kinesis Datastream allows user to receive data from data generation source. We have created amazon kinesis data stream using AWS CLI commands which is expected to consume data from the data source.

Technical + Functional Flow 

Create Kinesis data streams: 

      1. Create a stream in Kinesis using AWS Console or AWS CLI Commands; one to receive data from Data generator and another to write post processing. Data generator will produce the data which will be read and written to input/source data stream. Kinesis Analytics App will process and write data to Output/destination stream.
      2. We have created a program to generate data, and with the help of AWS SDKs and AWS CLI commands transmitted to Kinesis Data Streams. Data can be generated in various fashion:
        1. Using IoT devices
        2. Live trackers
        3. GPS trackers
        4. API
        5. Data generator tools (in case of Analysis)

Create a Kinesis Analytics App to Aggregate data

      1. Build a Kinesis Data Analytics application to read from the input/source data stream and write to output/destination data stream in formatted fashion in a specified time interval.
      2. It is very important to stop the application when not in use to save unwanted cost.

Data Storage and Processing:

      1. Lambda, another managed service by AWS processes data from trigger data stream and write to dynamo DB
      2. Lambda function works on trigger basis and cost model is strictly driven by consumption. No cost is incurred from user when function is not running. Data is stored in Dynamo DB and can be accessed in standard fashion.

Kinesis Firehose, S3 and Athena:

    1. Kinesis Firehose acts as mediator between Kinesis Datastream and S3 where Data received from Kinesis Datastream will be predefined S3 bucket in specified format
    2. Amazon Athena is server-less interactive query service which enables user to glorify data stored in S3 Bucket for analysis. 

Amazon CLI, AWS Cloud formation and AWS IAM also plays a very important role in building Cloud based infrastructure and ensure secure connectivity within and outside AWS cloud world. 

Conclusion:

Using AWS services, we were able to create a real-time data processing application based on serverless architecture which is capable of accepting data through Kinesis data streams, processing through Kinesis Data Analytics, triggering Lambda Function and storing in DynamoDB. The architecture can be reused for multiple data types from various data sources and formats with minor modifications. We have used all the managed services provided by AWS which led to zero infrastructure management efforts. 

Capstone project has helped us in building practical expertise on AWS services like Kinesis, Lambda, Dynamo DB, Athena, S3, Identity and Access Management, Serverless Architecture and Managed Services. We have also learnt the Go programming language to build pseudo data producer programs. AWS CLI has helped us to connect on-premise infrastructure with cloud services.  

This project is a part of Great Learning’s post graduate program in Cloud Computing. 

Authors
Amit Bajaj – Project Manager at Cognizant
Sathya Guruprasad – Infrastructure Specialist at IBM Pvt Ltd

Setting up a hospitality business model on AWS

Reading Time: 5 minutes

A capstone project by Sajal Biswas and Shreya Sharma

Use Case: Accommodation options in the travel industry are not limited to hotels and resorts. People often look for homestay options as this model benefits both the parties. Tourists can enjoy home-like comfort while owners can earn reasonable revenues on the rent.

Introduction:

We have taken the Airbnb business model as a reference, and we have analyzed how to utilize AWS cloud services so that business only need to focus on their model.

We are following ‘server-less architecture’ for our proposed solution. Serverless architectures help in significantly reducing operational cost, complexity, and engineering lead time, at the price of increased reliance on the vendor. 

Architecture:cloud computing capstone project

CICD Architecture:

cloud computing capstone project

Tech stack used:

– ReactJs for creating the web application using AWS AMPLIFY

– Profile Management using AWS COGNITO

– ChatBot using AWS LEX and AWS AMPLIFY

– Static website hosting on S3 bucket

– CLOUDFRONT for CDN

– Code repository in CODECOMMIT

– Backend API’s using Lambda functions(in Python) which will be triggered via API Gateway

– AWS ElastiCache for efficient Search functionality

– DynamoDB database for storing data in key-value pairs

– Static files like images are kept in an S3 bucket

– CloudWatch Alarms are being used for monitoring purpose

– AWS SES service to send emails to customers

– AWS Pinpoint and Athena for analytics purpose

Case Studies:

  1. Without provisioning Infrastructure, load balancing and less cost, how can we develop API, as fast as business needs to launch in the market?

For this requirement, Serverless architecture is the best choice. So, we have implemented the same so that business need not worry about Infrastructure changes and management.

  1. What if a user wants to track email user communication and process the data based on reply?

Enterprise solutions not only want the business to send promotional emails, contact services but also interested in user replies and track user communication as well. AWS SES is implemented for this feature, though we have integrated only sending email using Lambda function, other features can also be explored.

  1. The design approach for Search and Listing Properties on website

We have considered that a large amount of data will be generated, hence transaction would be huge as well, so we have chosen Dynamo DB. We are maintaining property list by partition key as <propertyCode>_<stateCode>_<pinCode> so that we can easily search, and whenever a huge request comes in, then it should split up in such a way that hot partition key issue does not arise.

  1. Efficient Search functionality using AWS ElasticSearch.

We are using AWS ElasticSearch for saving a record along with DynamoDB. We have also created Lambda function for collecting transaction data from DynamoDb and create a CSV file in S3 bucket which will be used from Athena for analytics purpose.

  1. Is it possible to increase customer interaction, instantly? 

We have integrated LEX ChatBot with basic functionalities.

  1. What would be a good approach for User Profile Management?

The initial thought was to use AWS RDS service for this, but later we used managed service for this which is AWS Cognito.

  1. Analytics from Business Perspective.

Currently, we have used below services for analytics purpose:

– Aws pinpoint

– Athena Query 

Technical Details:

Website hosting with API integration:

We have developed a static website using React Js and AWS Amplify. This website is hosted on S3 bucket and Cloudfront is integrated for caching and CDN.

– User Registration, Login, Password Management, Logout and Session management using AWS Cognito.

– LEX Chatbot for basic functionalities

– Integration with backend API’s deployed on API Gateway. We have consistent response JSON format i.e. ArrayList of objects

– AWS pinpoint for tracking user activity on the website

Deployment:

Repository Management: Website repository is maintained using AWS Code Commit.

CI/CD: We have used AWS Code Pipeline for website deployment

API deployment: All Backend API’s are deployed in API gateway integrated with AWS Lambda and we have created the dev stage environment for the same.

Monitoring and Metrics:

We have used Cloud Watch logs and Metrics for debugging and monitoring purpose using various tags.API’s and Database:

We have created API’s using AWS Lambda as backend. All functions are written in the Python environment. 

Although neither of us has expertise in Python, we learnt about it in the PGPCC course. 

Library:

We have used PIP package manager for installing boto3 for python.

API Endpoints:

All Lambda functions are exposed through API gateway as a POST request, wherein we have used “action” field in the body so that based on this field, API can respond accordingly.

Services details: 

We have created the following services:

Product Management Service:

We have created 3 functionalities by querying DynamoDb database/ ElasticSearch

– Create product

– Get All product

– Get all product by state

For the same functionality, based on the “es_service” flag in the body, we decide whether to call DynamoDb or ElasticSearch

Transaction Management Service:

We have created 3 functionalities by querying DynamoDb

– Create Transaction

– Get All transaction and by UserID

– Transaction by date for a particular UserId.

Transaction Analytics service which will gather all transaction data and dump into s3 as CSV file where we can query the data using Athena.

Conclusion:

Serverless computing offers several advantages over traditional cloud-based or server-centric infrastructure. For many developers, serverless architectures offer greater scalability, more flexibility, and quicker time to release, all at a reduced cost. With serverless architectures, developers do not need to worry about purchasing, provisioning, and managing backend servers.

We have observed the following advantages while working on this capstone project:

– No server management is necessary

– Developers are only charged for the server space they use, reducing cost

– Serverless architectures are inherently scalable

– Quick deployments and updates are possible

– Code can run closer to the end-user, decreasing latency

Authors’ Bio:

Shreya Sharma – Shreya is an AWS Certified Solutions Architect and is currently working as Senior Software Developer with Hexaware Technologies Pvt Ltd. in Mumbai. She has a particular interest in all things related to AWS Cloud, migration from on-premise to Cloud & Backend API. She has 8 years of extensive work experience in designing and developing Full Stack Applications on cloud and on-premise both.

Sajal Biswas – Sajal is passionate about cloud computing development and architecting cloud migration projects with backend API development. He is an OCA 7(java), CSM, Mule ESB certified professional and is currently working with Capgemini as a software consultant in Mule ESB technology. He has a total experience of 6.7 years including extensive experience in API integration.