Goldin Auctions main requirement was to deliver the information as requested by frontend and at the same time analyse the data at the backend. The execution started with having a single dashboard to serve data requests for multiple parties. Since the data for both parties belong to single storage, the idea of having a single interface to view different information came into existence. The dashboard belongs to admin usage, hence Cognito User Pool is configured for security purposes. The data storage being populated is given an analytical treatment at the backend which results in generating fresh data reports. Thought of maintaining loosely coupled architecture between the request-response micro-service and the analytics micro-service makes it more scalable.
AWS Partner Story: Goldin Auctions
- Goldin Auctions works with multiple parties for buying and selling of collectibles hence needed a single dashboard to view transactional data between multiparty.
- Since the data delivered and stored is confidential, maintaining a secure architecture was a big challenge.
- In the time of increasing business, a huge amount of data is generated which needs to be properly analysed.
- Delivering information and analysis can be done in a single infrastructure with the help of loosely-coupled micro-services.
- Secured architecture as the risk of outsiders accessing confidential data is eradicated.
- Fresh data report provided to the Business intelligence team.
- Low latency while migrating the reports
Before implementing the solution, we needed to find answers to the below questions:
- Do we need to set up TTL for DynamoDB items?
- What steps could be taken to reduce latency in the request-response process?
- What kind of data formats would be necessary for processing?
- Managing QuickSight for easy and informative visualizations.
We tried making a loosely-coupled architecture containing two micro-services. The first micro-service results in a single dashboard providing multiparty information and the second provide timely analytical records of DynamoDB data .
The user requesting the data is authenticated using the Cognito User pool. Once the authentication is done, the request can be made to two different GET endpoints of a single API Gateway. The lambda function behind both endpoints accesses a single DynamoDB table for data retrieval.
The DynamoDB table is populated with tons of data that needs to analysed frequently. The DynamoDB stream captures a time-ordered sequence of item-level modifications. Another lambda function get triggered every hour, listens to the DynamoDB stream, and writes the items to a Kinesis Data Firehose delivery stream.
The Kinesis Data Firehose sends the data to Amazon S3. The Glue crawler interrogates S3 files to determine their format and compression type and writes these properties into the Data Catalog. The Data Catalog is fed into Athena which creates an external table, set up partitions, and starts querying the data.
We use CloudWatch to manage metadata. It saves pipeline metrics that enable our lambda functions to make decisions during runtime. In order to maintain our architecture primarily serverless, we used API Gateway, Lambda functions, S3 bucket, DynamoDB, Glue, Athena, Kinesis data firehose, and Cognito. This serverless design enables cost savings and increased scalability without having to worry about infrastructure administration.
Accessing the Data
In the first pipeline, both lambda access the data from a single DynamoDB table maintained for data storage. The ID of the order( Item) and the timestamp collectively serve as the partition key for easy storage and retrieval of other multi-party (mainly buyers and sellers) data. In the second pipeline, the other lambda captures streams of the DynamoDB table, performs some transformation, and provides that to the Kinesis Data Firehose.
Using S3 bucket to collect raw data
Once the data is served to the Kinesis data Firehose, it transforms DynamoDB’s raw streaming data into Parquet format and fed it into the S3 bucket. One major benefit is that it dynamically partitions streaming data without us building our own processing pipelines.
The information from the source and target tables were automatically cataloged by an AWS Glue crawler as part of our pipeline, and AWS Glue ETL tasks used these catalogs to retrieve data from S3 and publish it to Athena.
Utilising data for analysis
The next tool we have is Athena, which is in charge of processing all the data and performing analysis on it to reveal hidden patterns and trends of transactions. This could be achieved by querying the data efficiently with the help of a distributed SQL engine provided by Athena.
The results can be used to further comprehend the ongoing trends in various groups of buyers and sellers and also to make important decisions about the parties providing maximum profits and the ones giving the losses. Since Athena uses Apache Hive to perform various Hive-compliant DDL commands, thus its makes the analysis easy. Also Athena charges by the amount of data scanned per query, we can save on costs and get better performance.
In the end, fresh data reports is being generated and given to the business excellence team
Handling pipeline failures
The lambda listening to the DynamoDB streams gets triggered every hour. The case of its execution failures, which could occur for any reason during runtime, was one of the pipeline’s challenges. As a result, we came up with SQS as a Dead Letter Queue (DLQ) with failure on retries as a solution.
If a lambda function fails to execute, it will initially retry three times to fix any dependency problems. If this doesn’t work, a message will be displayed with all the relevant information, such as the lambda name, error message, timestamp, etc. will be produced in DLQ, which will be configured as a destination on lambda failure. SNS will be configured to send error notifications later.
Deployment and maintenance
For this pipeline, our deployment process is automated through AWS CodeCommit pipelines as we are managing our code repositories using AWS CodeCommit. For infrastructure deployment, we have used AWS CloudFormation as Infrastructure as a code. We have maintained three separate environments, one for development and one for the test, and the last for production use. Using AWS CodeCommit’s automated CI/CD pipeline we have triggers on code lookup and deployment.
- AWS Lambda
- AWS DynamoDB
- AWS Cognito
- AWS SNS
- AWS Glue
- AWS S3
- AWS Athena
- AWS Glue data catalog
- Dynamo Stream
- API Gateway
Goldin Auctions works with multiple parties for buying and selling of collectibles hence needed a single dashboard to view transactional data between multiparty.
We needed a front end to analyze the data at the backend along with having a single dashboard to serve data requests for multiple parties. Since the data for both parties belong to single storage, the idea of having a single interface to view different information came into existence. And to expose it to users, a Dashboard was required too.
We tried making a loosely-coupled architecture containing two micro-services. The first micro-service results in a single dashboard providing multiparty information and the second provide timely analytical records of DynamoDB data.
The user requesting the data is authenticated using the Cognito User pool. Once the authentication is done, the request can be made to two different GET endpoints of a single API Gateway. The lambda function behind both endpoints accesses a single DynamoDB table for data retrieval