Systems integration has been the core competency in BlueSoft since the company’s founding in 2002. We use various ESB middleware solutions, commercial and open source, and help our customers integrate heterogeneous systems with different APIs.
But in some cases, traditional layered integration pattern may require too many compute resources. In such situations, we recommend another, more cost-effective solution.
The Limitations of Traditional Systems Integration
For one of our customers, we integrated on-premises systems and SaaS applications using the hybrid cloud on AWS. An open source ESB middleware made up the integration backbone: two instances deployed in a VPC in separate Availability Zones. The integration flows were decoupled using Messaging Oriented Middleware.
Unfortunately, sometimes legacy architectures impose performance limitations on an integrated project so it cannot become fully dynamic and event-driven. For example, some systems can only integrate using files that contain gigabytes of data, and their processing creates significant performance challenges.
In the above customer use case, one of the integration processes involved the following steps:
- Uploading a file that contained an extensive amount of sales data.
- Importing the file into the database to ensure data persistence and ease of querying using relations (which is hard to implement on flat files).
- DB load, which, if successful, invoked an ESB service.
- Using a DB adapter to query the database, join a couple of tables, and aggregate sales data for each sales representative.
- Preparing the output files.
- End of the process: sending a notification to the destination system via REST API that the file was ready to download.
The output files (one per sales rep) were served via the Nginx HTTP server deployed in the cloud. If an error appeared, an exception handler would publish a message to the Amazon Simple Notification Service (SNS) topic to notify the support team.
Event-Driven Approach – Serverless Integration
Following a Change Request for the above process, we decided to re-architect it and use serverless integration with AWS Lambdas. Along with a change in the data structure, we expected a significant increase in the volume size.
Traditionally, we would just change the underlying DB schema and load data from a file to the database for the ESB. Here’s what we did this time:
- We performed the necessary aggregations using the ETL tool and prepared the output files with the on-premises servers.
- Then we uploaded the output files to Amazon S3 using multipart upload to parallelize the upload and increase throughput.
- The ETL tool was already installed. We only had to configure it to grab files from the source system and to store its output on S3 after the processing had finished. Because of limited access to the console, we decided to use AWS Java SDK and pack the functionality into a jar file (easily executable by the ETL tool).
To reuse the existing support mechanism in case of an error, we reconfigured the ETL tool. It would publish a message to the current SNS topic & notify the support team (also deployed as a jar using AWS Java SDK).
We used the PUT event to trigger AWS Lambda to pick up the bucket files from S3. The bucket name and S3 key were fetched from the event. Next, we used these attributes to fetch the user meta data with AmazonS3Client. The information in the meta data was used to store a sales rep identifier and match it with a file when building a request to the output system.
For error handling, we configured AWS Lambda’s Dead Letter Queue in the SNS topic of the support team’s subscription.
Speed, Scaling, and Cost Reduction – Benefits of Serverless Approach
Of course, there are many integration patterns and architectural styles. There’s no one-size-fits-all integration solution. For this reason precisely, modifying a solution’s architecture may sometimes turn out to be the most beneficial option.
In our case, the enhancements we applied considerably improved the system’s performance and speed:
- We reduced the network overhead of storing data into the relational database and querying it from ESB in the cloud.
- The output files became available faster thanks to using on-premises ETL to aggregate data.
- Uploading data to S3 instead of storing on the EBS and hosting via Nginx resulted in 11 x 9 durability and high availability out-of-the-box.
- Replacing EC2 with Lambda offloaded the existing ESB instances and the volume that might require scaling out on heavy load.
- Thanks to events that trigger function invocation and DLQ, we don’t need to persist processing state to avoid duplicates.
Getting the right tools for the job is halfway to success. A bigger hammer is useless when what you need is a saw. In the same way, simply scaling up or out sometimes won’t do the trick.
As integrations get more and more complex, you have to combine several tools to improve performance and increase cost-effectiveness. In our case, coupling Amazon S3 with AWS Lambda allowed us to achieve an event-driven, flexible architecture. And our customer can now enjoy greater speed, better scaling, and reduced costs.