Web application performance is critical for success these days. There are numerous performance measurements that affect a customer’s experience, but one thing is for sure: slow response times can cause customers to leave your site. Understanding the importance of this, the engineers at Spokeo decided to dig deeper after a few of their site’s search engine optimization (SEO) pages experienced elevated response times. A performance profiling effort revealed a few underlying issues that needed to be addressed.
Looking at the bigger picture, however, the engineering team and software developers realized better visibility was needed over the entire site’s performance, including the ability to monitor trends over time. There was a lot to track with 30 environments, thousands of page categories, and billions of potential URL permutations to monitor. Using the Spokeo tech initiative process, the engineers brought forward a proposal to build a Performance Audit tool. (Read our Tech Blog about how Spokeo’s innovative tech initiative process resulted in other engineer-led improvements.)
The team presented the idea to the broader group of tech leads and the chief technology officer (CTO). The project was approved, and the team built the initial version with a focus on auditing SEO pages. The effort provided monitoring capabilities and insight into the performance, SEO, and accessibility scores of monitored pages. As a result, the team improved SEO scores to values over 90 across numerous categories.
Early Success Brings Scaling and Maintenance Challenges
Spokeo’s Performance Audit quickly became a valuable in-house monitoring tool. Performance metrics were stored and analyzed before and after each application release, enabling the team to ensure a responsive user experience for their customers. There was performance data going back to 2019 stored in AWS (Amazon) Redshift with the ability to visually report on it using Tableau.
The implementation used Lighthouse, an open-source browser-based tool in Chrome that runs a series of performance and quality tests against a page and generates a report with the findings. Unfortunately, support for this software in Lambda was dropped. The performance auditing tool needed to be refactored.
Additionally, the tool experienced reliability issues, and a few sprints were dedicated to refactoring, but debugging problems and resolving issues was a painful process. Maintaining the tool was challenging due to its steep learning curve. The team wanted a cloud-based solution that would scale to meet their needs and remove the need for desktop software.
The experience with the initial solution during the last few years clarified the use cases, presenting the engineers with a clear vision of what was needed. It was time for Performance Audit 2.0.
Iterating the Design of Performance Audit 2.0
The team needed a reliable, efficient performance auditing solution that could run on the cloud and provide the required reporting capabilities. The first iteration of the new design was based on running Lighthouse workers on Lambda, a serverless compute environment on AWS.
Although a lift-and-shift approach seemed simple on the surface, the detailed design became overly complex. AWS Simple Queue Service (SQS) messages would be used to trigger Lambda executions. However, SQS was not part of the current operational footprint and would add to the DevOps burden.
The initial Lighthouse implementation did not translate well to the serverless paradigm as it was a browser-based application in part. Additionally, the Lighthouse binaries could no longer be fully upgraded on AWS Lambda. Thus, the team modified the design by moving the Lighthouse component to run on AWS Elastic Container Service (ECS). This change gave the engineers more control over the runtime environment so the dependencies could be upgraded.
The original implementation used a custom locking implementation to prevent two parallel runs from occurring at the same time. The redesign included a move from SQS to a Redis-based queue solution called Bull. The move improved the maintainability as it used existing solutions to solve the distributed locking problem. Tests were initiated upon reading messages from a queue, which are either scheduled or initiated by a user. The Lighthouse ECS worker cluster scales based on the backlog of queue messages.
The final architecture is shown in the diagram below. A web-based user interface is hosted on EC2 and an RDS database is used to store application data. AWS Redshift continues to be used as the data warehouse and Tableau for reporting against that data.
Spokeo: A home for Creative, Passionate Software Developers
Spokeo is a forward-thinking company that values software engineers and the creativity they bring to the table. The evolution of the Performance Audit tool was driven by the engineering team and software developers. The design evolved over time to best leverage existing components on the cloud. It also improved maintainability by providing an ecosystem for developers to extend and improve the system.
Spokeo is a mid-size company that supports an engineering team of about 60 developers, giving them the security and benefits of a larger company with the feel and energy of a startup. Developers have the opportunity to work on the most important projects and priorities within the company. Engineers also receive guidance on their career paths with access to projects that can lead to skill growth in their areas of interest.
Each software developer at Spokeo has the opportunity to make a huge impact while working on exciting new projects and technologies. If you want to jumpstart your career, check out Spokeo’s software developer jobs now.