19.3 C
New York
Sunday, September 21, 2025

Trellix achieved 35% value financial savings and enhanced safety with Amazon OpenSearch Service


This can be a visitor publish by Leeneksh Dubey, Cloud Engineer at Trellix, in partnership with AWS.

Trellix, a worldwide chief in cybersecurity options, emerged in 2022 from the merger of McAfee Enterprise and FireEye. Serving over 40,000 enterprise clients worldwide, Trellix delivers the business’s most complete, open, and native AI-powered safety platform. Their resolution helps organizations construct operational resilience in opposition to superior threats by way of automated detection, investigation, and response capabilities.

Right this moment safety groups face an more and more complicated panorama of cybersecurity threats, whereas the quantity of safety and software logs grows exponentially. With restricted assets and personnel, groups wrestle to research all safety occasions, doubtlessly lacking rising threats. Trellix addresses these challenges by unifying safety instruments throughout endpoints, networks, cloud, and electronic mail right into a single, AI-powered platform. By automating risk detection, investigation, and response, it permits safety groups to establish and neutralize threats quicker whereas lowering operational complexity.

To handle exponential log progress throughout their multi-tenant, multi-Area infrastructure, Trellix used Amazon OpenSearch Service, Amazon OpenSearch Ingestion, and Amazon Easy Storage Service (Amazon S3) to modernize their log infrastructure. Dealing with challenges with self-managed Elasticsearch clusters on Amazon Elastic Compute Cloud (Amazon EC2), Trellix’s migration to managed OpenSearch Service considerably optimized their operations. This strategic implementation enabled them to course of terabytes of every day safety information throughout a number of AWS Areas whereas attaining a 35% discount in storage prices as of Q3 2024. The shift to managed companies saved as much as 10 hours of infrastructure upkeep time weekly, serving to builders focus extra on value-added duties.

On this publish, we share how, by adopting these AWS options, Trellix enhanced their system’s efficiency, availability, and scalability whereas lowering operational overhead.

Resolution overview

Trellix’s revolutionary log administration resolution, constructed on AWS companies, addresses the challenges of processing giant volumes of safety information throughout a number of Areas. This enterprise-grade structure demonstrates how organizations can successfully handle safety logs at scale whereas optimizing prices. The answer addresses three vital enterprise challenges: environment friendly administration of long-term log storage, scalable distribution of analytics and alerting capabilities, and optimization of storage prices throughout their multi-regional infrastructure. The structure is illustrated within the following diagram, demonstrating how Trellix managed the safety logs at scale whereas optimizing prices.

The Trellix safety log administration resolution on AWS implements a complete information pipeline that seamlessly handles log ingestion, processing, storage, and evaluation. Within the following sections, we discover the six steps of the workflow in additional element.

Step 1: Load information to Amazon S3

The answer begins with an information ingestion course of utilizing the Amazon S3 globally distributed and extremely scalable infrastructure. Uncooked safety and software logs are captured from a number of Regional deployments, serving to Trellix preserve each information sovereignty and low latency entry throughout numerous jurisdictions. These logs are then processed by the Trellix inside engine, which enriches them utilizing proprietary safety logic. This enriched dataset is subsequently saved again in Amazon S3, establishing a safe, scalable basis for additional safety analytics and downstream processing.

Step 2: Amazon SNS notification triggered by S3 Occasions

After the enriched information is efficiently saved in Amazon S3, the system initiates an event-driven automation sequence. Amazon S3 is configured to emit occasion notifications to an Amazon Easy Notification Service (Amazon SNS) matter at any time when new information is uploaded. Amazon SNS acts as a notification hub, effectively broadcasting these occasions to subscribed companies or endpoints. This strategy helps the structure stay responsive and decoupled, as a result of it permits numerous customers to be alerted in actual time as new information turns into accessible within the system.

Step 3: Message queuing in Amazon SQS

As the following step within the workflow, the SNS notifications are routed to Amazon Easy Queue Service (Amazon SQS), which serves as a sturdy and scalable queuing layer between producers and customers. This queue acts as a buffer, facilitating dependable and asynchronous supply of occasion metadata to downstream processing elements. The usage of Amazon SQS gives message persistence and fault tolerance, notably beneath high-throughput or failure eventualities, permitting OpenSearch Ingestion to course of incoming information in a managed and resilient method.

Step 4: Automated information processing with OpenSearch Ingestion

OpenSearch Ingestion constantly polls the SQS queue for brand new messages indicating the provision of information in Amazon S3. Upon receiving these messages, it makes use of its built-in integration capabilities to fetch the corresponding information immediately from Amazon S3. After the info is retrieved, the ingestion pipeline performs the required transformations earlier than forwarding it to the OpenSearch Service area. To facilitate optimum cost-efficiency and efficiency, Trellix chosen OR1 cases varieties for his or her OpenSearch deployment. These cases provide a excessive memory-to-vCPU ratio and are particularly optimized for intensive indexing and search workloads, making them perfect for dealing with large-scale log analytics operations.

Step 5: Log lifecycle setup utilizing Index State Administration

To optimize storage utilization and handle information retention, Trellix has carried out Index State Administration (ISM) insurance policies inside the OpenSearch Service. These insurance policies automate the lifecycle of ingested log information by transitioning it by way of outlined levels based mostly on age and entry patterns. Initially, logs reside within the sizzling tier for as much as 24 hours, enabling instant entry for real-time safety evaluation. As logs age past this threshold, they’re robotically transitioned to the UltraWarm storage, which gives a cheaper storage possibility whereas maintaining the info queryable. Lastly, after the predefined retention interval expires, the ISM coverage deletes the info from the system. This totally automated lifecycle administration strategy balances efficiency, compliance, and cost-efficiency.

Step 6: Complete monitoring and visualization

Utilizing the in depth monitoring capabilities of Amazon CloudWatch, complemented by Trellix’s in-house automations utilizing OpenSearch public APIs for customized monitoring, the answer gives end-to-end visibility by way of built-in visualization instruments. OpenSearch Dashboards gives safety groups with highly effective log evaluation and search capabilities, to allow them to dive deep into safety occasions and establish potential threats. Moreover, the answer makes use of Amazon Managed Grafana to create personalized dashboards that monitor each the info pipeline well being and OpenSearch cluster efficiency.

This twin visualization strategy delivers a number of advantages: real-time safety occasion monitoring and evaluation, complete efficiency metrics throughout the infrastructure, automated alerting for speedy risk response, customized dashboard views for various safety operations wants, and unified visibility throughout the a number of Regional deployments. The mixed energy of those instruments creates a sturdy monitoring framework that helps Trellix preserve a powerful safety posture whereas facilitating optimum efficiency throughout their international infrastructure.This six-step implementation demonstrates how AWS companies could be mixed to create a scalable, cost-efficient safety log administration resolution that processes terabytes of every day safety information whereas sustaining excessive efficiency and operational effectivity.

Key advantages

Trellix’s implementation of OpenSearch Service as their logging resolution delivered three vital benefits that reworked their safety operations.

Simplified log administration structure

Trellix streamlined their safety operations by implementing a cohesive log administration structure that avoids the complexity of managing a number of disparate instruments. Through the use of OpenSearch Ingestion, a completely managed serverless information pipeline, Trellix simplified their information pipeline for processing real-time safety information. The combination with Managed Grafana gives a unified visualization layer, enabling safety groups to give attention to risk detection fairly than infrastructure administration.

Scalability and resilience

The implementation of OpenSearch Service permits Trellix to attain unprecedented scalability and resilience of their safety operations. Trellix’s structure makes use of an OpenSearch Ingestion pipeline to supply easy dealing with of sudden log quantity spikes throughout a number of Regional deployments. OpenSearch Ingestion permits dynamic scaling with automated useful resource optimization, facilitating seamless capability administration as information volumes develop. This functionality helps Trellix preserve constant efficiency even during times of elevated safety occasion logging. The answer additionally implements a sturdy Multi-AZ deployment technique to keep up most resilience and steady service availability. Throughout self-healing testing, the structure demonstrated spectacular restoration occasions beneath 9 minutes when a node was rebooted, showcasing its capability to keep up enterprise continuity even in case of node failure. The automated failover capabilities facilitate minimal disruption to safety operations, so Trellix can preserve fixed vigilance over their clients’ safety posture. Lastly, the answer makes use of automated Amazon S3 backups mixed with hourly snapshots for complete point-in-time restoration capabilities. Every Area maintains extra buyer information replicas, making a multi-layered information safety technique that maintains the integrity and availability of vital safety info.

Easy scalability with optimized value

Trellix’s exponential progress in safety information processing demanded an answer that would scale dynamically whereas sustaining cost-efficiency. The strategic implementation of Amazon S3 and OpenSearch Service with UltraWarm storage supplied the muse for this scalable structure. UltraWarm, a completely managed heat storage tier for OpenSearch Service, revolutionized how Trellix manages their in depth safety information throughout a number of Areas. The answer makes use of UltraWarm’s revolutionary structure, which makes use of Amazon S3 for sturdy storage whereas sustaining quick question efficiency for safety evaluation. A key benefit of UltraWarm’s Amazon S3 backed structure is the elimination of index replicas, considerably lowering cluster dimension and related prices whereas sustaining information sturdiness.The clever log prioritization framework varieties the spine of Trellix’s information administration technique, categorizing incoming information based mostly on safety significance. This systematic strategy permits environment friendly routing of P2 and P3 log sources, optimized processing paths for various safety priorities, diminished load on main SIEM infrastructure, and customised dealing with based mostly on buyer necessities. The implementation has confirmed notably beneficial for safety log analytics, the place historic information evaluation is essential for risk detection and compliance necessities.The implementation delivered substantial operational and monetary advantages for Trellix. By combining priority-based routing and tiered storage administration, the group achieved a 35% discount in storage and compute prices whereas sustaining high-performance safety operations. The answer permits environment friendly storage and evaluation of intensive historic information, supporting Trellix’s dedication to complete safety monitoring whereas optimizing operational prices. This implementation demonstrates how AWS companies may help organizations optimize prices with out compromising safety capabilities or operational effectivity.

What’s subsequent

The profitable implementation of this resolution has positioned Trellix to discover extra AWS capabilities and rising applied sciences to reinforce their safety operations:

  • Integration of AWS ML/AI companies to investigate petabytes of safety log information
  • Implementation of ML-based anomaly detection inside OpenSearch Service
  • Utilizing safety analytics plugins for superior risk detection
  • Customized configurations and pre-built safety guidelines implementation

Abstract

Trellix efficiently modernized its log administration infrastructure by way of collaboration with AWS, implementing a complicated structure that addresses the challenges of processing terabytes of every day safety information throughout a number of Areas. Through the use of OpenSearch Service with UltraWarm nodes and integrating Amazon S3, the answer delivered vital efficiency enhancements, together with quicker log ingestion and streamlined operational administration. The structure’s revolutionary tiered storage strategy, mixed with optimized retention insurance policies, resulted in a 35% discount in storage prices whereas sustaining compliance necessities.This transformation has positioned Trellix to effectively deal with rising information volumes and evolving safety challenges, demonstrating how strategic use of cloud companies can concurrently enhance efficiency, cut back prices, and improve operational effectivity.


In regards to the authors

Leeneksh Dubey

Leeneksh Dubey

Leeneksh is a Cloud Engineer at Trellix, with experience in architecting scalable and resilient cloud infrastructure on AWS. He works extensively throughout information, analytics, and Al workloads protecting end-to-end resolution design, deployment automation, and value optimization. His focus is on constructing safe, high-performance environments that help the corporate’s cybersecurity product portfolio.

Harsh Bansal

Harsh Bansal

Harsh is an Analytics and AI Options Architect with Amazon Net Companies. Bansal collaborates intently with shoppers, helping of their migration to cloud platforms and optimizing cluster setups to reinforce efficiency and cut back prices. Earlier than becoming a member of AWS, Bansal supported shoppers in leveraging OpenSearch and Elasticsearch for numerous search and log analytics necessities.

Prashant Agrawal

Prashant Agrawal

Prashant is a Sr. Search Specialist Options Architect with Amazon OpenSearch Service. He works intently with clients to assist them migrate their workloads to the cloud and helps current clients fine-tune their clusters to attain higher efficiency and save on value. Earlier than becoming a member of AWS, he helped numerous clients use OpenSearch and Elasticsearch for his or her search and log analytics use circumstances. When not working, you’ll find him touring and exploring new locations. In brief, he likes doing Eat → Journey → Repeat.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles