This put up is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance coverage Group.
The expansion in quantity and variety of logging sources has been growing exponentially over the previous couple of years, and can proceed to extend within the coming years. Consequently, prospects throughout all industries are dealing with a number of challenges akin to:
- Balancing storage prices in opposition to assembly long-term log retention necessities
- Bandwidth points when transferring logs between the cloud and on premises
- Useful resource scaling and efficiency points when attempting to investigate huge quantities of log information
- Preserving tempo with the rising storage necessities, whereas additionally with the ability to present insights from the info
- Aligning license prices for Safety Data and Occasion Administration (SIEM) distributors with log processing, storage, and efficiency necessities. SIEM options assist you to implement real-time reporting by monitoring your atmosphere for safety threats and alerting on threats as soon as detected.
Zurich Insurance coverage Group (Zurich) is a number one multi-line insurer offering property, casualty, and life insurance coverage options globally. In 2022, Zurich started a multi-year program to speed up their digital transformation and innovation by way of the migration of 1,000 purposes to AWS, together with core insurance coverage and SAP workloads.
The Zurich Cyber Fusion Middle administration crew confronted related challenges, akin to balancing licensing prices to ingest and long-term retention necessities for each enterprise utility log and safety log information inside the current SIEM structure. Zurich needed to determine a log administration answer to work along with their current SIEM answer. The brand new strategy would wish to supply the flexibleness to combine new applied sciences akin to machine studying (ML), scalability to deal with long-term retention at forecasted development ranges, and supply choices for value optimization. On this put up, we talk about how Zurich constructed a hybrid structure on AWS incorporating AWS companies to fulfill their necessities.
Answer overview
Zurich and AWS Skilled Providers collaborated to construct an structure that addressed decoupling long-term storage of logs, distributing analytics and alerting capabilities, and optimizing storage prices for log information. The answer was primarily based on categorizing and prioritizing log information into precedence ranges between 1–3, and routing logs to completely different locations primarily based on precedence. The next diagram illustrates the answer structure.
The workflow steps are as follows:
- All the logs (P1, P2, and P3) are collected and ingested into an extract, remodel, and cargo (ETL) service, AWS Accomplice Cribl’s Stream product, in actual time. Capturing and streaming of logs is configured per use case primarily based on the capabilities of the supply, akin to utilizing built-in forwarders, putting in brokers, utilizing Cribl Streams, and utilizing AWS companies like Amazon Knowledge Firehose. This ETL service performs two capabilities earlier than information reaches the analytics layer:
- Knowledge normalization and aggregation – The uncooked log information is normalized and aggregated within the required format to carry out analytics. The method consists of normalizing log discipline names, standardizing on JSON, eradicating unused or duplicate fields, and compressing to scale back storage necessities.
- Routing mechanism – Upon finishing information normalization, the ETL service will apply needed routing mechanisms to ingest log information to respective downstream techniques primarily based on class and precedence.
- Precedence 1 logs, akin to community detection & response (NDR), endpoint detection and response (EDR), and cloud menace detection companies (for instance, Amazon GuardDuty), are ingested on to the prevailing on-premises SIEM answer for real-time analytics and alerting.
- Precedence 2 logs, akin to working system safety logs, firewall, id supplier (IdP), electronic mail metadata, and AWS CloudTrail, are ingested into Amazon OpenSearch Service to allow the next capabilities. Beforehand, P2 logs have been ingested into the SIEM.
- Systematically detect potential threats and react to a system’s state by way of alerting, and integrating these alerts again into Zurich’s SIEM for bigger correlation, lowering by roughly 85% the quantity of information ingestion into Zurich’s SIEM. Ultimately, Zurich plans to make use of ML plugins akin to anomaly detection to boost evaluation.
- Develop log and hint analytics options with interactive queries and visualize outcomes with excessive adaptability and pace.
- Scale back the common time to ingest and common time to look that accommodates the growing scale of log information.
- Sooner or later, Zurich plans to make use of OpenSearch’s safety analytics plugin, which may help safety groups rapidly detect potential safety threats through the use of over 2,200 pre-built, publicly accessible Sigma safety guidelines or create customized guidelines.
- Precedence 3 logs, akin to logs from enterprise purposes and vulnerability scanning instruments, should not ingested into the SIEM or OpenSearch Service, however are forwarded to Amazon Easy Storage Service (Amazon S3) for storage. These could be queried as wanted utilizing one-time queries.
- Copies of all log information (P1, P2, P3) are despatched in actual time to Amazon S3 for extremely sturdy, long-term storage to fulfill the next:
- Lengthy-term information retention – S3 Object Lock is used to implement information retention per Zurich’s compliance and regulatory necessities.
- Price-optimized storage – Lifecycle insurance policies mechanically transition information with much less frequent entry patterns to lower-cost Amazon S3 storage lessons. Zurich additionally makes use of lifecycle insurance policies to mechanically expire objects after a predefined interval. Lifecycle insurance policies present a mechanism to steadiness the price of storing information and assembly retention necessities.
- Historic information evaluation – Knowledge saved in Amazon S3 could be queried to fulfill one-time audit or evaluation duties. Ultimately, this information may very well be used to coach ML fashions to help higher anomaly detection. Zurich has achieved testing with Amazon SageMaker and has plans so as to add this functionality within the close to future.
- One-time question evaluation – Easy audit use circumstances require historic information to be queried primarily based on completely different time intervals, which could be carried out utilizing Amazon Athena and AWS Glue analytic companies. By utilizing Athena and AWS Glue, each serverless companies, Zurich can carry out easy queries with out the heavy lifting of working and sustaining servers. Athena helps quite a lot of compression codecs for studying and writing information. Subsequently, Zurich is ready to retailer compressed logs in Amazon S3 to attain cost-optimized storage whereas nonetheless with the ability to carry out one-time queries on the info.
As a future functionality, supporting on-demand, advanced question, evaluation, and reporting on massive historic datasets may very well be carried out utilizing Amazon OpenSearch Serverless. Additionally, OpenSearch Service helps zero-ETL integration with Amazon S3, the place customers can question their information saved in Amazon S3 utilizing OpenSearch Service question capabilities.
The answer outlined on this put up gives Zurich an structure that helps scalability, resilience, value optimization, and adaptability. We talk about these key advantages within the following sections.
Scalability
Given the quantity of information presently being ingested, Zurich wanted an answer that would fulfill current necessities and supply room for development. On this part, we talk about how Amazon S3 and OpenSearch Service assist Zurich obtain scalability.
Amazon S3 is an object storage service that provides industry-leading scalability, information availability, safety, and efficiency. The entire quantity of information and variety of objects you may retailer in Amazon S3 are just about limitless. Primarily based on its distinctive structure, Amazon S3 is designed to exceed 99.999999999% (11 nines) of information sturdiness. Moreover, Amazon S3 shops information redundantly throughout a minimal of three Availability Zones (AZs) by default, offering built-in resilience in opposition to widespread catastrophe. For instance, the S3 Customary storage class is designed for 99.99% availability. For extra data, take a look at the Amazon S3 FAQs.
Zurich makes use of AWS Accomplice Cribl’s Stream answer to route copies of all log data to Amazon S3 for long-term storage and retention, enabling Zurich to decouple log storage from their SIEM answer, a standard problem dealing with SIEM options in the present day.
OpenSearch Service is a managed service that makes it simple to run OpenSearch with out having to handle the underlying infrastructure. Zurich’s present on-premises SIEM infrastructure is comprised of greater than 100 servers, all of which should be operated and maintained. Zurich hopes to scale back this infrastructure footprint by 75% by offloading precedence 2 and three logs from their current SIEM answer.
To help geographies with restrictions on cross-border information switch and to fulfill availability necessities, AWS and Zurich labored collectively to outline an Amazon OpenSearch Service configuration that may help 99.9% availability utilizing a number of AZs in a single area.
OpenSearch Service helps cross-region and cross-cluster queries, which helps with distributing evaluation and processing of logs with out transferring information, and gives the power to mixture data throughout clusters. Since Zurich plans to deploy a number of OpenSearch domains in numerous areas, they may use cross-cluster search performance to question information seamlessly throughout completely different regional domains with out transferring information. Zurich additionally configured a connector for his or her current SIEM to question OpenSearch, which additional permits distributed processing from on premises, and permits aggregation of information throughout information sources. Consequently, Zurich is ready to distribute processing, decouple storage, and publish key data within the type of alerts and queries to their SIEM answer with out having to ship log information.
As well as, lots of Zurich’s enterprise models have logging necessities that may be happy utilizing the identical AWS companies (OpenSearch Service, Amazon S3, AWS Glue, and Amazon Athena). As such, the AWS parts of the structure have been templatized utilizing Infrastructure as Code (IaC) for constant, repeatable deployment. These parts are already getting used throughout Zurich’s enterprise models.
Price optimization
In serious about optimizing prices, Zurich needed to contemplate how they’d proceed to ingest 5 TB per day of safety log data only for their centralized safety logs. As well as, traces of companies wanted related capabilities to fulfill necessities, which might embrace processing 500 GB per day.
With this answer, Zurich can management (by offloading P2 and P3 log sources) the portion of logs which are ingested into their main SIEM answer. Consequently, Zurich has a mechanism to handle licensing prices, in addition to enhance the effectivity of queries by lowering the quantity of data the SIEM must parse on search.
As a result of copies of all log information are going to Amazon S3, Zurich is ready to benefit from the completely different Amazon S3 storage tiers, akin to utilizing S3 Clever-Tiering to mechanically transfer information amongst Rare Entry and Archive Entry tiers, to optimize the price of retaining a number of years’ value of log information. When information is moved to the Rare Entry tier, prices are diminished by as much as 40%. Equally, when information is moved to the Archive Immediate Entry tier, storage prices are diminished by as much as 68%.
Check with Amazon S3 pricing for present pricing, in addition to for data by area. Shifting information to S3 Rare Entry and Archive Entry tiers gives a major value financial savings alternative whereas assembly long-term retention necessities.
The crew at Zurich analyzed precedence 2 log sources, and primarily based on historic analytics and question patterns, decided that solely the latest 7 days of logs are usually required. Subsequently, OpenSearch Service was right-sized for retaining 7 days of logs in a scorching tier. Reasonably than configuring UltraWarm and chilly storage tiers for OpenSearch Service, copies of the remaining logs have been concurrently being despatched to Amazon S3 for long-term retention and may very well be queried utilizing Athena.
The mix of cost-optimization choices is projected to scale back by 53% the price of per GB of log information ingested and saved for 13 months when in comparison with the earlier strategy.
Flexibility
One other key consideration for the structure was the flexibleness to combine with current alerting techniques and information pipelines, in addition to the power to include new expertise into Zurich’s log administration strategy. For instance, Zurich additionally configured a connector for his or her current SIEM to question OpenSearch, which additional permits distributed processing from on premises and permits aggregation of information throughout information sources.
Inside the OpenSearch Service software program, there are alternatives to increase log evaluation utilizing safety analytics with predefined indicators of compromise throughout widespread log sorts. OpenSearch Service additionally gives the aptitude to combine with ML capabilities akin to anomaly detection and alert correlation to boost log evaluation.
With the introduction of Amazon Safety Lake, there’s one other alternative to increase the answer to extra effectively handle AWS logging sources and add to this structure. For instance, you need to use Amazon OpenSearch Ingestion to generate safety insights on safety information from Amazon Safety Lake.
Abstract
On this put up, we reviewed how Zurich was capable of construct a log information administration structure that offered the scalability, flexibility, efficiency, and cost-optimization mechanisms wanted to fulfill their necessities.
To be taught extra about parts of this answer, go to the Centralized Logging with OpenSearch implementation information, overview Querying AWS service logs, or run by way of the SIEM on Amazon OpenSearch Service workshop.
Concerning the Authors
Clarisa Tavolieri is a Software program Engineering graduate with {qualifications} in Enterprise, Audit, and Technique Consulting. With an intensive profession within the monetary and tech industries, she focuses on information administration and has been concerned in initiatives starting from reporting to information structure. She presently serves because the World Head of Cyber Knowledge Administration at Zurich Group. In her position, she leads the info technique to help the safety of firm property and implements superior analytics to boost and monitor cybersecurity instruments.
Austin Rappeport is a Laptop Engineer who graduated from the College of Illinois Urbana/Champaign in 2011 with a spotlight in Laptop Safety. After commencement, he labored for the Federal Vitality Regulatory Fee within the Workplace of Electrical Reliability, working with the North American Electrical Reliability Company’s Essential Infrastructure Safety Requirements on each the audit and enforcement aspect, in addition to requirements improvement. Austin presently works for Zurich Insurance coverage because the World Head of Detection Engineering and Automation, the place he leads the crew chargeable for utilizing Zurich’s safety instruments to detect suspicious and malicious exercise and enhance inner processes by way of automation.
Samantha Gignac is a World Safety Architect at Zurich Insurance coverage. She graduated from Ferris State College in 2014 with a Bachelor’s diploma in Laptop Techniques & Community Engineering. With expertise within the insurance coverage, healthcare, and provide chain industries, she has held roles akin to Storage Engineer, Danger Administration Engineer, Vulnerability Administration Engineer, and SOC Engineer. As a Cybersecurity Architect, she designs and implements safe community techniques to guard organizational information and infrastructure from cyber threats.
Claire Sheridan is a Principal Options Architect with Amazon Net Providers working with world monetary companies prospects. She holds a PhD in Informatics and has greater than 15 years of {industry} expertise in tech. She loves touring and visiting artwork galleries.
Jake Obi is a Principal Safety Guide with Amazon Net Providers primarily based in South Carolina, US, with over 20 years’ expertise in data expertise. He helps monetary companies prospects enhance their safety posture within the cloud. Previous to becoming a member of Amazon, Jake was an Data Assurance Supervisor for the US Navy, the place he labored on a big satellite tv for pc communications program in addition to internet hosting authorities web sites utilizing the general public cloud.
Srikanth Daggumalli is an Analytics Specialist Options Architect in AWS. Out of 18 years of expertise, he has over a decade of expertise in architecting cost-effective, performant, and safe enterprise purposes that enhance buyer reachability and expertise, utilizing huge information, AI/ML, cloud, and safety applied sciences. He has constructed high-performing information platforms for main monetary establishments, enabling improved buyer attain and distinctive experiences. He’s specialised in companies like cross-border transactions and architecting sturdy analytics platforms.
Freddy Kasprzykowski is a Senior Safety Guide with Amazon Net Providers primarily based in Florida, US, with over 20 years’ expertise in data expertise. He helps prospects undertake AWS companies securely in response to {industry} finest practices, requirements, and compliance laws. He’s a member of the Buyer Incident Response Staff (CIRT), serving to prospects throughout safety occasions, a seasoned speaker at AWS re:Invent and AWS re:Inforce conferences, and a contributor to open supply tasks associated to AWS safety.