OpenSearch optimized occasion (OR1) is sport altering for indexing efficiency and value

Amazon OpenSearch Service securely unlocks real-time search, monitoring, and evaluation of enterprise and operational information to be used circumstances like utility monitoring, log analytics, observability, and web site search.

On this submit, we look at the OR1 occasion kind, an OpenSearch optimized occasion launched on November 29, 2023.

OR1 is an occasion kind for Amazon OpenSearch Service that gives an economical solution to retailer massive quantities of knowledge. A site with OR1 cases makes use of Amazon Elastic Block Retailer (Amazon EBS) volumes for major storage, with information copied synchronously to Amazon Easy Storage Service (Amazon S3) because it arrives. OR1 cases present elevated indexing throughput with excessive sturdiness.

To study extra about OR1, see the introductory weblog submit.

Whereas actively writing to an index, we suggest that you just maintain one reproduction. Nevertheless, you possibly can change to zero replicas after a rollover and the index is not being actively written.

This may be accomplished safely as a result of the information is continued in Amazon S3 for sturdiness.

Be aware that in case of a node failure and alternative, your information will probably be robotically restored from Amazon S3, however could be partially unavailable through the restore operation, so you shouldn’t contemplate it for circumstances the place searches on non-actively written indices require excessive availability.

Purpose

On this weblog submit, we’ll discover how OR1 impacts the efficiency of OpenSearch workloads.

By offering phase replication, OR1 cases save CPU cycles by indexing solely on the first shards. By doing that, the nodes are in a position to index extra information with the identical quantity of compute, or to make use of fewer sources for indexing and thus have extra accessible for search and different operations.

For this submit, we’re going to think about an indexing-heavy workload and do some efficiency testing.

Historically, Amazon Elastic Compute Cloud (Amazon EC2) R6g cases are a excessive performant selection for indexing-heavy workloads, counting on Amazon EBS storage. Im4gn cases present native NVMe SSD for top throughput and low latency disk writes.

We are going to evaluate OR1 indexing efficiency relative to those two occasion sorts, specializing in indexing efficiency just for scope of this weblog.

Setup

For our efficiency testing, we arrange a number of elements, as proven within the following determine:

Architecture diagram

For the testing course of:

The index mapping, which is a part of our initialization step, is as follows:

{
  "index_patterns": [
    "logs-*"
  ],
  "data_stream": {
    "timestamp_field": {
      "identify": "time"
    }
  },
  "template": {
    "settings": {
      "number_of_shards": <VARYING>,
      "number_of_replicas": 1,
      "refresh_interval": "20s"
    },
    "mappings": {
      "dynamic": false,
      "properties": {
        "traceId": {
          "kind": "key phrase"
        },
        "spanId": {
          "kind": "key phrase"
        },
        "severityText": {
          "kind": "key phrase"
        },
        "flags": {
          "kind": "lengthy"
        },
        "time": {
          "kind": "date",
          "format": "date_time"
        },
        "severityNumber": {
          "kind": "lengthy"
        },
        "droppedAttributesCount": {
          "kind": "lengthy"
        },
        "serviceName": {
          "kind": "key phrase"
        },
        "physique": {
          "kind": "textual content"
        },
        "observedTime": {
          "kind": "date",
          "format": "date_time"
        },
        "schemaUrl": {
          "kind": "key phrase"
        },
        "useful resource": {
          "kind": "flat_object"
        },
        "instrumentationScope": {
          "kind": "flat_object"
        }
      }
    }
  }
}

As you possibly can see, we’re utilizing a information stream to simplify the rollover configuration and maintain the utmost major shard measurement below 50 GiB, as per greatest practices.

We optimized the mapping to keep away from any pointless indexing exercise and use the flat_object subject kind to keep away from subject mapping explosion.

For reference, the Index State Administration (ISM) coverage we used is as follows:

{
  "coverage": {
    "default_state": "scorching",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_primary_shard_size": "50gb"
            }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": [
      {
        "index_patterns": [
          "logs-*"
        ]
      }
    ]
  }
}

Our common doc measurement is 1.6 KiB and the majority measurement is 4,000 paperwork per bulk, which makes roughly 6.26 MiB per bulk (uncompressed).

Testing protocol

The protocol parameters are as follows:

Variety of information nodes: 6 or 12
Jobs parallelism: 75, 40
Main shard depend: 12, 48, 96 (for 12 nodes)
Variety of replicas: 1 (whole of two copies)
Occasion sorts (every with 16 vCPUs):
- or1.4xlarge.search
- r6g.4xlarge.search
- im4gn.4xlarge.search

Cluster	Occasion kind	vCPU	RAM	JVM measurement
or1-target	or1.4xlarge.search	16	128	32
im4gn-target	im4gn.4xlarge.search	16	64	32
r6g-target	r6g.4xlarge.search	16	128	32

Be aware that the im4gn cluster has half the reminiscence of the opposite two, however nonetheless every surroundings has the identical JVM heap measurement of roughly 32 GiB.

Efficiency testing outcomes

For the efficiency testing, we began with 75 parallel jobs and 750 batches of 4,000 paperwork per consumer (a complete 225 million paperwork). We then adjusted the variety of shards, information nodes, replicas, and jobs.

Configuration 1: 6 information nodes, 12 major shards, 1 reproduction

For this configuration, we used 6 information nodes, 12 major shards, and 1 reproduction, we noticed the next efficiency:

Cluster	CPU utilization	Time taken	Indexing pace
or1-target	65-80%	24 min	156 kdoc/s	243 MiB/s
im4gn-target	89-97%	34 min	110 kdoc/s	172 MiB/s
r6g-target	88-95%	34 min	110 kdoc/s	172 MiB/s

Highlighted on this desk, im4gn and r6g clusters have very excessive CPU utilization, triggering admission management, which rejects doc.

The OR1 reveals a CPU under 80 p.c sustained, which is an excellent goal.

Issues to bear in mind:

In manufacturing, don’t neglect to retry indexing with exponential backoff to keep away from dropping unindexed paperwork due to intermittent rejections.
The majority indexing operation returns 200 OK however can have partial failures. The physique of the response have to be checked to validate that each one the paperwork have been listed efficiently.

By lowering the variety of parallel jobs from 75 to 40, whereas sustaining 750 batches of 4,000 paperwork per consumer (whole 120M paperwork), we get the next:

Cluster	CPU utilization	Time taken	Indexing pace
or1-target	25-60%	20 min	100 kdoc/s	156 MiB/s
im4gn-target	75-93%	19 min	105 kdoc/s	164 MiB/s
r6g-target	77-90%	20 min	100 kdoc/s	156 MiB/s

The throughput and CPU utilization decreased, however the CPU stays excessive on Im4gn and R6g, whereas the OR1 is displaying extra CPU capability to spare.

Configuration 2: 6 information nodes, 48 major shards, 1 reproduction

For this configuration, we elevated the variety of major shards from 12 to 48, which offers extra parallelism for indexing:

Cluster	CPU utilization	Time taken	Indexing pace
or1-target	60-80%	21 min	178 kdoc/s	278 MiB/s
im4gn-target	67-95%	34 min	110 kdoc/s	172 MiB/s
r6g-target	70-88%	37 min	101 kdoc/s	158 MiB/s

The indexing throughput elevated for the OR1, however the Im4gn and R6g didn’t see an enchancment as a result of their CPU utilization remains to be very excessive.

Lowering the parallel jobs to 40 and holding 48 major shards, we will see that the OR1 will get somewhat extra strain because the minimal CPU will increase from 12 major shards, and the CPU for R6g seems significantly better. For the Im4gn nonetheless, the CPU remains to be excessive.

Cluster	CPU utilization	Time taken	Indexing pace
or1-target	40-60%	16 min	125 kdoc/s	195 MiB/s
im4gn-target	80-94%	18 min	111 kdoc/s	173 MiB/s
r6g-target	70-80%	21 min	95 kdoc/s	148 MiB/s

Configuration 3: 12 information nodes, 96 major shards, 1 reproduction

For this configuration, we began with the unique configuration and added extra compute capability, transferring from 6 nodes to 12 and growing the variety of major shards to 96.

Cluster	CPU utilization	Time taken	Indexing pace
or1-target	40-60%	18 min	208 kdoc/s	325 MiB/s
im4gn-target	74-90%	20 min	187 kdoc/s	293 MiB/s
r6g-target	60-78%	24 min	156 kdoc/s	244 MiB/s

The OR1 and the R6g are performing properly with CPU utilization under 80 p.c, with OR1 giving 33 p.c higher efficiency with 30 p.c much less CPU utilization in comparison with R6g.

The Im4gn remains to be at 90 p.c CPU, however the efficiency can be superb.

Lowering the variety of parallel jobs from 75 to 40, we get:

Cluster	CPU utilization	Time taken	Indexing pace
or1-target	40-60%	11 min	182 kdoc/s	284 MiB/s
im4gn-target	70-90%	11 min	182 kdoc/s	284 MiB/s
r6g-target	60-77%	12 min	167 kdoc/s	260 MiB/s

Lowering the variety of parallel jobs to 40 from 75 introduced the OR1 and Im4gn cases on par and the R6g very shut.

Interpretation

The OR1 cases pace up indexing as a result of solely the first shards have to be written whereas the reproduction is produced by copying segments. Whereas being extra performant in comparison with Img4n and R6g cases, the CPU utilization can be decrease, which supplies room for added load (search) or cluster measurement discount.

We will evaluate a 6-node OR1 cluster with 48 major shards, indexing at 178 thousand paperwork per second, to a 12-node Im4gn cluster with 96 major shards, indexing at 187 thousand paperwork per second or to a 12-node R6g cluster with 96 major shards, indexing at 156 thousand paperwork per second.

The OR1 performs virtually in addition to the bigger Im4gn cluster, and higher than the bigger R6g cluster.

The way to measurement when utilizing OR1 cases

As you possibly can see within the outcomes, OR1 cases can course of extra information at increased throughput charges. Nevertheless, when growing the variety of major shards, they don’t carry out as properly due to the distant backed storage.

To get the very best throughput from the OR1 occasion kind, you need to use bigger batch sizes than regular, and use an Index State Administration (ISM) coverage to roll over your index based mostly on measurement in an effort to successfully restrict the variety of major shards per index. You may as well enhance the variety of connections as a result of the OR1 occasion kind can deal with extra parallelism.

For search, OR1 doesn’t immediately affect the search efficiency. Nevertheless, as you possibly can see, the CPU utilization is decrease on OR1 cases than on Im4gn and R6g cases. That allows both extra exercise (search and ingest), or the chance to scale back the occasion measurement or depend, which might lead to a value discount.

Conclusion and suggestions for OR1

The brand new OR1 occasion kind offers you extra indexing energy than the opposite occasion sorts. That is essential for indexing-heavy workloads, the place you index in batch day by day or have a excessive sustained throughput.

The OR1 occasion kind additionally permits price discount as a result of their worth for efficiency is 30 p.c higher than present occasion sorts. When including multiple reproduction, worth for efficiency will lower as a result of the CPU is barely impacted on an OR1 occasion, whereas different occasion sorts would have indexing throughput lower.

Try the whole directions for optimizing your workload for indexing utilizing this repost article.

In regards to the writer

Cédric Pelvet is a Principal AWS Specialist Options Architect. He helps clients design scalable options for real-time information and search workloads. In his free time, his actions are studying new languages and training the violin.

OpenSearch optimized occasion (OR1) is sport altering for indexing efficiency and value

Purpose

Setup

Testing protocol

Efficiency testing outcomes

Configuration 1: 6 information nodes, 12 major shards, 1 reproduction

Configuration 2: 6 information nodes, 48 major shards, 1 reproduction

Configuration 3: 12 information nodes, 96 major shards, 1 reproduction

Interpretation

The way to measurement when utilizing OR1 cases

Conclusion and suggestions for OR1

In regards to the writer

Related Articles

Greatest iPhone to purchase in 2025: Which iPhone is finest for you?

Apple to disclose tariff-stricken Q3 outcomes on July 31

Study Methods to Use and Entry Veo 3?

LEAVE A REPLY Cancel reply

Latest Articles

Greatest iPhone to purchase in 2025: Which iPhone is finest for you?

Apple to disclose tariff-stricken Q3 outcomes on July 31

Study Methods to Use and Entry Veo 3?

Working with Microsoft 365’s new Copilot APIs

Nothing Telephones to get Android 16 powered Nothing OS 4.0 later this 12 months