In the present day, AWS introduced that Amazon Kinesis Knowledge Streams now helps document sizes as much as 10MiB – a tenfold enhance from the earlier restrict. With this launch, now you can publish intermittent bigger knowledge payloads in your knowledge streams whereas persevering with to make use of present Kinesis Knowledge Streams APIs in your functions with out extra effort. This launch is accompanied by a 2x enhance within the most PutRecords request dimension from 5MiB to 10MiB, simplifying knowledge pipelines and decreasing operational overhead for IoT analytics, change knowledge seize, and generative AI workloads.
On this put up, we discover Amazon Kinesis Knowledge Streams massive document assist, together with key use circumstances, configuration of most document sizes, throttling concerns, and finest practices for optimum efficiency.
Actual world use circumstances
As knowledge volumes develop and use circumstances evolve, we’ve seen rising demand for supporting bigger document sizes in streaming workloads. Beforehand, if you wanted to course of data bigger than 1MiB, you had two choices:
- Break up massive data into a number of smaller data in producer functions and reassemble them in client functions
- Retailer massive data in Amazon Easy Storage Service (Amazon S3) and ship solely metadata via Kinesis Knowledge Streams
Each these approaches are helpful, however they add complexity to knowledge pipelines, requiring extra code, rising operational overhead, and complicating error dealing with and debugging, significantly when clients must stream massive data intermittently.
This enhancement improves the benefit of use and reduces operational overhead for patrons dealing with intermittent knowledge payloads throughout varied industries and use circumstances. Within the IoT analytics area, related autos and industrial tools are producing rising volumes of sensor telemetry knowledge, with the dimensions of particular person telemetry data sometimes exceeding the earlier 1MiB restrict in Kinesis. This required clients to implement complicated workarounds, equivalent to splitting massive data into a number of smaller ones or storing the big data individually and solely sending metadata via Kinesis. Equally, in database change knowledge seize (CDC) pipelines, massive transaction data could be produced, particularly throughout bulk operations or schema adjustments. Within the machine studying and generative AI house, workflows are more and more requiring the ingestion of bigger payloads to assist richer function units and multi-modal knowledge varieties like audio and pictures. The elevated Kinesis document dimension restrict from 1MiB to 10MiB limits the necessity for most of these complicated workarounds, simplifying knowledge pipelines and decreasing operational overhead for patrons in IoT, CDC, and superior analytics use circumstances. Clients can now extra simply ingest and course of these intermittent massive knowledge data utilizing the identical acquainted Kinesis APIs.
The way it works
To start out processing bigger data:
- Replace your stream’s most document dimension restrict (
maxRecordSize) via the AWS Console, AWS CLI, or AWS SDKs. - Proceed utilizing the identical
PutRecordandPutRecordsAPIs for producers. - Proceed utilizing the identical
GetRecordsorSubscribeToShardAPIs for customers.
Your stream will probably be in Updating standing for a couple of seconds earlier than being able to ingest bigger data.
Getting began
To start out processing bigger data with Kinesis Knowledge Streams, you’ll be able to replace the utmost document dimension through the use of the AWS Administration Console, CLI or SDK.
On the AWS Administration Console,
- Navigate to the Kinesis Knowledge Streams console.
- Select your stream and choose the Configuration tab.
- Select Edit (subsequent to Most document dimension).
- Set your required most document dimension (as much as 10MiB).
- Save your adjustments.
Word: This setting solely adjusts the utmost document dimension for this Kinesis knowledge stream. Earlier than rising this restrict, confirm that every one downstream functions can deal with bigger data.
Most typical customers equivalent to Kinesis Shopper Library (beginning with model 2.x), Amazon Knowledge Firehose supply to Amazon S3 and AWS Lambda assist processing data bigger than 1 MiB. To be taught extra, seek advice from the Amazon Kinesis Knowledge Streams documentation for giant data.
You can too replace this setting utilizing the AWS CLI:
Or utilizing the AWS SDK:
Throttling and finest practices for optimum efficiency
Particular person shard throughput limits of 1MiB/s for writes and 2MiB/s for reads stay unchanged with assist for bigger document sizes. To work with massive data, let’s perceive how throttling works. In a stream, every shard has a throughput capability of 1 MiB per second. To accommodate massive data, every shard briefly bursts as much as 10MiB/s, finally averaging out to 1MiB per second. To assist visualize this habits, consider every shard having a capability tank that refills at 1MiB per second. After sending a big document (for instance, a 10MiB document), the tank begins refilling instantly, permitting you to ship smaller data as capability turns into accessible. This capability to assist massive data is repeatedly refilled into the stream. The speed of refilling relies on the dimensions of the big data, the dimensions of the baseline document, the general visitors sample, and your chosen partition key technique. Whenever you course of massive data, every shard continues to course of baseline visitors whereas leveraging its burst capability to deal with these bigger payloads.
For example how Kinesis Knowledge Streams handles totally different proportions of huge data, let’s study the outcomes a easy check. For our check configuration, we arrange a producer that sends knowledge to an on-demand stream (defaults to 4 shards) at a fee of fifty data per second. The baseline data are 10KiB in dimension, whereas massive data are 2MiB every. We performed a number of check circumstances by progressively rising the proportion of huge data from 1% to five% of the whole stream visitors, together with a baseline case containing no massive data. To make sure constant testing circumstances, we distributed the big data uniformly over time for instance, within the 1% situation, we despatched one massive document for each 100 baseline data. The next graph reveals the outcomes:

Within the graph, horizontal annotations point out throttling prevalence peaks. The baseline situation, represented by the blue line, reveals minimal throttling occasions. Because the proportion of huge data will increase from 1% to five%, we observe a rise within the fee at which your stream throttles your knowledge, with a notable acceleration in throttling occasions between the two% and 5% situations. This check demonstrates how Kinesis Knowledge Streams manages rising proportion of huge data.
We suggest sustaining massive data at 1-2% of your whole document depend for optimum efficiency. In manufacturing environments, precise stream habits varies primarily based on three key elements: the dimensions of baseline data, the dimensions of huge data, and the frequency at which massive data seem within the stream. We suggest that you just check along with your demand sample to find out the particular habits.
With on-demand streams, when the incoming visitors exceeds 500 KB/s per shard, it splits the shard inside quarter-hour. The mum or dad shard’s hash key values are redistributed evenly throughout baby shards. Kinesis routinely scales the stream to extend the variety of shards, enabling distribution of huge data throughout a bigger variety of shards relying on the partition key technique employed.
For optimum efficiency with massive data:
- Use a random partition key technique to distribute massive data evenly throughout shards.
- Implement backoff and retry logic in producer functions.
- Monitor shard-level metrics to establish potential bottlenecks.
Should you nonetheless must repeatedly stream of huge data, think about using Amazon S3 to retailer payloads and ship solely metadata references to the stream. Check with Processing massive data with Amazon Kinesis Knowledge Streams for extra info.
Conclusion
Amazon Kinesis Knowledge Streams now helps document sizes as much as 10MiB, a tenfold enhance from the earlier 1MiB restrict. This enhancement simplifies knowledge pipelines for IoT analytics, change knowledge seize, and AI/ML workloads by eliminating the necessity for complicated workarounds. You may proceed utilizing present Kinesis Knowledge Streams APIs with out extra code adjustments and profit from elevated flexibility in dealing with intermittent massive payloads.
- For optimum efficiency, we suggest sustaining massive data at 1-2% of whole document depend.
- For finest outcomes with massive data, implement a uniformly distributed partition key technique to evenly distribute data throughout shards, embrace backoff and retry logic in producer functions, and monitor shard-level metrics to establish potential bottlenecks.
- Earlier than rising the utmost document dimension, confirm that every one downstream functions and customers can deal with bigger data.
We’re excited to see the way you’ll leverage this functionality to construct extra highly effective and environment friendly streaming functions. To be taught extra, go to the Amazon Kinesis Knowledge Streams documentation.
