Passionate sports activities viewers count on to simply uncover and entry sports activities occasions and their favourite groups, leagues, and gamers. Offering a sturdy and intuitive search expertise is essential for the success of Prime Video Sports activities. With an enormous, quickly rising catalog of reside and on-demand sports activities choices, a well-designed search structure permits Prime Video Sports activities to cater to this engaged viewers, streamlining navigation and lowering friction within the person expertise. The Prime Video search expertise is among the most clicked on components within the world navigation bar. Search allows extremely related suggestions and drives elevated viewership and engagement. By prioritizing a seamless search expertise that caters to the wants of sports activities followers, Prime Video has enhanced the general buyer expertise, fostering belief and loyalty that contributes to the platform’s long-term development and success. On this submit, we’ll stroll you thru how Prime Video used Amazon OpenSearch Service and its AI and machine studying (AI/ML) capabilities to construct a extra intuitive and enhanced sports activities search expertise.
Challenges
The Prime Video search expertise was initially designed to assist clients uncover trending films and TV reveals that carry sturdy stats together with scores, viewership, and so forth. As Prime Video started to amass sports activities rights, they wanted to rethink the strategy, which was centered totally on TV reveals and films, to grasp the purchasers’ intent and floor the suitable content material. The strategy for TV reveals and films didn’t work as properly for reside sports activities due to the extra temporal and seasonal nature of sports activities content material making each title a chilly begin. For instance, a seek for “soccer reside” surfaced documentaries corresponding to “That is soccer: Season 1” and “Ronaldo VS Messi – Face Off!” moderately than reside soccer matches. Whereas these leisure choices are completely effective on their very own, they didn’t fulfill the purchasers’ aim of discovering and watching reside or upcoming video games for his or her favourite sports activities. This disconnect between search queries and related outcomes created challenges for purchasers making an attempt to entry the sports activities content material they wished. By surfacing these related sports activities occasions in search outcomes, Prime Video enhanced the shopper expertise, serving to clients uncover the complete breadth of sports activities protection accessible on Prime Video and discovering their favourite sports activities occasions. To deal with these points and higher serve the wants of sports activities followers, in 2024, Prime Video enhanced its sports-specific search capabilities, incorporating deeper sports activities understanding and utilizing state-of-the-art search strategies, creating an improved and clever search system.
Answer overview
In 2024, Prime Video Sports activities Search delivered the primary model of an enhanced sports activities search performance powering the expertise by means of a two layer answer comprised of coarse retrieval utilizing semantic search and binary search relevance classification. Semantic search is a method of looking for data that goes past simply matching key phrases. It matches queries to information (sports activities occasions on this case) primarily based on vector embeddings, which seize the that means of phrases, phrases, and sentences. The vectors can have n dimensions; when mapped into an n-dimensional house, information that’s shut in semantic that means (not a direct textual content match) will probably be shut to one another within the house, as proven within the following diagram of a two-dimensional vector house of sports activities matches (in yellow) and search queries (in inexperienced).
The muse of utilizing vector seek for sports activities is the creation of vector embeddings for every sport occasion current within the Prime Video Sports activities Catalog. As occasion information is ingested, textual data together with title, sports activities, crew names, leagues, and different occasion particulars are used to generate a singular vector illustration for every sports activities occasion. This permits the system to seize the semantic that means and relationships between completely different occasions—together with abbreviations, nicknames, and so forth—which might be usually utilized by clients to go looking. When a buyer searches for one thing associated to sports activities, their question can also be transformed right into a vector. The system then performs a Okay-nearest neighbor (KNN) search, evaluating the shopper’s question vector to the vectors of all sports activities occasions within the catalog. The occasions with vectors which might be closest to the question vector are recognized as essentially the most related matches, even when the searched phrases weren’t straight listed. For instance, Thursday Evening Soccer occasions is perhaps listed with out the abbreviation tnf, nonetheless these video games will probably be returned by semantic search if a buyer searches utilizing “tnf” as their search question.
The next determine reveals a excessive stage indexing and question circulation for a KNN vector search.
Discovering the closest vectors isn’t sufficient—the system additionally runs every of those probably related occasions by means of a customized binary relevance classification machine studying (ML) mannequin, skilled in-house. This permits the system to filter out any occasions that is perhaps solely tangentially associated to the unique search, abandoning a refined listing of essentially the most pertinent and related outcomes for the shopper.
Lastly, these extremely related occasions are ranked and surfaced to the shopper with components just like the occasion’s present reside standing and upcoming schedule taking part in a key function in figuring out the optimum order to show the outcomes. This mixed use of vector semantic search and relevance classification allows Prime Video to supply clients with a sports activities search expertise that precisely surfaces the content material they’re on the lookout for, considerably enhancing their potential to find and entry the reside, upcoming, and lately ended video games that they’re most curious about.
Process
The vector semantic search implementation we developed consists of two predominant elements: a KNN search index and an endpoint to invoke the textual content embedding mannequin. To host these elements, we used AWS companies—the customized textual content embedding mannequin was deployed on Amazon SageMaker, whereas the KNN index was created utilizing OpenSearch Service, and hosted on a managed cluster consisting of greater than 50 information nodes.
Each of those elements are designed to deal with real-time buyer visitors at a scale of 1000’s of requests per second. We simplified our system’s software layer through the use of ready-to-use options accessible in AWS. The Amazon OpenSearch Ingestion pipeline enabled a seamless, code-free integration, permitting us to put in writing sports activities information from an Amazon DynamoDB desk straight into the OpenSearch Service index, eliminating the necessity for conventional extract, remodel, and cargo (ETL) processes. Moreover, we used the Neural Search function of OpenSearch Service as an alternative of straight integrating our software layer with SageMaker for text-to-vector conversion. This strategy allows inner text-to-vector transformation, facilitating vector search throughout each ingestion and search phases. The Neural Search plugin of OpenSearch Service straight communicates with a textual content embedding mannequin deployed on SageMaker as a real-time inference endpoint utilizing ML connectors.
This structure—illustrated within the following determine—enabled us to construct a scalable and environment friendly vector search answer, making the most of the strengths of assorted AWS companies to simplify the implementation and enhance efficiency.
OpenSearch Ingestion : No-ETL information switch from DynamoDB to an OpenSearch Service index
Earlier than indexing the sports activities information in OpenSearch Service, the information is first saved in a DynamoDB desk. This layer of storage permits us to take care of a database of all sports activities occasions and their metadata required to allow search. This layer acts as a supply of reality for sports activities information that isn’t impacted by the evolution of buyer use circumstances and their respective implementation.
To seamlessly switch this information from DynamoDB to the OpenSearch Service index, we used an OpenSearch Ingestion pipeline. This allowed us to arrange real-time information switch with a zero ETL integration, abstracting away the information indexing from the applying layer. The OpenSearch Ingestion pipeline configuration allows us to specify a schema mapping between the DynamoDB desk and the anticipated doc schema in OpenSearch Service. This configuration additionally permits us to carry out information formatting operations on particular fields and configure a dead-letter queue (DLQ) if wanted. The steps to setup an OpenSearch Ingestion pipeline will be present in this weblog submit.
Embedding mannequin setup on SageMaker
On the core of our vector search implementation is the text-embedding mannequin, which performs a vital function in capturing the semantic that means of sports-related information. The Sports activities Search Science crew developed this text-embedding mannequin and deployed it on SageMaker as a real-time inference endpoint utilizing AWS Cloud Growth Package (AWS CDK).
The method of making the SageMaker endpoint requires two key artifacts:
With these two elements in place, we used the AWS CDK to programmatically provision the SageMaker endpoint, guaranteeing a seamless and constant deployment of the text-embedding mannequin. By utilizing the capabilities of AWS companies, corresponding to SageMaker, Amazon ECR, and Amazon S3, we have been in a position to construct a scalable and environment friendly text-embedding mannequin infrastructure to energy the vector search answer.
ML connectors
To facilitate entry to machine studying fashions hosted on platforms, corresponding to SageMaker or Amazon Bedrock, OpenSearch Service supplies ML connectors. These connectors allow direct integration between OpenSearch Service and exterior machine studying fashions.
In our case, the ML connector permits OpenSearch Service to straight invoke the SageMaker endpoint the place our customized text-embedding mannequin is deployed. This built-in integration between OpenSearch Service and the SageMaker hosted mannequin simplifies the general structure and eliminates the necessity for the applying layer to handle the communication between these two elements.
By utilizing the ML connectors offered by the OpenSearch Service ML plugin, we have been in a position to seamlessly combine our text-embedding mannequin—which is hosted on SageMaker—into the OpenSearch-powered vector search answer. This integration streamlines the information ingestion and querying pipeline making the implementation easier and extra intuitive.
Neural search
To simplify the applying layer of our vector search answer, we used the Neural Search capabilities offered by OpenSearch Service. This function permits us to ship solely the textual content information to the index, with out the necessity to explicitly handle the vector embedding technology and indexing. Utilizing neural search helped simplify the applying layer of the system by abstracting the generations and administration of vectors required to carry out a KNN search. Throughout ingestion, neural search transforms doc textual content into vector embeddings and indexes each the textual content and its vector embeddings in a vector index. Once you use a neural question throughout search, neural search converts the question textual content into vector embeddings, makes use of vector search to check the question and sports activities occasion embeddings, and returns the closest outcomes. This abstracts away the necessity to combine with SageMaker within the software layer to generate vector embeddings throughout ingestion and search.
The method of establishing a neural search index with a SageMaker-hosted inference endpoint includes the next detailed steps:
- Create an ML connector and register your mannequin in OpenSearch Service: This step generates a mannequin ID that you simply’ll want within the subsequent neural index setup.
- Create a neural ingest pipeline: An ingest pipeline is a sequence of processors which might be utilized to paperwork as they’re ingested into an index. To allow neural search, you possibly can outline the text_embedding processor within the pipeline. This processor converts the textual content in a doc discipline to vector embeddings, and the
field_map
configuration determines the enter and output fields for this course of. - Create the neural search index: To make use of the textual content embedding processor outlined within the ingest pipeline, you possibly can create a KNN index and specify the pipeline created within the earlier step because the default pipeline.
- Run a neural question: To confirm your neural search setup, run a neural question by offering a search textual content and consider the outcomes.
By following these steps, you possibly can arrange a neural search index in OpenSearch Service and run a neural question. The neural question can carry out KNN vector search internally, whereas solely requiring the enter of textual content information throughout each indexing and querying. This simplifies the applying layer and makes use of the built-in vector embedding technology and indexing capabilities offered by the OpenSearch Service Neural Search function.
Outcomes
The preliminary launch of this structure for sports activities search had a measurably constructive impression on buyer expertise. We noticed a statistically important improve in search-attributed conversions together with streams, purchases, subscriptions, and so forth. Offline evaluation of the outcomes delivered to clients indicated an enchancment within the precision of search outcomes and a discount within the irrelevance charge of the content material proven.
Moreover, we noticed that clients engaged with the search function extra steadily, because it was now surfacing outcomes that rather more intently aligned with what they have been on the lookout for. This elevated engagement led to higher discovery of related titles on the Prime Video service, together with titles that had acquired little engagement previous to the modifications.
General, the information clearly demonstrated that by tailoring the particular wants of sports activities followers into the search expertise, we considerably improved their potential to seek out and entry desired content material. By creating a better search system that higher understands sports activities intent, we’ve pushed extra significant buyer exercise and elevated conversions straight from search interactions.
Conclusion
By utilizing the modern AI/ML capabilities of Amazon OpenSearch Service, Prime Video was in a position to create a cutting-edge search expertise that successfully addressed the distinctive challenges introduced by extremely dynamic, high-volume sports activities content material. As well as, by overcoming the hurdles that include such giant scale, Prime Video Sports activities Search was in a position to contribute worthwhile enhancements and enhancements again to the OpenSearch open supply neighborhood. These contributions assist to pave the best way for different builders to extra readily use the superior AI/ML options that OpenSearch Service gives.
This collaboration between Prime Video Sports activities Search and OpenSearch Service has resulted in a best-in-class search functionality that may seamlessly accommodate the distinctive necessities of reside sports activities content material. It’s a partnership that has allowed the merchandise to develop and innovate in tandem, to the advantage of clients looking for distinctive search and discovery experiences.
If you wish to construct a search expertise that understands person intent past key phrase matching, strive the semantic search algorithm with OpenSearch Service and its AI/ML capabilities. You probably have any questions, go away a remark beneath.
Concerning the authors
Radhika Chandak is a Software program Growth Engineer at Amazon Prime Video, the place she has been working for the previous 3 years. Her focus is on creating high-velocity buyer experiences, with a specific emphasis on constructing state-of-the-art search experiences for sports activities content material. Radhika is captivated with creating options that remedy buyer issues and delight customers. Her experience lies in crafting modern approaches to reinforce the Prime Video Sports activities platform, guaranteeing seamless and fascinating experiences for sports activities lovers.
Anna Chalupowicz is a Software program Growth Supervisor at Amazon Prime Video Sports activities, with 6 years of various expertise inside Amazon. For the final 3.5 years, Anna has been working in Prime Video Sports activities, the place she focuses on creating high-scale options and architectural approaches that straight profit clients. With a ardour for collaborative studying and information sharing, Anna finds pleasure in tackling advanced technical challenges and utilizing data-driven insights to reinforce the shopper expertise.
Yaliang Wu is a Software program Engineering Supervisor at AWS, specializing in OpenSearch initiatives, machine studying, and generative AI functions.