10.9 C
New York
Sunday, March 16, 2025

A Code Implementation to Construct an AI-Powered PDF Interplay System in Google Colab Utilizing Gemini Flash 1.5, PyMuPDF, and Google Generative AI API


On this tutorial, we show the right way to construct an AI-powered PDF interplay system in Google Colab utilizing Gemini Flash 1.5, PyMuPDF, and the Google Generative AI API. By leveraging these instruments, we will seamlessly add a PDF, extract its textual content, and interactively ask questions, receiving clever responses from Google’s newest Gemini Flash 1.5 mannequin.

!pip set up -q -U google-generativeai PyMuPDF python-dotenv

First we set up the mandatory dependencies for constructing an AI-powered PDF Q&A system in Google Colab. google-generativeai offers entry to Gemini Flash 1.5, enabling pure language interactions, whereas PyMuPDF (often known as Fitz) permits environment friendly textual content extraction from PDFs. Additionally, python-dotenv helps handle setting variables, similar to API keys, securely inside the pocket book.

from google.colab import recordsdata
uploaded = recordsdata.add()

We add recordsdata out of your native machine to Google Colab. When executed, it opens a file choice dialog, permitting you to decide on a file (e.g., a PDF) to add. The uploaded file is saved in a dictionary-like object (uploaded), the place keys characterize file names and values comprise the file’s binary knowledge. This step is crucial for instantly processing paperwork, datasets, or mannequin weights in a Colab setting.

import fitz


def extract_pdf_text(pdf_path):
    doc = fitz.open(pdf_path)
    full_text = ""
    for web page in doc:
        full_text += web page.get_text()
    return full_text


pdf_file_path="/content material/Paper.pdf"
document_text = extract_pdf_text(pdf_path=pdf_file_path)
print("Doc textual content extracted!")
print(document_text[:1000]) 

We use PyMuPDF (fitz) to extract textual content from a PDF file in Google Colab. The perform extract_pdf_text(pdf_path) reads the PDF, iterates via its pages, and retrieves the textual content content material. The extracted textual content is then saved in document_text, with the primary 1000 characters printed to preview the content material. This step is essential for enabling text-based evaluation and AI-driven query answering from PDFs.

import os
os.environ["GOOGLE_API_KEY"] = 'Use your individual API key right here'

We set the Google API key as an setting variable in Google Colab. The API secret’s required to authenticate requests to Google Generative AI, permitting entry to Gemini Flash 1.5 for AI-powered textual content processing. Changing ‘Use your individual API key right here’ with a legitimate key ensures that the mannequin can generate responses securely inside the pocket book.

import google.generativeai as genai


genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


model_name = "fashions/gemini-1.5-flash-001"


def query_gemini_flash(query, context):
    mannequin = genai.GenerativeModel(model_name=model_name)
    immediate = f"""
Context: {context[:20000]}


Query: {query}


Reply:
"""
    response = mannequin.generate_content(immediate)
    return response.textual content


pdf_text = extract_pdf_text("/content material/Paper.pdf")


query = "Summarize the important thing findings of this doc."
reply = query_gemini_flash(query, pdf_text)
print("Gemini Flash Reply:")
print(reply)

Lastly, we configure and question Gemini Flash 1.5 utilizing a PDF doc for AI-powered textual content era. It initializes the genai library with the API key and masses the Gemini Flash 1.5 mannequin (gemini-1.5-flash-001). The query_gemini_flash() perform takes a query and extracted PDF textual content as enter, formulates a structured immediate, and retrieves an AI-generated response. This setup permits automated doc summarization and clever Q&A from PDFs.

In conclusion, following this tutorial, now we have efficiently constructed an interactive PDF-based interplay system in Google Colab utilizing Gemini Flash 1.5, PyMuPDF, and the Google Generative AI API. This resolution permits customers to extract info from PDFs and interactively question them simply. The mix of Google’s cutting-edge AI fashions and Colab’s cloud-based setting offers a strong and accessible strategy to course of massive paperwork with out requiring heavy computational sources.


Right here is the Colab Pocket book. Additionally, don’t neglect to observe us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Overlook to affix our 80k+ ML SubReddit.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles