JavaScript Mapping Library
Way back in March of this year, I took a look at using Google’s Gemini APIs to analyze PDF documents ("Using PDF Content with Google Gemini"). At the time, the Gemini API didn’t support PDF documents, so I made use of our (Adobe) PDF Extract service to get the text content out from the document. This "worked" but was possibly less than ideal as my "glom all the text together" approach didn’t really represent the PDF well. The PDF Extract API returns information about text context (like if it is a header for example), but my method ignored that. I’m happy to share that Gemini now supports PDF files natively. Let’s take a look at how this works.
To begin, you need to provide your PDF to Gemini. This is done via the Files API. I blogged about this a few months ago and it’s a rather simple process. You can upload files up to 2 gigs with a limit of 20 per project. These files are stored temporarily, but last for 48 hours so you can absolutely upload, do some "stuff", and then either delete them via an API call or let them expire naturally.
That aspect of the code hasn’t changed, but I’ll share the general function here.
import { GoogleAIFileManager } from "@google/generative-ai/server";const fileManager = new GoogleAIFileManager(API_KEY);const uploadResponse = await fileManager.uploadFile("adobe_security_properly_ocr.pdf", { mimeType: "application/pdf", displayName: "Adobe Security PDF",});
Once the file is uploaded, you then just include the reference in your prompt. Again, this is no different than what I showed in that earlier post, but here it is in action:
import { GoogleGenerativeAI } from "@google/generative-ai";const genAI = new GoogleGenerativeAI(API_KEY);const model = genAI.getGenerativeModel({ // Choose a Gemini model. model: "gemini-1.5-flash",});// Generate content using text and the URI reference for the uploaded file.let result = await model.generateContent([ { fileData: { mimeType: uploadResponse.file.mimeType, fileUri: uploadResponse.file.uri } }, { text: "Can you summarize this document as a bulleted list?" }, ]);
And that’s literally it. For an incredibly exciting document relating to Adobe’s security policies, I get:
The VSR program is a critical component of Adobe’s information security strategy, ensuring that third-party vendors comply with Adobe’s security standards and protect sensitive data.
Summarizing is just one thing you can do of course, I also tried a prompt for categorization:
Return a list of categories that define the content of this document. Return your result as a comma-delimited list.
Using the same upload reference, I got this:
This seemed to work well, but I’d be curious to know if you could restrict the returned categories to a certain set. I haven’t tested that yet, and of course, you could keep a ‘sanitized’ list in code and only use results that match.
Here’s the entire script for this demo (and I’ll link to the repo at the end):
import { GoogleAIFileManager } from "@google/generative-ai/server";import { GoogleGenerativeAI } from "@google/generative-ai";let API_KEY = process.env.GOOGLE_AI_KEY;// Initialize GoogleAIFileManager with your API_KEY.const fileManager = new GoogleAIFileManager(API_KEY);const genAI = new GoogleGenerativeAI(API_KEY);const model = genAI.getGenerativeModel({ // Choose a Gemini model. model: "gemini-1.5-flash",});// Upload the file and specify a display name.const uploadResponse = await fileManager.uploadFile("adobe_security_properly_ocr.pdf", { mimeType: "application/pdf", displayName: "Adobe Security PDF",});// Generate content using text and the URI reference for the uploaded file.let result = await model.generateContent([ { fileData: { mimeType: uploadResponse.file.mimeType, fileUri: uploadResponse.file.uri } }, { text: "Can you summarize this document as a bulleted list?" }, ]);// Output the generated text to the consoleconsole.log(result.response.text())console.log('-'.repeat(80));result = await model.generateContent([ { fileData: { mimeType: uploadResponse.file.mimeType, fileUri: uploadResponse.file.uri } }, { text: "Return a list of categories that define the content of this document. Return your result as a comma-delimited list." }, ]);// Output the generated text to the consoleconsole.log(result.response.text())
Of course, the benefits can get even better if you want to work with multiple documents at once. In order for that to work, you just upload more items, and refer to them in your prompt. For example:
// Upload the file and specify a display name.const uploadResponse = await fileManager.uploadFile("hamlet.pdf", { mimeType: "application/pdf", displayName: "Hamlet",});const uploadResponse2 = await fileManager.uploadFile("romeo-and-juliet.pdf", { mimeType: "application/pdf", displayName: "Romeo and Juliet",});
That’s the uploads, and here they are in use in a prompt:
let result = await model.generateContent([ { fileData: { mimeType: uploadResponse.file.mimeType, fileUri: uploadResponse.file.uri } }, { fileData: { mimeType: uploadResponse2.file.mimeType, fileUri: uploadResponse2.file.uri } }, { text: "Compare these two plays and discuss similar themes as well as major differences." }, ]);
This is what I got comparing Hamlet and Romeo and Juliet:
Similarities:
Differences:
In conclusion, both Hamlet and Romeo and Juliet are powerful and enduring tragedies that explore universal themes of love, death, fate, and the human condition. While they share similarities in themes, their focus on conflict and their approach to characters and language create distinct dramatic experiences. Hamlet delves into the complexities of individual psychology and morality, while Romeo and Juliet explores the destructive power of external forces and societal conflict.
It’s been a while since I’ve read these plays, but it feels like a pretty good comparison. Here’s that script:
import { GoogleAIFileManager } from "@google/generative-ai/server";import { GoogleGenerativeAI } from "@google/generative-ai";let API_KEY = process.env.GOOGLE_AI_KEY;// Initialize GoogleAIFileManager with your API_KEY.const fileManager = new GoogleAIFileManager(API_KEY);const genAI = new GoogleGenerativeAI(API_KEY);const model = genAI.getGenerativeModel({ // Choose a Gemini model. model: "gemini-1.5-flash",});// Upload the file and specify a display name.const uploadResponse = await fileManager.uploadFile("hamlet.pdf", { mimeType: "application/pdf", displayName: "Hamlet",});const uploadResponse2 = await fileManager.uploadFile("romeo-and-juliet.pdf", { mimeType: "application/pdf", displayName: "Romeo and Juliet",});console.log('Uploaded both files.');// Generate content using text and the URI reference for the uploaded file.let result = await model.generateContent([ { fileData: { mimeType: uploadResponse.file.mimeType, fileUri: uploadResponse.file.uri } }, { fileData: { mimeType: uploadResponse2.file.mimeType, fileUri: uploadResponse2.file.uri } }, { text: "Compare these two plays and discuss similar themes as well as major differences." }, ]);// Output the generated text to the consoleconsole.log(result.response.text())console.log('-'.repeat(80));
The power in this comes from automation of course. You could imagine a process that responds to new PDFs being added to a directory, uses Gemini for a summary, and stores that result in a database for use later. And it’s also totally fair to expect that the summary will be off, incomplete, and so forth, and therefore any tool should make it easy for an administrator to tweak.
As a super simple example, here’s a script that will summarize at the command line:
import { GoogleAIFileManager } from "@google/generative-ai/server";import { GoogleGenerativeAI } from "@google/generative-ai";let API_KEY = process.env.GOOGLE_AI_KEY;// Initialize GoogleAIFileManager with your API_KEY.const fileManager = new GoogleAIFileManager(API_KEY);const genAI = new GoogleGenerativeAI(API_KEY);const model = genAI.getGenerativeModel({ // Choose a Gemini model. model: "gemini-1.5-flash",});async function uploadFile(path) { // assumes /, kinda bad let name = path.split('/').pop(); // Upload the file and specify a display name. return await fileManager.uploadFile(path, { mimeType: "application/pdf", displayName: name, });};async function summarize(upload) { return (await model.generateContent([ { fileData: { mimeType: upload.file.mimeType, fileUri: upload.file.uri } }, { text: "Can you summarize this document?" }, ])).response.text();}if(process.argv.length < 3) { console.log('Pass a path to a PDF file to use this tool.'); process.exit();}let path = process.argv[2];console.log(`Upload $ {path}`);let upload = await uploadFile(path);console.log('Asking for a summary...');let summary = await summarize(upload);console.log('-'.repeat(80));console.log(summary);
You can find these scripts, and my source PDFs, as well as a few other tests, up in my repo here: https://github.com/cfjedimaster/ai-testingzone/tree/main/pdf_test Let me know what you think!
Raymond Camden
You must be logged in to post a comment.
This site uses Akismet to reduce spam. Learn how your comment data is processed.