Using the Gemini File API for Prompts with Media

Using media in your prompts (what’s called ‘multimodal’) with the Gemini API is fairly simple in small cases. You can encode your input with base64 and pass it along with your prompt. While this works well, it’s got limitations that may be quickly hit – most specifically a file size limit of 20 megs. A few months ago, I shared a demo of using your device’s camera to detect cat breeds. With today’s cameras taking incredibly detailed pictures, I hit that limit right away and had to write some code to resize the image to a smaller size. Luckily, the Gemini API has a better way of handling that, the File API. The File API # This API provides a separate method... more →
Posted in: JavaScript

Testing Google’s New Gemini Flash Model

I’m currently at Google I/O waiting for the next session to start and decided to take a quick look at the latest Gemini model to be released, Flash 1.5. As the name implies, this is a ‘speedier’ model built to return responses quicker than other models, with the tradeoff that the results may not be as good. Like most things in life, there’s going to be tradeoffs. Gemini’s Pro 1.5 model will definitely be slower but will return better results. When and how you choose is… well that’s a good question, right? I decided to build a tool so I could play with this myself. The idea is to let me enter a prompt and have it run both Flash and Pro models and see both... more →
Posted in: JavaScript

Building a Chat Integration with Google Gemini

It’s been on my queue to investigate how to use Generative AI in a ‘chat’ interface versus "one prompt and answer" mode for some time and today I finally got a chance to check it out. I’ll share my thoughts below, but once again I want to thank Allen Firstenberg for his help while I worked through some issues. As always, take what I’m sharing as the opinion of a developer still very new to this space. Any mistakes are my fault! What is GenAI chat? # Specifically, what is chat when it comes to generative AI? Nothing. Seriously. All ‘chat’ is taking your initial prompt, getting the result, then taking your next prompt and appending it. So for... more →
Posted in: JavaScript

JSON Results with Google Gemini Generative AI API Calls

Forgive the somewhat alliterative title there, but today’s post covers something that’s been on my mind since I started playing with Google Gemini, specifically, how to get the results of your API calls in JSON. To be clear, the REST API returns a result in JSON, but I’m talking about the content of the result itself. Before I continue, a quick shot out to Allen Firstenberg who has been helping me off and on with Google Gemini stuff. Anything I get wrong though is entirely my fault. 😜 Ok, so before I go on, let’s look at a typical result. Take a prompt like so: "What is the nature of light". Pass this to Gemini via the API, and the result you get, once you... more →
Posted in: JavaScript

Using PDF Content with Google Gemini

Back in February Google announced Gemini 1.5, their latest, most powerful language model, and while access has been open via AI Studio, API access has only been available in the past few days. I thought I’d try out the new model and specifically make use of the larger context window to do prompts on PDF documents. I discussed something similar earlier this year(("Using AI and PDF Services to Automate Document Summaries")[https://www.raymondcamden.com/2024/01/08/using-ai-and-pdf-services-to-automate-document-summaries]) which made use of Diffbot, so I thought it would be interesting to build a similar experience with the Gemini API. At a high level, it’s not too difficult: Begin... more →
Posted in: JavaScript

Google Gemini 1.5 Announced (and more new features)

In general I don’t tend to blog about stuff that isn’t quite out yet, but as I’ve got early access (and permission to share), and as it’s pretty darn cool, I thought I’d share. Plus, some of the new stuff is available to everyone, so you can try it out as well! Today, Google introduced its newest language model, Gemini 1.5. You can, and probably should, read the marketing/nicely polished intro by Google here, but I thought I’d share some highlights and examples here. I’ve had access to this for a grand total of four hours so please consider this my first initial impressions. As the title says, this is not yet released, but you can sign up for the waitlist... more →
Posted in: JavaScript

Google Gemini as Your Dungeon Master

So this is absolutely just another example of me playing around too much, but I had to share. As I mentioned in my post yesterday, Google’s AI Studio now supports uploading files and working with them in your prompt. Today I decided to give the Chat interface a try as I hadn’t yet played with it. On a whim, I googled for "dungeons and dragons rules PDF" and… well, you won’t believe what happened next. (Sorry, I couldn’t resist.) First off, the most important thing to note if you want to test with PDFs, ensure that they are OCRed. Right now AI Studio does not handle that well, but it should be corrected in the future. My Google search turned up the PDF here,... more →
Posted in: JavaScript

Google Gemini and AI Studio Launch

While it feels like just yesterday I first blogged about Google’s PaLM APIs and MakerSuite, it was actually over two months ago, and of course, GenAI offerings are iterating and improving at lightning speed. In the past week, Google has announced Gemini, their new generative AI model. Naturally, I was curious about the API aspect of this and took a quick look. MakerSuite rebranded as AI Studio # First off, the web UI (which I reviewed back in my first post) has been renamed to the generic and boring, but probably more enterprise and appropriate, AI Studio. Along with that, when creating new prompts, it will default to use Gemini models. (You can still select PaLM if you want.) Another change…... more →
Posted in: JavaScript
1 2