Using Google Gemini’s File API with ColdFusion

I promise, I’m not turning this back into a ColdFusion blog, but as I prepare my presentation next week at Summit and update my Google Gemini code for some ColdFusion demos, I ran into a particularly gnarly bit that I wanted to share in a post. For the most part, I’ve had no issues using Gemini’s REST APIs in ColdFusion, but the File API ended up being more difficult. If you go the documentation for uploading, and use the ‘Shell’ language tab, you can see an example like so: MIME_TYPE=$ (file -b --mime-type "$ {IMG_PATH_2}")NUM_BYTES=$ (wc -c < "$ {IMG_PATH_2}")DISPLAY_NAME=TEXTtmp_header_file=upload-header.tmp# Initial resumable request... more →
Posted in: JavaScript

Using PDF Content with Google Gemini – An Update

Way back in March of this year, I took a look at using Google’s Gemini APIs to analyze PDF documents ("Using PDF Content with Google Gemini"). At the time, the Gemini API didn’t support PDF documents, so I made use of our (Adobe) PDF Extract service to get the text content out from the document. This "worked" but was possibly less than ideal as my "glom all the text together" approach didn’t really represent the PDF well. The PDF Extract API returns information about text context (like if it is a header for example), but my method ignored that. I’m happy to share that Gemini now supports PDF files natively. Let’s take a look at how this... more →
Posted in: JavaScript

Caching Input with Google Gemini

A little over a month ago, Google announced multiple updates to their GenAI platform. I made a note of it for research later and finally got time to look at one aspect – context caching. When you send prompts to a GenAI system, your input is tokenized for analysis. While not a "one token per word" relation, basically the bigger the input (context) the more the cost (tokens). The process of converting your input into tokens takes time, especially when dealing with large media, for example, a video. Google introduced a "Context caching" system that helps improve the performance of your queries. As the docs suggest, this is really suited for cases where you’ve got... more →
Posted in: JavaScript

Creating a Generic Generative Template Language in Google Gemini

I’ve been a fan of ‘random text’ for some time. "Random text" is a bit vague, but to me the idea of using code to generate random stories, or even snippets, is fascinating. Back in April, I blogged about how I created short dragon-based stories. It took a generic string: A #adjective# dragon lives #place#. She #verb# her hoard, which consists of a #number# of #thing#, #number# of #thing#, and #number# of #thing#. She feels #feeling#. And created a story by replacing the pound-wrapped tokens with real words. I used a couple of different tools to build this, but the core one was a cool little Node library named random-word-slugs. It’s a powerful random word library... more →
Posted in: JavaScript

Using JSON Schema with Google Gemini

Back about a month ago, I wrote up a post on how to generate JSON results using Google Gemini, "JSON Results with Google Gemini Generative AI API Calls". While you should read that post first, the process basically boiled down to: Setting the response type of the result to JSON. Without this, Gemini will return JSON but encoded in Markdown. Using a System Instruction to give directions on the "shape" of the JSON, i.e., use this key and that key. While these techniques work well, recently yet another feature was added that makes this even better, JSON schema support. JSON Schema is an abstract way to define the shape of JSON and can be really useful in validation. The website... more →
Posted in: JavaScript

Building a Chat Integration with Google Gemini

It’s been on my queue to investigate how to use Generative AI in a ‘chat’ interface versus "one prompt and answer" mode for some time and today I finally got a chance to check it out. I’ll share my thoughts below, but once again I want to thank Allen Firstenberg for his help while I worked through some issues. As always, take what I’m sharing as the opinion of a developer still very new to this space. Any mistakes are my fault! What is GenAI chat? # Specifically, what is chat when it comes to generative AI? Nothing. Seriously. All ‘chat’ is taking your initial prompt, getting the result, then taking your next prompt and appending it. So for... more →
Posted in: JavaScript

JSON Results with Google Gemini Generative AI API Calls

Forgive the somewhat alliterative title there, but today’s post covers something that’s been on my mind since I started playing with Google Gemini, specifically, how to get the results of your API calls in JSON. To be clear, the REST API returns a result in JSON, but I’m talking about the content of the result itself. Before I continue, a quick shot out to Allen Firstenberg who has been helping me off and on with Google Gemini stuff. Anything I get wrong though is entirely my fault. 😜 Ok, so before I go on, let’s look at a typical result. Take a prompt like so: "What is the nature of light". Pass this to Gemini via the API, and the result you get, once you... more →
Posted in: JavaScript

Using PDF Content with Google Gemini

Back in February Google announced Gemini 1.5, their latest, most powerful language model, and while access has been open via AI Studio, API access has only been available in the past few days. I thought I’d try out the new model and specifically make use of the larger context window to do prompts on PDF documents. I discussed something similar earlier this year(("Using AI and PDF Services to Automate Document Summaries")[https://www.raymondcamden.com/2024/01/08/using-ai-and-pdf-services-to-automate-document-summaries]) which made use of Diffbot, so I thought it would be interesting to build a similar experience with the Gemini API. At a high level, it’s not too difficult: Begin... more →
Posted in: JavaScript
1 2 3 4 22