JavaScript Mapping Library
In general I don’t tend to blog about stuff that isn’t quite out yet, but as I’ve got early access (and permission to share), and as it’s pretty darn cool, I thought I’d share. Plus, some of the new stuff is available to everyone, so you can try it out as well! Today, Google introduced its newest language model, Gemini 1.5. You can, and probably should, read the marketing/nicely polished intro by Google here, but I thought I’d share some highlights and examples here. I’ve had access to this for a grand total of four hours so please consider this my first initial impressions. As the title says, this is not yet released, but you can sign up for the waitlist now. Also, right now this is AI Studio only and not available via API.
While the introductory post covers everything that was added, there are a few changes I’d like to specifically call out.
Probably the biggest change is the ability to handle files and folders. Video support is only available in the new model so it’s not publicly available yet, but files and folders can be used by anyone now. This is available within Studio right at the top:
Note that files and videos can not be drag/dropped into the interface and you have to use those buttons on top. But once you do, you can then start writing prompts. I dropped a PDF copy of Hamlet and asked for a summarization as well as the themes:
I haven’t read Hamlet since college but honestly, I liked what it output. By the way, the output was in Markdown and I’ve copied it as is here so the markup came from them. Oh, um, spoiler warning for a four-hundred-year-old play.
Act 1:
Act 2:
Act 3:
Act 4:
Act 5:
Last week I had blogged about using GenAI as your content assistant. With that in mind, I tried uploading a Markdown file right from my blog (this post) and asked for a summary. Here’s a screenshot, just so you can see how files are rendered – note that the token cost of the file is included:
And here’s the summary:
The author:
Overall, the article provides a practical and insightful exploration of temperature settings in generative AI, helping readers understand how to adjust this parameter to achieve their desired level of creativity and variety in generated text.
I dig it, and not because it seemed to praise me (ok, maybe that’s part of it).
Video testing is pretty darn cool, but I did run into some issues with it in my testing. One test that worked well was taking my raw video from this YouTube video, uploading, and asking for a summary:
The result felt pretty spot on:
"This video is about Google’s new large language model, PaLM 2. PaLM 2 is a more advanced version of PaLM, and it excels at advanced reasoning tasks, including code and math. It is also better at understanding the human language, which requires understanding ambiguous and figurative meaning of words, rather than the literal meaning. PaLM 2 was pre-trained on a large quantity of webpage, source code and other datasets. This means that it excels at popular programming languages like Python and JavaScript, but is also capable of generating specialized code in languages like Prolog, Fortran, and Verilog. Combining this with its language capabilities can help teams collaborate across languages."
I then tried with this video which details how to use the Photoshop API with Python. Here’s what Gemini 1.5 said:
"This video shows how to use the Adobe Photoshop API to remove the background from an image. The video starts by showing the Adobe Photoshop API documentation, which explains how to use the API to perform various tasks, including removing the background from an image. The video then shows how to create a Python script that uses the API to remove the background from an image. The script first generates an access token, then creates a remove background job, and finally polls the job until it is complete. The video ends by showing the image with the background removed."
This is… right but wrong. I mean, it does show the remove background part of the Photoshop API. That’s absolutely right. But it doesn’t bring up Python till the third sentence, which feels like it missed the point of the video.
I’m cautiously optimistic that this could be pretty powerful when tweaked a bit further. As I mentioned in my earlier post, I really don’t like writing summaries for my content, so I could imagine automating it via an (eventual) API access to this feature. It would be even better if Gemini could somehow ‘tap’ into an unlisted YouTube video so I didn’t have to upload it first.
One more big advantage of Gemini 1.5, and this will be important for handling those video files, or large sets of folders, is a much bigger amount of tokens. How big? Gemini 1.0 had a limit of 32,000 tokens. Gemini 1.5 gives you access to 1,000,000. Surely that’s meme-worthy, right?
AI Studio does a great job reporting this as well. Here’s the token count from a video test:
This is also reported in any file preview which gives you a way to determine how much the file impacts the total cost (which would include your text prompt as well):
As I said, I just started playing with the file and video support, but I like what I see so far. Once I get API access to try this I’ll definitely share what that’s like as well. Let me know what you think!
Raymond Camden
You must be logged in to post a comment.
This site uses Akismet to reduce spam. Learn how your comment data is processed.