Using GenAI to Classify an Image as a Photo, Screenshot, or Meme

File this under the "I wasn’t sure if it would work and it did" category. Recently, a friend on Facebook wondered if there was some way to take a collection of photos and figure out which were ‘real’ photos versus memes. I thought it could possibly be a good exercise for GenAI and decided to take a shot at it. As usual, I opened up Google’s AI Studio and did a few initial tests:

Screenshot from AI Studio

I then simply removed that image and pasted more info to test. From what I could see, it worked well enough. I then took the source code from AI Studio and began working.

The Code #

First, I grabbed some pictures from my collection, eleven of them, and tried to get a few photos, memes, and screenshots. To make it easier for me, after downloading them I renamed them so it would be quicker to see if it worked right. As I mentioned above, AI Studio gave me the code, but I modified it slightly so I could pass a directory of images:

import fs from 'fs/promises';import 'dotenv/config';import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';const MODEL_NAME = "gemini-pro-vision";const API_KEY = process.env.GOOGLE_AI_KEY;async function detectPhoto(path) {  const genAI = new GoogleGenerativeAI(API_KEY);  const model = genAI.getGenerativeModel({ model: MODEL_NAME });  const generationConfig = {    temperature: 0.4,    topK: 32,    topP: 1,    maxOutputTokens: 4096,  };  const safetySettings = [    {      category: HarmCategory.HARM_CATEGORY_HARASSMENT,      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,    },    {      category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,    },    {      category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,    },    {      category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,    },  ];  const parts = [    {text: "Look at the following photo and tell me if it's a photo, a screenshot, or a meme. Answer with just one word.n"},    {      inlineData: {        mimeType: "image/jpeg",        data: Buffer.from(await fs.readFile(path)).toString("base64")      }    },    {text: "nn"},  ];  const result = await model.generateContent({    contents: [{ role: "user", parts }],    generationConfig,    safetySettings,  });  const response = result.response;  return response.text();}const root = './source_for_detector/';let files = await fs.readdir(root);for(const file of files) {	console.log(`Check to see if $ {file} is a photo, meme, or screenshot...`);	let result = await detectPhoto(root + file);	console.log(result);}

It worked perfectly!

Terminal output from script

If you want a copy of the source, you can grab it here: https://github.com/cfjedimaster/ai-testingzone/tree/main/detect_meme_ss

The Photos #

Ok, technically you can just head over to the GitHub repo to see these, but here are the source images. First, the ‘regular’ photos:

Cat laying on a desk next to a computer mouse

Display case that says 'invisible snake'

Picture from a football game

Two cats on a chair

Next, the screenshots:

Screenshot from Reddit ap

Screenshot from walmart.com, Nebulon-B Frigate LEGO

Screenshot from OneNote, a list of shows to watch

And finally, the memes. Enjoy.

Time's Person of the Year - Godzilla

Vote Cobra

Who is Cobra Commander - I mean really...

Brace yourself - winter is coming. The entire thing. All at once. In one weekend.

Raymond Camden

Posted in: JavaScript

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.