JavaScript Mapping Library
A few days ago I wrote up an exploration I did in Spam APIs, "Spam APIs in 2024". After working on those demos and writing up my findings, I thought it might be interesting to see how a generative AI tool, like Googele’s Gemini would handle such a task.
So – once again – I’m a broken record on this – but I fired up AI Studio first to give it some quick tests. I began with this prompt:
Given the following text, rate how much it appears to be spam by giving it a score between 1 and 10, with 10 being the most likely the content is spam.
For my first test, I used a ‘bad’ sample (see my last post for a full listing of my sample data):
I got this response:
Spam indicators:
However, there are some elements that make it less likely to be spam:
Overall, the text exhibits several characteristics of spam, but it also contains some elements that could indicate a genuine attempt to connect. Therefore, a score of 7 reflects this ambiguity.
It’s important to remain cautious and avoid clicking on suspicious links, even if the message seems somewhat personal.
That feels pretty correct. Next, I tried it with this ‘good’ sample’:
Aleesha aleesha@buymeacoffee.comWed, Mar 20, 4:33 PM (4 days ago)to me
Hi Raymond Camden,
Just wanted to drop in and give you the big news: Polls are now live on Buy Me a Coffee.
To create a poll, go to your studio, select Posts and you will see an option to create Polls.
You can use Polls to engage with your audience, get ideas on what you should create next, collect feedback, and what not. Excited for you to try it out.
If you have any feedback for us, LMK. I read every reply 🙂
Regards,AleeshaBuy Me a Coffee Team
And got:
Reasons it’s not spam:
Reasons it could be considered slightly spammy:
Overall, the email seems legitimate and informative, but its promotional nature and informality slightly increase its spam score.
This also feels pretty accurate. All in all, it seems to work well, but if I wanted to automate the process, I’d need to remove the context. To do that, I changed my prompt to:
Given the following text, rate how much it appears to be spam by giving it a score between 1 and 10, with 10 being the most likely the content is spam. Your response should only contain the score with no additional text.
I verified this worked, and then wrote a quick automation script:
import { tests } from './inputdata.js';import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';const MODEL_NAME = "gemini-1.5-pro-latest";const API_KEY = process.env.GOOGLE_AI_KEY;async function testForSpam(test) { const genAI = new GoogleGenerativeAI(API_KEY); const model = genAI.getGenerativeModel({ model: MODEL_NAME }, { apiVersion:'v1beta'}); const generationConfig = { temperature: 0.9, topK: 1, topP: 1, maxOutputTokens: 2048, }; const safetySettings = [ { category: HarmCategory.HARM_CATEGORY_HARASSMENT, threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH, }, { category: HarmCategory.HARM_CATEGORY_HATE_SPEECH, threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH, }, { category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH, }, { category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold: HarmBlockThreshold.BLOCK_ONLY_HIGH, }, ]; const parts = [ {text: `Given the following text, rate how much it appears to be spam by giving it a score between 1 and 10, with 10 being the most likely the content is spam. Your response should only contain the score with no additional text.$ {test}`}, ]; const result = await model.generateContent({ contents: [{ role: "user", parts }], generationConfig, safetySettings, }); const response = result.response; return response.text().trim();}async function delay(x) { return new Promise(resolve => { setTimeout(() => resolve(), x); });}for(let good of tests.good) { console.log('Running good test'); let result = await testForSpam(good); console.log(result); await delay(30 * 1000);}for(let bad of tests.bad) { console.log('Running bad test'); let result = await testForSpam(bad); console.log(result); await delay(30 * 1000);}
Most of this is boilerplate Gemini code, with one really important difference. Note the safety settings are all set to BLOCK_ONLY_HIGH. I found that Gemini would have an issue with some of the spam (no surprise there) and lowering the safety thresholds worked.
BLOCK_ONLY_HIGH
Another important change was to throttle my code a bit such that I was only calling the API two times per minute. Without this I would get 429 errors stating I was hitting the API too much. That feels a bit on the stingy side and I’m not sure I’d use this at scale, but at the same time, it’s doing a lot more and heck, may even be overkill for this task.
How did it do? Pretty good I think:
Running good test2Running good test3Running good test3Running good test1Running good test1Running bad test6Running bad test8Running bad test8Running bad test7Running bad test7
Every single good test was less than 5 and every bad test was over 5. If you would like to try this code yourself, get a key, and then grab the code from here: https://github.com/cfjedimaster/ai-testingzone/tree/main/spam_check
Raymond Camden
You must be logged in to post a comment.
This site uses Akismet to reduce spam. Learn how your comment data is processed.