Using Generative AI to Parse Web Pages into Data
data:image/s3,"s3://crabby-images/eede9/eede9ff9f8fb995ab98d7db396b862a7363af84c" alt=""
A few months back, I took a look at using JSON-LD to turn a recipe web page into pure data: Scraping Recipes Using Node.js, Pipedream, and JSON-LD. This relied on a recipe actually using JSON-LD in the header to describe itself, which is pretty common for SEO purposes. Still, I was curious as to how well generative AI could solve this problem. In theory, this could be a good ‘backup’ in cases where a site wasn’t using JSON-LD and a general exploration of ‘parsing’ a web page into data. I’ll be using Google Gemini again, but in theory, this demo would work in other services as well. Here’s what I found.
Converting a Web Page into Structured Data
In order... more →
Posted in: JavaScript