Using Generative AI to Parse Web Pages into Data

A few months back, I took a look at using JSON-LD to turn a recipe web page into pure data: Scraping Recipes Using Node.js, Pipedream, and JSON-LD. This relied on a recipe actually using JSON-LD in the header to describe itself, which is pretty common for SEO purposes. Still, I was curious as to how well generative AI could solve this problem. In theory, this could be a good ‘backup’ in cases where a site wasn’t using JSON-LD and a general exploration of ‘parsing’ a web page into data. I’ll be using Google Gemini again, but in theory, this demo would work in other services as well. Here’s what I found. Converting a Web Page into Structured Data In order... more →
Posted in: JavaScript