AI – Jenneral HQ

A Few Prompts I Use to Test LLM Creativity

Related: Gwern’s much more serious and in-depth Benchmarking LLM Diversity & Creativity, my silly post The Neruda Factory

The models are pretty good at math and coding these days, but I care more about how well they can write and analyze writing. They’re not that great at it, but there is still significant variation between the models.

Here’s a few prompts that I use to test models for creativity and good writing skills. I don’t exactly compare the different outputs between models, but when a new model is released, I put them in and I think it gives me a decent sense of if it’s worth poking at further.

As of December 2024, of the models I have access to, Sonnet 3.5 and Llama 3.1 405B are leading the pack. o1 is medicore.

The Neruda Factory

People are talking a lot more about Claude these days, but I haven’t seen my exact perspective anywhere as a non-normie, non-technical person who likes him a lot, and Gwern says that it’s kind of important to write right now, so here goes.

This post is largely a breakdown of a few recent conversations I’ve had with Claude Sonnet 3.5 2024-10-22 which serve as scaffolding for some commentary, with a few more scattered thoughts at the end.

take whatever you want 💛