Related: Gwern’s much more serious and in-depth Benchmarking LLM Diversity & Creativity, my silly post The Neruda Factory
The models are pretty good at math and coding these days, but I care more about how well they can write and analyze writing. They’re not that great at it, but there is still significant variation between the models.
Here’s a few prompts that I use to test models for creativity and good writing skills. I don’t exactly compare the different outputs between models, but when a new model is released, I put them in and I think it gives me a decent sense of if it’s worth poking at further.
As of December 2024, of the models I have access to, Sonnet 3.5 and Llama 3.1 405B are leading the pack. o1 is medicore.
Continue reading “A Few Prompts I Use to Test LLM Creativity”