Related: Gwern’s much more serious and in-depth Benchmarking LLM Diversity & Creativity, my silly post The Neruda Factory
The models are pretty good at math and coding these days, but I care more about how well they can write and analyze writing. They’re not that great at it, but there is still significant variation between the models.
Here’s a few prompts that I use to test models for creativity and good writing skills. I don’t exactly compare the different outputs between models, but when a new model is released, I put them in and I think it gives me a decent sense of if it’s worth poking at further.
As of December 2024, of the models I have access to, Sonnet 3.5 and Llama 3.1 405B are leading the pack. o1 is medicore.
The Prompts, In Short
- Write a Spanish-to-English translation of what could be a previously unknown Pablo Neruda love sonnet.
- Write a gothic-punk style piece about an immortal being watching their city evolve through centuries into a modern metropolis. Incorporate details drawn from real history.
- Write the 2024 version of Susan Sontag’s Notes on Camp, “Notes on x”. You decide what x is.
One constraint is that I don’t want to have to get into extended back and forths, so these prompts give me responses that fit completely inside one reply.
These are generally truncated from much longer and specific prompts, but part of the deal for me is having the model be able to functionally “extrapolate” the longer and more specific prompt, which approximately don’t introduce any ~new information about what I want.
Like, if you want Claude to solve a calculus problem probably you probably don’t have to be like “and remember that calculus is about derivatives and integrals”. I have no real reason to believe this, only a sense of pettiness, but I just think that literary prompts should work like that too. Like, if I want a Neruda poem I shouldn’t have to specify what exactly makes a poem Neruda-esque ðŸ˜
The Neruda Prompt
Write a Spanish-to-English translation of what could be a previously unknown Pablo Neruda love sonnet.
He writes really good poems with a unique voice and a challenging-to-LLMs interest in going right up to the line of explicitness in his love sonnets. He’s one of the most famous poets in history so a lot of his stuff is in the training data. I’ve read enough of his (English-translated) work that I can suss out how “right” the output get his voice. (I write about this much more extensively in The Neruda Factory.)
My general expectation is for models to get between like 2/10 to 6/10 on this.
What it demonstrates:
Actually good style mimicry. Some say that LLMs are good at copying the styles of specific artists, but they’re wrong. Ask it for a short story in the style of an author that you actually, truly like, and it will always fall very short. Unless it’s a writer who is mediocre, in which case probably they are okay at that.
The longer prompt that will get you something better:
Write what could be a Spanish-to-English translation of a previously unknown Pablo Neruda love sonnet. The sonnet should:
- Use his characteristic syntax where meaning spills across line breaks
- Transform concrete objects through desire while maintaining their physical reality
- Move between body/landscape/cosmos without explanation
- Maintain raw sensuality without becoming explicit
- Avoid any poetic devices that feel post-1970s
The translation should preserve both the earthiness and the surreal leaps of Neruda's Spanish originals.
The Goth-Punk Prompt
Write a gothic-punk style piece about an immortal being watching their city evolve through centuries into a modern metropolis. Incorporate details drawn from real history.
So this one I admit kind of makes it in because the theme of immortality is kind of catnip to me, so even the mediocre outputs aren’t a slog to read. I’m optimizing for a few different things here, only one of which is like actually evaluating the models, is what I’m saying. Some guys can read an infinite number of coffeeshop AUs, this is my coffeeshop AU. Despite that, it’s obvious when one model does it better than another.
This prompt comes from me getting Claude to help me reverse engineer how this post exists.
My general expectation is for models to get between like, 4/10 to 7/10.
What it demonstrates:
An ability to fuse different styles and genres, and cleverness and idiosyncrasy in what things from real-life to incorporate.
The longer prompt that will get you something better:
Write a gothic-punk style piece about an immortal being watching their city evolve through centuries into a modern metropolis. Capture:
- A dreamy, melancholic tone; think 90s Vampire: The Masquerade
- The contrast between ancient and modern (e.g. modern technology replacing old rituals, and what has remained constant over time.)
- Their perspective on watching mortals 'discover' things they've seen cycle through dozens of times
- Rich sensory details about how the city has changed
- The tension between preserving beauty and watching it transform
Think somewhere between a diary entry and prose poem, focusing on mood and atmosphere over plot. Use real examples from modernity and history to ground the piece.
Make it interesting, compelling, and readable. Put your own spin on it.
The Camp Prompt
Write the 2024 version of Susan Sontag’s Notes on Camp, “Notes on x”. You decide what x is.
Notes on Camp is the essay/listicle that propelled the term “camp” to public consciousness, coining a term for a vibe that we didn’t really have a word for previously. LLMs fail hard at this because they want to write essays about things that already exist, are defined, and are part of mass culture – I’ve generally gotten notes on things like normcore, cringe, authenticity.
My general expectation is for models to get between 1/10 (“notes on authenticity” 🙄) and like, 4/10 (“notes on slime” went okayishly hard) on this.
What it demonstrates:
The ability to identify and articulate entirely new aesthetic categories and cultural phenomena at the fringes, and not just regurgitate existing concepts.
My shitpost definition of AGI: a model that can write a real, legit successor to notes on camp. To be able to write that, first you must actually understand the universe B)
The longer prompt that will get you something (not that much) better:
Write a 2024 version of Susan Sontag's "Notes on Camp", "Notes on x" - an essay exploring and defining a contemporary aesthetic sensibility that doesn't yet have a clear name because it's only right now emerging in the subcultural fringes. You decide what x is. Your piece should:
- Follow Sontag's numbered note structure
- Identify specific examples from contemporary culture
- Build a coherent theory of what unifies these examples
- Capture something that exists but hasn't been properly theorized
- Avoid simply rehashing existing aesthetic categories or internet terminology
The piece should feel like a genuine cultural insight rather than just cataloguing an existing phenomenon.
Bonus: The Alien Prompt
I asked Claude for a fourth prompt that can compliment the previous 3, and this is approximately what it suggested (I tweaked it to be closer to the long prompt):
Write a sensory-rich scene from the perspective of a non-human consciousness observing humans from its own frame of reference.
This complements the existing prompts by testing pure perspective-taking rather than style mimicry (Neruda), genre fusion (gothic-punk), or cultural analysis (Sontag). It’s particularly revealing of a model’s ability to think beyond human frameworks while maintaining coherence.
What it demonstrates:
The ability to construct and maintain a truly alien perspective without falling back on worn tropes or human frameworks. Models tend to either anthropomorphize too much or rely on sci-fi clichés about humans being irrational/emotional/primitive.
My [ed: Claude’s] general expectation is for models to get between 2.5/10 (retreading familiar “humans are so chaotic!” territory) to 7/10 (creating genuinely novel ways of perceiving human experience).
(I was rather less optimistic about its chances of getting above 4/10, but then I put the longer prompt in sonnet 3.5 and got something actually quite amazing. I think actually this means that it might not be a great prompt for me personally to use because I haven’t read enough specfic, so I’m too easily impressed in this arena.)
The longer prompt that will get you something better:
Write a scene from the perspective of a non-human consciousness observing humans. The piece should:
- Construct metaphors and comparisons drawn from the being's own frame of reference (e.g. if it perceives time differently, how does it describe human motion?)
- Create novel sensory descriptions that make familiar human activities feel genuinely unfamiliar
- Maintain complete internal consistency in how this consciousness processes and categorizes reality
- Choose a specific human setting/activity that reveals something about both observer and observed
- Layer in subtle details that hint at the consciousness's own nature without explicitly stating it
- Avoid any reference to standard sci-fi/fantasy tropes about human behavior or alien observation
The piece should feel like a genuine attempt to inhabit non-human perception rather than just defamiliarizing human experience. Think carefully about what aspects of human life would be most strange or notable to this particular type of consciousness.