Game Characters That Actually Think: How LLMs Are Changing NPC Dialogue

Every player who has spent time in an open-world RPG has experienced the immersion-breaking moment: you ask an NPC something slightly off-script, and they respond with the same pre-authored line they would give to any question in that topic area. The blacksmith who just witnessed a dragon attack will still deliver his forge pricing speech if you click the wrong dialogue option. The guard who knows your name from a previous encounter has forgotten it entirely in a fresh conversation. These are not bugs — they are the inevitable result of scripted dialogue trees, and they have defined NPC interaction in video games for 30 years.

That is changing, and the change is happening faster than most players realize.

What Inworld, Convai, and Ubisoft Are Actually Building

Several companies and studios are now embedding LLMs directly into game engines to power NPC dialogue. The approach varies, but the core architecture is similar: each NPC has a system prompt defining their personality, backstory, knowledge constraints, relationship history, and behavioral goals. Player inputs go to the LLM, which generates responses in character. Responses are then filtered for content policy and gameplay consistency before being delivered — usually as text fed to a voice synthesis system for spoken dialogue.

Inworld AI, which has integrations with Unreal Engine and Unity, has published case studies showing NPCs that maintain conversational coherence across dozens of turns, remember player actions from earlier in a session, and adapt their tone based on the relationship the player has built with them. An NPC who distrusts the player will be guarded; one who has been helped will be warmer. This is not a new mechanic — reputation systems have existed for years — but the expression of that relationship through natural language is qualitatively different from toggling between a "friendly" and "unfriendly" dialogue branch.

Ubisoft's NEO NPCs project, demonstrated at GDC 2024 and advanced since, uses LLMs combined with a knowledge graph that represents what each NPC knows about the game world. Characters can answer questions about locations, other characters, and recent events — but only if their character profile gives them access to that information. A tavern keeper knows the town gossip; a forest hermit does not. The knowledge graph prevents NPCs from accidentally revealing information their character should not have — a problem that uncontrolled LLMs reliably produce.

The Memory Problem

Context windows are the fundamental constraint. A standard LLM context window can hold a meaningful conversation history, but not the entirety of a player's relationship with an NPC across dozens of hours of gameplay. When the context fills, older memories drop out, and characters begin to forget things they should know.

Several approaches address this. Retrieval-augmented generation (RAG) systems store NPC memories in a vector database and retrieve relevant memories based on the current conversation context. When a player mentions a quest they completed three sessions ago, the RAG system pulls the relevant memory and injects it into the prompt. This gives NPCs effectively unlimited long-term memory, constrained only by what gets stored and retrieved accurately.

Other approaches use structured memory schemas: rather than storing raw conversation text, key events are extracted and stored as structured facts ("Player helped character escape prison on Day 14," "Player has never been rude to character," "Player has not completed the character's quest"). These structured memories are more reliably retrieved and less ambiguous than raw text, at the cost of some nuance.

The Voice Problem

Text-based NPC responses are functional but flat. Players in voice-acted games expect spoken dialogue, and generating text in real time is only half the solution. Real-time voice synthesis has improved dramatically — ElevenLabs, PlayHT, and others offer low-latency voice generation that can deliver synthesized speech within 200-400 milliseconds of receiving the text — but the output still lacks the performance nuance of professional voice actors. Generated voices can sound slightly robotic, particularly in emotionally charged moments.

Some studios are exploring hybrid approaches: a library of pre-recorded emotional vocalizations ("surprise," "fear," "joy," "sarcasm") combined with synthesized speech for the content. The emotional coloring comes from the pre-recorded performances; the specific words come from synthesis. Early results suggest this sounds more natural than pure synthesis for highly emotional moments.

What Works and What Does Not

Practical experience from shipped and in-development titles reveals clear patterns in where LLM NPCs work well and where they fail.

Works well:

Ambient conversation — NPCs discussing lore, town events, their daily lives. Low stakes, high immersion benefit.
Information delivery — NPCs who give directions, explain quest context, or provide world knowledge. LLMs are excellent at synthesizing and presenting information naturally.
Relationship building — NPCs who respond to player tone and history, developing distinct relationships with players who interact with them differently.
Surprise handling — When players do unexpected things, LLM NPCs can respond coherently rather than breaking immersion with a default "I don't understand" response.

Does not work well:

Critical path dialogue — Story beats that must deliver specific information or trigger specific game states. LLMs are probabilistic and can omit key information or deliver it inconsistently.
Combat and real-time interaction — Latency requirements for combat are incompatible with current LLM inference speeds; pre-scripted systems remain necessary.
Fully open-ended characters — Without careful knowledge graph constraints, LLMs will have NPCs reveal information they should not know, break character consistency, or generate responses inconsistent with the game world's internal logic.

The Cost Question

LLM inference is not free. A game with 200 named NPCs, each having potentially thousands of conversations with players, generates significant API costs if running on commercial LLM services. Most serious production deployments are exploring locally run, smaller models: 7B-13B parameter models quantized to run on consumer gaming GPUs achieve latency and cost profiles that are compatible with commercial game deployment. The quality gap versus frontier models is real but narrowing, and for NPCs with well-defined personalities and knowledge constraints, smaller models perform surprisingly well.

The games that figure out this cost and quality balance will define the next era of NPC design. Scripted dialogue trees will not disappear — they are still the right tool for critical story moments and resource-constrained titles. But for open-world games where immersion and player agency are the primary value proposition, LLM-powered NPCs represent a step change in what interactive storytelling can feel like. The characters who remember you, respond to your choices, and react naturally to the unexpected are no longer a tech demo curiosity. They are in production pipelines now.