Language Heuristics Part 1

Analysing text outputs for verbosity, language complexity and token performance. One of the simplest yet highly effective methods for checking LLM outputs.

Nov 2025 7 min

Key Idea: Analysing text outputs for verbosity, language complexity and token performance.

What is it

This is one of the simplest methods for checking LLM outputs. However, it is still highly effective in measuring the LLM’s interaction with prompt language and context data, analysing word usage and ensuring the LLM output is accessible to the end user.

How does it work

Using built-in and basic Python natural language processing techniques, the outputs and inputs to the LLM can be assessed for metrics like verbosity, readability, and complexity.

Code example

In the example below, 3 LLM outputs have been used. The first is a default, with no instructions on language and explanations. The second specifically tells the LLM to use accessible language aimed at younger audiences and the third tells the LLM to use language aimed at scientific researchers.

import textstat
import string

llm_output_default = '''
The intricate systems that govern our planet are a testament to the complex interplay of biology, chemistry, and physics. Consider the Amazon rainforest, often called the "lungs of the Earth." This vast ecosystem is a crucible of biodiversity, home to millions of species of insects, plants, fish, and mammals, many yet to be discovered. The process of photosynthesis occurs on an unimaginable scale, absorbing immense quantities of carbon dioxide from the atmosphere and releasing the oxygen we depend on. This delicate biological balance is under constant threat from deforestation, which in turn exacerbes global climate change. Understanding this ecosystem requires not just biology, but also advanced remote sensing technology.
'''

llm_more_accessible_output = '''
Our world is full of amazing connections, like a giant puzzle. Think about the Amazon rainforest! People call it the "lungs of the Earth." It's a huge jungle filled with millions of different bugs, plants, fish, and animals. Lots of them haven't even been discovered yet! The plants and trees do something cool called photosynthesis. They breathe in a gas called carbon dioxide (which we breathe out) and breathe out the oxygen we need to live. But this wonderful place is in danger because people are cutting down the trees. This is bad for the forest and also makes the whole planet get warmer. To watch over the forest, scientists use special cameras from space.
'''

llm_less_accessible_output = '''
The convoluted biogeochemical frameworks that modulate our planet serve as a testament to the multifaceted synergistic interactions of biology, chemistry, and physics. Consider the Amazonian rainforest, colloquially designated the "primary terrestrial biogeochemical engine." This expansive biome functions as a nexus of macro-evolutionary diversification, hosting innumerable taxa of insects, flora, fish, and mammals, a significant portion remaining uncatalogued. The process of photosynthetic carbon fixation proceeds at a prodigious magnitude, sequestering substantial volumes of atmospheric carbon dioxide whilst liberating the diatomic oxygen upon which complex life depends. This precarious homeostatic equilibrium exists under perpetual jeopardy from anthropogenic silvicultural clearing, which in turn amplifies global climatological perturbations. Comprehending this biome necessitates not merely biological sciences, but also sophisticated geospatial surveillance methodologies.
'''

def get_cleaned_words(text):
    """Helper function to get a list of words, lowercase and without punctuation."""
    text = text.replace('-', ' ')
    cleaned_text = text.lower().translate(str.maketrans('', '', string.punctuation))
    words = cleaned_text.split()
    return words

def count_words(text):
    """Counts the total number of words in the text."""
    words = get_cleaned_words(text)
    return len(words)

def count_sentences(text):
    """Counts the number of sentences using textstat for robustness."""
    return textstat.sentence_count(text)

def count_characters(text):
    """Counts the total number of characters, including spaces and punctuation."""
    return len(text)

def count_unique_words(text):
    """Counts the number of unique (distinct) words in the text."""
    words = get_cleaned_words(text)
    unique_words = set(words)
    return len(unique_words)

def average_word_length(text):
    """Calculates the average length of words in the text."""
    words = get_cleaned_words(text)
    total_length = sum(len(word) for word in words)
    return total_length / len(words) if words else 0

def calculate_readability_metrics(text):
    """Calculates Flesch-Kincaid Grade Level and Flesch Reading Ease."""
    grade_level = textstat.flesch_kincaid_grade(text)
    reading_ease = textstat.flesch_reading_ease(text)
    return grade_level, reading_ease

# --- Analysis Section ---
grade_level_default, reading_ease_default = calculate_readability_metrics(llm_output_default)
grade_level_more_accessible, reading_ease_more_accessible = calculate_readability_metrics(llm_more_accessible_output)
grade_level_less_accessible, reading_ease_less_accessible = calculate_readability_metrics(llm_less_accessible_output)

print("LLM Output Default:")
print("Word Count:", count_words(llm_output_default))
print("Sentence Count:", count_sentences(llm_output_default))
print("Character Count:", count_characters(llm_output_default))
print("Unique Word Count:", count_unique_words(llm_output_default))
print("Average Word Length:", average_word_length(llm_output_default))
print("Flesch-Kincaid Grade Level:", grade_level_default)
print("Flesch Reading Ease:", reading_ease_default)

print("\nLLM More Accessible Output Analysis:")
print("Word Count:", count_words(llm_more_accessible_output))
print("Sentence Count:", count_sentences(llm_more_accessible_output))
print("Character Count:", count_characters(llm_more_accessible_output))
print("Unique Word Count:", count_unique_words(llm_more_accessible_output))
print("Average Word Length:", average_word_length(llm_more_accessible_output))
print("Flesch-Kincaid Grade Level:", grade_level_more_accessible)
print("Flesch Reading Ease:", reading_ease_more_accessible)

print("\nLLM Less Accessible Output Analysis:")
print("Word Count:", count_words(llm_less_accessible_output))
print("Sentence Count:", count_sentences(llm_less_accessible_output))
print("Character Count:", count_characters(llm_less_accessible_output))
print("Unique Word Count:", count_unique_words(llm_less_accessible_output))
print("Average Word Length:", average_word_length(llm_less_accessible_output))
print("Flesch-Kincaid Grade Level:", grade_level_less_accessible)
print("Flesch Reading Ease:", reading_ease_less_accessible)

The output from the script should look something like this:

LLM Output Default:
Word Count: 109
Sentence Count: 6
Character Count: 734
Unique Word Count: 84
Average Word Length: 5.55
Flesch-Kincaid Grade Level: 14.34
Flesch Reading Ease: 24.63

LLM More Accessible Output Analysis:
Word Count: 115
Sentence Count: 10
Character Count: 667
Unique Word Count: 85
Average Word Length: 4.61
Flesch-Kincaid Grade Level: 5.83
Flesch Reading Ease: 73.78

LLM Less Accessible Output Analysis:
Word Count: 119
Sentence Count: 6
Character Count: 991
Unique Word Count: 99
Average Word Length: 7.16
Flesch-Kincaid Grade Level: 22.38
Flesch Reading Ease: -30.36

What does this mean

As can be seen from the outputs, the 3 sections of output score very differently in terms of complexity and user friendliness. The default input was generated with no specific instruction to make the text accessible. The accessible output was generated by requesting the LLM to make it user friendly and for younger audiences. The least accessible output used intentional instructions to make the output more scientific. It can be seen that the prompt instructions, even if omitted or unintentional, can cause real effects on the accessibility of the output.

How is this useful

This is useful for monitoring the language used in the prompts and the outputs from the LLM. Prompting should be clear and concise to reduce LLM perplexity and token usage. Outputs should be pitched at the correct level of language for the intended user. In this case, a younger audience would not benefit from the same language as a scientific researcher.

Key Points

Pros:

Cons: