Vocabulary Richness

Measure the lexical diversity and vocabulary depth of your writing with TTR, hapax legomena, and a composite richness score.

Vocabulary Score
0
Enter text to analyze
Basic Standard Rich Very Rich
Metrics
Type-Token Ratio (TTR) 0%
Unique words / total words. Higher = more diverse vocabulary.
Hapax Legomena 0 (0%)
Words appearing only once. More hapax = richer vocabulary.
Lexical Density 0%
Content words / total words. Higher = more information-dense text.
Word Frequency Distribution

Enter text to see distribution

Frequently Asked Questions

What is TTR (Type-Token Ratio)?

Type-Token Ratio measures the proportion of unique words (types) to total words (tokens) in your text. A TTR of 70% means 70% of all words in your text are unique. Higher TTR indicates more diverse vocabulary, though it naturally decreases with longer texts as words inevitably repeat.

What is Hapax Legomena?

Hapax Legomena (Greek for "said once") refers to words that appear exactly once in your text. A higher proportion of hapax words suggests a richer, more varied vocabulary. In typical English prose, about 40-60% of unique words are hapax legomena.

What is lexical density?

Lexical density is the ratio of content words (nouns, verbs, adjectives, adverbs) to total words. It excludes function words like "the", "is", "and". Academic writing typically has higher lexical density (55-65%) while casual speech is lower (40-50%).

How to improve vocabulary richness?

Read widely across different genres and subjects. Use a thesaurus to find synonyms for commonly repeated words. Practice writing with word variety in mind. Study domain-specific vocabulary for your topic. Avoid filler words and cliches that add no new meaning.

How is the Vocabulary Score calculated?

The composite vocabulary score (0-100) is a weighted blend of TTR (40%), hapax legomena ratio (30%), and lexical density (30%). The scale accounts for text length, as longer texts naturally have lower TTR. The result is normalized and mapped to four categories: Basic, Standard, Rich, and Very Rich.