Vocabulary Richness Score - Analyze Lexical Diversity

Frequently Asked Questions

What is TTR (Type-Token Ratio)?

Type-Token Ratio measures the proportion of unique words (types) to total words (tokens) in your text. A TTR of 70% means 70% of all words in your text are unique. Higher TTR indicates more diverse vocabulary, though it naturally decreases with longer texts as words inevitably repeat.

What is Hapax Legomena?

Hapax Legomena (Greek for "said once") refers to words that appear exactly once in your text. A higher proportion of hapax words suggests a richer, more varied vocabulary. In typical English prose, about 40-60% of unique words are hapax legomena.

What is lexical density?

Lexical density is the ratio of content words (nouns, verbs, adjectives, adverbs) to total words. It excludes function words like "the", "is", "and". Academic writing typically has higher lexical density (55-65%) while casual speech is lower (40-50%).

How to improve vocabulary richness?

Read widely across different genres and subjects. Use a thesaurus to find synonyms for commonly repeated words. Practice writing with word variety in mind. Study domain-specific vocabulary for your topic. Avoid filler words and cliches that add no new meaning.

How is the Vocabulary Score calculated?

The composite vocabulary score (0-100) is a weighted blend of TTR (40%), hapax legomena ratio (30%), and lexical density (30%). The scale accounts for text length, as longer texts naturally have lower TTR. The result is normalized and mapped to four categories: Basic, Standard, Rich, and Very Rich.

Vocabulary Richness