pommicket's Wiktionary-based word lists
These are various lists of words extracted from Wiktionary data dumps. Some of the code
used to produce them is available here.
You can do whatever you like with them, subject to
Wiktionary's licensing, where applicable.
-
The Big List: word-list.txt.xz (27MB compressed, 120MB uncompressed, 9,878,558 entries).¹
Every English Wikipedia article title & entry in English Wiktionary; containing only ASCII a-z/A-Z/space, max 2 words.
Words labelled offensive on Wiktionary were filtered out (overly aggressively—some totally inoffensive words were removed in the process).
-
English definitions: en-definitions.txt.xz (22MB compressed, 115MB uncompressed, 1,629,682 entries).¹
Every English definition in English wiktionary. Format is WORD DEFINITION
on each line (note: delimiter is 2 spaces).
Words can have multiple definitions; they are listed as separate lines.
DEFINITION
is in the wikitext format.
It’s possible that there are parsing errors, but I haven’t spotted any yet.
¹ Derived from enwiktionary-20250701 dump.