From 52f08356aedd7ff2bc5c3fdb9effac98d98b0c63 Mon Sep 17 00:00:00 2001 From: pommicket Date: Thu, 25 Sep 2025 14:59:07 -0400 Subject: Add translingual defintiiosn --- index.html | 41 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-) (limited to 'index.html') diff --git a/index.html b/index.html index 3bd880c..116827f 100644 --- a/index.html +++ b/index.html @@ -16,6 +16,7 @@

These are various lists of words extracted from Wiktionary data dumps. Some of the code used to produce them is available here.
+ Of course, all these lists undoubtedly contain errors because Wiktionary contains errors.
You can do whatever you like with them, subject to Wiktionary's licensing, where applicable.

@@ -26,10 +27,44 @@ Words labelled offensive on Wiktionary were filtered out (overly aggressively—some totally inoffensive words were removed in the process).
  • - English definitions: en-definitions.txt.xz (22MB compressed, 115MB uncompressed, 1,629,682 entries)
    - Every English definition in English wiktionary. Format is WORD DEFINITION - on each line (note: delimiter is 2 spaces).
    + English definitions: + en-definitions.txt.xz (23MB compressed, 127MB uncompressed, 1,629,482 entries) + and
    Translingual definitions: + trans-definitions.txt.xz (MB compressed, MB uncompressed, entries)
    + Every English/Translingual definition in English wiktionary. + Format is WORD PART_OF_SPEECH DEFINITION + on each line (note the two spaces between word and part of speech).
    Words can have multiple definitions; they are listed as separate lines.
    + PART_OF_SPEECH is one of the following: + DEFINITION is in the wikitext format.
    It’s possible that there are parsing errors, but I haven’t spotted any yet.
  • -- cgit v1.2.3