summaryrefslogtreecommitdiff
path: root/index.html
diff options
context:
space:
mode:
Diffstat (limited to 'index.html')
-rw-r--r--index.html39
1 files changed, 39 insertions, 0 deletions
diff --git a/index.html b/index.html
new file mode 100644
index 0000000..3bd880c
--- /dev/null
+++ b/index.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <title>Wiktionary word lists</title>
+ <meta content="width=device-width,initial-scale=1" name="viewport">
+ <meta property="og:title" content="Wiktionary word lists">
+ <meta property="og:type" content="article">
+ <meta property="og:url" content="https://s.pommicket.com/wiktionary/index.html">
+ <meta property="og:locale" content="en_US">
+ <meta property="og:site_name" content="pommicket.com">
+ <meta property="article:author" content="pommicket">
+</head>
+<body>
+ <h2>pommicket's Wiktionary-based word lists</h2>
+ <p>
+ These are various lists of words extracted from Wiktionary data dumps. Some of the code
+ used to produce them is available <a href="https://github.com/pommicket/wiktionary" target="_blank">here</a>.<br>
+ You can do whatever you like with them, subject to
+ <a href="https://en.wiktionary.org/wiki/Wiktionary:Copyrights" target="_blank">Wiktionary's licensing</a>, where applicable.
+ </p>
+ <ul>
+ <li>
+ The Big List: <a href="/tmt/word-list.txt.xz">word-list.txt.xz (27MB compressed, 120MB uncompressed, 9,878,558 entries)</a>.¹<br>
+ Every English Wikipedia article title &amp; entry in English Wiktionary; containing only ASCII a-z/A-Z/space, max 2 words.<br>
+ Words labelled <i>offensive</i> on Wiktionary were filtered out (overly aggressively—some totally inoffensive words were removed in the process).
+ </li>
+ <li>
+ English definitions: <a href="/wiktionary/en-definitions.txt.xz">en-definitions.txt.xz (22MB compressed, 115MB uncompressed, 1,629,682 entries)</a>.¹<br>
+ Every English definition in English wiktionary. Format is <code style="white-space: pre;">WORD DEFINITION</code>
+ on each line (note: delimiter is <b>2</b> spaces).<br>
+ Words can have multiple definitions; they are listed as separate lines.<br>
+ <code>DEFINITION</code> is in the wikitext format.<br>
+ It’s possible that there are parsing errors, but I haven’t spotted any yet.
+ </li>
+ </ul>
+ <p>¹ Derived from <a href="https://dumps.wikimedia.org/enwiktionary/20250701/" target="_blank">enwiktionary-20250701</a> dump.</p>
+</body>
+</html>