CSpell: prune spelling dictionary using AHK script

Many of these accepted spellings were language names,
which I've since used `// spell-checker:disable` on,
some of which were also broken up wrong, such as "zyk" from "Język"
due to Unicode handling bugs in CSpell which have since been fixed.
Some of them are simply no longer present in the codebase,
and some of them are now included in the default dictionaries,
such as "allowfullscreen".
It also looks like CSpell now ignores spelling within filenames,
since "mobipaint" doesn't need to be in the word list.

To prune the accepted spellings, I cleared the word list,
then ran the "run-command-on-all-files.ahk" script I wrote for this,
then used set intersection of the list before and after to avoid
adding new spellings to the list.

However, some of the spellings that were added/removed were actually
just changes in letter case, such as "woah", which I manually lowercased
when adding it to the dictionary. This complicated things.
So I ended up generating a patch of the additions, and converting it to
a regexp that would find any of the added words:

    Airbrushbrush|APNG|atombgwht|bigfix|blackwhite|bresenham|browserconfig|colorbar|Colorstr|dectree|dont|Fieldsets|Fullscreen|greyscale|Hilight|ICNS|ifds|Iframes|iife|junkbot|Multiuser|Nevermind|Numpad|onestep|proch|proclabel|relh|reltopics|Repurposable|rgbas|Spirobrush|spritesheet|subh|Subwindows|Tesselator|topich|Tracedata|Tracky|typestyles|Uniquify|VAPORWAVE|Verts|vsns|Woah|XFCONF|العربي

and matched it against the cspell.json file before the removals,
case-insensitively, and noted the original letter casings, and then
edited the patch to use the original letter casings, where applicable:

    sed -e "s/APNG/apng/ ; s/bresenham/Bresenham/ ; s/Fieldsets/fieldsets/ ; s/Fullscreen/fullscreen/ ; s/Hilight/hilight/ ; s/ICNS/icns/ ; s/ifds/IFDs/ ; s/Iframes/iframes/ ; s/iife/IIFE/ ; s/junkbot/Junkbot/ ; s/Multiuser/multiuser/ ; s/Nevermind/nevermind/ ; s/Numpad/numpad/ ; s/Repurposable/repurposable/ ; s/rgbas/RGBAs/ ; s/Subwindows/subwindows/ ; s/Tesselator/tesselator/ ; s/Tracedata/tracedata/ ; s/Tracky/tracky/ ; s/Uniquify/uniquify/ ; s/VAPORWAVE/vaporwave/ ; s/Verts/verts/ ; s/vsns/VSNs/ ; s/Woah/woah/ ; s/XFCONF/xfconf/" -i additions.patch

and then applied the patch,
and then squashed everything, and did the set intersection again to
get the removals, followed by a checkout to get the additions.

In retrospect, I should've just hacked a pruning feature into cspell-cli
in which case I might've even been able to send a pull request.
main
Isaiah Odhner 2024-01-29 01:48:16 -05:00
parent ebe236ae19
commit 12b49631ac
1 changed files with 0 additions and 318 deletions

View File

@ -15,58 +15,20 @@
"*.wav",
"localization"
],
// @TODO: is there a way to prune this spelling list? I didn't know about the comment directives for a while,
// and accepted a lot of strings for the language names, in languages that I don't understand,
// and even with partial strings when it was buggy with Unicode.
"words": [
"Adlm",
"Æвзаг",
"Afaan",
"Afaraf",
"airb",
"ajeļ",
"Allaire",
"allowfullscreen",
"anypalette",
"apng",
"APNGs",
"appinstalled",
"Aragonés",
"Asụsụ",
"autosave",
"autosaves",
"autoupdating",
"Avañe'ẽ",
"Avaric",
"Ayisyen",
"Aymar",
"Azərbaycan",
"Bahasa",
"Bamanankan",
"Basa",
"beforeinstallprompt",
"bepis",
"Bgau",
"bgcolor",
"Bislama",
"Bizaad",
"BMPs",
"Bokmål",
"Bopo",
"Bosanski",
"Brasileiro",
"Bresenham",
"Bresenham's",
"Brezhoneg",
"Català",
"Český",
"Čeština",
"Chamoru",
"Chewa",
"Chichewa",
"Chinyanja",
"Chuang",
"classid",
"clipart",
"Clippy",
"clsid",
@ -74,348 +36,160 @@
"colorbox",
"contenteditable",
"Corel",
"Corsa",
"Corsu",
"Cpath",
"Crect",
"Csvg",
"ctype",
"Cue",
"Cuengh",
"Cueŋƅ",
"Cymraeg",
"Cyrl",
"d'Òc",
"Dansk",
"Darude",
"datetime",
"Davvis",
"Davvisámegiella",
"desaturated",
"Deutsch",
"Dhivehi",
"DIALOGEX",
"Dili",
"Divehi",
"divs",
"djvu",
"documentedly",
"Dorerin",
"draggable",
"Dzongkha",
"Eesti",
"egbe",
"ellipticals",
"endonym",
"eqeqeq",
"equivalize",
"ertical",
"esque",
"Euskara",
"Euskera",
"Eʋegbe",
"Excelsi",
"eyedrop",
"Fa'a",
"Faka",
"farbling",
"Faroese",
"fieldset",
"fieldsets",
"firebaseapp",
"firebaseio",
"Fiteny",
"fliph",
"flippable",
"flipv",
"floodfill",
"floodfilling",
"focusables",
"focusring",
"fontbox",
"Føroyskt",
"Fran",
"Français",
"Française",
"frowny",
"Frysk",
"Fsvg",
"fudgedness",
"Fulah",
"Fulfulde",
"fullscreen",
"Fwww",
"Gaeilge",
"Gaelg",
"Gagana",
"Gàidhlig",
"Gailck",
"Galego",
"gazemouse",
"ghostwhite",
"GIFs",
"Gikuyu",
"Glag",
"gons",
"grayscale",
"gridlines",
"Grischun",
"hackily",
"hacky",
"haha",
"Hanb",
"Hant",
"hcanvas",
"hctx",
"headmouse",
"hilight",
"Hiri",
"HKEY",
"homescreen",
"hostnames",
"Hrvatski",
"hslrgb",
"hsrgb",
"humbnail",
"Icci",
"icns",
"iconify",
"idhlig",
"IFDs",
"iframe",
"iframe's",
"iframes",
"IIFE",
"Ikinyarwanda",
"Ikirundi",
"Imgur",
"Inkscape",
"Interlingue",
"Iñupiatun",
"IPFS",
"isaiahodhner",
"isded",
"Íslenska",
"Italiano",
"Jasc",
"Jawa",
"Jazyk",
"Jezik",
"Język",
"jfif",
"jnordberg",
"JSGF",
"jspaint",
"jsperf",
"Junkbot",
"Kajin",
"Kalaallisut",
"Kalaallit",
"Kalba",
"Kanuri",
"Kernewek",
"keyframes",
"keyshortcuts",
"Khoj",
"Kichwa",
"Kieli",
"Kikongo",
"Kiluba",
"Kinyarwanda",
"Kiswahili",
"Kolour",
"Konami",
"Kreyòl",
"Krita",
"Kuanyama",
"Kurdî",
"Kwanyama",
"Latine",
"Latn",
"Latvie",
"Latviešu",
"Lenga",
"lerp",
"Lëtzebuergesch",
"Letzeburgesch",
"libgconf",
"libtess",
"Lietuvių",
"Limburgan",
"Limburgish",
"Limburgs",
"Linb",
"Lingála",
"liveweb",
"llpaper",
"localdomain",
"localforage",
"localizable",
"lookpath",
"lors",
"Lospec",
"lrgb",
"lsid",
"ltres",
"Luba",
"Luciferi",
"Luxembourgish",
"Macromedia",
"Malti",
"mediump",
"megiella",
"Melayu",
"mimg",
"minifig",
"mobipaint",
"monospace",
"Mopaint",
"Motu",
"mouseleave",
"Mousewheel",
"msapplication",
"MSIE",
"mspaint",
"multitools",
"multitouch",
"multiuser",
"murl",
"Naoero",
"Nbat",
"Ndebele",
"Ndonga",
"Nederlands",
"nesw",
"nevermind",
"Nkoo",
"nomine",
"Norsk",
"nostri",
"nowrap",
"numberofcolors",
"numpad",
"Nuosu",
"Nuosuhxop",
"nwse",
"Nyanja",
"Occitan",
"occluder",
"octree",
"Odhner",
"Oʻzbek",
"oleobject",
"onwriteend",
"Optikey",
"Oqaasii",
"orizontal",
"Oromo",
"Oromoo",
"Ossetian",
"Otjiherero",
"ovaloid",
"ovaloids",
"oviforms",
"Owambo",
"paintbucket",
"pako",
"palettized",
"paypal",
"PDFs",
"peggys",
"Phlp",
"Photoshop",
"pixeling",
"PLTE",
"PNGs",
"pointerenter",
"pointerleave",
"pointermove",
"pointerup",
"Polski",
"Polszczyzna",
"Português",
"postimg",
"proxied",
"pseudorandomly",
"psppalette",
"Pulaar",
"Pular",
"Pushto",
"qtres",
"rbaycan",
"redoable",
"reenable",
"Rege",
"reimplement",
"repurposable",
"rerender",
"rerendered",
"resizer",
"retarget",
"retargeted",
"rgba",
"RGBAs",
"rightclick",
"rk",
"Română",
"rotologo",
"roundrect",
"roundrects",
"royskt",
"rrect",
"RTLCSS",
"Rumantsch",
"Runa",
"Rundi",
"Sami",
"sandboxed",
"sandboxing",
"Sango",
"Sardu",
"Satana",
"Satanas",
"Scribus",
"scrollable",
"scrollbars",
"Sesotho",
"Setswana",
"shader's",
"Shft",
"Shona",
"Shqip",
"Simi",
"Sinhala",
"Skencil",
"sketchpalette",
"skeuomorphic",
"slenska",
"Slovenčina",
"Slovenščina",
"Slovenski",
"Slovenský",
"Soomaali",
"Soomaaliga",
"sorthweast",
"Sotho",
"soundcloud",
"spacebar",
"spraycan",
"spraypaint",
"spraypainting",
"STRINGTABLE",
"styl",
"stylable",
"submenu",
"submenus",
"subrepo",
"subwindows",
"Suomen",
"Svenska",
"Swati",
"tabbable",
"tabindex",
"Tahoma",
"tbody",
"tesselator",
"tessy",
"textareas",
@ -424,21 +198,13 @@
"themeable",
"themepack",
"throwie",
"Tiếng",
"tileable",
"timespan",
"tina",
"titlebar",
"Toçikī",
"togglable",
"tracedata",
"tracky",
"Tshivenḓa",
"Tsonga",
"Türkçe",
"Tvcy",
"tzebuergesch",
"ufeff",
"undecagons",
"undoable",
"undoables",
@ -448,111 +214,27 @@
"unfocusing",
"uniquify",
"unmaximize",
"unminimize",
"unpremultiplied",
"untrusted",
"upiatun",
"UPNG",
"ustom",
"UTIF",
"Uyghur",
"Uyghurche",
"Vakaviti",
"Valencian",
"Valoda",
"vaporwave",
"Venda",
"Verdana",
"verts",
"Viacam",
"Việt",
"viewports",
"Vlaams",
"Volapük",
"Vosa",
"VSNs",
"Walon",
"Wayback",
"Webamp",
"webglcontextlost",
"webglcontextrestored",
"webp",
"Wikang",
"Winamp",
"",
"woah",
"Wollof",
"xfce",
"xfconf",
"Xitsonga",
"xtras",
"Yângâ",
"youtube",
"zbek",
"Zhōngwén",
"Zhuang",
"zoomable",
"zoomer",
"zyk",
"Ελληνικά",
"Авар",
"Аҧсуа",
"Аҧсшәа",
"Башҡорт",
"Беларуская",
"Български",
"Бызшәа",
"Език",
"Ирон",
"Јазик",
"Језик",
"Коми",
"Кыргыз",
"Кыргызча",
"Қазақ",
"Македонски",
"Мова",
"Монгол",
"Мотт",
"Нохчийн",
"Русский",
"Словѣньскъ",
"Српски",
"Татар",
"Теле",
"Тили",
"Тілі",
"Тоҷикӣ",
"Түркмен",
"Ўзбек",
"Українська",
"Чӑваш",
"Чӗлхи",
"Ѩзыкъ",
"Ӏарул",
"ქართული",
"Հայերեն",
"עברית",
"أۇزبېك",
"ئۇيغۇرچە",
"اردو",
"العربية",
"بهاس",
"پښتو",
"پنجابی",
"تاجیکی",
"سندھی",
"سنڌي",
"فارسی",
"كشميري",
"کوردی",
"ملايو",
"ትግርኛ",
"አማርኛ",
"ພາສາລາວ",
"ꦧꦱꦗꦮ",
"ᐃᓄᒃᑎᑐᑦ",
"ᐊᓂᔑᓈᐯᒧᐎᓐ",
"ᓀᐦᐃᔭᐍᐏᐣ"
]
}