From patchwork Wed Jun 28 11:06:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mike FABIAN X-Patchwork-Id: 71778 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3F8A8385840A for ; Wed, 28 Jun 2023 11:07:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F8A8385840A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1687950446; bh=irbW8Nk0hdMIRMQpd5Md0ols0BqdtgETRkTeGlBfbhY=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=wIBZVGXmu39kKB+LI3OhnGFQK8E7yqH6M0NyRSN2opqWqrvb3xCVnQ8ymEAyaHLDZ xUlVRVrrjAnQ1yWW6RsU5HQ6hwT+tkkUYYgZ/o8sGXI0zCGPHzn9JJsKaip1I4r3yc bznbbOLoCBr+D0JfoE46cgqId3V3PM08T+9+p3sk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 237E33858D37 for ; Wed, 28 Jun 2023 11:06:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 237E33858D37 Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-64--AkWFUXPOpib8KM5_RQaMg-1; Wed, 28 Jun 2023 07:06:55 -0400 X-MC-Unique: -AkWFUXPOpib8KM5_RQaMg-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-98df34aa83aso137880066b.1 for ; Wed, 28 Jun 2023 04:06:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687950414; x=1690542414; h=mime-version:user-agent:message-id:date:organization:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RZVZ6OTv7v41gfz2AmiPzxiHpN640/GvBkqyVYxECPs=; b=dJSs2e/TiNwfRecvi9+xHMUVSksz3LYAN/7jWinSKUnQW0v5jf+hg40vQVyjl1Aha6 gv3anOt5ISIwsapsDEcGppzTQfYSh/zbDGTt8eOuS1JgwOEDWJnnCNq/5TlyNAgyvfVm BYVXdN/j4VWOW1eryFt/3vx/WoIdK3t3ZD/mH2AJL0Ymbc1RWyHg+x0Ko29kKm3WsBsf /J3EEiLpXZzRyQKS85lca3il/o/Xp1rpEnpX/QApb41nu6hweT5KgWzdq/rkl1OcnxTW MVgb+1kz1mWbH8c2GMA8FRW8OPDmXAZKyyg+quGunYM/dL/gYzCyDdcHU96Fy4c64dub /bnQ== X-Gm-Message-State: AC+VfDxPVobM5XDyPCJ5z0VMRF/QPQkKoQE/pnAyojJ/taWt2Iftvumd +tTM1RtpYHEZiqkFgTYs0irp/RtgfuXrOtui+wgJQ9MKNXQMRGvz/wA8vYfVAeH5W5R+0hpT4wG E+3+McUZnm1tH5GRfaIE= X-Received: by 2002:a17:907:3d92:b0:977:d660:c5aa with SMTP id he18-20020a1709073d9200b00977d660c5aamr1113484ejc.31.1687950413849; Wed, 28 Jun 2023 04:06:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4qljU6k+JiNXCSEO06rTjqTRZuQ7x+tPEkzhZHzThycfrzTWSWESNc/in9VwnpkCWrdKtLZA== X-Received: by 2002:a17:907:3d92:b0:977:d660:c5aa with SMTP id he18-20020a1709073d9200b00977d660c5aamr1113463ejc.31.1687950413541; Wed, 28 Jun 2023 04:06:53 -0700 (PDT) Received: from hathi.site (p200300efa74bdd004fe3d3eb395fec61.dip0.t-ipconnect.de. [2003:ef:a74b:dd00:4fe3:d3eb:395f:ec61]) by smtp.gmail.com with ESMTPSA id i26-20020a1709063c5a00b00991d54db2acsm3155000ejg.44.2023.06.28.04.06.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 04:06:53 -0700 (PDT) Received: from hathi.site (localhost [IPv6:::1]) by hathi.site (Postfix) with ESMTP id 6B4E580122; Wed, 28 Jun 2023 13:06:52 +0200 (CEST) To: libc-alpha@sourceware.org Cc: Jens Petersen Subject: [PATCH] Change collation rules in localedata/locales/th_TH to use copy "iso14651_t1" and agree as much as possible with CLDR Organization: Red Hat X-Face: "'; oPz9V1+<,`}1ZuxRv~EiSusWq*{Yjr"Sdvbhq'?q=2R\\6Y9O/,SAE`{J|6I=|w/sQg< rW_N'E3IV6~f8?\l#Es`]S`mv',PY(`8{$$R?+gLu}Qv/Mn>)?uladFjJ@yl!_p_Jh; 5QxlD6zL:?r IXe4FfK$C^mWhh$o`yt; .r.FLZLQOWBt> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_50, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, PP_MIME_FAKE_ASCII_TEXT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Mike FABIAN via Libc-alpha From: Mike FABIAN Reply-To: Mike FABIAN Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Several of the locales still do not use copy "iso14651_t1" but create their collation rules from scratch. It is better to first use iso14651_t1 and then apply only the changes necessary. This patch does this for the th_TH locale (Thai). I made it to agree as much as possible with the rules from CLDR (see: https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml). It seems to be impossible to follow the CLDR rules &[before 1]๚<ฯ # should be "variable" and &๛<ๆ # should be "variable" exactly though. These ask for a primary difference in punctuation characters whose primary weight should be "IGNORE". But using a secondary differnence instead still sorts the test data correctly and the previously used collation in th_TH used tertiary differences for these characters. There was old localedata/th_TH.in test data in TIS-620 encoding which was not used (it was not in the localedata/Makefile). I converted this to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to localedata/Makefile. Using the existing collation rules in the th_TH locale did not sort that test file completely correct, I think my new collation rules based on iso14651_t1 are better. From 12e751450e849ee7d4c8a969a1f7708da589c5ef Mon Sep 17 00:00:00 2001 From: Mike FABIAN Date: Thu, 1 Jun 2023 17:02:44 +0200 Subject: [PATCH] Adapt collation in th_TH locale to use the iso14651_t1_common file and sync the collation with CLDR Use old existing localedata/th_TH.in file (which was not used), converted it to UTF-8, added to the Makefile and modified it a bit. The existing th_TH collation did not sort that file completely correct. --- localedata/Makefile | 2 + localedata/locales/th_TH | 828 ++++---------------------------------- localedata/th_TH.UTF-8.in | 163 ++++++++ localedata/th_TH.in | 178 -------- 4 files changed, 252 insertions(+), 919 deletions(-) create mode 100644 localedata/th_TH.UTF-8.in delete mode 100644 localedata/th_TH.in diff --git a/localedata/Makefile b/localedata/Makefile index 3619b6d47e..0fdbaae563 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -111,6 +111,7 @@ test-input := \ syr.UTF-8 \ szl_PL.UTF-8 \ tg_TJ.UTF-8 \ + th_TH.UTF-8 \ tk_TM.UTF-8 \ tr_TR.UTF-8 \ tt_RU.UTF-8 \ @@ -303,6 +304,7 @@ LOCALES := \ syr.UTF-8 \ szl_PL.UTF-8 \ tg_TJ.UTF-8 \ + th_TH.UTF-8 \ tk_TM.UTF-8 \ tr_TR.ISO-8859-9 \ tr_TR.UTF-8 \ diff --git a/localedata/locales/th_TH b/localedata/locales/th_TH index 7a10376e80..f97b6bdcb4 100644 --- a/localedata/locales/th_TH +++ b/localedata/locales/th_TH @@ -62,750 +62,96 @@ END LC_CTYPE LC_COLLATE -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" -collating-element from "" - -collating-symbol -collating-symbol -collating-symbol -collating-symbol -collating-symbol - -order_start forward;forward;forward;forward - -% definitions of extra collating symbols - - - - - - -UNDEFINED IGNORE;IGNORE;IGNORE;IGNORE - -% punctuation marks, ordered after ISO/IEC 14651 - IGNORE;IGNORE;;IGNORE % SPACE - IGNORE;IGNORE;;IGNORE % LOW LINE - IGNORE;IGNORE;;IGNORE % HYPHEN-MINUS - IGNORE;IGNORE;;IGNORE % COMMA - IGNORE;IGNORE;;IGNORE % SEMICOLON - IGNORE;IGNORE;;IGNORE % COLON - IGNORE;IGNORE;;IGNORE % EXCLAMATION MARK - IGNORE;IGNORE;;IGNORE % QUESTION MARK - IGNORE;IGNORE;;IGNORE % SOLIDUS - IGNORE;IGNORE;;IGNORE % FULL STOP - IGNORE;IGNORE;;IGNORE % THAI CHARACTER PAIYANNOI - IGNORE;IGNORE;;IGNORE % THAI CHARACTER MAIYAMOK - IGNORE;IGNORE;;IGNORE % GRAVE ACCENT - IGNORE;IGNORE;;IGNORE % CIRCUMFLEX - IGNORE;IGNORE;;IGNORE % TILDE - IGNORE;IGNORE;;IGNORE % APOSTROPHE - IGNORE;IGNORE;;IGNORE % QUOTATION MARK - IGNORE;IGNORE;;IGNORE % LEFT PAREN. - IGNORE;IGNORE;;IGNORE % LT BRACKET - IGNORE;IGNORE;;IGNORE % LEFT CURLY BRACKET - IGNORE;IGNORE;;IGNORE % RIGHT CURLY BRACKET - IGNORE;IGNORE;;IGNORE % RT BRACKET - IGNORE;IGNORE;;IGNORE % RIGHT PAREN. - IGNORE;IGNORE;;IGNORE % COMMERCIAL AT - IGNORE;IGNORE;;IGNORE % THAI CHARACTER SYMBOL BAHT - IGNORE;IGNORE;;IGNORE % DOLLAR SIGN - IGNORE;IGNORE;;IGNORE % THAI CHARACTER FONGMAN - IGNORE;IGNORE;;IGNORE % THAI CHARACTER ANGKHANKHU - IGNORE;IGNORE;;IGNORE % THAI CHARACTER KHOMUT - IGNORE;IGNORE;;IGNORE % ASTERISK - IGNORE;IGNORE;;IGNORE % BACK SOLIDUS - IGNORE;IGNORE;;IGNORE % AMPERSAND - IGNORE;IGNORE;;IGNORE % NUMBER SIGN - IGNORE;IGNORE;;IGNORE % PERCENT - IGNORE;IGNORE;;IGNORE % PLUS - IGNORE;IGNORE;;IGNORE % LESS THAN - IGNORE;IGNORE;;IGNORE % EQUAL - IGNORE;IGNORE;;IGNORE % GREATER THAN - IGNORE;IGNORE;;IGNORE % VERTICAL LINE - -% Thai tone marks and diacritics - IGNORE;;; % THAI CHARACTER YAMAKKAN - IGNORE;;; % THAI CHARACTER PINTHU - IGNORE;;; % THAI CHARACTER THANTHAKHAT - IGNORE;;; % THAI CHARACTER MAITAIKHU - IGNORE;;; % THAI CHARACTER MAI EK - IGNORE;;; % THAI CHARACTER MAI THO - IGNORE;;; % THAI CHARACTER MAI TRI - IGNORE;;; % THAI CHARACTER MAI CHATTAWA - -% Arabic and Thai decimal digits - ;;; % DIGIT ZERO - ;;; % THAI DIGIT ZERO - ;;; % DIGIT ONE - ;;; % THAI DIGIT ONE - ;;; % DIGIT TWO - ;;; % THAI DIGIT TWO - ;;; % DIGIT THREE - ;;; % THAI DIGIT THREE - ;;; % DIGIT FOUR - ;;; % THAI DIGIT FOUR - ;;; % DIGIT FIVE - ;;; % THAI DIGIT FIVE - ;;; % DIGIT SIX - ;;; % THAI DIGIT SIX - ;;; % DIGIT SEVEN - ;;; % THAI DIGIT SEVEN - ;;; % DIGIT EIGHT - ;;; % THAI DIGIT EIGHT - ;;; % DIGIT NINE - ;;; % THAI DIGIT NINE - -% Latin alphabet - ;;; % A - ;;; % a - ;;; % B - ;;; % b - ;;; % C - ;;; % c - ;;; % D - ;;; % d - ;;; % E - ;;; % e - ;;; % F - ;;; % f - ;;; % G - ;;; % g - ;;; % H - ;;; % h - ;;; % I - ;;; % i - ;;; % J - ;;; % j - ;;; % K - ;;; % k - ;;; % L - ;;; % l - ;;; % M - ;;; % m - ;;; % N - ;;; % n - ;;; % O - ;;; % o - ;;; % P - ;;; % p - ;;; % Q - ;;; % q - ;;; % R - ;;; % r - ;;; % S - ;;; % s - ;;; % T - ;;; % t - ;;; % U - ;;; % u - ;;; % V - ;;; % v - ;;; % W - ;;; % w - ;;; % X - ;;; % x - ;;; % Y - ;;; % y - ;;; % Z - ;;; % z +% Copy the template from ISO/IEC 14651 +copy "iso14651_t1" +% CLDR collation rules for Thai: +% (see: https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml) % -% Thai consonants, with leading vowels rearrangement +%[normalization on] +%[alternate shifted] +%[reorder Thai] +% # +% # The following tailoring is an adjustment of the +% # DUCET collation order for PAIYANNOI, MAIYAMOK, +% # NIKHAHIT, LAKKHANGYAO, and PHINTHU. This gives +% # a sort order as defined in the Royal Institute +% # Dictionary 2542 B.E. Edition (1999 A.D.). +% # +% &[before 1]๚<ฯ # should be "variable" % - ;;; % THAI CHARACTER KO KAI - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER KHO KHAI - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER KHO KHUAT - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER KHO KHWAI - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER KHO KHON - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER KHO RAKHANG - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER NGO NGU - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER CHO CHAN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER CHO CHING - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER CHO CHANG - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER SO SO - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER CHO CHOE - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER YO YING - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER DO CHADA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER TO PATAK - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER THO THAN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER THO NANGMONTHO - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER THO PHUTHAO - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER NO NEN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER DO DEK - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER TO TAO - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER THO THUNG - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER THO THAHAN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER THO THONG - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER NO NU - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER BO BAIMAI - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER PO PLA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER PHO PHUNG - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER FO FA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER PHO PHAN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER FO FAN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER PHO SAMPHAO - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER MO MA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER YO YAK - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER RO RUA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER RU - - ;;; % THAI CHARACTER LO LING - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER LU - - ;;; % THAI CHARACTER WO WAEN - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER SO SALA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER SO RUSI - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER SO SUA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER HO HIP - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER LO CHULA - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER O ANG - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER HO NOKHUK - "";;; - "";;; - "";;; - "";;; - "";;; - - ;;; % THAI CHARACTER NIKHAHIT - -% order of Thai vowels - ;;; % THAI CHARACTER SARA A - ;;; % THAI CHARACTER MAI HAN-AKAT - ;;; % THAI CHARACTER SARA AA - ;;; % THAI CHARACTER LAKKHANGYAO - ;;; % THAI CHARACTER SARA AM - ;;; % THAI CHARACTER SARA I - ;;; % THAI CHARACTER SARA II - ;;; % THAI CHARACTER SARA UE - ;;; % THAI CHARACTER SARA UEE - ;;; % THAI CHARACTER SARA U - ;;; % THAI CHARACTER SARA UU - ;;; % THAI CHARACTER SARA E - ;;; % THAI CHARACTER SARA AE - ;;; % THAI CHARACTER SARA O - ;;; % THAI CHARACTER SARA AI MAIMUAN - ;;; % THAI CHARACTER SARA AI MAIMALAI - -order_end +% &๛<ๆ # should be "variable" +% +% &๎<<์ +% &[before 1]ะ<à¹? +% &า<<<ๅ +% &าà¹?<<<à¹?า<<<ำ +% &ๅà¹?<<<à¹?ๅ +% &ไ<ฺ +% # consider: order pali virama as secondary different from yammacan (another old virama) +% # &๎ +% # <<ฺ +% # + +collating-element from "" +% This is already defined in iso14651_t1: +% collating-element from "" % decomposition of THAI CHARACTER SARA AM + +collating-element from "" % LAKKHANGYAO + NIKHAHIT +collating-element from "" % NIKHAHIT + LAKKHANGYAO +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% Finished defining collating-elements and collating-symbols +% +% One dummy reorder-after statement here to avoid a syntax error +% because the first rule reordering stuff starts without a reorder-after: +collating-symbol +reorder-after % FULL STOP + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +% &[before 1]๚<ฯ # should be "variable" +% ๚ U+0E5A should keep "IGNORE" as the primary weight (as defined in iso14651_t1_common). +% Therefore, I cannot sort ฯ U+0E2F before ๚ U+0E5A as a primary difference. +% Sorting it before as a secondary difference works though. To sort the existing test data +% in the correct order, this seems good enough. The previous collation in +% this th_TH locale, which did not use 'copy "iso14651_t1"' had these characters +% as a tertinary difference: +% IGNORE;IGNORE;;IGNORE % THAI CHARACTER PAIYANNOI +% IGNORE;IGNORE;;IGNORE % THAI CHARACTER ANGKHANKHU + IGNORE;"";IGNORE; % ฯ THAI CHARACTER PAIYANNOI + IGNORE;"";IGNORE; % ๚ THAI CHARACTER ANGKHANKHU +% &๛<ๆ # should be "variable" +% ๛ U+0E5B should keep "IGNORE" as the primary weight (as defined in iso14651_t1_common). +% Therefore I cannot sort ๆ U+0E46 after ๛ U+0E5B as a primary difference. +% Sorting it after as a secondary differnce works though and it seems good enough +% to sort the existing test data in the correct order. The previous collation in +% this th_TH locale, which did not use 'copy "iso14651_t1"' had these characters +% as a tertinary difference: +% IGNORE;IGNORE;;IGNORE % THAI CHARACTER MAIYAMOK +% IGNORE;IGNORE;;IGNORE % THAI CHARACTER KHOMUT + IGNORE;"";IGNORE; % ๛ THAI CHARACTER KHOMUT + IGNORE;"";IGNORE; % ๆ THAI CHARACTER MAIYAMOK +% &๎<<์ + IGNORE;;IGNORE; % ๎ THAI CHARACTER YAMAKKAN + IGNORE;;IGNORE; % ์ THAI CHARACTER THANTHAKHAT +% &[before 1]ะ<à¹? + "";;; % à¹? THAI CHARACTER NIKHAHIT + "";;; % ะ THAI CHARACTER SARA A +% &า<<<ๅ + ;;; % า THAI CHARACTER SARA AA + ;;; % ๅ THAI CHARACTER LAKKHANGYAO +% &าà¹?<<<à¹?า<<<ำ + ;;; % าà¹? decomposition of THAI CHARACTER SARA AM + ;;; % à¹?า decomposition of THAI CHARACTER SARA AM + ;;; % ำ THAI CHARACTER SARA AM +% &ๅà¹?<<<à¹?ๅ + ;;; % LAKKHANGYAO + NIKHAHIT + ;;; % NIKHAHIT + LAKKHANGYAO +% &ไ<ฺ +reorder-after + + +reorder-end END LC_COLLATE diff --git a/localedata/th_TH.UTF-8.in b/localedata/th_TH.UTF-8.in new file mode 100644 index 0000000000..06263dda34 --- /dev/null +++ b/localedata/th_TH.UTF-8.in @@ -0,0 +1,163 @@ +* +. +๎ +์ +ฯ +๚ +๛ +ๆ +0 +à¹? +0000 +à¹?à¹?à¹?à¹? +10 +๑à¹? +9 +๙ +9999 +๙๙๙๙ +a +A +๎A +์a +ฯä +๚a +๛ä +ๆa +b +B +à¸?à¸? +à¸?รรม +à¸?รรม์ +à¸?ราบ +à¸?ะเà¸?ณฑ์ +à¸?ัà¸? +à¸?้าว +à¸?ำ +à¸?ิน +à¸?ี่ +à¸?ึ๋น +à¸?ุน +à¸?ูด +เà¸?้ง +เà¸?ล้า +เà¸?ลียว +เà¸?้า +เà¸?าะ +เà¸?ี่ยว +เà¸?ี๊ยะ +เà¸?ือà¸? +à¹?à¸?ง +à¹?à¸?ะ +โà¸?น +โà¸?ร๋น +ใà¸?ล้ +ไà¸?่ +ไà¸?ล +ข้น +ขนาบ +ขาง +ข่าง +ข้าง +ข้างๆ +ข้างà¸?ระดาน +ข้างขึ้น +ข้างควาย +ข้างๆ คูๆ +ข้างเงิน +ข้างà¹?รม +ข้างออà¸? +เข็ด +เขน +เข็น +เข่น +à¹?ข็ง +à¹?ข่ง +à¹?ข้ง +à¹?ข้งขวา +à¹?ข็งขัน +à¹?ข่งขัน +à¹?ขน +à¹?ขวะ +ฃวด +ครรภ- +ครรภ์ +ฅอ +งาม +จุมพล +จุà¹?พล +ฉาà¸? +ชาย +ซาบ +à¸?าณ +ฎีà¸?า +à¸?าน +ฑาหะ +เฒ่า +เณร +ดนตรี +ตลาด +ถนน +ทูลเà¸?ล้า +ทูลเà¸?ล้าฯ +ทูลเà¸?ล้าทูลà¸?ระหม่อม +ธนาคาร +น้า +น้ำ +นี้ +บุà¸?à¸?า +บุà¸?หลง +ปา +ป่า +ป้า +ป๊า +ป๋า +ปาน +ป่าน +ป้าน +à¹?ป้ง +ผัด +à¸?า +ฯพณฯ +พณิชย์ +ฟาง +ภาษี +ม้า +ย่อง +รอง +ฤทธิ์ +ฤษี +ฤๅษี +ลลิตา +ฦๅชา +วà¸? +ศาล +ษมา +สà¸?ุล +หริภุà¸?ชัย +หฤทัย +หลง +à¹?หง่ +à¹?ห่ง +à¹?หนม +à¹?หนหวง +à¹?หบ +à¹?หม +อาน +ฮา +ไฮโล +à¹? +à¹?ä +ะ +ะa +า +ๅ +ๅà¹? +à¹?ๅ +ๅa +าä +าà¹? +à¹?า +ำ +ไ +ฺ diff --git a/localedata/th_TH.in b/localedata/th_TH.in deleted file mode 100644 index cc93d1f264..0000000000 --- a/localedata/th_TH.in +++ /dev/null @@ -1,178 +0,0 @@ -@@@@@ -0000 -10 litre -10 litre (10 ÅÔµÃ) -10 litre (ñð ÅÔµÃ) -10 ÅԵà -ñð ÅԵà -10 ÅԵà (10 litre) -ñð ÅԵà (10 litre) -ñð ÅԵà [10 litre] -ñð ÅԵà {10 litre} -9999 -A -a -A- -a- -A. -a. -a' --a -A-1 -AA -aa -A.A. -a.a. -AAA -A.A.A. -AAAA -A.A.A.L. -A.A.A.S. -Aachen -A.A.E. -A.Ae.E. -A.A.E.E. -AAES -AAF -A.Agr -aah -Aalborg -aide -air -air@@@ -@@@air -C.A.F -Canon -COOP -coop -CO-OP -co-op -Copenhagen -McArthur -Mc Arthur -Mc Mahon -vice-president -vice versa -vice-versa -¡¡ -¡ÃÃÁ -¡ÃÃÁì --¡ÃÐáÂè§ -¡ÃÒº -¡Ðࡳ±ì -¡Ñ¡ -¡éÒÇ -¡Ó -¡Ô¹ -¡Õè -¡Öë¹ -¡Ø¹ -¡Ù´ -à¡é§ -à¡ÅéÒ -à¡ÅÕÂÇ -à¡éÒ -à¡ÒÐ -à¡ÕèÂÇ -à¡ÕêÂÐ -à¡×Í¡ -ᡧ -á¡Ð -⡹ -â¡Ãë¹ -ã¡Åé -ä¡è -ä¡Å -¢é¹ -¢¹Òº -¢Ò§ -¢èÒ§ -¢éÒ§ -¢éÒ§æ -¢éÒ§¡Ãдҹ -¢éÒ§¢Öé¹ -¢éÒ§¤ÇÒ -¢éÒ§æ ¤Ùæ -¢éÒ§à§Ô¹ -¢éÒ§áÃÁ -¢éÒ§ÍÍ¡ -ࢹ -à¢ç¹ -à¢è¹ -à¢ç´ -á¢ç§ -á¢è§ -á¢é§ -á¢é§¢ÇÒ -á¢ç§¢Ñ¹ -á¢è§¢Ñ¹ -ᢹ -á¢ÇÐ -£Ç´ -¤ÃÃÀ- -¤ÃÃÀì -¥Í -§ÒÁ -¨ØÁ¾Å -¨Øí¾Å -©Ò¡ -ªÒ -«Òº -­Ò³ -®Õ¡Ò -°Ò¹ -±ÒËÐ -à²èÒ -à³Ã -´¹µÃÕ -µÅÒ´ -¶¹¹ -·ÙÅà¡ÅéÒ -·ÙÅà¡ÅéÒÏ -·ÙÅà¡ÅéÒ·ÙÅ¡ÃÐËÁèÍÁ -¸¹Ò¤Òà -¹éÒ -¹éÓ -¹Õé -ºØ­­Ò -ºØ­Ëŧ -ºØ­-Ëŧ -»Ò -»èÒ -»éÒ -»êÒ -»ëÒ -»Ò¹ -»èÒ¹ -»éÒ¹ -á»é§ -¼Ñ´ -½Ò -Ͼ³Ï -¾³ÔªÂì -¿Ò§ -ÀÒÉÕ -ÁéÒ -Âèͧ -Ãͧ -Ä·¸Ôì -ÄÉÕ -ÄåÉÕ -ÅÅÔµÒ -ÆåªÒ -Ç¡ -ÈÒÅ -ÉÁÒ -Ê¡ØÅ -ËÃÔÀØ­ªÑ -ËÄ·Ñ -Ëŧ -á˧è -áËè§ -á˹Á -á˹Ëǧ -á˺ -áËÁ -ÍÒ¹ -ÎÒ -äÎâÅ -- 2.41.0