Cross-Linguistic Transcription and Phonological Representation in the Hu\`it\'onggu\v{a}nx\`i Hu\'ay\'iy\`iy\v{u}

arXiv cs.CL Papers

Summary

This paper analyzes the Huitongguanxi Huayiyiyu, a series of multilingual glossaries from the Ming dynasty, as a structured cross-linguistic transcription system that used Chinese characters to represent non-Chinese languages, revealing how Chinese phonological categories were flexibly extended for phonetic approximation.

arXiv:2605.14480v1 Announce Type: new Abstract: Purpose: This study investigates the transcription principles underlying Hu\`it\'onggu\v{a}nx\`i Hu\'ay\'iy\`iy\v{u} (HHY), a series of multilingual glossaries compiled by the Ming government between the fifteenth and sixteenth centuries for interpreter training. The study treats HHY not as a collection of isolated language materials, but as a coherent multilingual transcription system representing spoken forms of non-Chinese languages through Chinese characters. Methods: A substantial portion of HHY was digitized and aligned with Chinese phonological categories. Previous reconstructions of individual language sections were critically reviewed and integrated into a unified comparative database. The analysis focuses on cross-linguistic regularities in Main Transcription (MT) and Supplementary Transcription (ST) across eight language sections. Results: MT generally represents sounds compatible with the Chinese syllable structure of the period, whereas ST mainly encodes phonetic features less compatible with Chinese phonology. The analysis further shows that Chinese phonological categories were used more flexibly in foreign-language transcription than previously assumed. HHY therefore functioned as a relatively systematic method of phonetic approximation rather than a direct projection of Chinese phonology onto non-Chinese languages. Conclusion: HHY can be analyzed as an internally structured transcription system rather than merely as a collection of glossaries. More broadly, the study demonstrates that historical transcription systems can provide valuable evidence for historical phonology, particularly for under-documented Asian languages with limited historical records.
Original Article
View Cached Full Text

Cached at: 05/15/26, 06:22 AM

# Huìtóngguǎnxì Huáyíyìyǔ and Premodern Asian Phonetic Transcription: A Cross-Linguistic Reconstruction
Source: [https://arxiv.org/html/2605.14480](https://arxiv.org/html/2605.14480)
\\setmainfont

\[ Path = \./, UprightFont = texgyretermes\-regular\.otf, ItalicFont = texgyretermes\-italic\.otf, BoldFont = texgyretermes\-bold\.otf, BoldItalicFont = texgyretermes\-bolditalic\.otf \]texgyretermes\\setCJKmainfont\[ Path = \./, UprightFont = NotoSerifCJKkr\-Regular\.otf, BoldFont = NotoSerifCJKkr\-Bold\.otf, ItalicFont = NotoSerifCJKkr\-Regular\.otf, BoldItalicFont = NotoSerifCJKkr\-Bold\.otf \]NotoSerifCJKkr\\newfontfamily\\ipafont\[ Path = \./, UprightFont = Charis\-Regular\.ttf, ItalicFont = Charis\-Italic\.ttf, BoldFont = Charis\-Bold\.ttf, BoldItalicFont = Charis\-BoldItalic\.ttf \]Charis

\[1\]\\fnmJi\-eun\\surKim

\[1\]\\orgdivDepartment of Korean Language and Literature,\\orgnameDuksung Women’s University,\\citySeoul,\\countrySouth Korea

###### Abstract

Purpose:This study investigates the transcription principles underlying theHuitongguanxi Huayiyiyu\(會同館係 華夷譯語; HHY\), a series of multilingual glossaries compiled by the Ming government between the fifteenth and sixteenth centuries for interpreter training\. Rather than treating HHY as a collection of language\-specific materials, the study approaches it as a cross\-linguistically coherent transcription system designed to represent spoken forms of non\-Chinese languages using Chinese characters\.

Methods:A substantial portion of HHY was digitized and aligned with Chinese phonological categories \(yīnxì音系\)\. Previous reconstructions of individual language sections were critically reviewed and integrated into a unified comparative database\. The analysis examines cross\-linguistic regularities in Main Transcription \(MT\) and Supplementary Transcription \(ST\) across multiple language sections, including Korean, Japanese, Mongolian, and Jurchen\.

Results:The analysis shows that MT generally represents sounds compatible with the Chinese syllable structure of the period, whereas ST is primarily used to encode phonetic features that could not be readily accommodated within Chinese phonology\. The results further show that Chinese transcription categories were used more flexibly in foreign\-language transcription than has generally been assumed on the basis of reconstructed Chinese phonology\. HHY therefore functioned as a relatively structured system of phonetic approximation rather than as a direct projection of Chinese phonology onto non\-Chinese languages\.

Conclusion:The study demonstrates that HHY can be analyzed as an internally structured transcription system rather than merely as a collection of isolated glossaries\. HHY constitutes an important but underutilized resource for reconstructing under\-documented Asian languages and for understanding how Chinese phonological categories were extended and reinterpreted in cross\-linguistic transcription practices\.

###### keywords:

Huayiyiyu, Huitongguan, historical phonology, Chinese phonology, multilingual glossaries, Asian languages

## 1Introduction

Historical Linguistics has traditionally relied on the comparative method and internal reconstruction\[[35](https://arxiv.org/html/2605.14480#bib.bib35)\], both of which have well\-recognized limitations\. The comparative method is constrained by relative temporal limitations, socio\-historical limitations, linguistic domain limitations, and “delicacy” limitations\[[12](https://arxiv.org/html/2605.14480#bib.bib12)\]\. Internal reconstruction, although applicable to single languages, inherits many of these constraints and faces an additional difficulty: orthographic records rarely constitute direct representations of speech\. Rather, orthographies function as mnemonic systems shaped by writing conventions, leaving a persistent gap between recorded forms and actual phonological structure\[[21](https://arxiv.org/html/2605.14480#bib.bib21)\]\. Despite these well\-known issues, relatively little attention has been paid to developing alternative sources or methods that can complement these dominant approaches\.

Against this background,huìtóngguǎnxì huáyíyìyǔ會同館係 華夷譯語 \(HHY\) represents an underexplored yet methodologically promising resource\. HHY is a series of multilingual wordlists compiled by the Ming dynasty government between the fifteenth and sixteenth centuries for the training of official interpreters\. Unlike literary translations or conventional lexical borrowings, HHY was designed to record spoken forms of non\-Chinese languages using Chinese characters as transcriptional devices\. As such, it occupies a distinctive position between orthography and phonetic notation\. The corpus documents twelve Asian languages, including languages that are now extinct or severely under\-documented, such as Jurchen and Cham, alongside languages with relatively rich historical records, such as Korean and Japanese\.

Several features make HHY a particularly valuable resource for the study of Asian historical phonology\. First, the period of compilation is comparatively well constrained and documented, allowing the transcribed forms to be situated within a relatively narrow historical window\. Second, HHY explicitly reflects spoken languages, as it was designed to transcribe contemporaneous pronunciation rather than to record written norms\. Third, the coexistence of multiple languages within a single transcriptional framework makes it possible to examine the internal consistency of the transcription system itself\. Languages with well\-established historical phonologies can serve as calibration points, enabling more informed interpretations of languages with limited independent documentation\. In this sense, HHY is not merely a collection of language\-specific materials, as treated in much of the previous research, but a cross\-linguistically coherent transcription system that can be analyzed in its own right\.

Despite this potential, HHY has not been studied in a comprehensive manner\. Previous research has largely been language\-specific, scattered across publications in Chinese, Korean, and Japanese, and rarely accessible to an international audience\. As a result, reconstructions are often outdated, parallel findings remain unconnected, and questions that have already been addressed in one scholarly tradition are repeatedly raised in another\. In addition, the technical difficulty of the corpus, which requires familiarity with Chinese historical phonology as well as the target languages, further limits the broader engagement\. Furthermore, previous studies tended to interpret transcriptions in HHY primarily through the lens of contemporaneous Chinese phonology\. The phonetic values of the Chinese characters used for transcription have often been inferred directly from reconstructed Middle Chineseyīnxì音系, implicitly assuming that they transcribed the same or at least similar phonetic values when used to transcribe foreign languages\. However, this assumption overlooks a well\-attested property of writing systems: when scripts are used as phonetic notation for foreign languages, the range of sounds they represent may shift beyond their native phonetic values\.

We aim to lower these barriers by analyzing HHY as a systematic corpus of Chinese character\-based phonetic transcription\. To this end, we conducted three related tasks\. First, a substantial portion of the HHY was digitized and systematically aligned with information on Chineseyīnxì音系, making the data accessible and searchable\. Second, reliable reconstructions for each language section were collected, evaluated, and integrated into a unified database\. Third, this study examines cross\-linguistic distributional patterns in two types of transcriptional elements: Main Transcription \(MT\) and Supplementary Transcription \(ST\)\. Here, MT refers to primary Chinese characters that encode sounds compatible with the Chinese syllable structure of the HHY period, while ST refers to additional characters used to represent sounds that fall outside the Chinese syllable structure of the HHY period\. The concept of ST in this study is based on earlier discussions ofcheomgi添記 in the analysis ofhyangchal鄕札 notation, an early Korean writing system employing Chinese characters both semantically and phonetically\[[18](https://arxiv.org/html/2605.14480#bib.bib18)\], but it is not assumed to be identical in function or scope\. Unlikehyangchal鄕札, HHY does not involve semantic glossing, namelyhundok訓讀, nor does it aim to represent morphemes or words as abstract linguistic units\. Rather, the ST in HHY operates within a phonetic, rather than morphographic, transcription system based on Chinese phonological categories\.

On this basis, we reconstruct the general transcription principles of HHY and clarify how the phonetic ranges of Chineseyīnxì音系 were extended when Chinese characters were used to transcribe foreign languages\. Through this approach, we seek not only to facilitate further research on HHY itself, but also to demonstrate how it can serve as a new historical resource for Asian historical phonology, particularly for under\-documented languages with limited or fragmented historical records\.

## 2Background

### 2\.1Four classes ofhuáyíyìyǔ

huáyíyìyǔ華夷譯語 is a general term that indicates the dictionaries between the Chinese language and the languages of the neighboring regions, published by the Chinese government from the earlyMíng明 dynasty and to the mid\-Qīng淸 dynasty\[[23](https://arxiv.org/html/2605.14480#bib.bib23),[24](https://arxiv.org/html/2605.14480#bib.bib24)\]The dictionaries can be classified into four groups, as shown in Table[1](https://arxiv.org/html/2605.14480#S2.T1):

Table 1:Comparison of multilingual glossaries compiled in premodern ChinaPublisherTarget language\(s\)TranscriptionVarietyErahuáyíyìyǔ,mǎshāyìhēi, etc\.MongolianChinese charactersWritten language14thcenturysìyíguǎnMongolian, Jurchen, Tibetan, Sanskrit, Persian, Tai, Uyghur, Burmese, ThaiChinese characters \+ original scriptWritten language14thcenturyhuìtóngguǎnMongolian, Jurchen, Tibetan, Persian, Uyghur, Burmese, Thai, Korean, Ryukyuan, Japanese, Vietnamese, Cham, ShanChinese charactersSpoken language15th–16thcenturyhuìtóng sìyíguǎn\(36 languages\)Chinese charactersWritten language17thcenturyNo officially standardized terminology has yet been established for these four classes\. In existing literature, however, they have been categorized in two main ways\. First, following the chronological classification proposed by\[[14](https://arxiv.org/html/2605.14480#bib.bib14)\], the four classes are labeledkōshu甲種,otsushu乙種,heishu丙種, andteishu丁種\. Second, they are named according to their publishing institutions, except for the first class, whose publishers cannot be subsumed under a single designation:zuìgǔběn最古本,sìyíguǎnxì四夷館係,huìtóngguǎnxì會同館係, andhuìtóngsìyíguǎnxì會同四譯館係\[[23](https://arxiv.org/html/2605.14480#bib.bib23),[20](https://arxiv.org/html/2605.14480#bib.bib20),[17](https://arxiv.org/html/2605.14480#bib.bib17)\]\. As the second approach is more widely adopted in previous studies, this study follows that convention\. In this study,huìtóngguǎnxì會同館係, namely HHY is examined\.

We selected HHY because of its strong colloquial orientation, which allows for the analysis of phonetic and phonological properties of the target languages\. As shown in Table[1](https://arxiv.org/html/2605.14480#S2.T1), the other classes primarily document written languages\. This difference reflects their distinct purposes of compilation\. While HHY was compiled for the training of interpreters, the remaining classes were intended for the training of translators\. The colloquial nature of HHY is particularly evident in materials on Mongolian, Uyghur, Persian, Tibetan, and Jurchen, whose spoken forms diverge substantially from their written traditions, which preserve much older stages of the languages\.

### 2\.2Composition ofhuìtóngguǎnxì huáyíyìyǔ

According to previous studies, six different versions of HHY have been identified, currently preserved in various locations worldwide, including London, Japan \(Seikadō Bunko 静嘉堂文庫; Nankaidō 阿波国; Inaba 稻葉君山\), Taiwan, and Seoul \(Seoul National University\)\[[23](https://arxiv.org/html/2605.14480#bib.bib23),[15](https://arxiv.org/html/2605.14480#bib.bib15)\]\. Earlier studies also report the existence of two additional versions formerly held in Hanoi and in Japan \(Mito Akira 水戸彰\), although these are no longer extant\.

The remaining versions do not contain identical sets of languages\. Rather, each version presents a different combination of target languages, as summarized in Table[2](https://arxiv.org/html/2605.14480#S2.T2)\. For the Hanoi and Akira versions, the language lists included in the table are limited to those that can be identified from previous studies; it remains unclear whether additional languages were originally included in these versions\.

Table 2:Coverage of target languages across different editions or repositoriesLondonSeikadoNankaidoInabaTaiwanSNUHanoiAkiraKoreanYNYYYYYYRyukyuanYNYYYYJapaneseYYYYYYVietnameseYYYYYYChamYYYNNNThaiYYYYYYMongolianNYYYYYYNUyghurYYYYYYTibetanNYYNNNNNPersianYYYNNNMalayYYYNYYJurchenNYYNNNShanYYYYNN
Note: Y = yes \(present\); N = no \(absent\)\.

The Nankaido version appears to be the most comprehensive, containing 13 languages: Mongolian, Jurchen, Tibetan, Persian, Uyghur, Burmese, Thai, Korean, Ryukyuan, Japanese, Vietnamese, Cham, and Shan\. However, in the present study, only eight languages are selected for the reconstruction of the HHY transcription system\. Although the analytical framework developed here can, in principle, be extended to the remaining languages, this limitation is imposed to ensure a solid empirical basis for reconstruction\.

In HHY, each language section constitutes a separate chapter and contains several hundred lexical entries\. These entries are organized into thematic categories based on lexical meaning, as illustrated in Table[4](https://arxiv.org/html/2605.14480#S2.T4)\.

Among these, with regard toshēng聲 inshēngsèmén\(聲色門\) of Cham,\[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]suggested that it might instead beyán顔\. However, when the contents of this section \(mén門\) are taken into account and compared with corresponding sections in other yiyu 譯語 that contain similar material, it can be confirmed that the correct reading is not \(yán\) 顔, but \(shēng\) 聲\. This section includes colors\-related vocabulary, such ashuáng黃 \(C\-539\),qīng靑 \(C\-541\),hóng紅 \(C\-542\), andbái白 \(C\-543\)\. SinceEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9), p\. 87\]themselves translated the name of this section \(mén門\) as “Colours,” there is little doubt that this section introduces color\-related vocabulary\.

Although a set of core themes is shared across languages, certain themes appear only in specific language sections, and some languages exhibit unique thematic categories absent from others\. This variation likely reflects culture\-specific lexical domains\. For example, the categories Diagram names and Sexagenary cycle occur exclusively in the Korean section, suggesting a particularly close cultural and intellectual connection between the Korean and Chinese communities at the time\.

Meanwhile, each entry of HHY has either two or three rows, as seen in Table[3](https://arxiv.org/html/2605.14480#S2.T3)\.

Table 3:Examples of chapter\-initial header entries\(a\)天日月星風哈嫩二害得一別二把論忝忍臥省捧

\(b\)天地日月風難薩你麻老瓦弄

Table[3](https://arxiv.org/html/2605.14480#S2.T3)presents the structure of entries in the Korean section \(a\) and the Tibetan section \(b\)\. Not all entries contain fully populated second or third rows; some are left blank\. The first row provides the semantic gloss, the second row transcribes the pronunciation of the native lexical item using Chinese characters, and the third row transcribes the corresponding Sino lexical form\. In this study, the third row is excluded from analysis, as the phonology of Sino lexical items is heavily shaped by Chinese phonological systems and thus differs fundamentally from that of native vocabulary\.

Table 4:Chapter titles and sizes across the eight language sectionsLanguage1234567KoreanAstronomyGeographyTimeFlowers and TreesBirds and AnimalsBuildingsTools and UtensilsJapaneseAstronomyGeographyTimeFlowers and TreesBirds and AnimalsBuildingsTools and UtensilsMongolianAstronomyGeographyFlowers and TreesBirds and AnimalsBuildingsTools and UtensilsHumansTibetanAstronomyGeographyToponymyTimeFlowers and TreesBirds and AnimalsBuildingsUyghurAstronomyGeographyToponymyTimeFlowers and TreesHuman mattersNational eventsPersianAstronomyGeographyToponymySeasonsFlowers and TreesBirds and AnimalsBuildingsMalayAstronomyGeographyTimeFlowers and TreesBirds and AnimalsBuildingsTools and UtensilsChamAstronomyGeographyTimeFlowers and TreesBirds and AnimalsBuildingsTools and Utensils
Language891011121314KoreanHumansHuman mattersBodyClothesColorsJewels and ValuablesFood and DrinkJapaneseHumansHuman mattersBodyClothesFood and DrinkJewels and ValuablesLiterature and HistoryMongolianHuman mattersBodyClothesFood and DrinkJewels and ValuablesLiterature and HistoryColorsTibetanTools and UtensilsHumansHuman mattersBodyClothesFood and DrinkJewels and ValuablesUyghurHumansBodyClothesFood and DrinkTools and UtensilsBirds and AnimalsBuildingsPersianTools and UtensilsHumansHuman mattersBodyClothesFood and DrinkJewels and ValuablesMalayHumansHuman mattersBodyClothesFood and DrinkJewels and ValuablesLiterature and HistoryChamHumansHuman mattersBodyClothesFood and DrinkJewels and ValuablesLiterature and History
Language1516171819ChaptersEntriesKoreanLiterature and HistoryNumeralsSexagenary cycleDiagram namesMiscellaneous19596JapaneseColorsNumeralsDirectionsMiscellaneous18566MongolianNumeralsMiscellaneous16716TibetanLiterature and HistoryColorsNumeralsMiscellaneous18749UyghurDirectionsMiscellaneousJewels and ValuablesColorsNumerals19839PersianLiterature and HistoryColorsNumeralsMiscellaneous18674MalayColorsNumeralsMiscellaneous17482ChamColorsNumeralsMiscellaneous17601
Note: The numbered columns indicate the order of chapter titles within each language section\.

### 2\.3Chinese in Huitongguanxi Huayiyiyu

It is important to examine the diachronic stage and dialectal affiliation of the Chinese used in HHY\. Because HHY does not provide explicit bibliographical information concerning its initial compilation or publication, these issues cannot be resolved through external documentation alone\. Instead, the linguistic character of the Chinese employed in HHY must be inferred from a combination of institutional context, internal linguistic evidence, and comparison with contemporaneous phonological sources\.

The chronological range of HHY can be established by reference to two external anchor points\. The upper limit is set at 1408, the year in which thehuìtóngguǎn會同館, the institution responsible for compiling and publishing HHY, was officially established\. Since HHY was produced as a textbook for the institution’s interpreter\-training activities, its compilation must postdate the institutional stabilization of thehuìtóngguǎn會同館\. The lower limit is provided by the London version of HHY, the latest extant manuscript, which was republished in 1549\[[44](https://arxiv.org/html/2605.14480#bib.bib44)\]\. Together, these two dates define a general window between 1408 and 1549 for the initial compilation and publication of HHY\.

Within this general range, the possible dates of compilation for individual language chapters can be further refined on the basis of linguistic characteristics and relevant extralinguistic evidence\. The Korean chapter appears to conform to the general upper limit, but allows a higher lower bound, as the linguistic features attested in the wordlist suggest a compilation date no later than the mid\-fifteenth century\[[23](https://arxiv.org/html/2605.14480#bib.bib23)\]\. The Japanese chapter, by contrast, aligns with the general lower limit but admits an earlier upper limit, since the Japanese bureau within thehuìtóngguǎn會同館 was established in 1492, making an earlier compilation impossible\[[44](https://arxiv.org/html/2605.14480#bib.bib44)\]\. The Malay chapter likewise follows the general upper limit, but its lower bound can be raised to 1511, the year in which the Kingdom of Malacca was conquered by Portugal; the linguistic and cultural conditions presupposed by the Malay materials are consistent with the pre\-conquest period\[[8](https://arxiv.org/html/2605.14480#bib.bib8)\]\. In sum, all chapters of HHY were compiled and published between 1408 and 1549, although the effective range can be narrowed depending on the language\.

Having established the chronological framework, the dialectal affiliation of the Chinese used in HHY can now be addressed\. The Chinese employed in most chapters can be identified asguānhuà官話, a conclusion supported by both institutional and phonological evidence\. Institutionally, HHY was a government\-sponsored project compiled by and for Ming officials, a context in whichguānhuà官話 functioned as the supra\-regional spoken norm\. Phonologically, the presence oférhuà兒化 points to a northern variety of Chinese, asérhuà兒化 is not characteristic of southern dialects\. In addition, the systematic loss of stop codas /\-p/, /\-t/, and /\-k/ further supports aguānhuà官話 affiliation, since this development is typical of northern Chinese varieties during the relevant period\.

An important exception to this general pattern is found in the Vietnamese chapter\. Unlike the other chapters, it does not exhibitérhuà兒化 and preserves stop codas /\-p/, /\-t/, and /\-k/\. These features allow a direct alignment between the Chinese transcriptions and corresponding Vietnamese stop codas, indicating that the Chinese used in this chapter reflects a southern variety rather thanguānhuà官話\. This exception demonstrates that HHY did not apply a single Chinese dialect mechanically across all language chapters, but rather adjusted its transcriptional practices to specific linguistic contexts\.

## 3Method

### 3\.1Reconstructing the Chinese Phonological System of Huitongguanxi Huayiyiyu

As argued above, the Chinese used in HHY largely reflects Latemíng明guānhuà官話 \(LMG\)\. A methodological difficulty arises, however, from the absence of phonological resources that directly reconstruct LMG as a unified system\. The available materials instead span two adjacent periods in the history of Chinese,zhōnggǔ中古 andjìngǔ近古, necessitating a reconstruction strategy that can accommodate this transitional status\. While there are differences among scholars, the history of Chinese is usually divided into four periods as below:

Table 5Periodization of Chinese linguistic history

ShànggǔZhōnggǔJìngǔXiàndàiOldMiddleEarly ModernModern
3C14C20C

Although HHY belongs chronologically to thejìngǔ近古 period, it is very close to the end ofzhōnggǔ中古\. For this reason, phonological sources associated withzhōnggǔ中古 cannot be excluded from the analysis\. This is particularly relevant becausezhōngyuányīnyùn中原音韻 \(1324\), one of the most comprehensive phonological descriptions of spoken Chinese prior to the Ming period, predates HHY by only several decades and continued to exert influence during the Late Ming\.

The literature referenced in this study can be divided into two groups\. The first consists of historical phonological sources, includingzhōngyuányīnyùn中原音韻,chóngdìng sīmǎwēngōng děngyùntújīng重訂司馬溫公等韻圖經,sìshēngtōngkǎo四聲通考,yùnlüèyìtōng韻略易通, andxīrúěrmùzī西儒耳目資\. The second group comprises modern reconstructions based on these historical materials, such as\[[10](https://arxiv.org/html/2605.14480#bib.bib10),[28](https://arxiv.org/html/2605.14480#bib.bib28),[42](https://arxiv.org/html/2605.14480#bib.bib42),[20](https://arxiv.org/html/2605.14480#bib.bib20),[25](https://arxiv.org/html/2605.14480#bib.bib25)\]\. Among all the literature, our alignment heavily relied on two of them:zhōngyuányīnyùn中原音韻 and\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\]\.

To reconstruct the Chinese phonological system underlying HHY, this study adopts a two\-step alignment procedure\. In the first step, all Chinese characters used in HHY were aligned with the phonological categories ofzhōngyuányīnyùn中原音韻, which was selected as the initial reference due to its extensive coverage of Chinese characters and its explicit orientation toward spoken forms\. Theshēngmǔ聲母 andyùnmǔ韻母 systems ofzhōngyuányīnyùn中原音韻 thus serve as the baseline for this initial alignment, with phonetic values reconstructed in IPA following previous studies\[[28](https://arxiv.org/html/2605.14480#bib.bib28),[42](https://arxiv.org/html/2605.14480#bib.bib42),[20](https://arxiv.org/html/2605.14480#bib.bib20),[25](https://arxiv.org/html/2605.14480#bib.bib25)\]\. In the second step, this preliminary alignment was systematically modified in accordance with\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\], which reconstructs Late Ming Guanhua on the basis of multiple historical sources and therefore more closely reflects the linguistic stage represented in HHY\. Because the phonological systems documented inzhōngyuányīnyùn中原音韻 underwent a series of changes between the fourteenth and seventeenth centuries, many of which were still in progress during the HHY period, these changes were treated not as categorical shifts but as transitional tendencies in the alignment process\.

More specifically, Table[3\.1](https://arxiv.org/html/2605.14480#S3.SS1)is theshēngmǔ聲母 system ofzhōngyuányīnyùn中原音韻\. The phonetic value was reconstructed in IPA in the square brackets based on previous studies\[[28](https://arxiv.org/html/2605.14480#bib.bib28),[42](https://arxiv.org/html/2605.14480#bib.bib42),[20](https://arxiv.org/html/2605.14480#bib.bib20),[25](https://arxiv.org/html/2605.14480#bib.bib25)\]\.

Table 6Theshēngmǔ聲母 system ofzhōngyuányīnyùn中原音韻

幫 bang \[p\]滂 pang \[ph\]明 ming \[m\]非 fei \[f\]微 wei \[m\]端 duan \[t\]透 tou \[th\]泥 ni \[n\]來 lai \[l\]精 jing \[ts\]清 qing \[tsh\]心 xin \[s\]章 zhang \[t\\ipafontʃ\]昌 chang \[t\\ipafontʃh\]山 shan \[\\ipafontʃ\]日 ri \[\\ipafontʒ\]見 jian \[k\]溪 xi \[kh\]疑 yi \[\\ipafontŋ\]曉 xiao \[x\]云 yun \[0\]

The above system underwent five subsequent changes\[[20](https://arxiv.org/html/2605.14480#bib.bib20),[22](https://arxiv.org/html/2605.14480#bib.bib22)\]\. First, theyí\-initial \(yímǔ疑母\) was lost by the mid\-fifteenth century\. Second, thewēi\-initial \(wēimǔ微母\) merged into the zero\-initial \(língshēngmǔ零聲母\), realized as \[w\], by the seventeenth century\. Third, the zhengchi\-series \(zhěngchǐyīn整齒音\) became retroflexed after the sixteenth century\. Fourth, thejiàn\-series \(jiànxì見系\) and thejīng\-series \(jīngxì精系\) were palatalized by the eighteenth century at the latest\. With the exception of the loss of theyí\-initial, these changes were not yet fully completed and were still in progress during the HHY period\.

Next, Table[3\.1](https://arxiv.org/html/2605.14480#S3.SS1)is theyùnmǔ韻母 system ofzhōngyuányīnyùn中原音韻\. The phonetic value was reconstructed in IPA in the square brackets based on previous studies\[[28](https://arxiv.org/html/2605.14480#bib.bib28),[42](https://arxiv.org/html/2605.14480#bib.bib42),[20](https://arxiv.org/html/2605.14480#bib.bib20),[25](https://arxiv.org/html/2605.14480#bib.bib25)\]\.

Table 7The yunmu system ofZhōngyuán Yīnyùn

開口 kaikou齊齒 qichi合口 hekou撮口 cuokou東鐘 dongzhong\[\\ipafontuŋ\] /\\ipafontuŋ/\[\\ipafontiuŋ\] /\\ipafontjuŋ/江陽 jiangyang\[\\ipafontaŋ\] /\\ipafontaŋ/\[\\ipafontiaŋ\] /\\ipafontjaŋ/\[\\ipafontuaŋ\] /\\ipafontwaŋ/支思 zhisi\[\\ipafontɿ\] /\\ipafontï/\[\\ipafonti\] /\\ipafontjəj/齊微 qiwei\[\\ipafontəi\] /\\ipafontəj/\[\\ipafontuəi\] /\\ipafontwəj/魚模 yumu\[\\ipafontu\] /\\ipafontu/\[\\ipafontiu\] /\\ipafontju/皆來 jielai\[\\ipafontai\] /\\ipafontai/\[\\ipafontiai\] /\\ipafontjaj/\[\\ipafontuai\] /\\ipafontwaj/眞侵 zhenqin\[\\ipafontən\] /\\ipafontən/\[\\ipafontin\] /\\ipafontjən/\[\\ipafontuən\] /\\ipafontwən/\[\\ipafontiuən\] /\\ipafontjwən/寒山 hanshan\[\\ipafontan\] /\\ipafontan/\[\\ipafontian\] /\\ipafontjan/\[\\ipafontuan\] /\\ipafontwan/先天 xiantian\[\\ipafontiɛn\] /\\ipafontjen/\[\\ipafontiuɛn\] /\\ipafontjwen/桓歡 huanhuan\[\\ipafontɔn\] /\\ipafonton/\[\\ipafontuɔn\] /\\ipafontwon/蕭豪 xiaohao\[\\ipafontau\] /\\ipafontaw/\[\\ipafontiau\] /\\ipafontjaw/歌戈 gege\[\\ipafontɔ\] /\\ipafonto/\[\\ipafontiɔ\] /\\ipafontjo/\[\\ipafontuɔ\] /\\ipafontwo/家麻 jiama\[\\ipafonta\] /\\ipafonta/\[\\ipafontia\] /\\ipafontja/\[\\ipafontua\] /\\ipafontwa/車遮 chezhe\[\\ipafontiɛ\] /\\ipafontjə/\[\\ipafontiuɛ\] /\\ipafontjwə/庚靑 gengqing\[\\ipafontəŋ\] /\\ipafontəŋ/\[\\ipafontiəŋ\] /\\ipafontjəŋ/\[\\ipafontuəŋ\] /\\ipafontwəŋ/\[\\ipafontiuəŋ\] /\\ipafontjwəŋ/尤侯 youhou\[\\ipafontəu\] /\\ipafontəw/\[\\ipafontiəu\] /\\ipafontjəw/侵尋 qinxun\[\\ipafontəm\] /\\ipafontəm/\[\\ipafontiəm\] /\\ipafontjam/監咸 jianxian\[\\ipafontam\] /\\ipafontam/\[\\ipafontiam\] /\\ipafontjam/廉纖 lianxian\[\\ipafontiɛm\] /\\ipafontjem/

The system in Table[3\.1](https://arxiv.org/html/2605.14480#S3.SS1)underwent four subsequent changes\[[20](https://arxiv.org/html/2605.14480#bib.bib20),[22](https://arxiv.org/html/2605.14480#bib.bib22)\]\. First, the coda /\-m/ merged into /\-n/\. This merger had already begun by the time ofzhōngyuányīnyùn中原音韻 and was completed before the sixteenth century\. Second, thehuánhuān\-rhyme \(桓歡韻\) /on/ and thexiāntiān\-rhyme \(先天韻\) /en/ lost their distinction during the sixteenth century\. Third, theérhuà\-rhyme \(兒化韻\) developed in northern dialects between the fifteenth and sixteenth centuries, as a result of which r\-initial characters in thezhīsī\-rhyme \(支思韻\), originally pronounced as \[\\ipafontʐɨ\], began to be realized as \[ər\]\. Fourth, thegēgē\-rhyme \(歌戈韻\) /o/ and thechēzhē\-rhyme \(車遮韻\) /e/ merged into \[\\ipafontɤ\]\. With the exception of the /\-m/ to /\-n/ merger, these changes were still in progress during the HHY period, although their effects are already observable in the HHY data\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\]\.

### 3\.2Selecting reliable secondary resources

In this study, rather than reconstructing the HHY materials anew, we make extensive use of previous scholarship, that is, secondary sources\. The reconstructions in the secondary sources are treated not merely as reference points, but as analytically substantive data that make it possible to examine HHY transcription practices on a broader, cross\-linguistic scale\. Core secondary references were selected according to three criteria\. First, priority was given to studies that reconstructed an entire language section \(i\.e\., a full language section\), rather than focusing on individual or partial lexical items\. Second, the study had to provide explicit phonetic reconstructions of the HHY entries, rather than relying solely on orthographic interpretation\. Third, the reconstruction needed to demonstrate careful consideration of both the grammatical structure of the target language and the phonological constraints imposed by Chinese transcription practices\. On the basis of these criteria, the core references listed in Table[3\.2](https://arxiv.org/html/2605.14480#S3.SS2)were selected for each language section\.

Table 8Previous reconstructions and their Chinese phonological references

LanguageReconstructionVersionChinese PhonologyKorean\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\]InabaZhōngyuányīnyùn
Wēngōng Děngyùn Tújīng
Sìshēngtōngkǎo
YùnlüèyìtōngJapaneseMatsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]Nankaido
London
Inaba
SeikadoXīrúĚrmùzīVietnameseChen \[[2](https://arxiv.org/html/2605.14480#bib.bib2)\]NankaidoZhōngyuányīnyùn;Yùnlüèyìtōng
ZhōnghuáxīnyùnTibetanNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]
Saita \[[34](https://arxiv.org/html/2605.14480#bib.bib34)\]SeikadoWēngōng Děngyùn TújīngUyghurShōgaito \[[37](https://arxiv.org/html/2605.14480#bib.bib37)\]Nankaido
Seikado
LondonWēngōng Děngyùn TújīngMongolianOchi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]Seikado
Nankaido
InabaWēngōng Děngyùn TújīngPersianHonda \[[13](https://arxiv.org/html/2605.14480#bib.bib13)\]LondonWēngōng Děngyùn TújīngMalayEdwards and Blagden \[[8](https://arxiv.org/html/2605.14480#bib.bib8)\]LondonGiles \[[10](https://arxiv.org/html/2605.14480#bib.bib10)\]ChamEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]LondonGiles \[[10](https://arxiv.org/html/2605.14480#bib.bib10)\]

Although each of the selected studies in Table[3\.2](https://arxiv.org/html/2605.14480#S3.SS2)provides reconstructions of the phonetic values represented by Chinese characters in HHY, they rely on different reference systems for Chinese phonology\. Among these,sīmǎ wēngōng děngyùn tújīng重訂司馬溫公等韻圖經 \(1606\), which is cited most frequently across the selected studies, is a traditional Chinese phonological chart \(yùntú韻圖, reflecting northern Guanhua pronunciation of the early seventeenth century\. As noted in\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\], this work serves as a major reference for reconstructing the northern phonological system of the fifteenth and sixteenth centuries, second only tozhōngyuányīnyùn中原音韻\. Meanwhile,yùnlüèyìtōng韻略易通 \(1442\), a fifteenth\-century source, preserves checked syllable codas \(\-p, \-t, \-k\) and maintains a split within theyúmú\-rhyme \(yúmúyùn魚模韻\), features that distinguish it fromzhōngyuányīnyùn中原音韻\. While these characteristics limit its applicability to certain aspects of the phonological system, the work was nonetheless consulted as a supplementary reference for issues other than initial consonants\. In the case of the Vietnamese section, whoseyùnmǔ韻母 reflects southern phonological characteristics distinct from those of other language sections, such southern\-oriented rhyme books could function as primary reference sources\.

A\. Japanese sectionrìběnguǎnyìyǔ日本館譯語

Research on the Japanese section,rìběnguǎnyìyǔ日本館譯語 has advanced considerably in Japanese linguistics, andWatanabe \[[43](https://arxiv.org/html/2605.14480#bib.bib43)\]already provided a reliable reconstruction of the entire lexical inventory at an early stage of the field\. Among the various sections of thehuáyíyìyǔ華夷譯語 examined in this study, the Japanese section therefore constitutes one of the most dependable points of comparison\.

The principal reference adopted here isMatsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]\. Building on earlier reconstructions,Matsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]differ from previous studies in that they provide phonetic reconstructions using the International Phonetic Alphabet \(IPA\)\. This is especially relevant because HHY, unlike other versions ofhuáyíyìyǔ華夷譯語, records spoken forms rather than written norms\. It should therefore be approached from a phonetic perspective\. In this respect,Matsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]offer a particularly useful basis for comparison\.

This does not mean, however, that the analysis can rely exclusively onMatsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]\. Scholarly views do not agree on every lexical item in the Japanese section, and several important transcription characters remain disputed\. Accordingly, this study takesMatsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]as its primary point of reference, while also consulting kana\-based reconstructions and other previous studies where necessary\[[44](https://arxiv.org/html/2605.14480#bib.bib44),[13](https://arxiv.org/html/2605.14480#bib.bib13),[15](https://arxiv.org/html/2605.14480#bib.bib15)\]\. In addition, relevant research on Middle Japanese is used to further evaluate and cross\-check the proposed interpretations\.

B\. Vietnamese sectionānnánguǎnyìyǔ安南館譯語

Comprehensive research on the Vietnamese section,ānnánguǎnyìyǔ安南館譯語, was conducted in Japan primarily by Chen Ching\-ho in a series of studies\[[2](https://arxiv.org/html/2605.14480#bib.bib2),[3](https://arxiv.org/html/2605.14480#bib.bib3),[4](https://arxiv.org/html/2605.14480#bib.bib4),[5](https://arxiv.org/html/2605.14480#bib.bib5),[6](https://arxiv.org/html/2605.14480#bib.bib6),[7](https://arxiv.org/html/2605.14480#bib.bib7)\]\. These studies examine the Vietnamese section from multiple perspectives, ranging from bibliographical issues to the phonological system reflected in the transcribed Vietnamese forms\. Because they provide a detailed and comprehensive treatment of the material, they remain the most authoritative studies on the Vietnamese section to date, and their major conclusions have not been seriously challenged in subsequent scholarship\.

One limitation, however, is that the reconstruction of lexical items in the Vietnamese section has not been extensively reexamined by a broad range of scholars\. To compensate for this relative lack of independent verification, the present study additionally consults previous research on the history of Vietnamese phonology\.

Furthermore, the Vietnamese section is used in the present study only as a comparative source for investigating the phonetic values of fifteenth\-century Korean initial consonants reflected in the Korean section\. According toChen \[[3](https://arxiv.org/html/2605.14480#bib.bib3)\], the Vietnamese section differs from several other sections, including the Korean section, in that its use of finals shows a particularly strong influence from southern varieties of Chinese phonology\.

C\. Tibetan sectionxīfānguǎnyìyǔ西番館譯語

Reconstruction of the lexical items in the Tibetan section,xīfānguǎnyìyǔ西番館譯語, has primarily been carried out by two Japanese scholars\.Nishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]provided the first comprehensive study of the Tibetan section\. In addition to bibliographical analysis,Nishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]reconstructed the literary and colloquial Tibetan forms corresponding to individual entries on the basis of modern Tibetan dialects, written Tibetan, and separate Tibetan glossaries\. Through this approach, he clarified both the phonological and lexical characteristics of the Tibetan reflected in the Tibetan section\.

One limitation ofNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\], however, is that the study relied exclusively on the Awa manuscript tradition\. Subsequently,Saita \[[34](https://arxiv.org/html/2605.14480#bib.bib34)\]largely followedNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]’s reconstruction of Tibetan forms while revising portions of the analysis on the basis of the Seikadō manuscript\. The present study therefore adoptsNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]as its principal reference while incorporating revisions proposed inSaita \[[34](https://arxiv.org/html/2605.14480#bib.bib34)\]when they are supported by comparison between different manuscripts\.

Although the Tibetan section has thus been reconstructed by two scholars, its interpretations cannot yet be regarded as sufficiently verified\. To compensate for this limitation, the present study additionally consults a broader range of scholarship on Tibetan historical phonology\.

D\. Uyghur sectionwèiwúérguǎnyìyǔ畏兀兒館譯語

Comprehensive research on the Uyghur section,wèiwúérguǎnyìyǔ畏兀兒館譯語, has primarily been conducted by a single Japanese scholar\. Reconstruction of the complete lexical inventory was carried out only inShōgaito \[[37](https://arxiv.org/html/2605.14480#bib.bib37)\]\. In an earlier study,Shōgaito \[[36](https://arxiv.org/html/2605.14480#bib.bib36)\]had already discussed the linguistic character of the Turkic language reflected in the Uyghur section, although without reconstructing the complete lexical inventory\.

InShōgaito \[[37](https://arxiv.org/html/2605.14480#bib.bib37)\], the colloquial Uyghur forms corresponding to individual entries were reconstructed on the basis of modern Uyghur dialects, Mongolic languages, written Mongolian, and other Chinese transcription materials of Uyghur\. Through this comparative approach, the study provided a detailed account of the phonological characteristics of the Uyghur reflected in the Uyghur section\. The major conclusions ofShōgaito \[[37](https://arxiv.org/html/2605.14480#bib.bib37)\]have not been seriously challenged in subsequent scholarship\.

One limitation, however, is that the reconstructed lexical items in the Uyghur section have not undergone extensive verification by a broad range of scholars\. To compensate for this limitation, the present study additionally consults broader research on Uyghur and Mongolic historical phonology\.

E\. Mongolian sectiondádánguǎnyìyǔ韃靼館譯語

Comprehensive research on the Mongolian section,dádánguǎnyìyǔ韃靼館譯語, has primarily been conducted byOchi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]\. In addition to bibliographical issues such as textual format and relationships among manuscript traditions,Ochi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]also examined the phonological characteristics of the Mongolian reflected in the text in considerable detail\. Rather than relying exclusively on reconstructed Chinese readings,Ochi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]approached the material with full consideration of the character of theHuitongguan\-typeHuayiyiyuas a transcription of spoken language\. Through detailed comparison with written Mongolian and modern Mongolian dialects,Ochi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]clarified important aspects of the phonological system reflected in the Mongolian section\.

One limitation, however, is that the lexical reconstructions proposed for the Mongolian section have not undergone extensive verification by a broad range of scholars\. The present study therefore additionally consults broader research on Mongolian historical phonology\.

At the same time, the Mongolian section is used in the present study only as a comparative source for investigating the transcription pattern of onsets and codas in HHY\. This is because, unlike several other major studies discussed above,Ochi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]did not provide reconstructions for the complete lexical inventory, but instead systematically identified the Mongolian sounds represented by individual Chinese transcription characters\. Also, if the reconstruction of the transcriptional system inOchi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]had been based on assumptions identical to those adopted in the present study, the analysis could have been applied more directly\. However,Ochi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]’s reconstruction of the phonological values of the transcription characters relied primarily on thechóngdìng sīmǎ wēngōng děngyùn tújīng重訂司馬溫公等韻圖經, which reflects seventeenth\-century Beijing Mandarin\. It is therefore possible that the reconstructed values of the transcription characters differ in some respects from those proposed in the present study\.

Relatively simple initials can still be reinterpreted on the basis ofOchi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]’s examples within the analytical framework adopted here\. Finals, however, are considerably more difficult to analyze in this way because of their greater structural complexity\. Moreover, since the study did not systematically examine the entire lexical inventory with respect to initials and finals, conclusions drawn from comparison with the Mongolian section should be treated with particular caution\.

F\. Persian sectionhuíhuíguǎnyìyǔ回回館譯語

Comprehensive research on the Persian section,huíhuíguǎnyìyǔ回回館譯語, has primarily been conducted by two Japanese scholars\. Substantial work on the Persian section had already begun in a series of studies by Tasaka Kōdō\[[38](https://arxiv.org/html/2605.14480#bib.bib38),[39](https://arxiv.org/html/2605.14480#bib.bib39),[40](https://arxiv.org/html/2605.14480#bib.bib40),[41](https://arxiv.org/html/2605.14480#bib.bib41)\]\. These studies, however, reconstructed only forty\-nine entries from the astronomy section\.Honda \[[13](https://arxiv.org/html/2605.14480#bib.bib13)\]later overcame this limitation by examining both theHuitongguanxiandSiyiguanxiPersian sections and reconstructing the complete lexical inventory and example phrases in Persian\. The present study refers only toHonda \[[13](https://arxiv.org/html/2605.14480#bib.bib13)\]’s reconstruction of theHuitongguan\-type Persian section\.

Although the Persian section has thus been reconstructed by two scholars, its interpretations cannot yet be regarded as sufficiently verified\. Moreover, unlike previous studies on several other sections,Honda \[[13](https://arxiv.org/html/2605.14480#bib.bib13)\]reconstructed the lexical items without providing a systematic analysis of the transcriptional system itself\. Another difficulty is that the study presents phonemic rather than phonetic reconstructions, meaning that interpretation of the actual phonetic values requires additional knowledge of historical Persian phonology\.

Despite these limitations, the Persian section remains important for the present study\. Persian permits word\-final consonant clusters and therefore provides important evidence for analyzing the phonetic value and transcribing rules of coda consonants\. To compensate for the limitations of previous research, the present study additionally consults a broad range of scholarship on Persian historical phonology\.

G\. Malay and Cham sectionsmǎnlàjiāguǎnyìyǔ滿剌加館譯語 andzhānchéngguǎnyìyǔ占城館譯語

The reconstructions of the Malay section,mǎnlàjiāguǎnyìyǔ滿剌加館譯語, and the Cham section,zhānchéngguǎnyìyǔ占城館譯語, were carried out by the same researchers\[[8](https://arxiv.org/html/2605.14480#bib.bib8),[9](https://arxiv.org/html/2605.14480#bib.bib9)\]\. Although a substantial number of transcription characters remained unresolved in these studies, they nevertheless remain the only works to attempt reconstruction of the complete lexical inventories of the Malay and Cham sections\.

At the same time,Edwards and Blagden \[[8](https://arxiv.org/html/2605.14480#bib.bib8)\]andEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]are relatively old studies, and many of their reconstructions appear to have been based primarily on modern Malay and Cham without sufficiently detailed consideration of historical Chinese phonology\. As a result, conclusions derived solely from comparison with the Malay and Cham sections cannot always be accepted without caution\.

The present study therefore supplements these earlier reconstructions through extensive consultation of additional scholarship on Malay and Cham, especially diachronic studies\. Through this approach, the study seeks to compensate as far as possible for the limitations of the existing reconstructions\.

Based on the characteristics of each section and the previous scholarship reviewed above, Table[3\.2](https://arxiv.org/html/2605.14480#S3.SS2)summarizes which features of each section are employed for comparison with the Korean section in the present study\. In Table[3\.2](https://arxiv.org/html/2605.14480#S3.SS2), the symbol ‘∘\\circ’ indicates relatively high reliability, ‘△\\triangle’ indicates relatively limited reliability, and ‘×\\times’ indicates that the material is considered too unreliable to be included in the analysis\.

Table 9Reliability of previous reconstructions as references for reconstructing the HHY transcription system in onset, nucleus, and coda

KoreanJapan\.Viet\.TibetanUyghurMongol\.PersianMalayChamOnset∘\\circ∘\\circ∘\\circ∘\\circ∘\\circ∘\\circ∘\\circ△\\triangle△\\triangleNucleus∘\\circ∘\\circ×\\times∘\\circ∘\\circ×\\times∘\\circ△\\triangle△\\triangleCoda∘\\circ∘\\circ×\\times∘\\circ∘\\circ∘\\circ∘\\circ△\\triangle△\\triangle
Meanwhile, as the selected reconstructions were produced using different transcription conventions, most of them were not presented in IPA, with the exception of the Japanese, Tibetan, and Vietnamese chapters, whose reconstructions were therefore adopted as presented in the original secondary sources\. Non\-IPA symbols were systematically converted into their corresponding IPA symbols on the basis of established phonological studies of each language\. Table[3\.2](https://arxiv.org/html/2605.14480#S3.SS2)summarizes the correspondences adopted in this conversion:

Table 10Correspondences adopted in converting non\-IPA symbols into IPA across the selected reconstructions

IPAUyghurMongol\.PersianMalayCham\\ipafontjyyyyywu, v\\ipafonttɕ\\ipafontč\\ipafontč\\ipafontčch\\ipafontč\\ipafontdʑ\\ipafontǰ\\ipafontǰ\\ipafontǰj\\ipafontǰ\\ipafontʃ\\ipafontš\\ipafontš\\ipafontš\\ipafontçrrrr\\ipafontʐ\\ipafontž\\ipafontžz\\ipafontẓ\\ipafontŋngng\\ipafontɲñ\\ipafontɛai\\ipafontə\\ipafontĕ\\ipafontĕ\\ipafontɔauøöœo’\\ipafontʰhh:\-\-\-
In some cases, the phonetic value of symbols used in the original reconstructions remains ambiguous\. For example, the symbol ‘r’ does not always clearly distinguish between \[\\ipafontɾ\] and \[r\], and the symbol ‘a’ may correspond to either \[a\] or \[\\ipafontɑ\]\. Such ambiguities were retained rather than arbitrarily resolved, and their potential impact is considered in the subsequent analysis\. The revised and converted reconstruction was then parsed to be aligned with every Chinese character of HHY\. Mostly either a syllable or a phoneme was aligned with a Chinese character, while more than one syllable was also rarely aligned with a character\. Finally, the parsed reconstruction was manually digitized and aligned with HHY into a single spreadsheet\.

### 3\.3Digitization and phonological encoding of Huitongguanxi Huayiyiyu

For the purposes of systematic analysis, HHY was digitized and structured as a searchable dataset\. Only the second row of each HHY entry was manually entered into an Excel spreadsheet\. This decision was motivated by two considerations\. First, the third row is missing in a substantial number of language chapters, whereas the second row is consistently attested across all languages included in HHY\. Second, the second row represents the Chinese transcription of the target language’s lexical items, rather than metalinguistic or auxiliary information, and therefore constitutes the most direct and comparable source for phonological analysis\. When the form or order of characters differed across extant versions, a single representative character was selected on the basis of corrections proposed in previous studies or through comparison between the reconstructed target\-language form and the phonetic value of the candidate Chinese characters\. In all cases, characters were entered following the oldest available version, thereby normalizing variation across different editions\.

When a selected character was not supported by Unicode, the closest available character with an equivalent phonetic value was substituted and enclosed in square brackets\. Each transcription character was entered as a separate row\. Additional columns were used to assign indexical identifiers and to align each character with information on Chinese phonology and its IPA reconstruction\. The indexical identifier encodes both the order of the lexical entry in the original text and the position of the character within the entry \(A: word\-initial, B: word\-medial, C: word\-final\), allowing positional effects to be examined systematically\.

Each transcription character was subsequently aligned with Chinese phonological categories appropriate to the fifteenth and sixteenth centuries\. Theyīnxì音系 of all characters was first encoded according tozhōngyuányīnyùn中原音韻 and then converted to reflect phonological developments of the Late Ming period on the basis ofTōdō \[[42](https://arxiv.org/html/2605.14480#bib.bib42)\]and\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\]\.shēngmǔ聲母 categories were encoded using traditional category names, whereasyùnmǔ韻母 categories were represented in Romanized form\. This asymmetry is due to methodological considerations: while theyīnxì音系 ofshēngmǔ聲母 not explicitly discussed inTōdō \[[42](https://arxiv.org/html/2605.14480#bib.bib42)\]can be inferred relatively straightforwardly from Middle Chinese phonology, changes affectingyùnmǔ韻母 are less regular and require closer scrutiny\. Accordingly,yùnmǔ韻母 values were reconstructed with reference to bothTōdō \[[42](https://arxiv.org/html/2605.14480#bib.bib42)\]and\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\]in order to approximate fifteenth\-century Chinese as closely as possible\. For example, kaiyin and yunwei, represented as /i/ and /u/ inTōdō \[[42](https://arxiv.org/html/2605.14480#bib.bib42)\], were encoded as /j/ and /w/, respectively, following\[[20](https://arxiv.org/html/2605.14480#bib.bib20)\]\.

## 4Cross\-linguistic Analysis ofHuitongguanxi HuayiyiyuTranscription

Exceptional correspondences are not excluded simply because they are exceptional\. They are excluded only when the secondary reconstruction lacks independent historical support, depends heavily on modern forms, or conflicts with established diachronic evidence\.

### 4\.1Distribution of Main Transcription

#### 4\.1\.1Shengmu

The MT distribution of in onset position shows a high degree of regularity across the HHY corpus\. Across multiple language sections, the transcription patterns ofshēngmǔ聲母 are used in consistent ways that are not reducible to the phonological system of any single target language\. The next page’s Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)provides a preliminary working summary ofshēngmǔ聲母 transcription patterns across the eight language sections, prior to selective evaluation\. In Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1), the transcription patterns of alveolar\-seriesshēngmǔ聲母 in the Vietnamese section are not included, as they are not comparable to those observed in the other language sections, as shown in Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)\.

Table 11Correspondence patterns of selectedshēngmǔ聲母 categories

Shengmuzhào照jīng精rì日xīn心shěn審qīng淸chuān穿Sound\\ipafontʧj, tʂ, z\\ipafontʧj, t, ʂ, z\\ipafontz, ɖ, ɲ\\ipafontʂ, tʰ\\ipafontʂ, tʰ\\ipafonts\\ipafontʂ
Table 12Working summary ofshēngmǔ聲母 transcription patterns across the eight language sections

chun 脣she 舌, chi 齒ya 牙hou 喉stopVoicelessUnaspiratedAspiratedVoicedUnaspiratedAspiratedAffricateVoicelessUnaspiratedAspiratedVoicedUnaspiratedAspiratedFricativeVoicelessVoicedLiquidTrillTapLateral fricativeApproximantLateral approximantNasal

This unique distribution can be attributed to diachronic developments in Vietnamese Phonology\. In particular, Vietnamese /t/ is derived from /\*s/, and the Sino\-Vietnamese /\\ipafontčy/ reflects multipleshēngmǔ聲母, includingjīngmǔ精母,zhàomǔ照母, andqīngmǔ淸母\. For this reason, the Vietnamese section does not provide a reliable basis for identifying general HHY transcription patterns of alveolar onsets and is therefore not used in establishing phonetic values for alveolar group\.

Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)is organized as follows\. The symbols shown in the upper part of each cell largely follow the IPA conventions; among them, only ‘\\ipafontȵ’ is a non\-standard symbol\. In the lower part of each cell, theshēngmǔ聲母 used for transcription are listed without the suffixmǔ母\. When more than oneshēngmǔ聲母 corresponds to a given phonetic value, all relevantshēngmǔ聲母 are listed\. The post\-uvular stop /q/ is also transcribed withjiànmǔ見母 in the Uyghur section and the Persian section; however, this correspondence is omitted from Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)for the sake of a more concise presentation\. In any case, as discussed inAbelow, thejiànmǔ見母 transcriptions in the Uyghur and Persian sections are not retained\.

Among theshēngmǔ聲母,míngmǔ明母,nímǔ泥母,láimǔ來母,xiǎomǔ曉母, andfēimǔ非母 show transcription patterns that can be accepted without selective filtering\. For each of theseshēngmǔ聲母, the transcriptional domain is continuous, and no evidence has been identified that phonetic values within these domains were transcribed using othershēngmǔ聲母\. Their transcriptional domains are indicated in bold in the table and can be summarized as follows:míngmǔ明母 is used to transcribe bilabial nasals;nímǔ泥母 is used to transcribe nasals articulated from the alveolar to the palatal region;láimǔ來母 is used to transcribe liquid consonants;xiǎomǔ曉母 is used to transcribe fricatives articulated from the velar to the glottal region;fēimǔ非母 is used to transcribe voiceless labial fricatives\.

For the remainingshēngmǔ聲母, except formíngmǔ明母,nímǔ泥母,láimǔ來母, andxiǎomǔ曉母, selective acceptance is required\. The primary criterion adopted here is the validity of the evidence presented in previous studies that reconstructed and analyzed each language section, specifically whether those studies provide adequate grounds for reconstructing the relevant phones or phonemes\. Detailed decisions regarding acceptance are discussed below\.

A\.jiànmǔ見母 –duānmǔ端母 –bāngmǔ幇母,xīmǔ溪母 –tòumǔ透母 –pāngmǔ滂母

The distribution of stop\-seriesshēngmǔ聲母 across the language sections is presented in Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)\.

Table 13Correspondence patterns of stop\-series

Sectionjiàn見xī溪duān端tòu透bāng幇pāng滂Korean\\ipafontk\\ipafontkʰ\\ipafontt\\ipafonttʰ\\ipafontp\\ipafontpʰJapan\.\\ipafontg, k–\\ipafontt, d–\\ipafontb, p–Vietnam\.\\ipafontg, k, gʰ\\ipafontkʰ\\ipafontɖ, t\\ipafonttʰ\\ipafontb, v–Tibetan\\ipafontg, k\\ipafontkʰ\\ipafontd, t\\ipafonttʰ\\ipafontb, p\\ipafontpʰUyghur\\ipafontg, q\\ipafontk\\ipafontd\\ipafontt\\ipafontb\\ipafontpMongol\.\\ipafontg\\ipafontk\\ipafontd\\ipafontt\\ipafontb\\ipafontbPersian\\ipafontg, q\\ipafontk\\ipafontd\\ipafontt\\ipafontb\\ipafontpMalay\\ipafontg, k–\\ipafontd, t–\\ipafontb, p–Cham\\ipafontg, k\\ipafontkʰ\\ipafontd, t\\ipafonttʰ\\ipafontb, p\\ipafontpʰ
In the Mongolian section, bothbāngmǔ幇母 andpāngmǔ滂母 have been analyzed as being used to transcribe /b/\. This interpretation in the secondary literature is based on the phonological system of Written Mongolian, which has only /b/ as a bilabial stop\. Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)is the relevant examples presented inOchi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]\.

Table 14Examples from the Mongol section

Index1st row2nd rowMongolianM–1師傅把黑失\\ipafont/baɣši/ \(文\)\\ipafontbaɣši三忽兒班\\ipafont/ɣurban/ \(文\)\\ipafontɣurban
However, according to previous reconstruction of the Middle Mongolian phonological system, Middle Mongolian bilabial stops exhibited a voicing contrast\[[33](https://arxiv.org/html/2605.14480#bib.bib33), p\. 64\]\. On this basis, the reconstruction of bilabial stops proposed inOchi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]requires revision\. Although it would be desirable to present a revised analysis here,Ochi \[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]does not provide a complete list of lexical items, which makes systematic reanalysis difficult\. Accordingly, this issue is left for future research\.

Across the language sections, the transcriptional distribution ofjiànmǔ見母 –duānmǔ端母 –bāngmǔ幇母 andxīmǔ溪母 –tòumǔ透母 –pāngmǔ滂母 falls into two major types\. First, when the target language has an aspirated stop series, unaspirated stops are transcribed withjiànmǔ見母 –duānmǔ端母 –bāngmǔ幇母, whereas aspirated stops are transcribed withxīmǔ溪母 –tòumǔ透母 –pāngmǔ滂母\. This pattern is observed primarily in the Cham, Tibetan, and Vietnamese sections\. Among these, the Vietnamese section is excluded from further consideration, as the Vietnamese reflected in this section lacks voiceless bilabial stops, which limits its usefulness for identifying general stop transcription patterns\.

Second, when the target language lacks an aspirated stop series, two subpatterns are observed\. In one subpattern, voiced stops are transcribed withjiànmǔ見母 –duānmǔ端母 –bāngmǔ幇母, while voiceless stops are transcribed withxīmǔ溪母 –tòumǔ透母 –pāngmǔ滂母\. This pattern is observed in the Mongolian, Uyghur, and Persian sections\. In the other subpattern, both voiced and voiceless stops are transcribed withjiànmǔ見母 –duānmǔ端母 –bāngmǔ幇母\. This pattern is observed in the Korean, Malay, and Japanese sections\.

B\.xīnmǔ心母 –qīngmǔ淸母,shěnmǔ審母 –chuānmǔ穿母

The distribution of the relevant fricative\-seriesshēngmǔ聲母 across the language sections is presented in Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)\.

Table 15Correspondence patterns of fricatives and affricates

Sectionxīn心shěn審qīng淸chuān穿Korean\\ipafonts\\ipafontʃ\\ipafontʦʰ\\ipafontʧʰJapan\.\\ipafonts\\ipafontʃ––Tibetan\\ipafonts, z, ɕ, ʑ\\ipafontʂ\\ipafontʧ\\ipafontʨʰ, tʂʰUyghur\\ipafonts\\ipafontʃ–\\ipafontʧMongol\.\\ipafonts\\ipafontʃ–\\ipafontʧPersian\\ipafonts\\ipafontʃ–\\ipafontʧMalay\\ipafonts–––Cham\\ipafonts\\ipafonts–\\ipafontdʑ, dʑʰ
Several points are observed in the transcription patterns shown in Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)\. First, the transcription pattern ofxīnmǔ心母 in the Tibetan section differs from those observed in the other language sections\. Second, the transcription patterns ofshěnmǔ審母 andchuānmǔ穿母 in the Cham section differ from those observed in the other language sections\. Third,qīngmǔ淸母 is rarely used\.

First, the basis for analysis ofNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]thatxīnmǔ心母 in the Tibetan section was used to transcribe not only /s/ but also /z/, /\\ipafontɕ/, and /\\ipafontʑ/ is unclear\. These values were inferred with reference to Written Tibetan\. In addition, /z/ has already merged with /s/ in the modern Lhasa dialect, and /\\ipafontɕ/ and /\\ipafontʑ/ are not directly attested as such in any modern Tibetan dialect\. In this study, priority is therefore given to the transcription patterns observed in the other language sections, and the claim thatxīnmǔ心母 in the Tibetan section was also used to transcribe /z/, /\\ipafontɕ/, and /\\ipafontʑ/ is not adopted\.

Next, the only source that reconstructs the Cham section,Edwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\], largely reproduces modern Cham, and its principal reference, Aymonier and Cabaton \(1906\), is a twentieth\-century Cham dictionary\. As a result, the phonetic value of items transcribed as s must be traced back on the basis of Proto\-Chamic and modern dialect evidence\. On this basis, the s transcribed withshěnmǔ審母 may have been closer to \[\\ipafontʃ\] in actual pronunciation\. For this reason, the Cham section is excluded from the interpretation of the transcription pattern ofshěnmǔ審母\.

Finally, if the symbols reconstructed inEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]are taken at face value,chuānmǔ穿母 in the Cham section would be interpreted as transcribing /j/ and /jh/\. Following Aymonier & Cabaton’s \(1906\) description that /j/ corresponds to the Serbo\-Croatian /\\ipafontđ/ and that /jh/ represents a more strongly aspirated variant of /j/, these symbols are rendered here as /d\\ipafontʑ/ and d\\ipafontʑʰ\. However, as noted above,Edwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]reproduces modern Cham without a historical\-linguistic framework, and Aymonier & Cabaton \(1906\) is likewise a twentieth\-century dictionary\. Given that the reconstructions /d\\ipafontʑ/ and /d\\ipafontʑʰ/ diverge from the transcription patterns ofchuānmǔ穿母 observed in the other language sections, they are not adopted here\.

C\.jīngmǔ精母 –zhàomǔ照母 –rìmǔ日母

Table 16Correspondence patterns of the rest

Sectionjīng精zhào照rì日Korean\\ipafontʦ, z\\ipafontʧ\\ipafontʒJapan\.\\ipafontʦ, ʣ\\ipafontʧ\\ipafontʒTibetan\\ipafontȶ, ȡ, ʦ, ʣ\\ipafonttʂ, dʐ, ʨ, ʥ\\ipafontʐUyghur\\ipafontz\\ipafontʤ\\ipafontʒMongol\.\\ipafontz\\ipafontʤ–Persian\\ipafontz\\ipafontʤ–Malay–\\ipafontʧ, ʤ–Cham\\ipafontʨ\\ipafontdʑ–
Two points are observed in the transcription patterns shown in Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)\. First, the transcription patterns ofjīngmǔ精母 andzhàomǔ照母 in the Tibetan section differ from those observed in the other language sections\. Second, the transcription pattern ofjīngmǔ精母 in the Cham section also differs from those observed in the other language sections\.

First, the analysis inNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\], which claims thatjīngmǔ精母 in the Tibetan section was used to transcribe /\\ipafontȶ/ and /\\ipafontȡ/, and thatzhàomǔ照母 was used to transcribe /\\ipafontʨ/ and /\\ipafontʥ/, cannot be adopted\. These values were inferred on the basis of Written Tibetan\. In modern Tibetan dialects, however, such reflexes are largely confined to the Lhasa dialect and are otherwise reflected as distinct phonemes\. While such correspondences might be considered if identical transcription patterns were observed in other language sections, the evidence presented inNishida \[[30](https://arxiv.org/html/2605.14480#bib.bib30)\]alone is insufficient to establish correspondences betweenjīngmǔ精母 and /\\ipafontȶ,\\ipafontȡ/, or betweenzhàomǔ照母 and /\\ipafontʨ,\\ipafontʥ/\.

One might attempt to support a correspondence betweenzhàomǔ照母 and /\\ipafontʥ/ on the basis of the Cham section, but the reconstructions proposed inEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]are difficult to adopt\. The source on whichEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\]relies, Aymonier & Cabaton \(1906\), explains the phonetic values of its symbols by analogy with Serbo\-Croatian rather than by providing explicit phonetic descriptions\. For example, the symbol c inEdwards and Blagden \[[9](https://arxiv.org/html/2605.14480#bib.bib9)\], reconstructed here as /\\ipafontʨ/, is described only as a highly palatalized prepalatal sound, with a pronunciation intermediate between the ‘ti’ of ‘tiare’ and the ‘qui’ of ‘inquiet’ cf\. Serbo\-Croatian /\\ipafontć/\. On this basis, it is not possible to determine whether the intended value was \[\\ipafontʨ\] or \[\\ipafontʧ\]\. As with the Tibetan case, such correspondences might be considered if they were independently supported by other language sections; however, the available evidence does not provide sufficient grounds for accepting thatjīngmǔ精母 transcribed /\\ipafontʨ/ or thatzhàomǔ照母 transcribed /\\ipafontʥ/\.

The transcription patterns retained after this selection are summarized in Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)\. From the contents of Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1), transcription patterns not adopted in this study have been removed, while those adopted have been retained\. Table[4\.1\.1](https://arxiv.org/html/2605.14480#S4.SS1.SSS1)is the final summary ofshēngmǔ聲母 transcription patterns across the eight language sections, after selective evaluation\.

Table 17Working summary ofshēngmǔ聲母 transcription patterns across the eight language sections

chun 脣she 舌, chi 齒ya 牙hou 喉stopVoicelessUnaspiratedAspiratedVoicedUnaspiratedAspiratedAffricateVoicelessUnaspiratedAspiratedVoicedUnaspiratedAspiratedFricativeVoicelessVoicedLiquidTrillTapLateral fricativeApproximantLateral approximantNasal

#### 4\.1\.2Yunmu

The MT distribution in rime position shows a high degree of regularity across the HHY corpus\. Across multiple language sections, the transcription patterns ofyùnmǔ韻母 are used in consistent ways that are not reducible to the phonological system of any single target language\. Table[4\.1\.2](https://arxiv.org/html/2605.14480#S4.SS1.SSS2)provides an preliminary working summary ofyùnmǔ韻母 transcription patterns across the eight language sections, prior to selective evaluation\.

Table 18Working summary ofyùnmǔ韻母 transcription patterns across the eight language sections

Section/a//ə//jə//ï//jï//wo//u//\-n//\-ŋ/Koreana,\\ipafontʌ\\ipafontʌ,\\ipafontɨ, əi\\ipafontɨiounŋJapaneseaa, oii, uiounŋTibetanaa, oiiiounŋUyghuraä, ai–iounŋPersiana,\\ipafontaːai–iou,\\ipafontuːnŋMalayaə, oiii–unŋChama\\ipafontɔw, ai–i,\\ipafontiː–unŋ
Theyùnmǔ韻母 /jə/, /jï/, /wo/, /u/, as well as the codas /\-n/ and /\-ŋ/, show stable transcription patterns that do not require selective evaluation\. These items are indicated in bold in the table\. Theyùnmǔ韻母 /jə/ and /jï/ are used to transcribe the front high vowel ‘i\.’ Theyùnmǔ韻母 /wo/ is used to transcribe the back mid vowel ‘o,’ and /u/ is used to transcribe the back high vowel ‘u\.’ Finally, theyùnwěi韻尾 /\-n/ and /\-ŋ/ are used to transcribe ‘n’ and ‘ŋ,’ respectively\. By contrast, the transcription patterns of theyùnmǔ韻母 /a/, /ə/, and /ï/ require further examination\. At first glance, the transcription pattern of /a/ appears straightforward; however, closer look shows that this is not the case\. Across the language sections, vowels transcribed with /a/ are consistently represented as ‘a,’ but their precise phonetic values cannot be determined with certainty\. Based on the secondary resources consulted in this study, it is nevertheless clear that these vowels belonged to the low vowel range\.

A\.yùnmǔ韻母 /a/ \- /ə/

Table 19:Distribution of vowel correspondences across language sectionsSectionəaouä\\ipafontɔwTotalKoreanJapan\-635513\-\-144Tibetan859326238Uyghur7126181242Persian230\-8\-277Malay45318787Cham\-37\-945126In Table[19](https://arxiv.org/html/2605.14480#S4.T19), Total refers to the number of transcription characters excluding ST and unresolved items\. The table does not present all transcription patterns, but only the two most frequent correspondences\. However, the case of u was exceptionally included because it appears across the transcription patterns of all the Yiyu materials\.

The most noteworthy case in Table[19](https://arxiv.org/html/2605.14480#S4.T19)is the those from Malay section\. Theyùnmǔ韻母 /ə/ was not only used to transcribe the vowel /ə/ in Malay, but also shows a genuine one\-to\-one correspondence with it\. Although it was occasionally used to transcribe /o/ as well, this correspondence is less significant, since /o/ does not appear to have belonged to the phonemic inventory of native Malay vocabulary\. The transcription pattern in Malay section therefore strongly suggests that the transcriptional value of theyùnmǔ韻母 /ə/ was close to \[ə\]\.

This perspective also allows us to narrow the transcriptional range of theyùnmǔ韻母 /a/\. In the Japanese, Tibetan, and Uyghur sections, the vowel a transcribed by theyùnmǔ韻母 /a/ was also sometimes transcribed with theyùnmǔ韻母 /ə/\. Moreover, the Chinese rhyme /ə/ was used not only for a, but also for relatively back vowels such as o, u, and ä\. Since the transcriptional value of theyùnmǔ韻母 /ə/ has been identified as \[ə\], the transcriptional range of theyùnmǔ韻母 /a/ must have been located further back than \[ə\]\.

Table 20:Distribution of front vowel correspondences across language sectionsSectioniuTotalKoreanJapan442873Tibetan37\-37Uyghur\-5Persian\-3Malay\-16Cham\-3Among the vowel correspondences, the transcription patterns found in the Japanese section are particularly informative\. Although theyùnmǔ韻母 /ï/ does not show a consistent one\-to\-one correspondence with either i or u, it was used to transcribe both vowels\. This suggests that its transcriptional value was likely located somewhere between the two\.

It should also be noted that many language sections could not be included in the present discussion\. However, this does not mean that theyùnmǔ韻母 /ï/ was entirely absent from their transcription systems\. The reason these sections are marked only with “–” is thatyùnmǔ韻母 /ï/ appears exclusively in ST rather than in the main transcriptions\. A similar tendency can be observed in the Tibetan section, where theyùnmǔ韻母 /ï/ occurs more frequently in ST than in the transcription of i itself\. These ST\-only uses likely reflect meaningful aspects of the transcription system, although a full discussion of their implications is beyond the scope of the present study\.

Based on the findings discussed so far, Figure[1](https://arxiv.org/html/2605.14480#S4.F1)summarizes the phonetic values or phonetic ranges represented by eachyùnmǔ韻母 in the transcription system ofhuáyíyìyǔ華夷譯語\.

![Refer to caption](https://arxiv.org/html/2605.14480v1/figure1_yunmu.png)Figure 1:Approximate phonetic ranges of the Chineseyunmuin HHY transcriptionThe shaded area in the upper part of the Figure 1 represents the transcriptional range of theyùnmǔ韻母 /ï/, while the shaded area in the lower part represents the transcriptional range of theyùnmǔ韻母 /a/\. The symbols in parentheses \- \(i,\\ipafontɨ, u, e, ə, o, a,\\ipafontɑ\) \- are IPA symbols indicating the phonetic values associated with each position\.

### 4\.2Distribution of Supplementary Transcription

A comprehensive summary of the ST patterns observed across the eight language sections is presented in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below\. As noted earlier, the Vietnamese section was considered only in relation to MT patterns inshēngmǔ聲母 transcription\.

Table 21Summary ofshēngmǔ聲母 transcription patterns

chun 脣she 舌, chi 齒ya 牙hou 喉BilabialLabio\-dentalDentalAlveolarPost\-alveolarAlveo\-palatalVelarPostvelarGlottalstopVoiceless\\ipafontp幇滂\\ipafontt端透\\ipafontk溪見\\ipafontq溪見Voiced\\ipafontb幇\\ipafontd端\\ipafontg見AffricateVoiceless\\ipafontʧ穿照Voiced\\ipafontʤ照日FricativeVoiceless\\ipafontf非\\ipafonts心\\ipafontʃ審\\ipafontx曉\\ipafonth曉Voiced\\ipafontz精\\ipafontʒ日\\ipafontɣ影溪見曉LiquidTrill\\ipafontr兒來Tap\\ipafontɾ兒來Lateralapproximant\\ipafontl來兒Nasal\\ipafontm明\\ipafontn/\-n/

The basic structure of Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)follows the same format as the preceding table summarizingshēngmǔ聲母 transcription patterns\. However, there are several differences\. First,ér兒 refers not to the transcription charactér兒 itself, but to the entire category oférhuà\-rhyme \(兒化韻\)\. Second, /\-n/ refers to ST characters containing theyùnwěi韻尾 /\-n/\. Although the table is based on a complete analysis of all relevant items, isolated forms that appear to reflect scribal or transcriptional errors were excluded in order to present the patterns more clearly\. Meanwhile, HHY appears to have employed a distinct set of transcription characters specifically for ST\. The transcription characters used for ST in each language section are summarized in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below\.

Table 22ST characters byshēngmǔ聲母 category

CategoryST charactersbāngmǔ幇母卜, 補, 不, 白, 䋠pāngmǔ滂母批duānmǔ端母的, 答, 得, 都tòumǔ透母忒, 剔, 惕, 禿, 帖jiànmǔ見母格, 革, 吉, 艮, 故, 果xīmǔ溪母克, 闊, 乞, 苦fēimǔ非母夫, 伏xīnmǔ心母思, 習, 糸, 西, 速, 桑shěnmǔ審母失jīngmǔ精母子, 聚, 則rìmǔ日母日chuānmǔ穿母赤, 除, 出zhàomǔ照母只, 褚xiǎomǔ曉母黑, 諕, 哈, 蛤, 吸yǐngmǔ影母額, 兒erhuayun 兒化韻兒, 二láimǔ來母力, 勒, 里, 剌, 魯, 綠, 羅, 路, 弄, 利míngmǔ明母密, 母, 木/\-n/音
However, not all transcription characters listed in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)can be regarded as genuine ST transcription characters\. As discussed above, some forms occur only in a very limited number of examples and do not appear to have been recognized as ST by the compilers themselves\. It is therefore necessary to distinguish such cases from transcription characters that were systematically used for ST in HHY\. In the present study, ST transcription characters were identified on the basis of two criteria\.

First, as the strongest criterion, we examined whether a given transcription character was used for ST in more than one language section\. Since each language section was compiled by a different bureau\(guǎn館\), it is likely that they were prepared by different compilers or working groups\. Although no direct records concerning the compilation process of HHY survive, this can be reasonably inferred even from the officials associated with each bureau listed in the London version manuscript\. If the same transcription character was independently used for ST across different language sections compiled by different individuals or institutions, it is reasonable to assume that the character functioned as an established ST transcription character at the time\.

Second, if a transcription character appeared in only a single language section, we examined whether it was used in more than one lexical item\. Even if a character occurs in many entries, if all instances involve the transcription of the same lexical morpheme, the pattern is better explained as a consequence of HHY’s tendency toward orthographic consistency rather than as evidence that the character functioned as a dedicated ST transcription character\.

A\.jiànmǔ見母 \-xīmǔ溪母 \-yǐngmǔ影母 \-xiǎomǔ曉母

In this section, we examinejiànmǔ見母 andxīmǔ溪母, which were primarily used to represent ST velar stops\. Sincejiànmǔ見母 andxīmǔ溪母 were also used, together withyǐngmǔ影母 andxiǎomǔ曉母, to represent ST\\ipafontɣ,yǐngmǔ影母 andxiǎomǔ曉母 will also be considered here\.

We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below provides a summary of the distributional patterns of these transcription characters, regardless of the specific phonemes or phones they were used to represent\.

Table 23Distribution of selected ST characters across language sections

PersianUyghurMongolianChamTotaljiàn見格11革27128吉11艮11故11果11xī溪克2242⊙\\odot64\+⊙\\odot闊22乞11苦134yǐng影額1111兒22xiǎo曉黑18⊙\\odot18\+⊙\\odot蛤⊙\\odot⊙\\odot諕4949哈11吸11
According to the criteria proposed above, the onlyjiànmǔ見母 character that can be identified as an ST transcription character isgé革\. In thexīmǔ溪母 group,kè克 qualifies as an ST character\. However,kǔ苦 should not be disregarded entirely, since it occurs in more than one language section, albeit in a limited number of examples\. As forér兒, it was primarily used in the Uyghur section to represent \[\\ipafontr\] in forms that had already undergone erhua\. Accordingly, the onlyyǐngmǔ影母 character that can be regarded as an ST transcription character isé額\. In thexiǎomǔ曉母 group,hēi黑 andxià諕 qualify as ST transcription characters\.gé蛤, by contrast, was excluded because the entire wordlist in the Mongolian section cannot be verified, making it difficult to determine whether the character was systematically used across different lexical items\. Moreover, it is not attested in any other language section\.

Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofjiànmǔ見母,xīmǔ溪母,yǐngmǔ影母, andxiǎomǔ曉母 when used for ST across the language sections\.

Table 24Correspondence patterns ofjiàn見,xī溪,yǐng影, andxiǎo曉

PersianUyghurMongolianjiàn見母\\ipafontg, q,\\ipafont–xī溪母\\ipafontk\\ipafontk, q,\\ipafontkyǐng影母\\ipafont––xiǎo曉母\\ipafonth, x\\ipafontx\\ipafont
First, it is noteworthy that none of the ST targets shown in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)involve unreleased stops\. The language sections in which the voiceless sounds \[k\] and \[q\] were represented through ST are the Persian and Uyghur sections, and in these languages \[k\] and \[q\] did not undergo unreleased realization even in word\-final positions\.

The fact that \[q\] and\\ipafontɣ were represented differently across language sections suggests a phonetic distinction between them\. However, there appears to be little research on the precise phonetic values of these sounds in word\-final position in fifteenth\-century Persian, Uyghur, and Mongolian\. In the case of\\ipafontɣ, it has been shown that in fifteenth\-century Uyghur and Mongolian its phonemic status was not firmly established even in onset position, and that it occurred only under specific phonological conditions\[[31](https://arxiv.org/html/2605.14480#bib.bib31)\]\. Accordingly, the ST patterns involving\\ipafontɣ in the Uyghur and Mongolian sections will not be taken into account here\. In Persian, meanwhile, \[q\] is attested in twentieth\-century Persian as the result of a merger between the native phoneme /\\ipafontɣ/ and /\\ipafontk₂/, which occurred in Arabic loanwords, but this merger had not yet become phonologized in the thirteenth century\[[32](https://arxiv.org/html/2605.14480#bib.bib32)\]\. This raises the possibility that forms interpreted as ST representations of \[q\] in the Persian section may in fact reflect /\\ipafontɣ/ or /\\ipafontk₂/\. The ST patterns involving \[q\] in the Persian section will therefore also be excluded from consideration\.

B\.bāngmǔ幇母 –pāngmǔ滂母 –fēimǔ非母

In this section, we examinebāngmǔ幇母 andpāngmǔ滂母, which were primarily used to represent ST bilabial stops, together withfēimǔ非母, which was mainly used to represent ST bilabial fricatives\.

We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns of the relevant transcription characters regardless of the particular phonemes or phones they were used to represent\.

Table 25Distribution of ST characters associated withbāng幇,pāng滂, andfēi非

PersianUyghurTibetanMongolianChamTotalbāng幇卜32112063不⊙\\odot66\+⊙\\odot補9110白134䋠11pāng滂批11fēi非夫33伏2525
According to the criteria proposed above,bo卜 andbù不 in thebāngmǔ幇母 group qualify as ST transcription characters\. However,bǔ補 andbái白 should not be disregarded entirely, since they occur in more than one language section, albeit in a limited number of examples\. The character 䋠 raises the possibility of being an erroneous form of 補, although this can only be confirmed through direct examination of the base manuscript\. In the case ofpāngmǔ滂母, only a single example of ST involving pi 批 is attested\. It therefore remains unclear whetherpāngmǔ滂母 had a dedicated ST transcription character\. As forfēimǔ非母, bothfū夫 andfú伏 qualify as ST transcription characters\. Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofbāngmǔ幇母,pāngmǔ滂母, andfēimǔ非母 when used for ST across the language sections\.

Table 26Correspondence patterns ofbāng幇 andfēi非

PersianUyghurTibetanMongolianChambāng幇母\\ipafontb, p\\ipafontb, p\\ipafontb, p\\ipafontb\\ipafontb, pfēi非母\\ipafontf\\ipafontf–––
One particularly noteworthy pattern in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)concerns \[p\]\. As discussed above for \[k\], \[g\], and \[q\], and as will also be seen below for \[d\] and\\ipafontt, ST representations of stop consonants generally show a tendency for voiceless sounds to be represented by aspiratedshēngmǔ聲母 series and voiced sounds by unaspirated series\. However, \[p\], despite being voiceless, was represented by the unaspiratedbāngmǔ幇母 series\. One might argue that, although reconstructed phonologically as /p/, the relevant forms may in fact have been phonetically realized as \[b\] under particular phonological conditions\. However, as shown in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below, cases can be identified in which \[p\] occurring in environments that do not permit voicing was nevertheless represented throughbāngmǔ幇母\.

Table 27Representative examples of transcription entries

Index1st row2nd rowTargetU\-34雲起課克科卜kök qopU\-703薄與卜哈納yupqanaT\-6雲卜吝/\\ipafontprin/lit\.sprinP\-1622左徹卜čap
C\.duānmǔ端母 –tòumǔ透母

In this section, we examineduānmǔ端母 andtòumǔ透母, which were primarily used to represent ST alveolar stops\. We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns of the relevant transcription characters regardless of the particular phonemes or phones they were used to represent\.

Table 28Distribution of ST characters associated withduān端 andtòu透

PersianUyghurMalayChamJapaneseTotalduān端的2321017得5858都11答11tòu透忒262753剔1414惕66禿11帖11
According to the criteria proposed above,de的 and de 得 in theduānmǔ端母 group qualify as ST transcription characters\. Withintòumǔ透母,tè忒 is the most robustly attested ST transcription character, whiletī剔 andtì惕 also function as valid ST transcription characters\. Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofduānmǔ端母 andtòumǔ透母 across the language sections\.

Table 29Correspondence patterns ofduān端 andtòu透

PersianUyghurMalayChamJapaneseduān端母\\ipafontd\\ipafontd\\ipafontt, ʧ\\ipafontt\\ipafontttòu透母\\ipafontt\\ipafontt–––
First, the cases of \[t\] and \[\\ipafontʧ\] represented byduānmǔ端母 in the Malay section should be excluded from the discussion\. This is because the segments reconstructed as \[\\ipafontʧ\] and \[t\]\[[8](https://arxiv.org/html/2605.14480#bib.bib8)\], which presents the forms in Modern Malay, may have corresponded to different phonemes such as /d/ in fifteenth\-century Malay\. According to\[[26](https://arxiv.org/html/2605.14480#bib.bib26), pp\. 55–56\], syllable\-final /t/ and /d/ in Modern Malay are unreleased in coda position\. However, it remains unclear when this neutralization or unreleased realization emerged historically\. Consequently, it is difficult to rule out the possibility that forms reconstructed as \[t\] inEdwards and Blagden \[[8](https://arxiv.org/html/2605.14480#bib.bib8)\]may actually have corresponded to /d/ during the period represented in the Malay section\. Likewise, the case of \[t\] represented byduānmǔ端母 in the Japanese section should also be excluded from consideration\. Previous studies do not agree on whetherde的 in these items represented ST \[t\]\.Matsumoto and Ding \[[27](https://arxiv.org/html/2605.14480#bib.bib27)\]reconstructed these forms as representing \[t\], whereasWatanabe \[[43](https://arxiv.org/html/2605.14480#bib.bib43), p\. 8\]instead interpreted them as corresponding to チ\.

The Cham section is the only language section in which the unaspiratedshēngmǔ聲母 categoryduānmǔ端母 was used to represent the voiceless sound \[t\]\. Nevertheless, this transcription pattern in the Cham section cannot simply be disregarded\. The \[t\] represented byduānmǔ端母 in the Cham section occurs as the first element of an initial consonant cluster, and Cham initial clusters are generally understood to have arisen through vowel loss\. It is therefore necessary to assume that the phonetic value of this segment was indeed \[t\] in fifteenth\-century Cham as well\.

Even so, two considerations suggest thattòumǔ透母 should be regarded as the primary STshēngmǔ聲母 for \[t\], whereasduānmǔ端母 should be regarded as the primary STshēngmǔ聲母 for \[d\]\. First, in the Cham section, no cases are attested in which \[d\], contrasting with \[t\] in voicing, is represented through ST\. Second, in both the Persian section and the Uyghur section, where both \[t\] and \[d\] are represented through ST, they correspond systematically totòumǔ透母 andduānmǔ端母 respectively\. Therefore, although the correspondence between \[t\] andduānmǔ端母 inferred from the Cham section should not be ignored, it is better treated as a secondary correspondence\.

D\.xīnmǔ心母 –shěnmǔ審母

We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns of the relevant transcription characters regardless of the phonetic values they were used to represent\.

Table 30Distribution of ST characters associated withxīn心 andshěn審

PersianUyghurTibetanMongolianMalayTotalxīn心思513613⊙\\odot14114\+⊙\\odot習718糸33西33速22桑11shěn審失4170⊙\\odot111\+⊙\\odot
According to the criteria proposed before, the most robustly attested ST transcription character in thexīnmǔ心母 group issī思\. However,xí習,sī糸, andsù速 should not be disregarded, since they were used across different lexical items\. By contrast,xī西 is excluded from the set of ST transcription characters forxīnmǔ心母, because all of its occurrences are limited to the representation of \[s\] in the same lexical itemeski\(Uyghur\-150, 491, 713\)\. The ST transcription character forshěnmǔ審母 isshī失\. Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofxīnmǔ心母 andshěnmǔ審母 across the language sections\.

Table 31Correspondence patterns ofxīn心 andshěn審

PersianUyghurTibetanMongolianMalayxīn心母\\ipafonts\\ipafonts\\ipafonts\\ipafonts\\ipafontsshěn審母\\ipafontʃ\\ipafontʃ–\\ipafontʃ–
The ST patterns associated withxīnmǔ心母 andshěnmǔ審母 are highly systematic and require little further discussion\. As shown in the table above,xīnmǔ心母 was used to represent \[s\], whereasshěnmǔ審母 was used to represent\\ipafontʃ\.

E\.chuānmǔ穿母 –zhàomǔ照母,jīngmǔ精母 –rìmǔ日母

We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns of the relevant transcription characters regardless of the phonetic values they were used to represent\.

Table 32Distribution of ST characters associated withchuān穿,zhào照,rì日, andjīng精

PersianUyghurTotalchuān穿赤99除55出22zhào照只77褚11rì日日22jīng精子195877聚11則11
According to the criteria proposed above, the ST transcription character forchuānmǔ穿母 appears to have beenchì赤\. Althoughchú除 is excluded because it was used exclusively to represent the same lexical itemüč\(Uyghur\-193, 238, 250, 265, 804\), chu 出 should not be disregarded, since it occurs in different lexical items\. The ST transcription character forzhàomǔ照母 waszhǐ只, that forrìmǔ日母 wasrì日, and that forjīngmǔ精母 waszǐ子\. Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofchuānmǔ穿母,zhàomǔ照母,jīngmǔ精母, andrìmǔ日母 across the language sections\.

Table 33Correspondence patterns ofchuān穿,zhào照,jīng精, andrì日

PersianUyghurchuān穿母–\\ipafontʧzhào照母\\ipafontʤ–jīng精母\\ipafontz\\ipafontzrì日母\\ipafontʤ, ʒ–
First, the ST patterns associated withchuānmǔ穿母,zhàomǔ照母, andjīngmǔ精母 are highly systematic and require little further discussion\. As shown in the table above,chuānmǔ穿母 was used to represent \[\\ipafontʧ\],zhàomǔ照母 to represent \[\\ipafontʤ\], andjīngmǔ精母 to represent \[\\ipafontz\]\.

The case ofrìmǔ日母 is somewhat different, since only two examples are attested and they represent different phonetic values\. However, within the Persian section, \[\\ipafontʤ\] is consistently represented byzhàomǔ照母 across multiple lexical items\. This suggests that \[\\ipafontʒ\], rather than \[\\ipafontʤ\], should be regarded as the sound corresponding torìmǔ日母\.

F\.láimǔ來母 –érhuà\-rhyme \(兒化韻\)

We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns of the relevant transcription characters regardless of the phonetic values they were used to represent\.

Table 34Distribution of ST characters associated withlái來 andér兒

PersianUygurTibetanMongolianMalayChamTotallái來勒216⊙\\odot18力4314158里29⊙\\odot29剌134魯22綠11羅11路11弄11利11ér兒兒146249124⊙\\odot1910548\+⊙\\odot二⊙\\odot⊙\\odot
According to the criteria proposed above, the most robustly attested ST transcription characters forláimǔ來母 arelè勒,lì力, andlǐ里\. However,là剌 should not be disregarded, since it was used for ST in more than one language section, albeit in a limited number of examples\. The ST transcription characters associated withérhuà\-rhyme \(兒化韻\) areér兒 andèr二\. Sinceèr二 was primarily used in the Korean section to represent ST /l/, it may reasonably be regarded as an ST transcription character forérhuà\-rhyme \(兒化韻\)\. Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofláimǔ來母 andérhuà\-rhyme \(兒化韻\) across the language sections\.

Table 35Correspondence patterns oflái來 andérhuà兒化韻

PersianUyghurTibetanMongolianMalayChamlái來母\\ipafontl, ɾ\\ipafontl, ɾ\\ipafontl\\ipafontl, ɾ\\ipafontr–érhuà兒化韻\\ipafontɾ, l\\ipafontɾ, l\\ipafontr\\ipafontɾ, l\\ipafontr\\ipafontr
At first glance, the table above may suggest thatláimǔ來母 andérhuà\-rhyme \(兒化韻\) were used inconsistently to represent \[\\ipafontl\], \[\\ipafontr\], and \[\\ipafontɾ\]\. However, when the frequency counts summarized in Appendix are taken into consideration, a clearer tendency emerges: \[\\ipafontl\] shows the strongest correspondence withláimǔ來母, whereas \[\\ipafontr\] and \[\\ipafontɾ\] correspond most consistently toérhuà\-rhyme \(兒化韻\)\. At the same time, cases in which \[\\ipafontl\] is represented throughérhuà\-rhyme \(兒化韻\) and cases in which \[\\ipafontr\] or \[\\ipafontɾ\] are represented throughláimǔ來母 are also sufficiently frequent that they should be treated as secondary correspondences rather than dismissed altogether\.

F\.míngmǔ明母, /\-n/

We begin by identifying the transcription characters that functioned as ST characters\. Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns of the relevant transcription characters regardless of the phonetic values they were used to represent\.

Table 36Distribution of ST characters associated withmíng明 and /\\ipafont\-n/

PersianUyghurTotalmíng明密1111母11木11/\-n/音1212
According to the criteria proposed above, onlymì密 qualifies as an ST transcription character belonging tomíngmǔ明母\. However,mù木 deserves special attention, since it was used in Jilin leishi 鷄林類事 to represent the Korean coda /\-m/\. The ST transcription character containing theyùnwěi韻尾 /\-n/ isyīn音\. This is reminiscent of the use of 音 as a final consonant marker inHyangchaltranscription\. Taking these observations into account, Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below summarizes the distributional patterns ofmíngmǔ明母 characters and transcription characters containing theyùnwěi韻尾 /\-n/ across the language sections\.

Table 37Correspondence patterns ofmíng明 and /\\ipafont\-n/

PersianUyghurmíng明母\\ipafontm\\ipafontm/\\ipafont\-n/\\ipafontn–
The ST patterns associated withmíngmǔ明母 characters and transcription characters containing theyùnwěi韻尾 /\-n/ are highly systematic and require little further discussion\. As shown in the table above,míngmǔ明母 characters were used to represent\\ipafontm, whereas transcription characters containing theyùnwěi韻尾 /\-n/ were used to represent\\ipafontn\.

The observations presented thus far are summarized in Table[4\.2](https://arxiv.org/html/2605.14480#S4.SS2)below\.

Table 38Summary ofshēngmǔ聲母 transcription patterns

chun 脣she 舌, chi 齒ya 牙hou 喉BilabialLabio\-dentalDentalAlveolarPost\-alveolarAlveo\-palatalVelarPostvelarGlottalstopVoiceless\\ipafontp幇\\ipafontt端透\\ipafontk溪\\ipafontq溪Voiced\\ipafontb幇\\ipafontd端\\ipafontg見AffricateVoiceless\\ipafontʧ穿Voiced\\ipafontʤ照FricativeVoiceless\\ipafontf非\\ipafonts心\\ipafontʃ審\\ipafontx曉\\ipafonth曉Voiced\\ipafontz精\\ipafontʒ日\\ipafontɣ影見LiquidTrill\\ipafontr兒來Tap\\ipafontɾ兒來Lateralapproximant\\ipafontl來兒Nasal\\ipafontm明\\ipafontn/\-n/

## 5Major characteristics of transcription in Huitongguanxi Huayiyiyu

Our analysis of the digitized HHY shows that the transcription system is characterized by two constitutive properties: phonetic transcription and consistent notation\. These properties are not limited to a single language section, but are consistently observed across HHY and align with observations made in previous language\-specific studies\.

### 5\.1Phonetic transcription

Phonetic transcription refers to transcription that records perceived phonetic values without phonological interpretation, in contrast to phonological transcription\. Whereas phonological transcription abstracts away from phonetic variation to represent underlying contrastive units, phonetic transcription aims to capture surface realizations as they are heard\. A familiar illustration can be drawn from English consonants: the phoneme /t/ is written uniformly as t in orthography, but its phonetic realization varies by context, appearing as \[t\\ipafontʰ\] in top, \[t\] in stop, and \[\\ipafontɾ\] in butter\. In phonetic transcription, these differences are explicitly represented\.

Analysis of digitized HHY data indicates that the transcription system systematically records perceived phonetic values rather than abstract phonological categories\. Instead of neutralizing contextual variation to represent underlying contrastive units, as in phonological transcription, HHY transcription reflects surface realizations as they are heard\. This sensitivity to phonetic detail is evident across language sections and is consistent with previous descriptions of HHY as a resource oriented toward spoken language \([30](https://arxiv.org/html/2605.14480#bib.bib30), p\. 98;[31](https://arxiv.org/html/2605.14480#bib.bib31), p\. 115\), although previous studies were solely focusing on individual language sections\. In the present study, this characteristic was verified across the digitized HHY corpus\.

At the same time, the transcription characters used in HHY cannot be regarded as equivalent to the IPA\. The IPA is an objective system designed exclusively for phonetic transcription, in which each symbol corresponds to a single speech sound\. The Chinese characters in HHY, on the contrary, functioned as a writing system for a natural language and therefore represent phonological categories rather than discrete phonetic segments\. As a result, when Chinese characters are used for phonetic transcription, phonetic variants that belong to a single phoneme in the target language may be differentiated and represented separately, while variants that belong to distinct phonemes in the target language may conversely be conflated into a singleyīnxì音系\.

Furthermore, the phonetic range associated with a given Chinese phonological category is not fixed, but shifts when that category is used to transcribe foreign languages\. Such shifts arise from the interaction between the Chinese phonological system and the phonological systems of the target languages\. This flexible use of Chinese phonological categories constitutes a distinctive characteristic of HHY transcription and is closely related to its purpose of compilation\. As discussed above, HHY differs from the other three classes ofhuáyíyìyǔ華夷譯語 in that it is explicitly concerned with spoken language, reflecting its primary function as a resource for training interpreters rather than translators\.

### 5\.2Consistent notation

Our analysis of the digitized HHY data also reveals a high degree of consistency in notation\. This characteristic, like the phonetic orientation of HHY transcription, has been noted in previous studies on individual language sections \([30](https://arxiv.org/html/2605.14480#bib.bib30), p\. 99\) and in our study, it was confirmed across the entire data\. Identical morphemes tend to be transcribed in largely the same manner even when they occur as components of larger constructions, such as compound words or phrases\.

One notable exception to this general consistency is the use of ST, a transcriptional device employed to represent phonetic material that cannot be encoded by MT characters\.While the main transcription characters remain stable, the presence or absence of ST characters may vary\. When the lexical identity of an item is sufficiently clear without ST, ST characters tend to be omitted; when it is not, ST characters were retained\. An example from the Korean section is provided in Table[5\.2](https://arxiv.org/html/2605.14480#S5.SS2)\. Only a subset of cases in which the same lexical item occurs more than once is presented there\.

Table 39Examples of repeated lexical items with and without supplementary transcription in the Korean section

Index1strow2ndrowIndex1strow2ndrowK\-1, 13–21, 232天哈嫩\(二\)K\-10, 51雪嫩K\-2, 22–27日害K\-11, 50霧按蓋K\-3, 28–31月得二K\-12, 52–53露以沁K\-4, 32–35星別二K\-14, 124陰黑立大K\-5, 36–38風把論K\-17, 77, 108高那大K\-6, 39–44雲故論K\-18, 86邊格自K\-7, 49雷別剌K\-20, 127晚展根\(格\)大K\-8, 45–48, 462雨必
This pattern is clearly illustrated by the itemhānèn哈嫩 together with the ST characterèr二\. The full formhānèn\-èr哈嫩二 appears only when the item occurs as an independent lexical entry, as in K\-1\. When the same form appears as part of compounds or phrases, as in K\-13–21 and K\-232, the ST characterèr二, which represents the coda /l/, is omitted, and the form appears simply ashānèn哈嫩\.

By contrast, in the cases ofdé\-èr得二 andbié\-èr別二, where the same ST characterèr二 is likewise used to represent /l/, the character is never omitted and appears consistently throughout the corpus\. This difference can be explained by the fact thathānèn哈嫩, even without the ST character, can still be interpreted unambiguously as representing Korean /hanal/ ‘sky’\. In contrast,dé得 andbié別 alone do not provide sufficiently stable or transparent representations of Korean /tal/ ‘moon’ and /pjəl/ ‘star’ without the additional ST characterèr二\.

Although the number of cases is limited, a similar pattern is also observed with grammatical morphemes\. Grammatical morphemes that are transcribed consistently across entries in Korean section are summarized in Table[5\.2](https://arxiv.org/html/2605.14480#S5.SS2)\.

Table 40Grammatical morphemes consistently transcribed in the Korean section

2ndrowKoreanIndex大\-taK\-13–14, 17, 19–35, 39–40, 47–55, 77–78, 84, 87–88, 91–100, 106–113, 118–119, 124, 127–129, 134–135, 142–143, 176, 188–189, 218–219, 229, 253–254, 264–265, 353–355, 357–358, 381–383, 387, 426–427, 436–437, 526–527 \(88 items in total\)刺\-laK\-146–147, 190, 341, 342, 345, 347, 349–350, 356, 506, 507 \(12 items in total\)格\-ge~eK\-20–23, 29–30, 50, 146–147, 189, 341, 345, 347, 349–350 \(15 items in total\)那\-naK\-190, 342 \(2 items in total\)
The characterdà大, for example, is used to transcribe two distinct sentence\-final endings in Middle Korean:\-ta, the declarative ending, and\-la, the imperative ending\. The forms\-ke\-and\-e\-, transcribed withgé格, as well as\-na\-, transcribed withnà那, function as confirmative prefinal endings and are typically associated with monologic speech or utterances with a strongly one\-sided assertive function\[[19](https://arxiv.org/html/2605.14480#bib.bib19), pp\. 285–286\]\.

This distinction is reflected in the transcription ofzhǎngēn\(gé\)dà展根\(格\)大 in the Korean section\. In the formzhǎngēndà展根大 ‘to grow dark’, corresponding to Middle Koreancyengkul\-ta, the ST charactergé格, representing the confirmative prefinal ending\-e\-, is omitted\. By contrast, inzhǎngēngédà展根格大 ‘to have grown dark’, corresponding to Middle Koreancyengkul\-e\-ta,gé格 is inserted to explicitly mark the presence of the prefinal ending\.

### 5\.3Analytical Framework: Main and Supplementary Transcription

The distribution of phonetic material between MT characters and ST in HHY exhibits a high degree of regularity across language sections\. In particular, patterns in the use and non\-use of ST reveal systematic constraints on how phonetic information is encoded within the HHY transcription system\. These patterns reflect both the representational limits imposed by the Chinese syllable structure and the phonetic properties of the target languages\.

#### 5\.3\.1General principle I: syllable\-structure constraints on Main Transcription

Differences in phonological constraints between Chinese and the target languages play a crucial role in shaping HHY transcription practices\. Analysis of the digitized HHY data shows that the first general principle governing HHY transcription is grounded in syllable\-structure constraints\. Accordingly, this principle is best understood with reference to the Chinese syllable structure, conventionally formalized as I/MVET \(Initial =shēngmǔ聲母; Medial =yùntóu韻頭; principal Vowel =yùnfù韻腹; Ending =yùnwěi韻尾; Tone =shēngdiào聲調\)\.

Within this framework, theyùntóu韻頭 andyùnfù韻腹 do not emerge as decisive factors in determining the distribution of MT\. Instead, our HHY data indicates that constraints on MT are primarily associated with theshēngmǔ聲母 and theyùnwěi韻尾\. This distributional asymmetry corresponds closely to specific properties of Chinese syllable structure in the fifteenth and sixteenth centuries, two of which are particularly relevant for understanding the syllable\-structure constraints observed in our data\. \(a\) First, in fifteenth\- and sixteenth\-century Chinese, the only codas permitted were /n/ and /ŋ/\. \(b\) Second, complex initials were not permitted in fifteenth\-century Chinese\. Together, these structural constraints place clear limits on the range of phonetic material that can be captured by a single Chinese character in HHY\.

Under constraints \(a\) and \(b\), a range of target\-language phonological configurations cannot be represented by MT\. First, MT is systematically limited in word\-final position due to constraint \(a\)\. The data shows that MT is not possible when a target\-language syllable or word ends in a consonant other than /n/ or /ŋ/\. Consonants such as /k/, /t/, /p/, /h/, /l/, and /r/ therefore consistently fall outside the scope of coda representation by MT characters\. It is noteworthy, however, that theyùnwěi韻尾 /m/, generally assumed to have disappeared from Chinese by the fifteenth century, appears to be represented by MT in a small number of HHY entries\. At this stage, it remains unclear whether this reflects residual survival of /\-m/ in Chinese or localized transcriptional variation\.

MT is likewise systematically limited in word\-initial position due to constraint \(b\)\. Because complex onsets are excluded from the Chinese syllable template, word\-initial consonant clusters in target languages cannot be fully encoded by MT\. In such cases, the first consonant in the cluster is consistently excluded from MT and instead recovered through ST, while the second consonant may be represented as an initial\.

MT is also limited in word\-medial position due to both constraints \(a\) and \(b\)\. When three or more consonants occur medially, only a subset can be encoded by MT\. If the first consonant is /n/ or /ŋ/, it may be represented as a coda; any following consonants up to the penultimate position are excluded, while the final consonant may be represented as an initial\. Word\-final consonant clusters exhibit comparable limitations: only the first consonant can be encoded as a coda, and only if it is /n/ or /ŋ/, while all remaining consonants fall outside the scope of MT\.

Taken together, these patterns show that phonetic material that cannot be encoded by MT under Chinese syllable\-structure constraints systematically satisfies the first necessary condition for ST\. Crucially, however, not all such material is realized through ST\. Additional phonetic conditions must be met for ST to occur, as discussed in the following section 4\.2\.2\. For now, the structural constraints identified here can be summarized schematically as below:

𝐂1​𝐂2​𝐕​𝐂3​𝐂4​𝐂5​𝐂6​𝐕​𝐂7​𝐂8\\mathbf\{C\}\_\{1\}\\ \\mathbf\{C\}\_\{2\}\\ \\mathbf\{V\}\\ \\mathbf\{C\}\_\{3\}\\ \\mathbf\{C\}\_\{4\}\\ \\mathbf\{C\}\_\{5\}\\ \\mathbf\{C\}\_\{6\}\\ \\mathbf\{V\}\\ \\mathbf\{C\}\_\{7\}\\ \\mathbf\{C\}\_\{8\}
When the constraints outlined above are applied to this schematic representation,C1C\_\{1\},C4C\_\{4\},C5C\_\{5\}, andC8C\_\{8\}consistently qualify as potential targets for ST regardless of their specific phonetic value\.C2C\_\{2\}andC6C\_\{6\}are eligible for MT\.C3C\_\{3\}andC7C\_\{7\}are encoded as MT only when they are /n/ or /\\ipafontŋ/; otherwise, they likewise satisfy the first necessary condition for ST\.

#### 5\.3\.2General principle II: Phonetic conditions governing Supplementary Transcription

While General Principle I specifies which phonetic material cannot be encoded by MT under Chinese syllable\-structure constraints, the HHY data further show that only a subset of such material is actually realized through ST\. Examination of the digitized HHY corpus reveals that the distribution of ST is not arbitrary, but instead correlates systematically with specific phonetic properties of the target\-language segments\.

Across language sections, segments realized through ST consistently exhibit one or more of the following phonetic characteristics\. First, voiced segments are regularly represented by ST whenever they satisfy the structural conditions identified in General Principle I\. No voiced segment attested in the phoneme inventories of the target languages is systematically excluded from ST under these conditions\. Apparent absences reflect gaps in the relevant inventories rather than restrictions on ST itself\. In fact, previous studies have used the presence of ST in HHY as independent evidence for the voicing of particular segments \(e\.g\.[37](https://arxiv.org/html/2605.14480#bib.bib37);[31](https://arxiv.org/html/2605.14480#bib.bib31)\)\.

Second, voiceless segments may also be realized through ST when they are continuants\. Under the same structural constraints, voiceless fricatives and other continuant segments consistently appear as ST where MT is unavailable\. As with voiced segments, no systematic exclusion of voiceless continuants from ST is observed in the HHY data\. Third, even voiceless stops may be realized through ST when they are phonetically released\. Evidence from Hunminjeongeum Haeryebon, particularly the Eight Coda Rules \(八終聲法\), suggests that fifteenth\-century Korean permitted distinctions involving unreduced or unreleased codas\. Persian, Uyghur, and Cham sections also provide clear cases in which released voiceless stops are represented by ST under positions where MT is structurally unavailable\.

Taken together, these observations indicate that, provided the syllable\-structure constraints outlined in General Principle I are met, phonetic material is eligible for ST if it satisfies at least one of the following conditions:

A\. the segment is voiced\.

B\. the segment is voiceless but continuant\.

C\. the segment is a voiceless stop realized with release\.

Although a small number of cases do not conform neatly to these generalizations, such instances are sporadic and do not undermine the overall regularity observed in the HHY transcription system\. The patterns summarized here therefore capture the dominant phonetic conditions governing the distribution of ST across the HHY corpus\.

## 6General Discussion and Conclusion

The present study has argued that the transcription system of HHY cannot be interpreted solely through contemporaneous Chinese phonology\. Although the transcriptional categories used in HHY were derived from Chinese phonological categories, their actual transcriptional values were shaped through interaction with the phonological systems of the target languages\. The same Chinese category may correspond to different phonetic values across language sections, while phonetically similar sounds may be represented through different categories depending on the contrastive structure of the target language\. This suggests that HHY did not operate as a direct projection of Chinese phonology onto foreign languages\. Rather, it functioned as a system of phonetic approximation that adapted Chinese phonological resources to the representation of non\-Chinese speech\.

At the same time, the transcription system was not arbitrary\. The distribution of MT and ST across language sections shows that the transcriptional practices of HHY were highly systematic\. Segments that could be accommodated within the Chinese syllable template were generally represented through MT, whereas segments that could not be represented within the structural constraints of Chinese phonology were selectively represented through ST\. Importantly, the occurrence of ST was not random\. The results suggest that ST was conditioned not only by structural incompatibility with Chinese syllable structure, but also by phonetic salience, including voicing, continuancy, and release\. In this sense, HHY reflects neither a purely phonological notation system nor a fully segmental phonetic notation system comparable to the IPA\. Instead, it represents a historically specific transcriptional system in which phonetic perception, phonological categorization, and orthographic convention interacted systematically\.

One of the main implications of the present study is methodological\. Previous studies on HHY have generally been conducted language\-specifically, and the interpretation of individual transcription categories has often depended heavily on reconstructions of Chinese phonology alone\. The cross\-linguistic comparison conducted in this study suggests that a broader comparative approach is necessary\. Since the same transcriptional framework was used across multiple language sections, better\-understood languages can function as calibration points for interpreting languages with fewer historical resources\. For example, stable correspondences in the Malay and Japanese sections help narrow the phonetic range associated with particular yunmu categories, while recurring ST patterns across Persian, Uyghur, Tibetan, Mongolian, Malay, and Cham sections make it possible to distinguish systematic transcriptional devices from isolated orthographic variation\.

This approach is especially meaningful for the study of under\-documented or endangered languages\. Many of the languages represented in HHY survive only fragmentarily in historical records, and some have undergone substantial phonological change since the period of compilation\. In such cases, language\-internal reconstruction alone is often insufficient\. The multilingual structure of HHY makes it possible to approach these languages comparatively through the internal regularities of the transcription system itself\. This expands the methodological possibilities of historical phonology beyond traditional comparative reconstruction and typological comparison\. Rather than comparing languages only through inherited cognates or structural typology, the present study demonstrates that multilingual transcription corpora themselves can function as comparative phonological evidence\.

The present study has focused primarily on reconstructing the transcriptional principles underlying HHY and identifying the phonetic ranges associated with major transcription categories\. Much work nevertheless remains to be done\. Several language sections still require more reliable secondary reconstruction, manuscript comparison remains incomplete, and the statistical distribution of transcription categories deserves further quantitative investigation\. Despite these limitations, the results suggest that HHY constitutes one of the most systematic multilingual transcription corpora preserved from premodern Asia\. Its significance lies not only in the historical forms it records, but also in the transcriptional logic through which Chinese characters were adapted to represent a wide range of non\-Chinese languages\.

## References

- \\bibcommenthead
- Calabrese and Wetzels \[2009\]Calabrese A, Wetzels WL \(2009\) Loan Phonology\. John Benjamins
- Chen \[1966\]Chen Ch \(1966\) Annan yakugo no kenkyū \(1\)\. Shigaku 39\(3\):307–348\. In Japanese
- Chen \[1967a\]Chen Ch \(1967a\) Annan yakugo no kenkyū \(2\)\. Shigaku 39\(4\):481–497\. In Japanese
- Chen \[1967b\]Chen Ch \(1967b\) Annan yakugo no kenkyū \(3\)\. Shigaku 40\(1\):25–85\. In Japanese
- Chen \[1968a\]Chen Ch \(1968a\) Annan yakugo no kenkyū \(4\)\. Shigaku 41\(1\):1–63\. In Japanese
- Chen \[1968b\]Chen Ch \(1968b\) Annan yakugo no kenkyū \(5\)\. Shigaku 41\(2\):205–248\. In Japanese
- Chen \[1968c\]Chen Ch \(1968c\) Annan yakugo no kenkyū \(6\)\. Shigaku 41\(3\):409–459\. In Japanese
- Edwards and Blagden \[1931\]Edwards ED, Blagden CO \(1931\) A chinese vocabulary of malacca malay words and phrases collected between ad 1403 and 1511 \(?\)\. Bulletin of the School of Oriental and African Studies 6\(3\):715–749
- Edwards and Blagden \[1939\]Edwards ED, Blagden CO \(1939\) A chinese vocabulary of cham words and phrases\. Bulletin of the School of Oriental Studies, University of London pp 53–91
- Giles \[1898\]Giles HA \(1898\) A Chinese Biographical Dictionary\. Bernard Quaritch
- Guo \[1986\]Guo X \(1986\) Hanzi gu yin shouce\. Beijing Daxue Chubanshe, in Chinese
- Harrison \[2017\]Harrison SP \(2017\) On the limits of the comparative method\. In: The Handbook of Historical Linguistics\. Blackwell Publishing, p 213–243
- Honda \[1963\]Honda S \(1963\) Kai\-kai kan’yakugo ni tsuite\. Hokkaidō Daigaku Bungakubu Kiyō 11:224–250\. In Japanese
- Ishida \[1931\]Ishida M \(1931\) Jurchen\-go kenkyū no shinsiryō\. In: Kuwabara Hakase kanreki kinen Tōyōshi ronshū\. Tokyo: Shigakukai, p 1271–1323, in Japanese
- Jiang \[1998\]Jiang C \(1998\) Rondon daigaku\-bon nihon\-kan yakugo ni mirareru dokujiteki na yojihō o megutte\. Tsukuba Nihongo Kenkyū 3:60–75\. In Japanese
- Kang \[2011\]Kang Y \(2011\) Loanword phonology
- Kim \[2016\]Kim Je \(2016\) A comparative study of transliterate letters in huayiyiyu: Focused on the estimation of the phonetic value of chaoxianguanyiyu\. Master’s thesis, Seoul National University, Seoul, m\.A\. thesis
- Kim \[1980\]Kim Wj \(1980\) Hyangga haedokbeop yeongu\. SNU Press, Seoul, in Korean
- Ko \[2010\]Ko Yk \(2010\) Pyojun Jungse Gukeo Munbeopron, 3rd edn\. Jipmundang, Seoul, in Korean
- Kwon \[1995\]Kwon Ih \(1995\) A phonological study on the chaoxianguanyiyu\. PhD thesis, Seoul National University, Seoul
- Lass \[2015\]Lass R \(2015\) Interpreting alphabetic orthographies\. In: The Oxford Handbook of Historical Phonology\. Oxford University Press
- Lee \[2007\]Lee Jd \(2007\) Jungguk\-eo eumunhak\. Hakgobang, in Korean
- Lee \[1957\]Lee Km \(1957\) Joseongwan yeogeo\-ui pyeonchan yeondae\. Munni Daehakbo 5\(1\)\. In Korean
- Li \[2019\]Li Ys \(2019\) The uighur word materials in a manuscript of huá\-yí\-yì\-yǔ \(hhy\) in the library of seoul national university \(v\)—tianwenmen ‘the category of astronomy’\. Journal of the Royal Asiatic Society 3:1–62
- Li and Zhou \[1999\]Li Z, Zhou C \(1999\) Hanzi gujin yin biao\. Zhonghua Shuju, in Chinese
- Maris \[1980\]Maris MY \(1980\) The Malay Sound System\. Penerbit Fajar Bakti Sdn\. Bhd\., Kuala Lumpur
- Matsumoto and Ding \[1997\]Matsumoto M, Ding F \(1997\) Nihon\-kan yakugo ni okeru chū–nichi tai\-on kōshaku\. Rōshū \(Komazawa Daigaku\) 45:1–38\. In Japanese
- Minjungseorim \[1966\]Minjungseorim \(ed\) \(1966\) Hanhan dae\-sajeon\. Minjung Seorim, in Korean
- Ning \[1985\]Ning J \(1985\) Zhongyuan yinyun biao gao\. Jilin Wenshi Chubanshe, in Chinese
- Nishida \[1963\]Nishida R \(1963\) Jūrokuseiki ni okeru seikō\-shō chibetto\-go tensen hōgen ni tsuite: Kan–chibetto tango\-shū iwayuru heishu\-bon ‘seiban kan’yakugo’ no kenkyū\. In Japanese
- Ochi \[2004\]Ochi S \(2004\) Kaei yakugo heishu\-bon ‘dattan yakugo’ ni okeru mongoru\-go ni tsuite\. In Japanese
- Pisowicz \[1985\]Pisowicz A \(1985\) Origins of the New and Middle Persian Phonological Systems\. Nakł\. Uniwersytetu Jagiellońskiego
- Rybatzki \[2003\]Rybatzki V \(2003\) Middle mongol\. In: Janhunen J \(ed\) The Mongolic Languages\. Routledge, p 57–82
- Saita \[1987\]Saita \(1987\) \[shiryō\] heishu\-bon seiban kan’yakugo kōhon \(kō\)\. Kōbe\-shi Gaikokugo Daigaku Gaikoku\-gaku Kenkyū 17:157–215\. In Japanese
- Shields \[2010\]Shields K \(2010\) Linguistic typology and historical linguistics\. In: Song JJ \(ed\) The Oxford Handbook of Linguistic Typology\. Oxford University Press, p 551–567,[10\.1093/oxfordhb/9780199281251\.013\.0026](https://arxiv.org/doi.org/10.1093/oxfordhb/9780199281251.013.0026)
- Shōgaito \[1982\]Shōgaito M \(1982\) Uiur kan’yakugo: Churuku\-go no seikaku ni tsuite\. Kōbe Gaidai Ronsō 33\(5\)\. In Japanese
- Shōgaito \[1984\]Shōgaito M \(1984\) Uiur kan’yakugo no kenkyū: Meidai uiguru kōgo no saikō\. Kōbe\-shi Gaikokugo Daigaku Gaikoku\-gaku Kenkyū 14:51–172\. In Japanese
- Tasaka \[1943a\]Tasaka K \(1943a\) Kai\-kai kan’yakugo goshaku \(1\)\. Tōyō Gakuhō 30:96–133\. In Japanese
- Tasaka \[1943b\]Tasaka K \(1943b\) Kai\-kai kan’yakugo goshaku \(2\)\. Tōyō Gakuhō 30:100–164\. In Japanese
- Tasaka \[1944\]Tasaka K \(1944\) Kai\-kai kan’yakugo goshaku \(3, complete\)\. Tōyō Gakuhō 30\(4\):534–560\. In Japanese
- Tasaka \[1951\]Tasaka K \(1951\) Kai\-kai kan’yakugo goshaku hosei\. Tōyō Gakuhō 33\(3–4\):400–413\. In Japanese
- Tōdō \[1978\]Tōdō A \(ed\) \(1978\) Gakken Kanwa daijiten\. Gakushū Kenkyūsha, in Japanese
- Watanabe \[1961\]Watanabe M \(1961\) Kai yakugo oyobi nihon\-kan yakugo ni tsuite ¡shōzen¿\. Komazawa Daigaku Bungakubu Kenkyū Kiyō 19:15\. In Japanese
- Ōtomo \[1968\]Ōtomo M \(1968\) Nihon\-kan yakugo: Honbun to sakuin\. Rakubunsha, in Japanese

Similar Articles

Phonetic Modeling of Dialectal Variation in Vietnamese Speech

arXiv cs.CL

This paper proposes a dialect-aware phonetic framework for modeling phonetic variation in Vietnamese ASR, decomposing syllables into structured components and mapping them to dialect-specific IPA representations. The approach matches pretrained baselines with fewer parameters and no external pretraining on the UIT-ViMD multi-dialect dataset.

An In-Vitro Study on Cross-Lingual Generalization in Language Models

arXiv cs.CL

This paper introduces an in-vitro framework with two procedurally generated languages to study cross-lingual generalization in language models, finding that tokenization's preservation of reusable substructure is more critical than lexical similarity or data balance for transferring capabilities across languages.

Cross-Lingual Steering for Figurative Language Generation

arXiv cs.CL

This paper explores cross-lingual transfer of internal representations for figurative language generation in multilingual LLMs, showing that activation directions learned in one language can effectively steer generation in other languages.

LLMs for automatic annotation of Mandarin narrative transcripts

arXiv cs.CL

This paper evaluates LLMs for automatically annotating narrative macrostructure in spoken Mandarin, finding that the best model achieves near-human reliability while reducing annotation time by 65%, though performance degrades on semantically complex or lexically diverse narratives.