苏州话的数字保护：一个棘手的难题 / Digital Preservation of the Suzhou Dialect as a Wicked Problem

§ 01

苏州话，离你有多远？

你听得懂，但你会说吗？

你的孩子将来会问你：外婆说什么？
你会翻译给他听。
但那个味道，那个语气，
那种只有苏州人才懂的弯弯绕绕——
翻译不了。

§ 02

三代。三十年。

以下片段用于非营利学术与文化保护目的。视频通过原平台嵌入播放，不作二次分发。如有版权疑问，请通过页面底部联系方式与我们联系。

第一代 · 旧时代苏州话邵氏电影《乾隆下扬州》· 1978

「片中女子虽是评弹出身，但这段展示的是旧时代苏州人的日常口语——字字入声，韵味天然。这不只是语言，也是一段文化地位的记忆。明清时期，"苏人以为雅者，则天下雅之"。嘲笑乾隆听不懂苏州话的，是扬州酒馆里的一个小二；台子底下跟着起哄叫好的，是一群扬州本地的纨绔子弟。江南一带，能听懂苏州话，是身份，是品位，是荣耀。」

第二代 · 八零/九零后施斌 · 西周婚礼风俗 · 苏州电视台

「苏州电视台主持人施斌以苏州话讲述西周婚礼风俗。苏州话还在——但普通话词汇与节奏已悄然渗入。这一代是方言与普通话共存的过渡层。」

第三代 · 零零后 2024–2025年录制

「能听懂，也能说出几句，但语调和词汇量已大幅萎缩。这不是懒惰，是一整个语言环境的消失——学校、媒体、同龄人，全是普通话。」

本人录音，版权归作者所有，授权本页使用。自有版权

三代之间，语言的传承断了。

§ 03

苏州话，为什么特别？

苏州话之所以难以数字化保存，恰恰是因为它在语言学上格外精密。以下三个特征，是任何保护方案都必须面对的挑战。

七个声调

普通话有四个声调。苏州话有七个，包括两个以喉塞音收尾的"入声"——这在普通话里早已消失。

普通话（4调）

苏州话（7调）

连读变调

一个字单独念和放在词组里念，声调完全不同。这个规则贯穿全句，让任何语音识别模型都难以处理。

今单读: 阴平

今朝 → 今变调，朝跟随

今朝天气 → 调域扩展，规则更复杂

保留浊音

普通话里消失了一千多年的浊辅音，在苏州话里依然活着。这是中古汉语的"活化石"，也是AI训练数据里最难处理的特征之一。

苏州话有 / 普通话无：

/b/ /d/ /g/ /v/ /z/

这些音素在普通话训练语料中几乎不存在，导致模型系统性地识别错误。

比普通话更丰富的元音系统

苏州话包含普通话中完全没有的圆唇前中元音，如/ø/，与法语"feu"中的发音相似，展现了吴语保留古汉语特征的独特性。

昆曲与评弹的音韵根基

明代音乐家魏良辅以苏州方言腔调为基础创制了昆山腔，成为昆曲的标准音。评弹至今必须用苏州话演出——换了语言，艺术便不复存在。

昆曲评弹非物质文化遗产

还有第四重困难：被误认。 苏州话和上海话同属吴语（Wu Chinese），但两者语音系统差异显著。由于上海话在网络上留存更多、与普通话在结构上更接近，AI模型更容易"学会"上海话。数据爬取时，大量苏州话内容被错误标注为上海话；训练出来的所谓"吴语模型"，实质上往往是以上海话为主导的系统。本就稀少的苏州话语料，在进入数据集之前就已经被系统性地污染了。

§ 04

为什么"教一教"就解决不了？

遇到语言危机，人们的第一反应通常是"那就教吧"。于是有了学校课程、学习App、政府录音工程——每一个方案都有其逻辑，也都部分失败了。以下四个案例说明为什么。

学校开设方言课

课上教的是"标准苏州话"——发音准确，语调规范。但同学们说，听起来像播音员，不像邻居。

✗ 学了，但不会用

开发了学习App

用户数据很好看。但调查发现，用过App的人在生活中说方言的比例并没有上升。

✗ 会认，但不会说

政府录音存档

建立了珍贵的语音数据库。但录音躺在服务器里，没有人在用它们和下一代对话。

✗ 保存了声音，没保存语言

苏州话没有统一的书写系统

想打"煞煞好"，哪个字？每个人写法不一样，词典里查不到，输入法也不认识。没有文字，就没有教材；没有教材，数字化工具从何建起？方言的书写问题不是支线——它是整个难题的结构性根基之一。

✗ 说得出，写不下来

每一个办法，都解决了一个问题，同时带来了新的问题。这就是为什么它很难。

§ 05

为什么这是一个"棘手难题"？

研究者把那些"越解决越复杂"的问题称为"棘手难题"（Wicked Problem）。苏州话的保护，正是其中之一。以下四个维度，说明了为什么。

问题的复杂性

没有单一原因，也没有单一解法

语言政策、经济逻辑、人口迁移、数字媒体——这些力量相互叠加，牵一发而动全身。

语言政策经济逻辑人口迁移数字媒体教育体制

解决方案的失效

每一次干预，都在改变问题本身

好的出发点，也可能带来意外的后果：

开方言课 → 催生"正式苏州话"，不像真正的日常口语
建数字档案 → 声音被保存，但没有人在用它交流
推标准化 → 保存的是人工构建的版本，不是活的方言

利益冲突

各方都想保护，但想法截然不同

对"怎么保"的分歧，往往比"要不要保"更深：

语言学家vs科技公司：自然变体 vs 标准化数据

中央政府vs本地社区：推广普通话 vs 方言文化自主

老一辈vs年轻人：语言纯正 vs 自然演变

本地人vs新苏州人：方言认同 vs 被排斥感

共同的关切

分歧之中，有五件事大家都认同

这是所有对话的起点：

没有人真心希望苏州话消失
下一代对所有人都很重要
文化认同值得保护
现有录音弥足珍贵
留给行动的时间正在减少

你现在能做什么？

倷好 (nǎi hǎo) — 你好

§ 01

Three generations.
Thirty years of change.

Before the data and the arguments, let's have a intuitive impression by feeling it ourself. The three clips below are all in Suzhounese — the same dialect, representing roughly forty years apart. You don't need to understand the words. Feel the rhythm, the texture, the feel of the sounds. Then get a rough idea of what happens to that sound across generations. English subtitles are available in the video player.

Generation 1 · Suzhounese as it was spoken Shaw Brothers · 乾隆下扬州 · 1978

The woman in this clip is a Pingtan performer by trade, but what you are hearing is not a staged performance — it is how Suzhou people simply talked. For centuries, Suzhou set the cultural standard for the entire Yangtze Delta: the saying went, "What the people of Suzhou consider elegant, the whole world considers elegant." In this clip, Emperor Qianlong visits Yangzhou and cannot understand the woman — and gets mocked for it by a tavern waiter, while the local Yangzhou dandies watching from the floor cheer and jeer along. The joke lands because understanding Suzhounese was a point of pride across the whole Jiangnan region — not just for Suzhou locals, but for anyone who considered themselves cultured. "A Shandong man eating 麦冬 — doesn't understand a thing." (Pun: 麦冬/mài dōng ≈ 没懂, "understood nothing.")

Generation 2 · Millennial speaker Shi Bin · Suzhou Television · 2020s

Shi Bin (施斌) is a host at Suzhou Television, whose shows are all entirely in Suzhounese — a deliberate and increasingly rare choice. This clip, originally broadcast by Suzhou TV and later uploaded to YouTube, covers Western Zhou dynasty wedding customs. Listen for the same tonal richness as the first clip, but also notice Mandarin vocabulary and rhythms creeping in. This generation grew up bilingual; the dialect survived, but it changed.

Generation 3 · Post-2000s speaker Recorded 2024–2025

The author's own voice — part of the generation that grew up hearing Suzhounese from grandparents but schooled entirely in Mandarin. Can understand it, can say some phrases, but the full vocabulary and tonal instinct are gone. Not from lack of interest. From lack of environment.

Across three generations, the thread of transmission broke.

§ 02

Three forces working against preservation

The difficulty of preserving Suzhounese isn't a single problem — it's at least three, operating at the same time and reinforcing each other. Click any node to explore.

Click any node to explore the forces at work.

§ 03

The recordings keep growing.
The speakers don't.

Speech recognition accuracy on Wu dialect has improved measurably in the past five years — driven partly by the release of the WenetSpeech-Wu dataset, a large collection of recorded Wu speech (a corpus) assembled for training AI models. At the same time, the population capable of producing natural, unscripted Wu speech has declined. The tools are getting better at preserving an artifact of something that is ceasing to exist as a living system.^[5]

01

Corpus completeness failure

Age, geographic, and register bias in recordings
02

Social context collapse

Isolated phoneme data without pragmatic/conversational structure
03

Transmission failure

No mechanism connecting preserved data to speaker communities
04

Definition failure

Conflating documentation with revitalization
05

Label contamination

Suzhounese and Shanghainese are both Wu dialects, but Shanghainese has far greater online presence and sits structurally closer to Mandarin — making it easier for Mandarin-trained models to handle. AI scrapers routinely mislabel Suzhounese content as Shanghainese. The result: a model trained on "Wu dialect" data may be learning mostly Shanghainese, with Suzhounese either diluted into the noise or systematically misclassified from the start.

Youth fluency data: Jiangsu TV survey (2020, 2.2%); China News Service / Pingjiang survey (2013, ~5%); Sohu/media survey (2008, 8%). ASR baseline from WenetSpeech-Wu paper (arXiv:2601.11027, Jan 2026): Paraformer on Wu ~67% CER (2019), Tencent Cloud ~25% CER (2024), Whisper-medium-Wu ~11% CER (2025). Early-period ASR estimates extrapolated from published Chinese dialect ASR literature. Projected 2025 fluency (~1.5%) based on trendline.

§ 04

Why speech software struggles with Suzhounese

Before the social problem, there is a purely technical one. Three features of Suzhounese make it resistant to standard speech AI pipelines — and understanding them clarifies why having more recordings alone does not solve the problem.

7-tone system vs. Mandarin's 4

Suzhounese preserves five unchecked tones and two checked tones (入声, 'entering tones') with a glottal stop offset — lost in Mandarin ~1,000 years ago. Mandarin-trained acoustic models have no representation for these phoneme classes.

Mandarin (4 tones)

Suzhounese (7 tones)

Tones shift depending on neighboring syllables

In Suzhounese, the tone of a syllable changes based on what surrounds it — a phenomenon called tone sandhi. Specifically, the first syllable of a phrase sets a "tone key" that overwrites all syllables to its right. AI speech systems built on Mandarin — where tones are fixed per syllable — have no architecture for this; they must understand the whole phrase before decoding any individual sound.

今 alone: tone 1 (yin-ping)

今朝 → 今 shifts; 朝 follows

今朝天气好 → full domain spread; all 5 syllables affected

Sounds that Mandarin erased — still alive here

Suzhounese preserves a set of consonants — /b, d, g, v, z/ — produced with the vocal cords vibrating (called voiced obstruents). Mandarin lost these roughly a thousand years ago. Every speech AI trained on Mandarin hears these sounds and guesses wrong, because it has never been trained to recognize them at all. With almost no examples in existing training data, error rates spike.

Present in Suzhounese / absent in Mandarin:

/b/ /d/ /g/ /v/ /z/

Plus: 4 competing romanization systems with no official standard — annotation inconsistency is structural.

Richer Vowels than Mandarin

Suzhounese includes rounded front mid vowels like /ø/ — sounds absent from Mandarin entirely. These are the same vowel sounds found in French 'feu' or German 'schön'.

Foundation of Two UNESCO Arts

Kunqu opera (UNESCO 2001) and Pingtan storytelling were built on Suzhou phonology. Wei Liangfu explicitly chose Suzhou accent as Kunqu's phonological standard in the Ming dynasty.

Kunqu 昆曲 Pingtan 评弹 UNESCO 2001

§ 05

You probably cannot solve this problem.

You can only make better or worse interventions.

In 1973, Rittel and Webber introduced the concept of the "wicked problem" — a class of challenges that resist rational-technical solutions because the problem itself keeps changing shape as you try to address it.^[2] Language preservation is one.

Property	In Suzhou Dialect Preservation
No definitive formulation	Archive? Revitalize? Heritage performance? Speech tech?
"Preservation" means archiving to a linguist, community vitality to a parent, clean training data to an engineer, and cultural identity to a 25-year-old in Shanghai. These are not variants of the same goal.
No stopping rule	How many speakers = "preserved"?
There is no agreed threshold. The PASSCo framework (2024) implicitly acknowledges this by treating preservation as perpetual institutional coordination rather than a target to reach.
Solutions change the problem	School programs create formal register
Teaching Suzhounese in schools produces speakers who "sound like news anchors, not neighbors." The formal register created by pedagogy didn't exist before — the intervention changed what was being preserved.
One-shot operation	Native speaker generation can't be reconstructed
With approximately 2.2% of Suzhou teenagers reporting fluency, and the figure still declining, the cohort of native speakers capable of natural first-language transmission is aging out. Each year of delay forecloses options that cannot be reopened.
No right or wrong, only better/worse	Standardization vs. regional variation
Standardizing the dialect for NLP pipelines marginalizes natural variation. Any canonical version is a constructed artifact. The model trained on "Suzhounese" may be encoding something no living speaker fully recognizes.
No writing system — a structural gap	You can say it. You cannot reliably write it.
Chinese characters encode meaning, not sound. Suzhounese has sounds that no standard character represents — speakers improvise, each differently. "煞煞好" (sàsa hǎo) appears in no official dictionary; its characters vary writer to writer. Without a stable orthography there can be no text corpus; without a text corpus there can be no language model; without a language model there can be no input method. The absence of a writing system is not just a cultural gap — it is a hard dependency that blocks every downstream digital tool. No other language at this scale faces this problem so completely.
No enumerable solution set	School curricula, media production, ASR systems, legislation, diaspora networks — the intervention space has no ceiling.
Unlike engineering problems with discrete solution spaces, Suzhou dialect preservation has an effectively infinite set of potential interventions: curriculum reform, community centers, TTS development, tourism, WeChat campaigns, documentary film, legal protection, artist residencies, diaspora outreach, and countless combinations. No principled method exists for choosing among them or knowing when enough has been tried.
Essential uniqueness	What worked for Cantonese, Hokkien, or Gaelic may fail here. The context is not transferable.
Suzhou dialect lacks Cantonese's institutional semi-autonomy in Hong Kong, Hokkien's diaspora networks and Taiwan curriculum mandate, Gaelic's EU-framework protections, and Ainu's ethnic-minority legal standing. Every apparently analogous success case differs on the very dimensions that determined its outcome. Imported solutions require such extensive modification that they may no longer be solutions at all.
Multiple interconnected causes	Government policy, economic incentives, urbanization, demographic shift, and parental choice all drive decline simultaneously.
Six causal streams operate in parallel: (1) Putonghua mandates remove institutional space for dialect; (2) labor markets reward Mandarin and English; (3) migrant demographic influx means dialect speakers are now a minority in Suzhou; (4) social attitudes equate dialect with lower status; (5) parents who speak the dialect actively choose Mandarin for their children; (6) digital platforms are structurally Mandarin-first. Each cause is sufficient to drive decline alone; they are mutually reinforcing.
Stakeholder disagreement	Linguists, community members, government officials, engineers, and educators hold mutually incompatible but internally coherent views.
Linguists prioritize phonological completeness; communities prioritize living use; government must balance dialect promotion against Putonghua policy; tech companies optimize for data and engagement; educators face curriculum constraints; artists want creative freedom. Each stakeholder's framing of the problem generates a different solution that conflicts with others'. There is no neutral vantage point from which to adjudicate.

§ 06

Why the usual solutions keep failing

Rittel & Webber^[2] identified ten properties of wicked problems. Four are especially visible here — including one that is often overlooked: despite deep conflict, stakeholders share more than they acknowledge.

Problem Complexity

No single cause, no single lever

The dialect's decline is driven by mutually reinforcing forces. Addressing one does not neutralize the others — each is itself a complex system.

Language Policy Economic Incentives Urbanization Digital Homogenization Migration

Resistance to Solution

Every intervention reshapes the problem

Well-intentioned strategies produce second-order effects that create new problems:

School programs → formal register that didn't previously exist ("sounds like a news anchor")
Digital archives → preserved recordings nobody in the community consults
Standardization → a constructed artifact no living speaker fully recognizes

Stakeholder Conflict

All want preservation; none agree on how

The disagreements are not superficial — they reflect genuinely incompatible values:

Linguists vs Tech companies: natural variation vs. clean pipeline data

Government vs Communities: Putonghua unity vs. cultural autonomy

Older speakers vs Youth: purism vs. natural Mandarin-influenced evolution

Insiders vs Migrants: dialect as identity vs. dialect as exclusion

Shared Interests

Five things every stakeholder actually agrees on

These are the only stable foundations for multi-stakeholder dialogue:

Nobody genuinely wants the dialect to disappear
The next generation is what matters most
Cultural identity is worth protecting
Existing recordings are irreplaceable
The window for action is narrowing

The Speaker Population, by Cohort

Each row represents one age cohort. Filled dots are estimated fluent speakers; hollow dots are the non-fluent remainder of the same cohort. The collapse between the 60+ generation and those under 20 is not gradual — it is a cliff.

Each dot ≈ 10,000 estimated fluent speakers

Each dot ≈ 10,000 estimated fluent speakers. Data sources are imprecise and contested — this reflects trend direction, not exact counts.

§ 06

Ethical Challenges & Knowledge Gaps

Preserving an endangered dialect is not a neutral act. Two ethical problems are embedded in the research itself — and two significant knowledge gaps remain unresolved.

Ethical Challenge 1

Documentation changes what it documents

Any systematic preservation effort requires standardization — choosing which speakers to record, which register to canonize, which characters to assign to unwritten sounds. But Suzhounese has no single authoritative form. A researcher who builds a corpus is not capturing the dialect; they are constructing one version of it. This poses a direct ethical problem: the act of preservation can marginalize the natural variation it claims to protect, producing an artifact that sounds more like a dialect exhibit than a living language.

Ethical Challenge 2

Whose consent governs contributed data?

Community-sourced platforms invite speakers to submit recordings, stories, and reflections. But the downstream uses of that data — training speech models, building phoneme databases, informing policy — are rarely made explicit at the point of contribution. Speakers from older generations may not fully understand what NLP pipelines do with their voices. This gap between what contributors expect and what institutions do with their data is an ethical problem that existing language preservation frameworks have not adequately resolved.

Knowledge Gap 1

No validated fluency threshold for "preservation"

There is currently no agreed metric for what counts as a successfully preserved dialect. How many active speakers constitute viability? What ratio of passive to active speakers is sufficient? What counts as "natural transmission" versus "instructed performance"? Without this, every preservation intervention lacks a meaningful success criterion — which is itself a defining property of wicked problems. Further research establishing measurable thresholds is essential before resources can be allocated responsibly.

Knowledge Gap 2

Intergenerational transmission mechanisms are poorly understood

The existing literature documents the rate of decline but offers limited explanation of which specific household and social conditions predict whether a child acquires a minority dialect. Is it frequency of grandparent contact? Parental attitudes toward dialect identity? Presence of same-age peers? Without this causal understanding, preservation programs cannot be effectively targeted. This is the most consequential gap for practitioners designing interventions.

A city of 12 million people.
A language spoken by almost no one under 30.

Why This Should Concern You

这件事，为什么与你有关

If you simply followed a link and aren't sure why you're here

如果你只是点进了一个链接，还不确定为什么

If you grew up hearing Suzhounese at home

如果你从小在家里听过苏州话

If your grandparents spoke a dialect you never quite learned

如果你的祖父母说着一门你始终没学会的方言

If you study endangered languages or language documentation

如果你研究濒危语言或语言记录

If you build speech or language AI tools

如果你从事语音识别或低资源NLP

If you work in intangible heritage or language policy

如果你从事非物质遗产或语言政策工作

苏州话，离你有多远？

三代。三十年。

苏州话，为什么特别？

为什么"教一教"就解决不了？

为什么这是一个"棘手难题"？

你现在能做什么？

Three generations.
Thirty years of change.

Three forces working against preservation

The recordings keep growing.
The speakers don't.

Why speech software struggles with Suzhounese

You probably cannot solve this problem.

Why the usual solutions keep failing

The Speaker Population, by Cohort

Ethical Challenges & Knowledge Gaps

七十年里，人们都尝试过什么？

Seventy years of attempts. What has been tried?

技术和政策，
都帮不了你做那件最重要的事。

The technical problem is real.
It is not the whole problem.

把你的声音留下来

Share a voice. Any voice counts.

你有什么要说的？

Thoughts, questions, stories.

References

Revision Statement

A city of 12 million people. A language spoken by almost no one under 30.

Why This Should Concern You

这件事，为什么与你有关

If you simply followed a link and aren't sure why you're here

如果你只是点进了一个链接，还不确定为什么

If you grew up hearing Suzhounese at home

如果你从小在家里听过苏州话

If your grandparents spoke a dialect you never quite learned

如果你的祖父母说着一门你始终没学会的方言

If you study endangered languages or language documentation

如果你研究濒危语言或语言记录

If you build speech or language AI tools

如果你从事语音识别或低资源NLP

If you work in intangible heritage or language policy

如果你从事非物质遗产或语言政策工作

苏州话，离你有多远？

三代。三十年。

苏州话，为什么特别？

为什么"教一教"就解决不了？

为什么这是一个"棘手难题"？

你现在能做什么？

Three generations.Thirty years of change.

Three forces working against preservation

The recordings keep growing. The speakers don't.

Why speech software struggles with Suzhounese

You probably cannot solve this problem.

Why the usual solutions keep failing

The Speaker Population, by Cohort

Ethical Challenges & Knowledge Gaps

七十年里，人们都尝试过什么？

Seventy years of attempts. What has been tried?

技术和政策，都帮不了你做那件最重要的事。

The technical problem is real.It is not the whole problem.

把你的声音留下来

Share a voice. Any voice counts.

你有什么要说的？

Thoughts, questions, stories.

References

Revision Statement

A city of 12 million people.
A language spoken by almost no one under 30.

Three generations.
Thirty years of change.

The recordings keep growing.
The speakers don't.

技术和政策，
都帮不了你做那件最重要的事。

The technical problem is real.
It is not the whole problem.