text-correction

#text-correction

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

arXiv cs.CL ↗ · 2d ago Cached

CSRP proposes a three-stage framework combining continual pre-training, chain-of-thought supervised fine-tuning, and reinforcement learning with an efficiency-aware reward to address over-correction in Chinese grammatical error correction, achieving state-of-the-art results on the NACGEC benchmark.

0 favorites 0 likes

text-correction

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

Submit Feedback