Yichen (Zach) Wang

Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models
Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, and Chao Shen
🆕 NIPS 2024
We conduct a pilot study on prompt optimization generalization and find two co-relation rules with LM's attention weight distributions. We then offer a new objective, concentration, representing the strength and stability of lookback attention to the prompt. Adapting it to popular soft and hard prompt optimization methods shows good improvement.
Citation
```
@article{li2024concentrate,
    title={Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models},
    author={Li, Chengzhengxu and Liu, Xiaoming and Zhang, Zhaohan and Wang, Yichen and Liu, Chen and Lan, Yu and Shen, Chao},
    journal={arXiv preprint arXiv:2406.10584},
    year={2024}
}
```
Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks
Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia Tsvetkov, and Tianxing He
ACL 2024 🌟 best paper nomination 🌟 meta score = 5/5
We comprehensively study the robustness of popular machine-generated text detectors under attacks from diverse categories: editing, paraphrasing, prompting, and co-generating. Our experiments reveal that all detectors exhibit different loopholes. Further, we investigate the reasons behind these defects and propose initial out-of-the-box patches.
Citation // Code // Dataset // Poster
```
@article{wang2024stumbling,
    title={Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks},
    author={Wang, Yichen and Feng, Shangbin and Hou, Abe Bohan and Pu, Xiao and Shen, Chao and Liu, Xiaoming and Tsvetkov, Yulia and He, Tianxing},
    journal={arXiv preprint arXiv:2402.11638},
    year={2024}
}
```
k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He
ACL 2024 Findings
We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure.
Citation
```
@article{hou2024k,
    title={k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text},
    author={Hou, Abe Bohan and Zhang, Jingyu and Wang, Yichen and Khashabi, Daniel and He, Tianxing},
    journal={arXiv preprint arXiv:2402.11399},
    year={2024}
}
```
Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better
Shengchao Liu, Xiaoming Liu, Yichen Wang, Zehua Cheng, Chengzhengxu Li, Zhaohan Zhang, Yu Lan, and Chao Shen
ACL 2024
We propose a novel fine-tuned machine-generated text detector, Pecola, bridging metric-based and fine-tuned methods by contrastive learning on selective perturbation further than DetectGPT.
Citation
```
@article{liu2024does,
    title={Does$\backslash$textsc $\{$DetectGPT$\}$ Fully Utilize Perturbation? Selective Perturbation on Model-Based Contrastive Learning Detector would be Better},
    author={Liu, Shengchao and Liu, Xiaoming and Wang, Yichen and Cheng, Zehua and Li, Chengzhengxu and Zhang, Zhaohan and Lan, Yu and Shen, Chao},
    journal={arXiv preprint arXiv:2402.00263},
    year={2024}
}
```
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation
Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov
NAACL 2024
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences.
Citation
```
@article{hou2023semstamp,
    title={Semstamp: A semantic watermark with paraphrastic robustness for text generation},
    author={Hou, Abe Bohan and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia},
    journal={arXiv preprint arXiv:2310.03991},
    year={2023}
}
```
Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning
Chengzhengxu Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, and Chao Shen
AAAI 2024
We propose a dialogue-comprised policy-gradient-based discrete prompt optimization (DP2O) method with dialogue prompt alignment and reinforcement learning to efficiently and effectively generate prompt demonstrations.
Citation
```
@inproceedings{li2024dialogue,
    title={Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning},
    author={Li, Chengzhengxu and Liu, Xiaoming and Wang, Yichen and Li, Duyi and Lan, Yu and Shen, Chao},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={38},
    number={16},
    pages={18481--18489},
    year={2024}
}
```
Improving Pacing in Long-Form Story Planning
Yichen Wang, Kevin Yang, Xiaoming Liu, and Dan Klein
EMNLP 2023 Findings
Existing LLM-based systems for writing long-form stories or story outlines frequently suffer from unnatural pacing, resulting in a jarring experience for the reader. We propose a Concrete Outline Control (CONCOCT) system to improve pacing when automatically generating story outlines. Compared to a baseline hierarchical outline generator, humans judge CONCOCT’s pacing to be more consistent over 57% of the time across multiple outline lengths, and the gains also translate to downstream stories.
Citation // Code // Dataset // Poster
```
@inproceedings{wang2023improving,
    title={Improving Pacing in Long-Form Story Planning},
    author={Wang, Yichen and Yang, Kevin and Liu, Xiaoming and Klein, Dan},
    booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
    pages={10788--10845},
    year={2023}
}
```
CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning
Xiaoming Liu⁼, Zhaohan Zhang⁼, Yichen Wang⁼, Hang Pu, Yu Lan, and Chao Shen
EMNLP 2023
We present a coherence-based contrastive learning model named CoCo to detect the possible machine-generated texts (MGTs) under the low-resource scenario. We encode coherence information in the form of graph into the text representation and employ an improved contrastive learning framework. Our approach outperforms the state-of-the-art methods at least 1.23%. Also, we surprisingly find that MGTs originated from up-to-date language models could be easier to detect than these from previous models, in our experiments, and we propose some preliminary explanations.
Citation // Code // Dataset // Poster
```
@inproceedings{liu2023coco,
    title={Coco: Coherence-enhanced machine-generated text detection under low resource with contrastive learning},
    author={Liu, Xiaoming and Zhang, Zhaohan and Wang, Yichen and Pu, Hang and Lan, Yu and Shen, Chao},
    booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
    pages={16167--16188},
    year={2023}
}
```

Yichen (Zach) Wang 王奕辰

yichenzw at uchicago dot edu

news

publications

competition