Zikang Shan(单子康)

Email: shanzikang [at] stu.pku.edu [dot] cn

I am a first-year Ph.D. student at Peking University, advised by Prof. Liwei Wang. Currently, I am also a research intern at Microsoft Research Asia. Before that, I received my Bachelor's degree from Peking University. During my undergraduate years, I was honored to be advised by Prof. Liwei Wang and Prof. He Wang.

Please feel free to contact me if you want to discuss or collaborate!

Google Scholar  |  Github

News

May 1, 2025 Paper accepted at ICML (Spotlight). See you in Vancouver!

Research

I am interested in reinforcement learning, believing its potential remains underexplored today as both a general learning paradigm and a local behavior optimizer. Recently, I particularly focus on applying RL to large language models, viewing reinforcement learning as a critical direction to scale up model capabilities with compute when high quality data grows increasingly scarce.

DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong*, Zikang Shan*, Guhao Feng*, Wei Xiong, Xinle Cheng, Li Zhao, Di He, Jiang Bian, Liwei Wang
ICML, 2025, Spotlight
Paper  |  Code
Based on theoretical insights, we propose an alignment algorithm that is sample efficient and effective.
UniDexGrasp++: Improving Universal Dexterous Grasping via Geometry-aware Curriculum Learning and Iterative Generalist-Specialist Learning
Weikang Wan*, Haoran Geng*, Yun Liu, Zikang Shan, Li Yi, Yaodong Yang, and He Wang
ICCV, 2023, Oral presentation with all top rankings, best paper finalist
Paper  |  Website  |  Code
We improve our previous method, making it object-agnostic and much more effective.
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
Yinzhen Xu*, Weikang Wan*, Jialiang Zhang*, Haoran Liu*, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi, and He Wang
CVPR, 2023
Paper  |  Website  |  Code
We propose a method to learn dexterous grasping policies able to handles diverse objects based on realistic observations.

This website is adapted from Jon Barron's website and deployed on Github Pages. Last updated on May 1, 2025.