Zikang Shan
Email: shanzikang [at] stu.pku.edu [dot] cn
I am a first-year Ph.D. student at Peking University, advised by Prof. Liwei Wang.
Currently, I am also a research intern at Microsoft Research Asia.
Before that, I received my Bachelor's degree from Peking University.
During my undergraduate years, I was honored to be advised by Prof. Liwei Wang and Prof. He Wang.
Please feel free to contact me if you want to discuss or collaborate!
Google Scholar |
Github
|
Research
I am interested in reinforcement learning, believing its potential remains largely underexplored today as both a general learning paradigm and a behavior optimizer.
I particularly focus on applying reinforcement learning to large language model post-training, viewing reinforcement learning as a critical direction to scale up models with compute when high quality data becomes increasingly scarce.
|
|
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong*, Zikang Shan*, Guhao Feng*, Wei Xiong, Xinle Cheng, Li Zhao, Di He, Jiang Bian, Liwei Wang
Under review
Paper |
Code
Based on theoretical insights, we propose an alignment algorithm that is sample efficient and effective.
|
|
UniDexGrasp++: Improving Universal Dexterous Grasping via Geometry-aware Curriculum Learning and Iterative Generalist-Specialist Learning
Weikang Wan*, Haoran Geng*, Yun Liu, Zikang Shan, Li Yi, Yaodong Yang, and He Wang
ICCV, 2023, Oral presentation with all top rankings, best paper finalist
Paper |
Website |
Code
We improve our previous method, making it object-agnostic and much more effective.
|
|
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
Yinzhen Xu*, Weikang Wan*, Jialiang Zhang*, Haoran Liu*, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi, and He Wang
CVPR, 2023
Paper |
Website |
Code
We propose a method to learn dexterous grasping policies able to handles diverse objects based on realistic observations.
|
|