Skip to content
View yk7333's full-sized avatar

Block or report yk7333

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
yk7333/README.md
  • ๐Ÿ‘‹ Hi, Iโ€™m Kai Yang(ๆจๆบ).
  • ๐Ÿ‘€ Research Focus: Large Language Models (LLM) & Reinforcement Learning (RL).
  • ๐ŸŽ“ M.S. in Artificial Intelligence, Tsinghua University.
  • ๐Ÿ’ผ Tencent Hunyuan X Team | Research Engineer, specializing in RL for LLMs.
  • ๐Ÿ“ซ Contact: yangkaisigsrl@gmail.com

Pinned Loading

  1. d3po d3po Public

    [CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

    Python 242 18

  2. RoboEden/Luxai-s2-Baseline RoboEden/Luxai-s2-Baseline Public

    Python 11 1

  3. Graduation-project-design Graduation-project-design Public

    ้‡‡็”จๆจกๆ‹Ÿ้€€็ซ็ญ–็•ฅไผ˜ๅŒ–็š„ๅ…็–ซ็ฎ—ๆณ• ่งฃๅ†ณๆ— ไบบๆœบๅๅŒๅˆ†้…้—ฎ้ข˜

    MATLAB 22

  4. DRND DRND Public

    [ICML 2024]Exploration and Anti-exploration with Distributional Random Network Distillation

    Python 15

  5. TaskAllocation TaskAllocation Public

    [EAAI] A two-stage reinforcement learning-based approach for multi-entity task allocation.

    Python 26 6