5.24 (周六)

Opening 9:00 – 9:15
Keynotes 9:15 – 10:35
轻量与巨构:内存存储优化的再思考
9:15 – 9:55
内存、存储优化是计算机系统中的核心问题之一。本次报告将介绍我们在轻量嵌入式场景与巨构大数据应用上优化内存、存储系统方面的最新工作。
邵子立
香港中文大学
香港中文大学计算机科学与工程学系教授。其研究方向包括嵌入式软件与系统、存储系统及相关工业应用。
编程语言研究探秘
9:55 – 10:35
编程语言研究一直给人理论严谨、高不可攀的印象。实际上,许多计算机系统领域的重要研究工作的基石都来自编程语言研究领域,本次报告将分享编程语言领域的研究者到底在做什么,以及对计算机系统方面的研究有怎样的启发。
熊英飞
北京大学
北京大学新体制长聘副教授、研究兴趣是程序设计语言和软件工程,特别是程序合成、修复、分析和验证。
Break 10:35 – 10:45
Keynote 10:45 – 11:25
面向多核系统的高性能高可靠异步通信
10:45 – 11:25
异步通信在操作系统、数据库、网络和语言运行时等多线程应用中发挥着关键作用,可实现数据传输、任务分发和组件解耦。本报告将介绍后摩尔时代下异构多核硬件高性能可靠异步通信组件的设计及其相关工业实践。
王佳玮
华为技术有限公司
王佳玮,博士毕业于德国德累斯顿工业大学,博士期间研究聚焦于系统可扩展性、并行处理和并发数据结构等领域,结合形式化验证技术开展创新性研究并推动在实际工业场景中应用,目前专注于为HarmonyOS研发高并发、低延迟、非阻塞系统组件。
Infrastructure (1) - Chair: 蒋炎岩 (南京大学) 11:25 - 12:30
星绽:兼顾安全和性能的开源通用OS内核
田洪亮
蚂蚁集团
Ananke: 快速、透明的文件系统微内核恢复技术
刘璟
微软亚洲研究院
拓扑感知的NPU虚拟化方案
冯二虎
上海交通大学
面向高性能的安全OLAP系统:缓存驱动的并行同态比较技术
胡起
香港大学
SimAI: 面向 AI 大规模集群的高精度仿真器
李庆旭
阿里云
Mini Panel
Lunch 12:30 – 13:30
Industry - Chair: 李明煜 (中国科学院软件研究所) 13:30 - 14:50
支持系统虚拟化与AI软件栈的高性能RISC-V硬件平台
夏鸣远
超睿科技(上海)有限公司
跨平台开发框架在鸿蒙生态的实践与未来规划
谢国
华为技术有限公司
毕方AI Native IDE在鸿蒙生态领域的技术实践
邓成瑞
华为技术有限公司
异构融合OS关键技术与应用实践
林飞龙
华为技术有限公司
通过架构创新为人工智能应用打造下一代数据库
张霖涛
EloqData
Converos: 面向Rust操作系统内核并发模块的实用型模型检验方法
王明华
蚂蚁技术研究院
Mini Panel
Machine Learning and Acceleration (1) - Chair: 张明喆 (中国科学院信息工程研究所) 14:50 – 16:10
一次编写,处处运行:基于神经符号方法的深度学习系统张量程序转编译
董守杨
中国科学技术大学
BlitzScale:通过极速模型在线扩容实现高效弹性大模型推理
张鼎言
上海交通大学
基于混合动静态方法的稀疏加速器的灵活分块
李欣桐
清华大学
KTransformers:体验前沿大模型推理优化技术的灵活框架
张博鑫
清华大学
SoMa:深度神经网络加速器 DRAM 通信调度空间的识别、探索与理解
蔡经纬
清华大学
Mini Panel
Break & Poster 16:10 – 16:40
Storage and Data Management - Chair: 邹翔宇 (哈尔滨工业大学(深圳)) 16:40 – 18:00
GC 不只是垃圾回收:重复数据删除备份存储的捎带式碎片整理
邹翔宇
哈尔滨工业大学(深圳)
一种基于元数据融合的高性能去重文件系统
潘延麒
哈尔滨工业大学(深圳)
面向分布式内存存储的去条带化纠删码方案
高健
清华大学
面向内存受限的嵌入式场景的NOR Flash文件系统
黄浩
哈尔滨工业大学(深圳)
低延迟大规模的向量检索系统
郭昊
清华大学
基于CPU/GPU协同的高吞吐、低延迟大规模向量检索系统
田冰
华中科技大学
Mini Panel
Poster 18:00 – 18:30
Banquet 18:30

5.25 (周日)

Keynotes 9:00 – 10:20
大模型多维并行训练系统
9:00 – 9:40
大规模深度神经网络因模型结构复杂和动态性强,训练过程常常超出现代GPU的算力和内存极限,需通过多维并行方式进行加速。报告将介绍新一代多维并行训练系统,突破了现有GPipe、Pipedream、Megatron等系统在负载均衡、GPU利用率和动态模型训练等方面的限制。Fold3D已成为MindSpore平台上的主流千卡级并行训练系统。
崔鹤鸣
香港大学
崔鹤鸣博士本科、硕士毕业于清华大学,博士毕业于哥伦比亚大学,致力于并行与分布式系统、可信执行环境和AI训练基础设施的研究,获得ICSE 2025和ACSAC 2017最佳论文奖、主持多项国家及地区科研项目。主导的Fold3D、NASPipe、vPipe等大模型训练系统广泛应用于PyTorch及MindSpore平台。
大模型分布式推理系统的工程实践
9:40 – 10:20
报告人为某公司核心系统研发工程师。
Break & Poster 10:20 – 10:50
Infrastructure (2) - Chair: 徐尔茨 (上海交通大学) 10:50 – 12:10
河图v2: 基于层级化异构数据标注的高效分布式深度学习系统
符芳诚
北京大学
Mooncake: 以键值缓存为中心的以存换算大语言模型推理架构
秦若愚
清华大学
IO设备虚拟化下的内存免Pin研究
汪沄
上海交通大学
Beehive: 利用多线程程序异步性的分离式内存运行时
李权熹
中国科学院计算技术研究所
操作系统渲染服务并行化,采用乱序执行顺序提交策略
吴元培
上海交通大学
Mini Panel
Lunch 12:10 – 13:30
Cloud and Datacenter - Chair: 刘海坤 (华中科技大学) 13:30 – 14:50
AC-Cache: 基于访问关联的小对象高效内存缓存系统
南福麟
厦门大学
一种具有灵活无阻塞区域面向大型人工智能和高性能计算系统的网络拓扑
王梓宇
国防科技大学
MemSeer: 探索超大规模异构X86/ARM集群的内存故障区别并进行多粒度预测
谷云飞
上海交通大学
利用 NVMe-oF 旁路卸载提升 JBOF 存储吞吐
孙迅
清华大学
基于高精度时钟优化的跨域分布式事务系统
宋昊泽
香港大学
Mini Panel
Break 14:50 – 15:00
Machine Learning and Acceleration (2) 15:00 – 16:00
Mosaic: 借助指令镶嵌挖掘深度学习加速器中的指令级并行性
许健行
中国科学技术大学
FlashGEMM: 在 CPU 上利用数据重用优化矩阵乘法序列
章俊文
国防科技大学
基于低比特剪枝的嵌入相似性 Top-K SpMV 加速方案 AccelES
翟嘉琪
华中科技大学
AutoCCL: 用于加速分布式DNN训练的集合通信自动调优器
许冠斌
中国科学技术大学
Mini Panel
Closing 16:00 – 16:10

Posters

ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores
Penghao Song (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences), Chongxi Wang (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences), Chenji Han (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences), Haoyu Zhao (Loongson Technology Co. Ltd), Tingting Zhang (Loongson Technology Co. Ltd.; Institute of Computing Technology, CAS), Tianyi Liu (The University of Texas at San Antonio), Jian Wang (State Key Lab of Processors, Insititute of Computing Technology, CAS; University of Chinese Academy of Sciences)
ALLMod: Exploring Area-Efficiency of LUT-based Large Number Modular Reduction via Hybrid Workloads
Fangxin Liu (Shanghai Jiao Tong University), Haomin Li (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University), Haibing Guan (Shanghai Jiao Tong University)
BLOOM: Bit-Slice Framework for DNN Acceleration with Mixed-Precision
Fangxin Liu (Shanghai Jiao Tong University), Ning Yang (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University), Haibing Guan (Shanghai Jiao Tong University)
Comprehensive Deadlock Prevention for GPU Collective Communication
Lichen Pan (Peking University), Juncheng Liu (OneFlow Inc.), Yongquan Fu (Science and Technology Laboratory of Parallel and Distributed Processing; College of Computer, National University of Defense Technology), Jinhui Yuan (OneFlow Inc.), Rongkai Zhang (Peking University), Pengze Li (Peking University), Zhen Xiao (Peking University)
CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels
Fangxin Liu (Shanghai Jiao Tong University), Shiyuan Huang (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University)
CXL-ECC: an Efficient LRC-based on-CXL-Memory-eXpander-Controller ECC to Enhance Reliability and Performance of DRAM Error Correction
Yixuan Liu (Shanghai Jiao Tong University), Yunfei Gu (Shanghai Jiao Tong University), Junhao Dai (Shanghai Jiao Tong University), Xinyuan Wu (Shanghai Jiao Tong University), Chentao Wu (Shanghai Jiao Tong University), Xinfei Guo (Shanghai Jiao Tong University), Jieru Zhao (Shanghai Jiao Tong University), Jie Li (Shanghai Jiao Tong University), Minyi Guo (Shanghai Jiao Tong University)
Embracing Imbalance: Dynamic Load Shifting among Microservice Containers in Shared Clusters
Shutian Luo (University of Virginia), Jianxiong Liao (Sun Yat-sen University), Chenyu Lin (University of Macau, Macau SAR, China), Huanle Xu (University of Macau), Zhi Zhou (Sun Yat-sen University), ChengZhong Xu (University of Macau)
Espresso: Exploiting the Sparsity Property in Event Sensors with Spatiotemporal Ordering
Leshan Li (Tsinghua University), Hongyi Li (Tsinghua University), Qingyuan Yang (Tsinghua University), Mingtao Ou (Tsinghua University), Rong Zhao (Tsinghua University), Xinglong Ji (Tsinghua University)
FAST: An FHE Accelerator for Scalable-parallelism with Tunable-bit
Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences), Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences), Liang Kong (Ant Group), Guiming Shi (Tsinghua University), Guang Fan (Ant Group), Dan Meng (Institute of Information Engineering, CAS), Rui Hou (Institute of Information Engineering, CAS), Mingzhe Zhang (Ant Group)
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
Yujie Wang (Peking University), Shiju Wang (Beihang University), Shenhan Zhu (Peking University), Fangcheng Fu (Peking University), Xinyi Liu (Peking University), Xuefeng Xiao (Bytedance), Li Huixia (ByteDance), Jiashi Li (ByteDance), Faming Wu (ByteDance), Bin Cui (Peking University)
GenCNN: A Partition-Aware Multi-Objective Mapping Framework for CNN Accelerators Based on Genetic Algorithm
Yudong Mu (Insitute of Computing Technology, CAS), Zhihua Fan (Institute of Computing Technology, CAS), Wenming Li (Insitute of Computing Technology, CAS), Zhiyuan Zhang (Insitute of Computing Technology, CAS), Xuejun An (Insitute of Computing Technology, CAS), Dongrui Fan (Insitute of Computing Technology, CAS), Xiaochun Ye (Insitute of Computing Technology, CAS)
Grad: Intelligent Microservice Scaling by Harnessing Resource Fungibility
Liao Chen (University of Macau, Macau SAR, China), Chenyu Lin (University of Macau, Macau SAR, China), Shutian Luo (Yale University), Huanle Xu (University of Macau), ChengZhong Xu (University of Macau)
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong (Peking University), Yanfan Sun (Beihang University), Ling Liang (Peking University), Runsheng Wang (Peking University), Meng Li (Peking University)
Hydra: Scale-out FHE Accelerator Architecture for Secure Deep Learning on FPGA
Yinghao Yang (Institute of Computing Technology, Chinese Academy of Sciences), Xicheng Xu (Institute of Computing Technology, Chinese Academy of Sciences), Hang Lu (Institute of Computing Technology, Chinese Academy of Sciences), Xiaowei Li (Institute of Computing Technology, CAS, China)
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu (Shanghai Jiao Tong University), Liyan Chen (Shanghai Jiao Tong University), Yanning Yang (Shanghai Jiao Tong University), Haocheng Wang (Shanghai Jiao Tong University), Dong Du (Shanghai Jiao Tong University), Zhigang Mao (Shanghai JiaoTong university), Naifeng Jing (Shanghai Jiao Tong University), Yubin Xia (Shanghai Jiao Tong University), Haibo Chen (Shanghai JiaoTong University)
LitTLS: Lightweight Thread-Level Speculation on Little Cores
Xin Cheng (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China), Jinpeng Ye (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China), Haoyu Deng (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China), Tingting Zhang (Loongson Technology Co. Ltd., Beijing, China), Tianyi Liu (Computer Science, The University of Texas at San Antonio, San Antonio, United States), Jian Wang (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China)
MeHyper: Accelerating Hypergraph Neural Networks by Exploring Implicit Dataflows
Wenju Zhao (Huazhong University of Science and Technology), Pengcheng Yao (Huazhong University of Science and Technology), Dan Chen (National University of Singapore), Long Zheng (Huazhong University of Science and Technology), Xiaofei Liao (Huazhong University of Science and Technology), Qinggang Wang (Huazhong University of Science and Technology), Shaobo Ma (Huazhong University of Science and Technolog), Yu Li (Huazhong University of Science and Technology), Haifeng Liu (Huazhong University of Science and Technology), Wenjing Xiao (Guangxi University), Yufei Sun (Huazhong University of Science and Technology), Bing Zhu (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology), Jingling Xue (University of New South Wales)
Microns: Connection Subsetting for Microservices in Shared Clusters
Jianxiong Liao (Sun Yat-sen University), Juntao Li (Sun Yat-sen University), Zhi Zhou (Sun Yat-sen University), Fei Xu (East China Normal University), Fangming Liu (Peng Cheng Laboratory, and Huazhong University of Science and Technology), Xu Chen (Sun Yat-sen University)
MILLION: MasterIng Long-Context LLM Inference Via Outlier-Immunized KV Product QuaNtization
Fangxin Liu (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University)
Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters
Wenyan Chen (University of Macau; Shenzhen Institute of Advanced Technology, CAS), Chengzhi Lu (University of Macau), Huanle Xu (University of Macau), Kejiang Ye (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), ChengZhong Xu (University of Macau)
Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core
Dian Jiao (Institute of Information Engineering, Chinese Academy of Sciences), Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences), Zhiwei Wang (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences), Yi Chen (Huazhong University of Science and Technology), Rui Hou (Institute of Information Engineering, CAS), Dan Meng (Institute of Information Engineering, CAS), Mingzhe Zhang (Ant Research)
OMeGa: Boosting Large-scale Graph Embeddings with Heterogeneous Memory Processing
Peng Fang (Huazhong University of Science and Technology), Siqiang Luo (Nanyang Technological University), Wang FangWang (Huazhong University of Science and Technology), Bolong Zheng (Huazhong University of Science and Technology), Hong Jiang (UT Arlington), Dan Feng (Huazhong University of Science and Technology), Hechang Pan (Huazhong University of Science and Technology), Xingyu Wan (Huazhong University of Science and Technology)
Overcoming the Last Mile between Log-Structured File Systems and Persistent Memory via Scatter Logging
Yifeng Zhang (Harbin Institute of Technology, Shenzhen), Yanqi Pan (Harbin Institute of Technology, Shenzhen), Hao Huang (Harbin Institute of Technology, Shenzhen), Yuchen Shan (Harbin Institute of Technology, Shenzhen), Wen Xia (Harbin Institute of Technology, Shenzhen)
PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type
Ning Yang (Shanghai Jiao Tong University), Fangxin Liu (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University)
PTPS: Precision-Aware Task Partitioning and Scheduling for SpMV on CPU-FPGA Heterogeneous Platforms
Jianhua Gao (Beijing Normal University), Zhi Zhou (Beijing Institute of Technology), Xingze Huang (Beijing Institute of Technology), Juan Wang (Beijing Institute of Technology), Yizhuo Wang (Beijing Institute of Technology), Weixing Ji (Beijing Normal University)
Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing
Chenning Tao (Zhejiang University), Liqiang Lu (Zhejiang University)
RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems
Yifan Hua (Shanghai Jiao Tong University), Shengan Zheng (Shanghai Jiao Tong University), Weihan Kong (Shanghai Jiao Tong University), Cong Zhou (Shanghai Jiao Tong University), Kaixin Huang (Shanghai Jiao Tong University), Ruoyan Ma (Shanghai Jiao Tong University), Linpeng Huang (Shanghai Jiao Tong University)
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling
Yujie Wang (Peking University), Shenhan Zhu (Peking University), Fangcheng Fu (Peking University), Xupeng Miao (Purdue University), Jie Zhang (Alibaba Group), Juan Zhu (Alibaba Group), Fan Hong (Alibaba Group), Yong Li (Alibaba Group), Bin Cui (Peking University)
Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing
Jinming Ma (Shanghai Artificial Intelligence Laboratory), Jiefei Chen (Shanghai Artificial Intelligence Laboratory & Fudan University), Xiuhong Li (Peking University), Jiangfei Duan (CUHK), Haojie Duanmu (Shanghai Artificial Intelligence Laboratory & ShanghaiJiaoTong University), Xingcheng Zhang (Shanghai Artificial Intelligence Laboratory), Chao Yang (Peking University), Dahua Lin (The Chinese University of Hong Kong & Sensetime Research)
TSAJS: Efficient Multi-Server Joint Task Scheduling Scheme for Mobile Edge Computing
Chaoqun Li (Shandong University), Si Wu (Shandong University), Qike Cao (Henan Institute of science and technology), Jinyao Liu (Shandong University), Xiuzhen Cheng (Shandong University), Pengfei Hu (Shandong University)
TSN Cache: Exploiting Data Localities in Graph Computing Applications
Chaoyang Jia (National University of Defense Technology), Jingyu Liu (National University of Defense Technology), Shi Chen (National University of Defense Technology), Kai Lu (National University of Defense Technology), Li Shen (National University of Defense Technology)
Under the Dome: Automated Generation of eBPF Programs for Monitoring Vulnerability with AEGIS
Tianze Zhang (National University of Defense Technology), Teng Wang (National Innovation Institute of Defense Technology), Zhouyang Jia (National University of Defense Technology), Yuanliang Zhang (National University of Defense Technology), Shengbin Xu (National University of Defense Technology), Hongyan Wu (National University of Defense Technology), Xiaofan Sun (National University of Defense Technology), Lei Wang (National Innovation Institute of Defense Technology), Shanshan Li (National University of Defense Technology)