ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores
Penghao Song (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences), Chongxi Wang (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences), Chenji Han (State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences), Haoyu Zhao (Loongson Technology Co. Ltd), Tingting Zhang (Loongson Technology Co. Ltd.; Institute of Computing Technology, CAS), Tianyi Liu (The University of Texas at San Antonio), Jian Wang (State Key Lab of Processors, Insititute of Computing Technology, CAS; University of Chinese Academy of Sciences)
ALLMod: Exploring Area-Efficiency of LUT-based Large Number Modular Reduction via Hybrid Workloads
Fangxin Liu (Shanghai Jiao Tong University), Haomin Li (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University), Haibing Guan (Shanghai Jiao Tong University)
BLOOM: Bit-Slice Framework for DNN Acceleration with Mixed-Precision
Fangxin Liu (Shanghai Jiao Tong University), Ning Yang (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University), Haibing Guan (Shanghai Jiao Tong University)
Comprehensive Deadlock Prevention for GPU Collective Communication
Lichen Pan (Peking University), Juncheng Liu (OneFlow Inc.), Yongquan Fu (Science and Technology Laboratory of Parallel and Distributed Processing; College of Computer, National University of Defense Technology), Jinhui Yuan (OneFlow Inc.), Rongkai Zhang (Peking University), Pengze Li (Peking University), Zhen Xiao (Peking University)
CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels
Fangxin Liu (Shanghai Jiao Tong University), Shiyuan Huang (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University)
CXL-ECC: an Efficient LRC-based on-CXL-Memory-eXpander-Controller ECC to Enhance Reliability and Performance of DRAM Error Correction
Yixuan Liu (Shanghai Jiao Tong University), Yunfei Gu (Shanghai Jiao Tong University), Junhao Dai (Shanghai Jiao Tong University), Xinyuan Wu (Shanghai Jiao Tong University), Chentao Wu (Shanghai Jiao Tong University), Xinfei Guo (Shanghai Jiao Tong University), Jieru Zhao (Shanghai Jiao Tong University), Jie Li (Shanghai Jiao Tong University), Minyi Guo (Shanghai Jiao Tong University)
Embracing Imbalance: Dynamic Load Shifting among Microservice Containers in Shared Clusters
Shutian Luo (University of Virginia), Jianxiong Liao (Sun Yat-sen University), Chenyu Lin (University of Macau, Macau SAR, China), Huanle Xu (University of Macau), Zhi Zhou (Sun Yat-sen University), ChengZhong Xu (University of Macau)
Espresso: Exploiting the Sparsity Property in Event Sensors with Spatiotemporal Ordering
Leshan Li (Tsinghua University), Hongyi Li (Tsinghua University), Qingyuan Yang (Tsinghua University), Mingtao Ou (Tsinghua University), Rong Zhao (Tsinghua University), Xinglong Ji (Tsinghua University)
FAST: An FHE Accelerator for Scalable-parallelism with Tunable-bit
Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences), Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences), Liang Kong (Ant Group), Guiming Shi (Tsinghua University), Guang Fan (Ant Group), Dan Meng (Institute of Information Engineering, CAS), Rui Hou (Institute of Information Engineering, CAS), Mingzhe Zhang (Ant Group)
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
Yujie Wang (Peking University), Shiju Wang (Beihang University), Shenhan Zhu (Peking University), Fangcheng Fu (Peking University), Xinyi Liu (Peking University), Xuefeng Xiao (Bytedance), Li Huixia (ByteDance), Jiashi Li (ByteDance), Faming Wu (ByteDance), Bin Cui (Peking University)
GenCNN: A Partition-Aware Multi-Objective Mapping Framework for CNN Accelerators Based on Genetic Algorithm
Yudong Mu (Insitute of Computing Technology, CAS), Zhihua Fan (Institute of Computing Technology, CAS), Wenming Li (Insitute of Computing Technology, CAS), Zhiyuan Zhang (Insitute of Computing Technology, CAS), Xuejun An (Insitute of Computing Technology, CAS), Dongrui Fan (Insitute of Computing Technology, CAS), Xiaochun Ye (Insitute of Computing Technology, CAS)
Grad: Intelligent Microservice Scaling by Harnessing Resource Fungibility
Liao Chen (University of Macau, Macau SAR, China), Chenyu Lin (University of Macau, Macau SAR, China), Shutian Luo (Yale University), Huanle Xu (University of Macau), ChengZhong Xu (University of Macau)
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong (Peking University), Yanfan Sun (Beihang University), Ling Liang (Peking University), Runsheng Wang (Peking University), Meng Li (Peking University)
Hydra: Scale-out FHE Accelerator Architecture for Secure Deep Learning on FPGA
Yinghao Yang (Institute of Computing Technology, Chinese Academy of Sciences), Xicheng Xu (Institute of Computing Technology, Chinese Academy of Sciences), Hang Lu (Institute of Computing Technology, Chinese Academy of Sciences), Xiaowei Li (Institute of Computing Technology, CAS, China)
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu (Shanghai Jiao Tong University), Liyan Chen (Shanghai Jiao Tong University), Yanning Yang (Shanghai Jiao Tong University), Haocheng Wang (Shanghai Jiao Tong University), Dong Du (Shanghai Jiao Tong University), Zhigang Mao (Shanghai JiaoTong university), Naifeng Jing (Shanghai Jiao Tong University), Yubin Xia (Shanghai Jiao Tong University), Haibo Chen (Shanghai JiaoTong University)
LitTLS: Lightweight Thread-Level Speculation on Little Cores
Xin Cheng (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China), Jinpeng Ye (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China), Haoyu Deng (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China), Tingting Zhang (Loongson Technology Co. Ltd., Beijing, China), Tianyi Liu (Computer Science, The University of Texas at San Antonio, San Antonio, United States), Jian Wang (State Key Lab of Processors, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China)
MeHyper: Accelerating Hypergraph Neural Networks by Exploring Implicit Dataflows
Wenju Zhao (Huazhong University of Science and Technology), Pengcheng Yao (Huazhong University of Science and Technology), Dan Chen (National University of Singapore), Long Zheng (Huazhong University of Science and Technology), Xiaofei Liao (Huazhong University of Science and Technology), Qinggang Wang (Huazhong University of Science and Technology), Shaobo Ma (Huazhong University of Science and Technolog), Yu Li (Huazhong University of Science and Technology), Haifeng Liu (Huazhong University of Science and Technology), Wenjing Xiao (Guangxi University), Yufei Sun (Huazhong University of Science and Technology), Bing Zhu (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology), Jingling Xue (University of New South Wales)
Microns: Connection Subsetting for Microservices in Shared Clusters
Jianxiong Liao (Sun Yat-sen University), Juntao Li (Sun Yat-sen University), Zhi Zhou (Sun Yat-sen University), Fei Xu (East China Normal University), Fangming Liu (Peng Cheng Laboratory, and Huazhong University of Science and Technology), Xu Chen (Sun Yat-sen University)
MILLION: MasterIng Long-Context LLM Inference Via Outlier-Immunized KV Product QuaNtization
Fangxin Liu (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University)
Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters
Wenyan Chen (University of Macau; Shenzhen Institute of Advanced Technology, CAS), Chengzhi Lu (University of Macau), Huanle Xu (University of Macau), Kejiang Ye (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), ChengZhong Xu (University of Macau)
Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core
Dian Jiao (Institute of Information Engineering, Chinese Academy of Sciences), Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences), Zhiwei Wang (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences), Yi Chen (Huazhong University of Science and Technology), Rui Hou (Institute of Information Engineering, CAS), Dan Meng (Institute of Information Engineering, CAS), Mingzhe Zhang (Ant Research)
OMeGa: Boosting Large-scale Graph Embeddings with Heterogeneous Memory Processing
Peng Fang (Huazhong University of Science and Technology), Siqiang Luo (Nanyang Technological University), Wang FangWang (Huazhong University of Science and Technology), Bolong Zheng (Huazhong University of Science and Technology), Hong Jiang (UT Arlington), Dan Feng (Huazhong University of Science and Technology), Hechang Pan (Huazhong University of Science and Technology), Xingyu Wan (Huazhong University of Science and Technology)
Overcoming the Last Mile between Log-Structured File Systems and Persistent Memory via Scatter Logging
Yifeng Zhang (Harbin Institute of Technology, Shenzhen), Yanqi Pan (Harbin Institute of Technology, Shenzhen), Hao Huang (Harbin Institute of Technology, Shenzhen), Yuchen Shan (Harbin Institute of Technology, Shenzhen), Wen Xia (Harbin Institute of Technology, Shenzhen)
PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type
Ning Yang (Shanghai Jiao Tong University), Fangxin Liu (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University)
PTPS: Precision-Aware Task Partitioning and Scheduling for SpMV on CPU-FPGA Heterogeneous Platforms
Jianhua Gao (Beijing Normal University), Zhi Zhou (Beijing Institute of Technology), Xingze Huang (Beijing Institute of Technology), Juan Wang (Beijing Institute of Technology), Yizhuo Wang (Beijing Institute of Technology), Weixing Ji (Beijing Normal University)
Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing
Chenning Tao (Zhejiang University), Liqiang Lu (Zhejiang University)
RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems
Yifan Hua (Shanghai Jiao Tong University), Shengan Zheng (Shanghai Jiao Tong University), Weihan Kong (Shanghai Jiao Tong University), Cong Zhou (Shanghai Jiao Tong University), Kaixin Huang (Shanghai Jiao Tong University), Ruoyan Ma (Shanghai Jiao Tong University), Linpeng Huang (Shanghai Jiao Tong University)
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling
Yujie Wang (Peking University), Shenhan Zhu (Peking University), Fangcheng Fu (Peking University), Xupeng Miao (Purdue University), Jie Zhang (Alibaba Group), Juan Zhu (Alibaba Group), Fan Hong (Alibaba Group), Yong Li (Alibaba Group), Bin Cui (Peking University)
Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing
Jinming Ma (Shanghai Artificial Intelligence Laboratory), Jiefei Chen (Shanghai Artificial Intelligence Laboratory & Fudan University), Xiuhong Li (Peking University), Jiangfei Duan (CUHK), Haojie Duanmu (Shanghai Artificial Intelligence Laboratory & ShanghaiJiaoTong University), Xingcheng Zhang (Shanghai Artificial Intelligence Laboratory), Chao Yang (Peking University), Dahua Lin (The Chinese University of Hong Kong & Sensetime Research)
TSAJS: Efficient Multi-Server Joint Task Scheduling Scheme for Mobile Edge Computing
Chaoqun Li (Shandong University), Si Wu (Shandong University), Qike Cao (Henan Institute of science and technology), Jinyao Liu (Shandong University), Xiuzhen Cheng (Shandong University), Pengfei Hu (Shandong University)
TSN Cache: Exploiting Data Localities in Graph Computing Applications
Chaoyang Jia (National University of Defense Technology), Jingyu Liu (National University of Defense Technology), Shi Chen (National University of Defense Technology), Kai Lu (National University of Defense Technology), Li Shen (National University of Defense Technology)
Under the Dome: Automated Generation of eBPF Programs for Monitoring Vulnerability with AEGIS
Tianze Zhang (National University of Defense Technology), Teng Wang (National Innovation Institute of Defense Technology), Zhouyang Jia (National University of Defense Technology), Yuanliang Zhang (National University of Defense Technology), Shengbin Xu (National University of Defense Technology), Hongyan Wu (National University of Defense Technology), Xiaofan Sun (National University of Defense Technology), Lei Wang (National Innovation Institute of Defense Technology), Shanshan Li (National University of Defense Technology)