时 间:2025-10-21 (周二)15:00 - 16:00
地 点:中北理科大楼A1314室
报告人:王文佳 香港科技大学(广州)助理教授
主持人:王亚平 华东师范大学教授
摘 要:
Personalized services are fundamental to today's digital economy, with their online decision-making often framed as contextual bandit problems. Modern applications present two significant challenges for this framework: high-dimensional covariates and the necessity for nonparametric models to accurately reflect the complex relationships between rewards and covariates. We propose a new contextual bandit algorithm based on a sparse additive reward model that addresses both challenges via: (i) a double penalization method for nonparametric reward function estimation, and (ii) an epoch-based structure that effectively balances exploration and exploitation. We prove that the cumulative regret of our algorithm is sublinear in the time horizon $T$ and grows linearly with the logarithm of the covariate dimensionality $\log(d)$. Through extensive numerical experiments, we show our algorithm's superior performance in high-dimensional settings compared to existing algorithms.
报告人简介:
王文佳是香港科技大学(广州)信息枢纽数据科学与分析学域的助理教授;2018年8月获得佐治亚理工学院工业工程系博士学位。王文佳的研究方向包括不确定性量化、随机仿真、机器学习、非参数统计和计算机实验,在统计学、机器学习、管理学顶级期刊、会议Journal of the American Statistical Association,Journal of Machine Learning Research,Management Science,Technometrics,NeurIPS,ICLR,ICML等发表数十篇篇文章。