时 间:2026年5月9 日(周六) 16:00 - 17:00
报告人:刘旭 上海财经大学教授
地 点:普陀校区理科大楼A1514室
邀请人:史兴杰 华东师范大学副教授
摘 要:
To address data scarcity in statistical modeling, we propose a methodol- ogy for augmenting datasets using synthetic data from pretrained generative models. We introduce a procedure for tabular data suitable for Generalized Linear Models, which employs a reversible transformation to an image rep- resentation. This enables the use of pretrained diffusion models for gener- ation, followed by an inverse mapping to the original data domain. A core component of our methodology is a principled filtering pipeline designed to select high-utility synthetic samples and mitigate negative transfer. This procedure utilizes a data-partitioning scheme for independent evaluation and incorporates transfer learning principles. For high-dimensional settings, it is enhanced with a p-value criterion. The framework is also adapted for image data augmentation. Empirical results from simulations and real-data applica- tions demonstrate systematic improvements in predictive accuracy. Our find- ings also reveal a practical limit to these performance gains, highlighting the finite nature of transferable information from the generative model.
报告人简介:
刘旭博士是上海财经大学统计与管理学院常任教授。2011-2016年分别在美国西北大学和密歇根州立大学从事博士后研究。近年来主要研究兴趣为生成式学习、迁移学习、以及高维数据分析。在国际顶级统计期刊包括JASA,Biometrika,JoE,JMLR等发表30多篇论文。现担任International Journal of Organizational and Collective Intelligence (IJOCI) 和 Journal of Statistical Theory and Applications (JSTA)的副主编。主持两项国家自科面上项目、参与一项国家自科重点项目子课题。获得上海市第十六届哲学社会科学优秀成果二等奖。