11月3日 | 徐晨:Distributed Hard Screening for Massive Data

时    间:2023年11月3日10:00-11:00

地    点:普陀校区理科大楼A1514

报告人:徐晨西安交通大学教授

主持人:刘玉坤华东师范大学教授

摘   要:

Feature screening is a powerful tool for modeling high dimensional data. It aims at reducing the dimensionality by removing most irrelevant features before an elaborative analysis. When a dataset is massive in both sample size N and dimensionality p, classic screening methods become inefficient or even infeasible due to the high computational burden. In this paper, we propose a distributed screening method for the large-N-large-p setup. The new method is built upon an ADMM updating procedure of L0-constrained consensus regression, where data are processed in m manageable segments by multiple local computers. In the procedure, the local computers improve screening results iteratively by communicating with each other via a global computer. The joint effects between features are also accounted naturally in the screening process. It thus provides a computationally viable and reliable route for screening features with big data. Under mild conditions, we show that the proposed updating procedure is convergent and leads to an accurate screening even when m = o(N). Moreover, with a proper starting value, the procedure enjoys the sure screening property within finite number of iterations. The promising performance of the method is supported by extensive numerical studies.

报告人简介:

徐晨教授毕业于加拿大不列颠哥伦比亚大学统计系,师从国际知名统计学家加拿大皇家科院院士陈嘉骅。毕业后赴美国宾州州立大学做博士后研究。现任西安交通大学特聘教授、加拿大渥太华大学长聘副教授。徐晨教授长期从事大数据统计机器学习的基础理论与方法研究,在大数据特征筛选/降维、再抽样理论与方法、分布式统计分析等领域取得系统性创新成果,做出多个原创性贡献。在统计学顶刊Journal of American Statistical Association、机器学习顶刊Journal of Machine Learning Research、IEEE Transactions on Pattern Analysis & Machine Intelligence 和综合学科类顶刊National Science Review 等国际著名杂志发表研究论文40余篇; 主持加拿大自然科学探索基金、中国国家重点研发计划项目, 参与中国国家自然科学基金重大项目、鹏城实验室重大科研攻关任务项目。研究获得加拿大统计学会最佳学生论文奖(2010)、加拿大国家统计科学研究所杰出博士后导师奖(2021)、粤港澳大湾区首届国际算法算例大赛冠军 (2022) 等。现任统计学权威杂志JASA、EJS的副主编,曾任CJS、Neurocomputing、Survey Sampling等国际知名杂志的编委或客座主编。


发布者:张瑛发布时间:2023-10-20浏览次数:310