报告人：Feng Yang 副教授 Columbia University
题目一：Neyman-Pearson classification algorithms and NP receiver operating characteristics
摘要：In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below the desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class
1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than alpha, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression,support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose alpha in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.
题目二：Are there any community structures in a hypergraph?
摘要：Many complex networks in the real world can be formulated as hypergraphs where community detection has been widely used. However, the fundamental question of whether communities exist or not in an observed hypergraph remains unclear. The aim of the work is to tackle this important problem. Specifically, we systematically study when a hypergraph with community structure can be successfully distinguished from its Erdos-Renyi counterpart, and propose concrete test statistics based on hypergraph cycles when the models are distinguishable. For uniform hypergraphs, we show that the success of hypergraph testing highly depends on the order of the average degree as well as the signal to noise ratio. In addition, we obtain asymptotic distributions of the proposed test statistics and analyze their power. Our results are further extended to nonuniform hypergraphs in which a new test involving both edge and hyperedge information is proposed. The novel aspect of our new test is that it is provably more powerful than the classic test involving only edge information. Simulation and real data analysis support our theoretical findings.
Feng Yang 副教授，本科毕业于中国科学技术大学少年班，博士毕业于普林斯顿大学。2010—2016年为美国哥伦比亚大学统计系助理教授，2016年—至今为美国哥伦比亚大学统计系副教授。在包括统计学顶级杂志Annals of Statistics, Journal of American Statistical Association和Journal of the Royal Statistical Society, Series B等杂志上发表论文四十余篇。