Publications
Full list: [Google scholar] | [DBLP]
2024
- ICMLLearning Latent Space Hierarchical EBM Diffusion ModelsJiali Cui, and Tian HanIn International Conference on Machine Learning (ICML), 2024
This work studies the learning problem of the energy-based prior model and the multi-layer generator model. The multi-layer generator model, which contains multiple layers of latent variables organized in a top-down hierarchical structure, typically assumes the Gaussian prior model. Such a prior model can be limited in modelling expressivity, which results in a gap between the generator posterior and the prior model, known as the prior hole problem. Recent works have explored learning the energy-based (EBM) prior model as a second-stage, complementary model to bridge the gap. However, the EBM defined on a multi-layer latent space can be highly multi-modal, which makes sampling from such marginal EBM prior challenging in practice, resulting in ineffectively learned EBM. To tackle the challenge, we propose to leverage the diffusion probabilistic scheme to mitigate the burden of EBM sampling and thus facilitate EBM learning. Our extensive experiments demonstrate a superior performance of our diffusion-learned EBM prior on various challenging tasks.
- ICMLImproving Adversarial Energy-Based Model via Diffusion ProcessCong Geng, Tian Han, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Søren Hauberg, and Bo LiIn International Conference on Machine Learning (ICML), 2024
Generative models have shown strong generation ability while efficient likelihood estimation is less explored. Energy-based models (EBMs) define a flexible energy function to parameterize unnormalized densities efficiently but are notorious for being difficult to train. Adversarial EBMs introduce a generator to form a minimax training game to avoid expensive MCMC sampling used in traditional EBMs, but a noticeable gap between adversarial EBMs and other strong generative models still exists. Inspired by diffusion-based models, we embedded EBMs into each denoising step to split a long-generated process into several smaller steps. Besides, we employ a symmetric Jeffrey divergence and introduce a variational posterior distribution for the generator’s training to address the main challenges that exist in adversarial EBMs. Our experiments show significant improvement in generation compared to existing adversarial EBMs, while also providing a useful energy function for efficient density estimation.
- ICMLLayerwise Change of Knowledge in Neural NetworksXu Cheng, Lei Cheng, Zhaoran Peng, Yang Xu, Tian Han, and Quanshi ZhangIn International Conference on Machine Learning (ICML), 2024
This paper aims to explain how a deep neural network (DNN) gradually extracts new knowledge and forgets noisy features through layers in forward propagation. Up to now, although the definition of knowledge encoded by the DNN has not reached a consensus, Previous studies have derived a series of mathematical evidence to take interactions as symbolic primitive inference patterns encoded by a DNN. We extend the definition of interactions and, for the first time, extract interactions encoded by intermediate layers. We quantify and track the newly emerged interactions and the forgotten interactions in each layer during the forward propagation, which shed new light on the learning behavior of DNNs. The layer-wise change of interactions also reveals the change of the generalization capacity and instability of feature representations of a DNN.
- CVPRWLearning Multimodal Latent Space with EBM Prior and MCMC InferenceShiyu Yuan, Carlo Lipizzi, and Tian HanIn Generative Models in Computer Vision (GCV) Workshop in CVPR, 2024
Multimodal generative models are crucial for various applications. We propose an approach that combines an expressive energy-based model (EBM) prior with Markov Chain Monte Carlo (MCMC) inference in the latent space for multimodal generation. The EBM prior acts as an informative guide, while MCMC inference, specifically through short-run Langevin dynamics, brings the posterior distribution closer to its true form. This method not only provides an expressive prior to better capture the complexity of multimodality but also improves the learning of shared latent variables for more coherent generation across modalities. Our proposed method is supported by empirical experiments, underscoring the effectiveness of our EBM prior with MCMC inference in enhancing cross-modal and joint generative tasks in multimodal contexts.
- WACVEnforcing Sparsity on Latent Space for Robust and Explainable RepresentationsHanao Li, and Tian HanIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
Recently, dense latent variable models have shown promising results, but their distributed and potentially redundant codes make them less interpretable and less robust to noise. On the other hand, sparse representations are more parsimonious, providing better explainability and noise robustness, but it is difficult to enforce sparsity due to the complexity and computational cost involved. In this paper, we propose a novel unsupervised learning approach to enforce sparsity on the latent space for the generator model, utilizing a gradually sparsified spike and slab distribution as our prior. Our model is composed of a top-down generator network that maps the latent variable to the observations. We use maximum likelihood sampling to infer latent variables in the generator’s posterior direction, and spike and slab regularization in the inference stage can induce sparsity by pushing non-informative latent dimensions toward zero. Our experiments show that the learned sparse latent representations preserve the majority of the information, and our model can learn disentangled semantics, increase the explainability of the latent codes, and enhance the robustness of the classification and denoising tasks.
2023
- NeurIPSLearning Energy-based Model via Dual-MCMC TeachingJiali Cui, and Tian HanIn Advances in Neural Information Processing Systems (NeurIPS), 2023
This paper studies the fundamental learning problem of the energy-based model (EBM). Learning the EBM can be achieved using the maximum likelihood estimation (MLE), which typically involves the Markov Chain Monte Carlo (MCMC) sampling, such as the Langevin dynamics. However, the noise-initialized Langevin dynamics can be challenging in practice and hard to mix. This motivates the exploration of joint training with the generator model where the generator model serves as a complementary model to bypass MCMC sampling. However, such a method can be less accurate than the MCMC and result in biased EBM learning. While the generator can also serve as an initializer model for better MCMC sampling, its learning can be biased since it only matches the EBM and has no access to empirical training examples. Such biased generator learning may limit the potential of learning the EBM. To address this issue, we present a joint learning framework that interweaves the maximum likelihood learning algorithm for both the EBM and the complementary generator model. In particular, the generator model is learned by MLE to match both the EBM and the empirical data distribution, making it a more informative initializer for MCMC sampling of EBM. Learning generator with observed examples typically requires inference of the generator posterior. To ensure accurate and efficient inference, we adopt the MCMC posterior sampling and introduce a complementary inference model to initialize such latent MCMC sampling. We show that three separate models can be seamlessly integrated into our joint framework through two (dual-) MCMC teaching, enabling effective and efficient EBM learning.
- ICCVLearning Hierarchical Features with Joint Latent Space Energy-Based PriorJiali Cui, Ying Nian Wu, and Tian HanIn Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
This paper studies the fundamental problem of multilayer generator models in learning hierarchical representations. The multi-layer generator model that consists of multiple layers of latent variables organized in a top-down architecture tends to learn multiple levels of data abstraction. However, such multi-layer latent variables are typically parameterized to be Gaussian, which can be less informative in capturing complex abstractions, resulting in limited success in hierarchical representation learning. On the other hand, the energy-based (EBM) prior is known to be expressive in capturing the data regularities, but it often lacks the hierarchical structure to capture different levels of hierarchical representations. In this paper, we propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning. We develop a variational joint learning scheme that seamlessly integrates an inference model for efficient inference. Our experiments demonstrate that the proposed joint EBM prior is effective and expressive in capturing hierarchical representations and modelling data distribution.
- CVPRLearning Joint Latent Space EBM Prior Model for Multi-layer GeneratorJiali Cui, Ying Nian Wu, and Tian HanIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. Such joint latent space EBM prior model captures the intra-layer contextual relations at each layer through layer-wise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating high-quality images and capturing hierarchical features for better outlier detection.
- UAIMolecule Design by Latent Space Energy-Based Modeling and Gradual Distribution ShiftingDeqian Kong, Bo Pang, Tian Han, and Ying Nian WuIn Uncertainty in Artificial Intelligence (UAI), 2023
Generation of molecules with desired chemical and biological properties such as high drug-likeness, high binding affinity to target proteins, is critical for drug discovery. In this paper, we propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our model assumes an energy-based model (EBM) in the latent space. Conditional on the latent vector, the molecule and its properties are modeled by a molecule generation model and a property regression model respectively. To search for molecules with desired properties, we propose a sampling with gradual distribution shifting (SGDS) algorithm, so that after learning the model initially on the training data of existing molecules and their properties, the proposed algorithm gradually shifts the model distribution towards the region supported by molecules with desired values of properties. Our experiments show that our method achieves very strong performances on various molecule design tasks.
2022
- NeurIPSAdaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based ModelZhisheng Xiao, and Tian HanIn Advances in Neural Information Processing Systems (NeurIPS) , 2022
This paper studies the fundamental problem of learning energy-based model (EBM) in the latent space of the generator model. Learning such prior model typically requires running costly Markov Chain Monte Carlo (MCMC). Instead, we propose to use noise contrastive estimation (NCE) to discriminatively learn the EBM through density ratio estimation between the latent prior density and latent posterior density. However, the NCE typically fails to accurately estimate such density ratio given large gap between two densities. To effectively tackle this issue and learn more expressive prior models, we develop the adaptive multi-stage density ratio estimation which breaks the estimation into multiple stages and learn different stages of density ratio sequentially and adaptively. The latent prior model can be gradually learned using ratio estimated in previous stage so that the final latent space EBM prior can be naturally formed by product of ratios in different stages. The proposed method enables informative and much sharper prior than existing baselines, and can be trained efficiently. Our experiments demonstrate strong performances in image generation and reconstruction as well as anomaly detection.
- AAAILearning from the Tangram to Solve Mini Visual TasksYizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, and Song-Chun ZhuIn Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) , 2022
Current pre-training methods in computer vision focus on natural images in the daily-life context. However, abstract diagrams such as icons and symbols are common and important in the real world. This work is inspired by Tangram, a game that requires replicating an abstract pattern from seven dissected shapes. By recording human experience in solving tangram puzzles, we present the Tangram dataset and show that a pre-trained neural model on the Tangram helps solve some mini visual tasks based on low-resolution vision. Extensive experiments demonstrate that our proposed method generates intelligent solutions for aesthetic tasks such as folding clothes and evaluating room layouts. The pre-trained feature extractor can facilitate the convergence of few-shot learning tasks on human handwriting and improve the accuracy in identifying icons by their contours.
- AAAIContext-Aware Health Event Prediction via Transition Functions on Dynamic Disease GraphsChang Lu, Tian Han, and Yue NingIn Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) , 2022
With the wide application of electronic health records (EHR) in healthcare facilities, health event prediction with deep learning has gained more and more attention. A common feature of EHR data used for deep-learning-based predictions is historical diagnoses. Existing work mainly regards a diagnosis as an independent disease and does not consider clinical relations among diseases in a visit. Many machine learning approaches assume disease representations are static in different visits of a patient. However, in real practice, multiple diseases that are frequently diagnosed at the same time reflect hidden patterns that are conducive to prognosis. Moreover, the development of a disease is not static since some diseases can emerge or disappear and show various symptoms in different visits of a patient. To effectively utilize this combinational disease information and explore the dynamics of diseases, we propose a novel context-aware learning framework using transition functions on dynamic disease graphs. Specifically, we construct a global disease co-occurrence graph with multiple node properties for disease combinations. We design dynamic subgraphs for each patient’s visit to leverage global and local contexts. We further define three diagnosis roles in each visit based on the variation of node properties to model disease transition processes. Experimental results on two real-world EHR datasets show that the proposed model outperforms state of the art in predicting health events.
2021
- EACLGenerative Text Modeling through Short Run InferenceBo Pang, Erik Nijkamp, Tian Han, and Ying Nian WuIn The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL) , 2021
Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into the prior, an issue known as posterior collapse. Various techniques have been proposed to mitigate this issue. Most of them focus on improving the inference model to yield latent codes of higher quality. The present work proposes a short run dynamics for inference. It is initialized from the prior distribution of the latent variable and then runs a small number (e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The major advantage of our method is that it does not require a separate inference model or assume simple geometry of the posterior distribution, thus rendering an automatic, natural and flexible inference engine. We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse. Analyses of the latent space show that interpolation in the latent space is able to generate coherent sentences with smooth transition and demonstrate improved classification over strong baselines with latent features from unsupervised pretraining. These results together expose a well-structured latent space of our generative model.
2020
- NeurIPSLearning Latent Space Energy-Based Prior ModelBo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, and Ying Nian WuIn Advances in Neural Information Processing Systems (NeurIPS) , 2020
We propose to learn energy-based model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the top-down network of the generator model. Both the latent space EBM and the top-down network can be learned jointly by maximum likelihood, which involves short-run MCMC sampling from both the prior and posterior distributions of the latent vector. Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well. We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection.
- AAAIOn the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based ModelsErik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, and Ying Nian WuIn The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI) , 2020
This study investigates the effects of Markov chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is restricted to the family of unnormalized probability densities for which the negative log density (or energy function) is a ConvNet. We find that many of the techniques used to stabilize training in previous studies are not necessary. ML learning with a ConvNet potential requires only a few hyper-parameters and no regularization. Using this minimal framework, we identify a variety of ML learning outcomes that depend solely on the implementation of MCMC sampling. On one hand, we show that it is easy to train an energy-based model which can sample realistic images with short-run Langevin. ML can be effective and stable even when MCMC samples have much higher energy than true steady-state samples throughout training. Based on this insight, we introduce an ML method with purely noise-initialized MCMC, high-quality short-run synthesis, and the same budget as ML with informative MCMC initialization such as CD or PCD. Unlike previous models, our energy model can obtain realistic high-diversity samples from a noise signal after training. On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images. We show that it is much harder to train a ConvNet potential to learn a steady-state over realistic images. To our knowledge, long-run MCMC samples of all previous models lose the realism of short-run samples. With correct tuning of Langevin noise, we train the first ConvNet potentials for which long-run and steady-state MCMC samples are realistic images.
- CVPRJoint Training of Variational Auto-Encoder and Latent Energy-Based ModelTian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, and Ying Nian WuIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2020
This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM). The joint training of VAE and latent EBM are based on an objective function that consists of three Kullback-Leibler divergences between three joint distributions on the latent vector and the image, and the objective function is of an elegant symmetric and anti-symmetric form of divergence triangle that seamlessly integrates variational and adversarial learning. In this joint training scheme, the latent EBM serves as a critic of the generator model, while the generator model and the inference model in VAE serve as the approximate synthesis sampler and inference sampler of the latent EBM. Our experiments show that the joint training greatly improves the synthesis quality of the VAE. It also enables learning of an energy function that is capable of detecting out of sample examples for anomaly detection.
- ECCVLearning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate InferenceErik Nijkamp, Bo Pang, Tian Han, Linqi Zhou, Song-Chun Zhu, and Ying Nian WuIn 16th European Conference on Computer Vision (ECCV) , 2020
This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized non-persistent short run MCMC, such as finite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model.
2019
- CVPRDivergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential ModelTian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, and Ying Nian WuIn IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model. The divergence triangle is a compact and symmetric (anti-symmetric) objective function that seamlessly integrates variational learning, adversarial learning, wake-sleep algorithm, and contrastive divergence in a unified probabilistic formulation. This unification makes the processes of sampling, inference, energy evaluation readily available without the need for costly Markov chain Monte Carlo methods. Our experiments demonstrate that the divergence triangle is capable of learning (1) an energy-based model with well-formed energy landscape, (2) direct sampling in the form of a generator network, and (3) feed-forward inference that faithfully reconstructs observed as well as synthesized data. The divergence triangle is a robust training method that can learn from incomplete data.
- CVPRUnsupervised Disentangling of Appearance and Geometry by Deformable Generator NetworkXianglei Xing, Tian Han, Ruiqi Gao, Song-Chun Zhu, and Ying Nian WuIn IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
We present a deformable generator model to disentangle the appearance and geometric information in purely unsupervised manner. The appearance generator models the appearance related information, including color, illumination, identity or category, of an image, while the geometric generator performs geometric related warping, such as rotation and stretching, through generating displacement of the coordinate of each pixel to obtain the final image. Two generators act upon independent latent factors to extract disentangled appearance and geometric information from images. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets to facilitate knowledge transfer tasks.
- WACVLearning Generator Networks for Dynamic PatternsTian Han, Yang Lu, Jiawen Wu, Xianglei Xing, and Ying Nian WuIn IEEE Winter Conference on Applications of Computer Vision (WACV), 2019
We address the problem of learning dynamic patterns from unlabeled video sequences, either in the form of generating new video sequences, or recovering incomplete video sequences. This problem is challenging because the appearances and motions in the video sequences can be very complex. We propose to use the alternating back-propagation algorithm to learn the generator network with the spatial-temporal convolutional architecture. The proposed method is efficient and flexible. It can not only generate realistic video sequences, but can also recover the incomplete video sequences in the testing stage or even in the learning stage. The proposed algorithm can be further improved by using learned initialization which is useful for the recovery tasks. Further, the proposed algorithm can naturally help to learn the shared representation between different modalities. Our experiments show that our method is competitive with the existing state of the art methods both qualitatively and quantitatively.
2018
- IJCAIReplicating Active Appearance Model by Generator NetworkTian Han, Jiawen Wu, and Ying Nian WuIn the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI) , 2018
A recent Cell paper [Chang and Tsao, 2017] reports an interesting discovery. For the face stimuli generated by a pre-trained active appearance model (AAM), the responses of neurons in the areas of the primate brain that are responsible for face recognition exhibit strong linear relationship with the shape variables and appearance variables of the AAM that generates the face stimuli. In this paper, we show that this behavior can be replicated by a deep generative model called the generator network, which assumes that the observed signals are generated by latent random variables via a top-down convolutional neural network. Specifically, we learn the generator network from the face images generated by a pre-trained AAM model using variational auto-encoder, and we show that the inferred latent variables of the learned generator network have strong linear relationship with the shape and appearance variables of the AAM model that generates the face images. Unlike the AAM model that has an explicit shape model where the shape variables generate the control points or landmarks, the generator network has no such shape model and shape variables. Yet the generator network can learn the shape knowledge in the sense that some of the latent variables of the learned generator network capture the shape variations in the face images generated by AAM.
2017
- AAAIAlternating Back-Propagation for Generator NetworkTian Han, Yang Lu, Song-Chun Zhu, and Ying Nian WuIn the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017
This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a non-linear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.