Xiaolin Hu's homepage

Home Research Teaching Publications Software Students Links

Publications

Preprint

Jianjin Xu, Zheyang Xiong, Xiaolin Hu, “Frame difference-based temporal loss for video stylization,” arXiv:2102.05822, 2021. A simple loss that does not need the time-consuming estimation of optical flow. Source codes
Xiao Li, Jianmin Li, Ting Dai, Jie Shi, Jun Zhu, Xiaolin Hu, “Rethinking Natural Adversarial Examples for Classification Models,” arXiv:2102.1173, Feb 2021. How should we define the natural adversarial examples? We propose the ImageNet-A-Plus dataset, which is modified from ImageNet-A.
Xiao Li, Zhuhong Li, Qiongxiu Li, Bingze Lee, Jinghao Cui, Xiaolin Hu, “Faster-GCG: efficient discrete optimization jailbreak attacks against aligned large language models,” arXiv:2410.15362. A faster and more effective version of the GCG attack proposed by Zou et al. (2023)
Xiaopei Zhu, Siyuan Huang, Zhanhao Hu, Jianmin Li, Jun Zhu, Xiaolin Hu, “Physical adversarial examples for person detectors in thermal images based on 3D modeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence. DOI 10.1109/TPAMI.2025.3582334 Another “invisibility cloak" for infrared person detectors.

2025

Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu, “PBCAT: patch-based composite adversarial training against physically realizable attacks on object detection,” Proc. of the International Conference on Computer Vision (ICCV), Honolulu, Hawaii, Oct 19th-23rd, 2025. This is an efficient method for adversarial training of object detectors. Source codes
Xiao Li, Hang Chen, Xiaolin Hu, “On the Importance of Backbone to the Adversarial Robustness of Object Detectors,” IEEE Transactions on Information Forensics and Security, vol. 20, pp. 2387-2398. We found that adversarially pre-trained backbone networks were essential for enhancing the adversarial robustness of object detectors. Source codes
Han Liu, Peng Cui, Bingning Wang, Weipeng Chen, Yupeng Zhang, Jun Zhu, Xiaolin Hu, “Improving accuracy and calibration via differentiated deep mutual learning,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville TN, June 11th-15th, 2025. Modern DNNs tend to exhibit overconfidence especially on ambiguous samples. The paper presents a new method to address this. Source codes .
Tong Wang, Ting Liu, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu, “GlyphMastero: a glyph encoder for high-fidelity scene text editing,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville TN, June 11th-15th, 2025. An interesting scene text editing method. It can change text in scenes using the same style of the original text.. .
Jun Huang, Ting Liu, Yihang Wu, Xiaochao Qu, Luoqi Liu, Xiaolin Hu, “MTADiffusion: mask text alignment diffusion model for object inpainting,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville TN, June 11th-15th, 2025. In addition to a new object inpaiting method, we present a new dataset comprising 5 million images and 25 million mask-text pairs. .
Chongkai Yu, Ting Liu, Li Anqi, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu, “SAM-REF: introducing image-prompt synergy during interaction for detail enhancement in the segment anything model,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville TN, June 11th-15th, 2025. We improve the Segment Anything Model (SAM).
Gang Zhang, Ziyi Li, Chufeng Tang, Jianmin Li, Xiaolin Hu, “CEDNet: A cascade encoder-decoder network for dense prediction,” Pattern Recognition, vol. 128, Feb. 2025. Previous neural networks for object detection and image segmentation are usually built upon the backbones such as ResNet originally designed for classification, which doesn't need high-resolution features. This is not a good strategy. Different backbones for dense prediction are needed. arXiv version Source codes
Kai Li, Wendi Sang, Chang Zeng, Runxuan Yang, Guo Chen, Xiaolin Hu, “SonicSim: a customizable simulation platform for speech processing in moving sound source scenarios,” Proc. of the 13th International Conference on Learning Representations (ICLR), Singapore, Apr 24th-28th, 2025.
Mohan Xu#, Kai Li#, Guo Chen, Xiaolin Hu, “TIGER: time-frequency interleaved gain extraction and reconstruction for efficient speech separation”, Proc. of the 13th International Conference on Learning Representations (ICLR), Singapore, Apr 24th-28th, 2025. The model has only 0.8M parameters!
Hang Chen, Chufeng Tang, Xiao Li, Xiaolin Hu, “Efficient neuron segmentation in electron microscopy by affinity-guided queries,” Proc. of the 13th International Conference on Learning Representations (ICLR), Singapore, Apr 24th-28th, 2025. A query-based method for EM images segmentation.
Xiao Li#, Wenxuan Sun#, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu, “ADBM: adversarial diffusion bridge model for reliable adversarial purification,” Proc. of the 13th International Conference on Learning Representations (ICLR), Singapore, Apr 24th-28th, 2025. A query-based method for EM images segmentation.

2024

Zhi Cheng, Zhanhao Hu, Yuqiu Liu, Hang Su, Xiaolin Hu, “Full-distance evasion of pedestrian detectors in the physical world,” Advances in Neural Information Processing (NeurIPS), Vancouver, Dec 10-15, 2024. Previous physical adversarial attacks can only function at short distances. We propose a method to remedy this. Several key techniques include simulating the atmosphere perspective and camara imaging process.
Hang Chen, Chufeng Tang, Xiaolin Hu, “DHS-DETR: Efficient DETRs with dynamic head switching,” Computer Vision and Image Understanding, vol. 248, article 104106, Nov 2024.
Ziqin Wang, Jiawei Gao, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang, “CooHOI: learning cooperative human-object interaction with manipulated object dynamics,” Advances in Neural Information Processing (NeurIPS), Vancouver, Dec 10-15, 2024. (Spotlight)
Haoran He, Peilin Wu, Chenjia Bai, Hang Lai, Lingxiao Wang, Ling Pan, Xiaolin Hu, Weinan Zhang, “Bridging the sim-to-real gap from the information bottleneck perspective,” Conference on Robot Learning. (Oral)
Jinchao Liu, Margarita Osadchy, Yan Wang, Yingying Wu, Enyi Li, Xiaolin Hu, and Yongchun Fang, “Vibrational spectroscopy can be vulnerable to adversarial attacks,” Analytical Chemistry, vol. 96, pp. 16570-16580, Oct. 2024. We show that vibrational spectroscopy (Raman and infrared), which is widely used in security applications such as security check in the airport, can be vulnerable to adversarial attacks. It poses serious security threats to our society.
Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu, “An audio-visual speech separation model inspired by cortico-thalamo-cortical circuits,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 10, pp. 6637-6651, Oct 2024. A brain-inspired model for audio-visual speech separation. The state-of-the-art model on this task. Supplementary video and source codes arXiv version
Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yingpeng Dong, Xiaolin Hu, “Natural language induced adversarial images,” ACM Multimedia, Melbourne, Australia, Oct 28-Nov 1, 2024.
Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu, “Controllable navigation instruction generation with chain of thought prompting,” The 18th European Conference on Computer Vision (ECCV), MiCo Milano, Italy, Sep 29th-Oct 4th, 2024.
Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu, “PartImageNet++ dataset: scaling up part-based models for robust recognition,” The 18th European Conference on Computer Vision (ECCV), MiCo Milano, Italy, Sep 29th-Oct 4th, 2024. We propose a new dataset called PartImageNet++, providing high-quality part segmentation annotations for all categories of ImageNet-1K. Dataset and source codes
Kai Li, Runxuan Yang, Fuchun Sun, Xiaolin Hu, “IIANet: an intra- and inter-modality attention network for audio-visual speech separation,” The 41st International Conference on Machine Learning (ICML), Vienna, Austria, July 21-27, 2024. Inspired by the cross-modal processing mechanism in the brain, we design intra- and inter-attention modules to integrate auditary and visual information for efficient speech separation. The model simulates audio-visual fusion in different levels of sensory cortical areas as well as higher association areas such as parietal cortex. Demo and source codes
Xiao Li, Qiongxiu Li, Zhanhao Hu, and Xiaolin Hu, “On the privacy effect of data enhancement via the lens of memorization,” IEEE Transactions on Information Forensics and Security, vol. 9, pp. 4686-4699, 2024. We investigated several nonintuitive and seemingly contradictory conclusions about privacy, data augmentation and adversarial robustness. Source codes
Gang Zhang Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu, “SAFDNet: A simple and effective network for fully sparse 3D object detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024. (Oral, 90 out of about 11500) Source codes
Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu, Jianmin Li, Xiaolin Hu, “Infrared adversarial car stickers”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024. We hide real cars against infrared car detectors.
Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, Xiaolin Hu, “Language-driven anchors for zero-shot adversarial robustness,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024. Source codes
Xiaopei Zhu, Xiao Li, Jianmin Li, Zheyao Wang, Xiaolin Hu, “Hiding from thermal imaging pedestrian detectors in the physical world,” Neurocomputing, vol. 564, article 126963, 2024. Extention of our AAAI 2021 paper (use small bulbs).
Samuel Pegg, Kai Li, Xiaolin Hu, “RTFS-Net: recurrent time-frequency modelling for efficient audio-visual speech separation,” Proc. of the 12th International Conference on Learning Representations (ICLR), Vienna, Austria, May 7-11, 2024. The first time-frequency domain audio-visual speech separation method that outperforms all contemporary time-domain counterparts. It uses only 1/100 parameters of VisualVoice, one of the previous SOTA methods. Source codes
Zhongfu Shen, Jiajun Yang, Qiangqiang Zhang, Kuiyu Wang, Xiaohui Lv, Xiaolin Hu, Jian Ma, Song-Hai Shi, “How variable progenitor clones construct a largely invariant neocortex,” National Science Review, vol. 11, no. 1, January 2024, nwad247.

2023

Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu, Zheyao Wang, “Hiding from infrared detectors in real world with adversarial clothes,” Applied Intelligence, vol. 53, 29537-29555, 2023. A infrared adversarial attack method based on carbon fiber heaters. A physical attack.
Jiawei Shan, Gang Zhang, Chufeng Tang, Hujie Pan, Qiankun Yu, Guanhao Wu, and Xiaolin Hu, “Focal distillation from high-resolution data to low-resolution data for 3D object detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 12, pp. 14064-14075. 2023. A method to utilize 64-channel LiDAR data to train an object detector that works on 16-channel LiDAR data.
Samuel Pegg, Kai Li, Xiaolin Hu, “TDFNet: an efficient audio-visual speech separation model with top-down fusion,” Proceedings of the 13th International Conference on Information Science and Technology (ICIST), Cairo, Egypt, December 8-14, 2023. We combine our TDANet and CTCNet for efficient audio-visual speech separation. Source codes
Runxuan Yang, Yuyang Peng and Xiaolin Hu, “A fast high-fidelity source-filter vocoder with lightweight neural modules,” IEEE Transactions on Audio, Speech and Language Processing, vol. 31, pp. 3362-3373, 2023. Singing voice synthesis Demo and source codes
Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Xiaolin Hu, “HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds,” Advances in Neural Information Processing (NeurIPS), New Orleans, Dec 10-16, 2023. We use encoder-decoder blocks to capture long-range dependencies among features in the spatial space. HEDNet was 50% faster than DSVT. Source codes
Xinyi Li, Yanan Zhong, Hang Chen, Jianshi Tang, Xiaojian Zheng, Wen Sun, Yang Li, Dong Wu, Bin Gao, Xiaolin Hu, He Qian, Huaqiang Wu, “Memristors-based dendritic neuron for high-efficiency spatial-temporal information processing,” Advanced Materials, vol. 35, 2203684, 2023. A brain-inspired equipment based on memristors for simulating the dynamic behavior of dentrites of biological neurons. The power consumption is 1000X lower than GPU running the same network.
Jianjin Xu, Zhaoxiang Zhang, Xiaolin Hu, “Extracting semantic knowledge from GANs with unsupervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 9654-9668, 2023. This method has an interesting application: you can change the segmentation mask to generate desired images. Demo and source codes
Xiao Li, Ziqi Wang, Bo Zhang, Fuchun Sun, Xiaolin Hu, “Recognizing object by components with human prior knowledge enhances adversarial robustness of deep neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8861-8873, July 2023. arXiv version This method is inspired by a well-known theory in cognitive psychology – recognition-by-components.
Hector Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann, “Audio-visual speech separation in noisy environments with a lightweight iterative model,” Proceedings of the INTERSPEECH, Dublin, Ireland, August 20-24, 2023. Demo Source codes
Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian, “Visual recognition by request,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June 18-22, 2023. Source codes
Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu, “Physically realizable natural-looking clothing textures evade person detectors via 3D modeling,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, June 18-22, 2023. If you wear our designed camouflage clothing, the AI behind cameras may not detect you. (^_^) Demo
Kai Li, Runxuan Yang, Xiaolin Hu, “An efficient encoder-decoder architecture with top-down attention for speech separation,” Proc. of the 11th International Conference on Learning Representations (ICLR), Kigali, Rwanda, May 1-5, 2023. Top-down neural projections are ubiquitous in the brain. We found that this kind of projections are very useful for solving the Cocktail Party Problem. Speech separation demo and music separation demo Source codes

2022

Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen, “Adjacency constraint for efficient hierarchical reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4152-4166, 2022. arXiv version Extended version of our previous NeurIPS 2020 paper.
Hang Chen, Chufeng Tang, Xiaolin Hu. "Dense contrastive loss for instance segmentation." Proc. of the British Machine Vision Conference (BMVC), London, UK, Nov. 21-24, 2022. Source codes
Kai Li, Xiaolin Hu, Yi Luo, “On the use of deep mask estimation module for neural source separation systems,” Proceedings of the InterSpeech, Incheon, Korea, Sept. 18-22, 2022.
Haoran Chen, Jianmin Li, Simone Frintrop, and Xiaolin Hu, “The MSR-Video to Text dataset with clean annotations,” Computer Vision and Image Understanding, vol. 225, article no. 103581, 2022. arXiv version After cleaning the annotations, the perfromance of existing models increases. The cleaned dataset will be made available on request. Source codes
Ting-Yu Kuo, Yuanda Liao, Kai Li, Bo Hong, Xiaolin Hu, “Inferring mechanisms of auditory attentional modulation with deep neural networks,” Neural Computation, vol. 34, no. 11, pp. 2205-2231, 2022. With the help of DNNs, we suggest that the projection of top-down attention signals to lower stages within the auditory pathway of the human brain plays a more significant role than the higher stages in solving the "cocktail party problem". Source codes
Shangqi Guo, Qi Yan, Xin Su, Xiaolin Hu, Feng Chen, “State-temporal compression in reinforcement learning with the reward-restricted geodesic metric,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5572-5589, 2022.
Xiaolin Hu, Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, “Improving image segmentation with boundary patch reﬁnement,” International Journal of Computer Vision, vol. 130, pp. 2571-2589, 2022. A simple yet effective post-processing method to refine the results of image segmentation (semantic segmentation, instance segmentation and panoptic segmentation) models. Extension of our CVPR 2021 work. Source codes
Chufeng Tang, Lingxi Xie, Gang Zhang, Xiaopeng Zhang, Qi Tian, Xiaolin Hu, “Active pointly-supervised instance segmentation,” Proc. of European Conference on Computer Vision (ECCV), Tel-Aviv, Israel, Oct. 23-27, 2022. arXiv version We present an economic active learning setting, APIS, for instance segmentation, which saves annotation cost dramatically. Source codes
Zhanhao Hu, Jun Zhu, Bo Zhang, Xiaolin Hu, “Ampliﬁcation trojan network: attack deep neural networks by amplifying their inherent weakness,” Neurocomputing, vol. 505, pp. 142-153, 2022. A new trojan network for attacking DNNs. Source codes
Jianfeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou, “NP-match: when neural processes meet semi-supervised learning,” Proc. of the 39 th International Conference on Machine Learning (ICML), Baltimore, Maryland, USA, July 17-23, 2022.
Jianfeng Wang, Xiaolin Hu, “Convolutional neural networks with gated recurrent connections,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3421-3425, 2022. Extension of a previous work. We demonstrate the good performance of the Gated RCNN on image classification and object detection. Source codes
Zhanhao Hu, Siyuan Huang, Xiaopei Zhu, Fuchun Sun, Bo Zhang, Xiaolin Hu, “Adversarial texture for fooling person detectors in the physical world”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, Louisian, June 19-24, 2022. (Oral) Supplementary Materials including Supplementary video Supplementary video arXiv version Source codes This paper tells you how to make a “invisibility cloak”!
Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu, “Infrared invisible clothing: hiding from infrared detectors at multiple angles in real world”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, Louisian, June 19-24, 2022. (Oral) Supplementary Materials including Supplementary videos Supplementary video 1 and supplementary video 2 arXiv version This paper tells you how to make a “invisibility cloak” for infrared cameras!
Xiaolin Hu, Zhigang Zeng, “Bridging the functional and wiring properties of V1 neurons through sparse coding,” Neural Computation, vol. 34, no. 1, pp. 104-137, 2022. A standard excitatory-inhibitory neural network shows numerous functional and wiring properties of neurons in layer 2/3 of V1 after unsupervised learning on natural images. Many properties are predictions yet to be verified in biological experiments. One interesting property is the small-worldness. Source codes

2021

Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann, “Speech separation using an asynchronous fully recurrent convolutional neural network,” Advances in Neural Information Processing Systems (NeurIPS), Virtual, Dec 6-14, 2021. A brain-inspired model for speech separation. Demo and Source codes
Hang Chen, Xiao Li, Zefan Wang, Xiaolin Hu, “Robust logo detection in E-commerce images by data augmentation,” Proc. of the 29th ACM International Conference on Multimedia Workshop, pp. 4789-4793, Chengdu, China, Oct 20-24, 2021. Ranked 5/36489 in ACM MM2021 Robust Logo Detection Grand Challenge. Source codes
Jiaheng Liu, Yudong Wu, Yichao Wu, Chuming Li, Xiaolin Hu, Ding Liang, Mengyu Wang, “DAM: Discrepancy Alignment Metric for Face Recognition; Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3814-3823, Virtual, Oct 11-17, 2021.
Ge Gao, Mikko Lauri, Xiaolin Hu, Jianwei Zhang, Simone Frintrop, “CloudAAE: learning 6D object pose regression with on-line data synthesis on point clouds,” Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, May 30-June 5, 2021. arXiv version Source codes
Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu, “RefineMask: Towards high-quality instance segmentation with fine-grained features,“ Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021. arXiv version A coarse-to-fine strategy. Source codes
Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu, “Look closer to segment better: boundary patch refinement for instance segmentation,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021. arXiv version A post-processing model applicable to any instance segmentation method. We ranked the 1st on the Cityscapes leaderboard by the submission DDL of CVPR2021. Source codes
Jianfeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu, “RSG: A simple yet effective module for learning imbalanced datasets,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021. Source codes
Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang, “Generalized Focal Loss V2: learning reliable localization quality estimation for dense object detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 19-25, 2021. arXiv version An extension of the GFL in our NeurIPS 2020 paper. Source codes
Weiyi Zhang, Shuning Zhao, Le Liu, Jianmin Li, Xingliang Cheng, Thomas Fang Zheng, Xiaolin Hu,“Attack on practical speaker verification system using universal adversarial perturbations,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, June 6-11, 2021. A physical attack on speaker verification systems. Source codes
Xiaopei Zhu, Xiao Li, Jianmin Li, Zheyao Wang, Xiaolin Hu, “Fooling thermal infrared pedestrian detectors in real world using small bulbs,” The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), Virtual, Feb 2-9, 2021. Supplementary document and Supplementary video arXiv version If you hold a cardboard embedded with small bulbs designed by us, you would not be detected by YOLOv3.
Han Liu, Shifeng Zhang, Ke Lin, Jing Wen, Jianmin Li, Xiaolin Hu, “Vocabulary-wide credit assignment for training image captioning models,” IEEE Transactions on Image Processing, vol. 30, pp. 2450-2460, 2021. At each generation step, we assign a reward to every word in the vocabulary. Source codes
Zi Yin, Valentin Yiu, Xiaolin Hu, Liang Tang, “End-to-end face parsing via interlinked convolutional neural networks,” Cognitive Neurodynamics, vol. 15, pp. 169-179, 2021. Extension of a previous work for face parsing. Source codes

2020

Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen, “Generating adjacency-constrained subgoals in hierarchical reinforcement learning,” Advances in Neural Information Processing Systems (NeurIPS), Dec 6-12, 2020. (Spotlight) A method for reducing the high-level action space for hierarchical reinforcement learning. Supplementary Materials Source codes
Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang, “Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection,” Advances in Neural Information Processing Systems (NeurIPS), Dec 6-12, 2020. We propose a joint representation of localization quality and classification for object detection methods. Source codes
Haoran Chen, Ke Lin, Alexander Maye, Jianming Li, and Xiaolin Hu, “A semantics-assisted video captioning model trained with scheduled sampling,” Frontiers in Robotics and AI, September 30, 2020. Source codes
Weilun Chen, Zhaoxiang Zhang, Xiaolin Hu, Baoyuan Wu, “Boosting decision-based black-box adversarial attacks with random sign flip,” European Conference on Computer Vision, pp. 276-293. Springer, Cham, 2020.
Jian Wu, Xiaoguang Liu, Xiaolin Hu, Jun Zhu, “PopMNet: generating structured pop music melodies using neural networks,” Artificial Intelligence, vol. 286, article 103303, 2020. Generate the structure of a song first, then generate the melody. Project page Source codes
Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu, “Learning reliable visual saliency for model explanations, ” IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1796-1807, 2020. When you input an image of dog into a deep neural network, you use some existing methods to highlight the region of the dog by setting the output label as "dog", it is OK. But if you set the output label as "cat", you will find some weird results.
Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu, “Interpret neural networks by extracting critical subnetworks,” IEEE Transactions on Image Processing, vol. 29, pp. 6707-6720, 2020. Extension of (Wang et al. CVPR 2018). We extend the idea of critical routes for individual image samples to image categories.
Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, Jun Zhu, “A hierarchical recurrent neural network for symbolic melody generation,” IEEE Transactions on Cybernetics, vol. 50, no. 6, pp. 2749-2757, 2020. arXiv:1712.05274 Automatic melody generation All melodies used in experiments are available
Jianqiao Guo, Yajun Yin, Xiaolin Hu, Gexue Ren, “Self-similar network model for fractional-order neuronal spiking: implications of dendritic spine functions,” Nonlinear Dynamics, vol. 100, pp. 921-935, 2020.	$fractional-order$
Haoran Chen and Jianmin Li and Xiaolin Hu, “Delving deeper into the decoder for video captioning,” The 24th European Conference on Artificial Intelligence (ECAI), Santiago de Compostela, Spain, August 29-September 2, 2020. With a few techniques we boost the state-of-the-art results on video captioning benchmark datasets. Source codes
Ge Gao, Mikko Lauri, Yulong Wang, Xiaolin Hu, Jianwei Zhang, Simone Frintrop, “6D object pose regression via supervised learning on point clouds,” IEEE International Conference on Robotics and Automation (ICRA), Paris, France, May 31 to June 4, 2020. Source codes
Qiushan Guo, Xinjiang Wang, Yichao Wu, Zhipeng Yu, Ding Liang, Xiaolin Hu and Ping Luo, “Online knowledge distillation via collaborative learning,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 16-18, 2020.
Yudong Wu, Yichao Wu, Ruihao Gong, Yuanhao Lv, Ken Chen, Ding Liang, Xiaolin Hu, Xianglong Liu and Junjie Yan, “Rotation consistent margin loss for efficient low-bit face recognition”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 16-18, 2020.
Yulong Wang, Xiaolu Zhang, Xiaolin Hu, Bo Zhang, Hang Su, “Dynamic network pruning with interpretable layerwise channel selection, ”The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, USA, Feb 7-12, 2020. Source codes
Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu, “Pruning from scratch,” The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, USA, Feb 7-12, 2020 Supplementary material arXiv:1909.12579v1 We find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. One can prune the model with its random initial weights. Source codes
Xiang Li, Jun Li, Xiaolin Hu, Jian Yang, “Line-CNN: end-to-end traffic line detection with line proposal unit,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 1, pp. 248-258, 2020. An end-to-end model to detect traffic lines at a speed of 30 f/s on a Titan X GPU. It's potentially useful for autonomous driving systems.

2019

Fangzhou Liao, Ming Liang, Zhe Li, Xiaolin Hu, Sen Song, “Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-or network, ” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3484-3495, 2019. arXiv:1711.08324 The winning solution to the Kaggle Data Science Bowl 2017. A 500,000 US dollar solution! Source codes
Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, Xiaolin Hu, “Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization,” Proc. of IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, Oct 27–Nov 2, 2019. pp. 4997-5006. Supplementary Materials Source codes
Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu, “Knowledge distillation via route constrained optimization,” Proc. of IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, Oct 27–Nov 2, 2019. pp. 1345-1354. (Oral) A new knowledge distillation method for training a small neural network.
Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang, “Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 15–21, 2019. This paper explains why the combination of Dropout and Batch Normalization (BN) often leads to worse performance in many modern neural networks.
Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang, “Selective Kernel Networks,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, June 15–21, 2019. A neural network that performs better than ResNet, ResNeXt, SENet etc. for image classification. Source codes
Niange Yu, Xiaolin Hu, Binheng Song, Jian Yang, Jianwei Zhang, “Topic-oriented image captioning based on order-embedding,” IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2743-2754, 2019. Generate captions for images from different perspectives. Source codes
Shangqi Guo , Zhaofei Yu, Fei Deng, Xiaolin Hu, Feng Chen, “Hierarchical Bayesian inference and learning in spiking neural networks,” IEEE Transactions on Cybernetics, vol. 49, no. 1, pp. 133-145, 2019. Spiking neural networks for Bayesian inference.
Fangzhou Liao, Xi Chen, Xiaolin Hu, Sen Song, “Estimation of the volume of the left ventricle from MRI images using deep neural networks,” IEEE Transactions on Cybernetics, vol. 49, no. 2, pp. 495-504, 2019. This algorithm got the 4th place in the Kaggle Data Science Bowl 2016 Source codes
Qingtian Zhang, Xiaolin Hu, Bo Hong, Bo Zhang, “A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex,” PLOS Computational Biology, 15(2): e1006766, 2019. We used a hierarchical sparse coding model to reveal acoustic feature encoding mechanism in the auditory system. For example, interestingly, the artificial neurons in top layers exhibited phonetic feature encoding property. We found an important role of response sparseness for these properties to emerge. Source codes
Wei Feng, Wentao Liu, Tong Li, Jing Peng, Chen Qian, Xiaolin Hu, “Turbo learning framework for human-object interactions recognition and human pose estimation,” The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, Hawaii, USA, Jan 27-Feb 1, 2019. Learn two tasks simutaneously, which help each other iteratively.

2018

Yi Zhang, Weichao Qiu, Qi Chen, Xiaolin Hu, Alan Yuille, “UnrealStereo: controlling hazardous factors to analyze stereo vision”, Proc. of the International Conference on 3DVision, Verona, Italy, September 5-8, 2018. A synthetic image generation tool enabling to control hazardous factors, such as making objects more specular or transparent, for developing 3D vision algorithms.
Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, Jun Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018. Winning solution of the NIPS 2017 Competition on Adversarial Attacks and Defenses organized by Google Brain. Source codes1 Source codes2
Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li, “Boosting adversarial attacks with momentum,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.(Spotlight) Winning solution of the NIPS 2017 Competition on Adversarial Attacks and Defenses organized by Google Brain. Source codes for non-targeted attack Source codes for targeted attack
Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu, “Interpret neural networks by identifying critical data routing paths,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018. We found that images with similar sementic meaning have similar critical routes in deep CNNs. Source codes
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu, “High Performance Visual Tracking with Siamese Region Proposal Network,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.
Wentao Liu, Jie Chen, Cheng Li, Chen Qian, Xiao Chu, Xiaolin Hu, “A cascaded inception of inception network with attention modulated feature fusion for human pose estimation,” The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), New Orleans, USA, Feb 2-7, 2018. Erratum Three techniques for human pose estimation: 1. inception of inception block, 2. attention to individual levels, 3. cascaded network.

2017

Chengxu Zhuang, Yulong Wang, Daniel Yamins, Xiaolin Hu, “Deep learning predicts correlation between a functional signature of higher visual areas and sparse firing of neurons,” Frontiers in Computational Neuroscience, 2017. Doi: 10.3389/fncom.2017.00100 Study the visual system using deep learning models. Dataset used in the paper
Jianfeng Wang, Xiaolin Hu, “Gated recurrent convolution neural network for OCR,” Advancies in Neural Information Processing (NIPS), Long Beach, USA, Dec. 4-9, 2017. A modified version of our RCNN proposed in 2015. Source codes
Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, Xiu Li, Xiaolin Hu, “Scale-aware face detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21–26, 2017. Prior to face detection, use a CNN to predict the scale distribution of the faces.
Tiancheng Sun, Yulong Wang, Jian Yang, Xiaolin Hu, “Convolution neural networks with two pathways for image style recognition,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4102-4113, 2017. The gram matrix technique proposed by Gatys et al. is used to classify image styles. Three benchmark datasets are experimented, WikiPaintings, Flickr Style and AVA Style. Source codes
J. Wu, L. Ma, X. Hu, “Delving deeper into convolutional neural networks for camera relocalization,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29- June 3, 2017. We present three techniqus for enhancing the performance of convolutional neural networks for camera relocalizationare.
F. Liao, X. Hu, S. Song, “Emergence of V1 recurrent connectivity pattern in artificial neural network,”Computational and Systems Neuroscience (Cosyne), Salt Lake City, Feb. 23 - 26, 2017.
Y. Zhao, X. Jin, X. Hu, “Recurrent convolutional neural network for speech processing,” Proc. of the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, March 5-9, 2017. Applications of recurrent CNN to speech processing. Source codes

2016

Q. Zhang, X. Hu, H. Luo, J. Li, X. Zhang, B. Zhang, “Deciphering phonemes from syllables in blood oxygenation level-dependent signals in human superior temporal gyrus,” European Journal of Neuroscience, vol. 43, no. 6, pp. 773-781, 2016.

This is a "mind reading" work. We managed to decode the phonome information from functional magnetic resonance imaging (fMRI) signals of subjects when they listened to nine syllables. The results indicated that phonemes have unique representations in the superior temporal gyrus (STG). We also revealed certain response patterns of the phonomes in STG.

H. Qin, J. Yan, X. Li, X. Hu, “Joint Training of Cascaded CNN for Face Detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, June 26-July 1, 2016, pp. 3456-3465.

S. Wang, Y. Yang, X. Hu, J. Li, B. Xu, “Solving the K-shortest paths problem in timetable-based public transportation systems,” Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, vol. 20, no. 5, pp. 413-427, 2016.

An extended version of the IMECS 2012 paper.

2015

Z. Cheng, Z. Deng, X. Hu, B. Zhang, T. Yang, “Efficient reinforcement learning of a reservoir network model of parametric working memory achieved with a cluster population winner-take-all readout mechanism,” Journal of Neurophysiology, vol.114, no. 6, 3296-3305, 2015. Learning of a reservoir network for working memory of monkey brain.
X. Li, S. Qian, F. Peng, J. Yang, X. Hu, and R. Xia, "Deep convolutional neural network and multi-view stacking ensemble in Ali mobile recommendation algorithm competition," The First International Workshop on Mobile Data Mining & Human Mobility Computing (ICDM 2015). The team won the Ali competition. Rank 1st over 7186 teams. .
M. Liang, X. Hu, B. Zhang, “Convolutional neural networks with intra-layer recurrent connections for scene labeling,” Advances in Neural Information Processing Systems(NIPS), Montréal, Canada, Dec. 7-12, 2015. caffe configs An application of the recurrent CNN. It achieves excellent performance on the Stanford Background and SIFT Flow datasets.
Y. Zhou, X. Hu, B. Zhang, “Interlinked convolutional neural networks for face parsing,” International Symposium on Neural Networks (ISNN), Jeju, Korea, Oct. 15-18, 2015, pp. 222-231. A two-stage pipeline is proposed for face parsing and both stages use iCNN, which is a set of CNNs with interlinkage in the convolutional layers. Source codes
M. Liang, X. Hu, “Recurrent convolutional neural network for object recognition,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA, June 7-12, 2015 , pp. 3367-3375 . cuda-convnet2 configs (used in the paper) caffe configs torch version pytorch version (by Xiao Li) Typical deep learning models for object recognition have feedforward architectures including HMAX and CNN.This is a crude approximation of the visual pathway in the brain since there are abundant recurrent connections in the visual cortex. We show that adding recurrent connections to CNN improves its performance in object recognition.
X. Zhang, Q. Zhang, X. Hu, B. Zhang, “Neural representation of three-dimensional acoustic space in the human temporal lobe,” Frontiers in Human Neuroscience, vol. 9, article 203, 2015. doi: 10.3389/fnhum.2015.00203 Humans are able to localize the sounds in the environment. How the locations are encoded in the cortex remains elusive. Using fMRI and machine learning techniques, we investigated how the temporal cortex of humans encodes the 3D acoustic space.
M. Liang, X. Hu, “Predicting eye fixations with higher-level visual features,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 1178-1189, 2015. codes There is a debate about whether low-level features or high-level features are more important for prediction eye fixations. Through experiments, we show that mid-level features and object-level features are indeed more effective for this task. We obtained state-of-the-art results on several benchmark datasets including Toronto, MIT, Kootstra and ASCMN at the time of submission.
M. Liang, X. Hu, “Feature selection in supervised saliency prediction,” IEEE Transactions on Cybernetics, vol. 45, no. 5, pp. 900-912, 2015. (Download the computed saliency maps here) There is a trend for incorporating more and more features for supervised learning of visual saliency on natural images. We find much redundancy among these features by showing that a small subset of features leads to excellent performance on several benchmark datasets. In addition, these features are robust across different datasets.
Q. Zhang, X. Hu, B. Zhang, “Comparison of L1-Norm SVR and Sparse Coding Algorithms for Linear Regression,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 8, pp. 1828-1833, 2015. MATLAB codes The close connection between the L1-norm support vector regression (SVR) and sparse coding (SC) is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the L1-SVR algorithms in efficiency. The SC algorithms are then used to design RBF networks, which are more efficient than the well-known orthogonal least squares algorithm.

2014

T. Shi, M. Liang, X. Hu, “A reverse hierarchy model for predicting eye fixations,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, June 24-27, 2014, pp. 2822-2829. We present a novel approach for saliency detection in natural images. The idea is from a theory in cognitive neuroscience, called reverse hierarchy theory, which proposes that attention propagates from the top level of the visual hierarchy to the bottom level.
X. Hu, J. Zhang, J. Li, B. Zhang, “Sparsity-regularized HMAX for visual recognition,” PLOS ONE, vol. 9, no. 1, e81813, 2014. MATLAB codes We show that a deep learning model with alternating sparse coding/ICA and local max pooling can learn higher-level features on images without labels. After training on a dataset with 1500 images, in which there were 150 unaligned faces, 6 units on the top layer became face detectors. This took a few hours on a laptop computer with 2 cores, in contrast to Google's 16,000 cores in a similar project.
X. Hu, J. Zhang, P. Qi, B. Zhang, “Modeling response properties of V2 neurons using a hierarchical K-means model,” Neurocomputing, vol. 134, pp. 198-205, 2014. We show that the simple data clustering algorithm, K-means can be used to model some properties of V2 neurons if we stack them into a hierarchical structure. It is more biologically feasible than the sparse DBN for doing the same thing because it can be realized by competitive hebbian learning. This is an extended version of our ICONIP'12 paper.
P. Qi, X. Hu, “Learning nonlinear statistical regularities in natural images by modeling the outer product of image intensities,” Neural Computation, vol. 26, no. 4, pp. 693–711, 2014. MATLAB codes This is a hierarchical model aimed at modeling the properties of complex cells in the primary visual cortex (V1). It can be regarded as a simplified version of Karklin and Lewicki's model published in 2009.

2013

P. Qi, S. Su, X. Hu, “Modeling outer products of features for image classification,” Proc. of the 6th International Conference on Advanced Computational Intelligence (ICACI), Hangzhou, China, Oct. 19-21, 2013, pp.334-338.

The method described in our 2014 Neural Computation paper was applied on SIFT features for image classification (in the SPM framework), which achieved higher accuracy on two datasets than traditional sparse coding.

M. Liang, M. Yuan, X. Hu, J. Li and H. Liu, “Traffic sign detection by ROI extraction and histogram features-based recognition,” Proc. of the 2013 International Joint Conference on Neural Network (IJCNN), Dallas, USA, Aug. 4-9, 2013, pp. 739-746.

The paper describes our method used for the IJCNN 2013 German Traffic Sign Detection Competition. This method achieved 100% accuracy on the Prohibitory signs!

Y. Wu, Y. Liu, J. Li, H. Liu, X. Hu, “Traffic sign detection based on convolutional neural networks,” Proc. of the 2013 International Joint Conference on Neural Network (IJCNN), Dallas, USA, Aug. 4-9, 2013, pp. 747-753.

The paper describes another method used for the IJCNN 2013 German Traffic Sign Detection Competition. This method ranked 2nd and 4th on the Mandatory and Danger signs, respectively!

2012

Y. Yang, Q. He, X. Hu, “A compact neural network for training support vector machines,” Neurocomputing, vol. 86, pp. 193-198, 2012. A simple analog circuit is proposed for solving SVM. It takes advantages of the nonlinear properties of operational amplifiers.
X. Hu and J. Wang, “Solving the assignment problem using continuous-time and discrete-time improved dual networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 5, pp. 821-827, 2012. Assign n entities to n slots and each assignment has a cost.
X. Hu, P. Qi, B. Zhang, “Hierarchical K-means algorithm for modeling visual area V2 neurons,” Proc. of 19th International Conference on Neural Information Processing (ICONIP), Doha, Qatar, Nov. 12-15, 2012, pp. 373-381. An extended version is in our 2014 neurocomputing paper.
Y. Yang, S. Wang, X. Hu, J. Li, B. Xu, “A modified k-shortest paths algorithm for solving the earliest arrival problem on the time-dependent model of transportation systems,” Proc. of International MultiConference of Engineers and Computer Scientists (IMECS), Hong Kong, March 14-16, 2012, pp. 1560-1567. If one wants to go to city B from city A by train and wants to arrive at A as early as possible, could you provide some "good" itinararies? Here is a fast solution. It gives you K best solutions for any citis A and B of mainland China within 30ms on a small server when K<100.

X. Hu and J. Wang, “Solving the k-winners-take-all problem and the oligopoly Cournot-Nash equilibrium problem using the general projection neural networks.” Proc. of 14th International Conference on Neural Information Processing (ICONIP), Kitakyushu, Japan, Nov. 13-16, 2007, pp. 703-712.

S. Liu, X. Hu and J. Wang, “Obstacle Avoidance for Kinematically Redundant Manipulators Based on an Improved Problem Formulation and the Simplified Dual Neural Network”, Proc. of IEEE Three-Rivers Workshop on Soft Computing in Industrial Applications, Passau, Bavaria, Germany, August 1-3, 2007, pp. 67-72.

X. Hu and J. Wang, “Convergence of a recurrent neural network for nonconvex optimization based on an augmented Lagrangian function,” Proc. of 4th International Symposium on Neural Networks, Part III, Nanjing, China, June 3-7, 2007.

2006

X. Hu and J. Wang, “Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network,” IEEE Transactions on Neural Networks, vol. 17, no. 6, pp. 1487-1499, 2006.

X. Hu and J. Wang, “Solving extended linear programming problems using a class of recurrent neural networks,” Proc. of 13th International Conference on Neural Information Processing, Part II, Hong Kong, Oct. 3-6, 2006.

Publications

Preprint

2025

Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu, “PBCAT: patch-based composite adversarial training against physically realizable attacks on object detection,” Proc. of the International Conference on Computer Vision (ICCV), Honolulu, Hawaii, Oct 19th-23rd, 2025.

2024

Haoran He, Peilin Wu, Chenjia Bai, Hang Lai, Lingxiao Wang, Ling Pan, Xiaolin Hu, Weinan Zhang, “Bridging the sim-to-real gap from the information bottleneck perspective,” Conference on Robot Learning. (Oral)

Gang Zhang Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu, “SAFDNet: A simple and effective network for fully sparse 3D object detection,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, June 17-21, 2024. (Oral, 90 out of about 11500)

2023

Xiao Li, Ziqi Wang, Bo Zhang, Fuchun Sun, Xiaolin Hu, “Recognizing object by components with human prior knowledge enhances adversarial robustness of deep neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8861-8873, July 2023.

2022

Zhanhao Hu, Siyuan Huang, Xiaopei Zhu, Fuchun Sun, Bo Zhang, Xiaolin Hu, “Adversarial texture for fooling person detectors in the physical world”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, Louisian, June 19-24, 2022. (Oral)

Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu, “Infrared invisible clothing: hiding from infrared detectors at multiple angles in real world”, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Orleans, Louisian, June 19-24, 2022. (Oral)

2021

2020

Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen, “Generating adjacency-constrained subgoals in hierarchical reinforcement learning,” Advances in Neural Information Processing Systems (NeurIPS), Dec 6-12, 2020. (Spotlight)

2019

Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, Xiaolin Hu, “Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization,” Proc. of IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, Oct 27–Nov 2, 2019. pp. 4997-5006.

Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu, “Knowledge distillation via route constrained optimization,” Proc. of IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, Oct 27–Nov 2, 2019. pp. 1345-1354. (Oral)

2018

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li, “Boosting adversarial attacks with momentum,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018.(Spotlight)

2017

J. Wu, L. Ma, X. Hu, “Delving deeper into convolutional neural networks for camera relocalization,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29- June 3, 2017.

F. Liao, X. Hu, S. Song, “Emergence of V1 recurrent connectivity pattern in artificial neural network,”Computational and Systems Neuroscience (Cosyne), Salt Lake City, Feb. 23 - 26, 2017.

2016

2015

X. Li, S. Qian, F. Peng, J. Yang, X. Hu, and R. Xia, "Deep convolutional neural network and multi-view stacking ensemble in Ali mobile recommendation algorithm competition," The First International Workshop on Mobile Data Mining & Human Mobility Computing (ICDM 2015).

2014

2013

2012

2011

2010