Lulu Tang

Lulu Tang

I am a researcher at Beijing Academy of Artificial Intelligence (BAAI). I completed my postdoctoral research at BAAI and Tsinghua University, under the supervision of Prof. Tiejun Huang and Prof. Jiwen Lu. Prior to that, I obtained my Ph.D. degree from University of Macau, advised by Prof. Zhi-xin Yang and frequently collaborating with Prof. Kui Jia.

My research interests lie in the area of 3D computer vision, 3D generation models, and Vision-Language foundation models.

Email | GitHub | Google Scholar

Selected Publications

(*: Equal Contribution, ♣: corresponding author)

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma*, Huachen Gao*, Haoge Deng*, Zhengxiong Luo, Tiejun Huang, Lulu Tang♣, Xinlong Wang♣,
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A), 2025,(Highlight,~3% acceptance rate)
[arxiv] [Project page] [Code] [Dataset] [Post]

See3D is a scalable visual-conditional MVD model for open-world 3D creation, which can be trained on web-scale video collections without camera pose annotations.

PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization
Aofan Liu, Lulu Tang♣, Ting Pan, Yuguo Yin, Bin Wang, Ao Yang,
IEEE International Conference on Multimedia & Expo (ICME, CCFB), 2025
[arXiv]

PiCo is a jailbreaking framework that bypasses advanced MLLM defenses using token-level typographic attacks to evade input filters and embeds malicious intent in programming instructions to avoid runtime monitoring.

TAP: Tokenize Anything via Prompting
Ting Pan*, Lulu Tang*, Xinlong Wang♣, Shiguang Shan,
European Conference on Computer Vision (ECCV,TH-CPL-A), 2024
[arXiv] [Code] [Demo]

TAP is a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch).

Point-BERT: Pre-Training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu*, Lulu Tang*, Yongming Rao*, Tiejun Huang, Jie Zhou, Jiwen Lu♣
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, CCF-A), 2022
[arXiv] [Code] [Project Page] [Post]

Point-BERT is a new paradigm for learning Transformers in an unsupervised manner by generalizing the concept of BERT onto 3D point cloud data.

Spike Transformer: Monocular Depth Estimation for Spiking Camera
Jiyuan Zhang*, Lulu Tang*, Zhaofei Yu♣, Jiwen Lu, Tiejun Huang
European Conference on Computer Vision (ECCV,TH-CPL-A), 2022
[Paper] [Code] [Model] [Data]

Spike-T is a Transformer-based network for learning spike data and estimating monocular depth from continuous spike streams.

An edge-vector based approximation solution for flexible-scale point cloud upsampling
Luqing Luo*, Lulu Tang*, Wanyi Zhou, Shizheng Wang, Zhi-Xin Yang♣
IEEE/CVF International Conference on Computer Vision (ICCV, CCF-A), 2021
[arXiv] [Code]

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors
Lulu Tang*, Ke Chen*, Chaozheng Wu , Yu Hong , Kui Jia♣, Zhi-Xin Yang♣
IEEE Transactions on Cybernetics, 2020
[arXiv]

Canonical correlation analysis regularization: An effective deep multiview learning baseline for RGB-D object recognition
Lulu Tang*, Zhi-Xin Yang♣, Kui Jia♣
IEEE Transactions on Cognitive and Developmental Systems, 2019
[Paper]

Improving deep learning on point cloud by maximizing mutual information across layers
Di Wang*, Lulu Tang, Xu Wang, Luqing Luo, Zhi-Xin Yang♣
Pattern Recognition, 2022
[Paper]

Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition
Zhi-Xin Yang*, Lulu Tang ♣, Kun Zhang, Pak Kin Wong
Cognitive Computation, 2022
[Paper]