Suivre
Sihan Chen
Titre
Citée par
Citée par
Année
Cptr: Full transformer network for image captioning
W Liu, S Chen, L Guo, X Zhu, J Liu
arXiv preprint arXiv:2101.10804, 2021
1552021
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
S Chen, X He, L Guo, X Zhu, W Wang, J Tang, J Liu
arXiv preprint arXiv:2304.08345, 2023
542023
Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset
S Chen, H Li, Q Wang, Z Zhao, M Sun, X Zhu, J Liu
Advances in Neural Information Processing Systems 36, 2024
322024
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu
arXiv preprint arXiv:2305.16103, 2023
292023
Global-local propagation network for RGB-D semantic segmentation
S Chen, X Zhu, W Liu, X He, J Liu
arXiv preprint arXiv:2101.10801, 2021
182021
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
X He, S Chen, F Ma, Z Huang, X Jin, Z Liu, D Fu, Y Yang, J Liu, J Feng
arXiv preprint arXiv:2305.13167, 2023
162023
MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques
S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu
Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021
52021
VL-Mamba: Exploring State Space Models for Multimodal Learning
Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu
arXiv preprint arXiv:2403.13600, 2024
32024
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
S Chen, X He, H Li, X Jin, J Feng, J Liu
arXiv preprint arXiv:2306.09085, 2023
32023
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
J Liu, W Wang, S Chen, X Zhu, J Liu
IEEE Transactions on Multimedia, 2023
32023
GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
M Sun, W Wang, Z Qin, J Sun, S Chen, J Liu
Advances in Neural Information Processing Systems 36, 2024
12024
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Y Yan, X He, W Wang, S Chen, J Liu
arXiv preprint arXiv:2308.09779, 2023
2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Z Liu, S Chen, L Guo, H Li, X He, J Liu
arXiv preprint arXiv:2305.11769, 2023
2023
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–13