Study Record/AI Data Science

Contrastive Learning

Sungyeon Kim 2024. 6. 28. 00:10

1. Contrastive Learning

1) 초기에는 이미지 task을 위해 이미지 representation만을 학습한 모델이

2) language랑 멀티모달 task로 확장되는 것을 contrasting learning이라 함

** 핵심은 이미지랑 의미적으로 유사한 (sementically related) text pair을 가져오는 것.

 

2. CLIP (Contrastive Language-Image Pre-training)

1) separate encoders for images and text

2) project them into a shared latent space

3) the model is trained on image-text pairs

4) only align image-text pairs

 

3. Triple Contrastive Learning (TCL)

1) incorporate both cross-modal and intra-modal objectives

2) not only align image-text pairs but also ensure that similar inputs within the same modality are close in the representiation space.