In recent years, the applications of pre-trained models constructed with contrastive learning techniques on large-scale unlabeled data have gained widespread adoption, such as lane detection and face recognition. However, the security and privacy issues of contrastive learning models have increasingly attracted the attention of researchers. This paper focused on the poisoning attack against the multimodal contrastive learning models. Poisoning attack injected carefully crafted data into the training set to change the behavior of victim models. To tackle the issue of existing attacks primarily targeting either text or image encoders individually and failing to fully leverage other modality-related information, this paper proposed a specific targeted poisoning attack, which poisoned both the text and image encoders simultaneously. Firstly, this paper employed a generator utilizing the Beta distribution to produce opacity values, which were used to automatically watermark the images. Subsequently, this paper calculated the number of instances to be collected based on the Euclidean distance between the watermarking instance and the target instance. Following the watermarking process, this paper optimized the instances to generate poisoning instances. Compared with the state-of-the-art attacks, this method achieves a lower poisoning rate, and a better model accuracy.