New Delhi: Meta, formerly known as Facebook, has recently launched a cutting-edge artificial intelligence (AI) model called “CM3leon” (pronounced like chameleon). This innovative model possesses the ability to generate both text-to-image and image-to-text conversions.
In a blog post, Meta stated that CM3leon is a multimodal model, which is trained using a recipe that has been modified from text-only language models. The training process involves a large-scale retrieval- augmented pre-training stage, followed by a second multitask supervised fine-tuning (SFT) stage.
Utilizing the advanced capabilities of CM3leon, Meta asserts that their image generation tools can now create more coherent and visually pleasing imagery, effectively aligning with the input prompts provided. Remarkably, CM3leon achieves this with only five times the computing power and a smaller training dataset compared to previous transformer-based methods.
Extensive evaluation against the widely used image generation benchmark, zero-shot MS-COCO, revealed that CM3leon attained an outstanding FID (Frechet Inception Distance) score of 4.88. This establishes a new state-of-the-art in the field of text-to-image generation, surpassing Google’s text-to- image model, Parti. Furthermore, Meta highlights that CM3leon showcases exceptional performance across various vision-language tasks, including visual question answering and long-form captioning.
Impressively, despite being trained on a relatively small dataset of only three billion text tokens, CM3leon achieves competitive zero-shot performance when compared to larger models trained on larger datasets. Meta believes that CM3leon’s exceptional performance across diverse tasks signifies a significant step toward higher-fidelity image generation and comprehension.
Meta envisions that models like CM3leon will ultimately enhance creativity and facilitate improved applications in the metaverse. The company looks forward to pushing the boundaries of multimodal language models and releasing additional models in the future.




























