Elon Musk Launches Grok AI 1.5 Vision with Advanced Image Skills; to Compete with ChatGPT 4 Vision

By Reetika Bhatt - April 16, 2024
The announcement regarding the upgrade came via the official X (formerly Twitter) account of xAI. The company shared a blog post describing the new AI model alongside a few benchmark results.

The Grok 1.5 Vision is a new offering from Elon Musk's artificial intelligence (AI) company, xAI. With the debut of Grok 1.5V, xAl's first multimodal Al model, the race in artificial intelligence (AI) intensifies. By incorporating image processing, the most recent version of Grok expands on the text capabilities of its predecessor and can now comprehend papers, charts, diagrams, screenshots, and photos. Although not yet available to the public, early testers and current Grok users will soon be able to access Grok-1.5V via xAl. This closed beta phase implies that before a broader release, xAl is still fine-tuning the model. Interestingly, this development came just a few days after OpenAI unveiled its own GPT-4 model powered by computer vision. Meanwhile, here's all you need to know about the new Grok 1.5 Vision.

 Grok 1.5 Vision: Key Details

The highlight feature of Grok-1.5V, also known as Grok-1.5 Vision, is its capacity to process visual and textual data. Thanks to its multimodal approach, Grok is positioned as a rival to well-known models such as OpenAl's ChatGPT, Google's Gemini, and Anthropic's Claude. xAI claims that Grok-1.5V outperforms competing frontier models across multiple areas, such as multidisciplinary reasoning and complex visual understanding. Notably, the model performs better than its contemporaries on a newly created xAl- RealWorldQA benchmark.

The announcement regarding the upgrade came via the official X account of xAI. The company shared a blog post describing the new AI model alongside a few benchmark results. Most of the specifics are unchanged from the previously released Grok 1.5 model, which now includes vision capabilities. Both the overall benchmark scores and the context window of 1,28,000 tokens are probably going to stay the same.



RealWorldQA Assessment

Additionally, xAI released the Grok 1.5 Vision benchmark scores obtained using a proprietary benchmark. The firm has named it the RealWorldQA benchmark. In particular, RealWorldQA evaluates a model's comprehension of fundamental spatial concepts in the real world. Advanced AI models frequently encounter difficulties with tasks that appear straightforward to humans. The first dataset consists of more than 700 anonymised photos taken in real-world environments, such as cars, together with a question and an authentic response for each image. Furthermore, the model was tested using several different benchmarks, including MMMU, Mathvista, and ChartQA. Grok performed better in RealWorldQA than OpenAI's GPT-4 with Vision and Gemini 1.5 Pro, but it scored comparatively less in MMMU and ChartQA assessments.

Computer Vision

For those who are unaware, computer vision is the area of computer science that focuses on giving computers—including artificial intelligence (AI) models—the capacity to recognise and comprehend objects in the real world using pictures and videos. This helps computers perceive and interpret visual signals like that of humans. Many companies are increasingly concentrating on creating models with a vision-focused approach due to the growth of multimodal AI models. This feature is shared by OpenAI's GPT-4 with Vision and Google's Gemini 1.5 Pro.

Grok-1.5V, according to xAI, is a major step toward creating "beneficial AGI" (Artificial General Intelligence), or an Al with a thorough grasp of the outside world. In the upcoming months, the firm intends to significantly improve Grok's ability to interpret audio and video data in addition to photos. With this breakthrough, Elon Musk's xAI is now vying for the top spot in the AI market.

  • Tags
  • Elon Musk
  • Grok AI 1.5
  • Grok AI
  • ChatGPT
  • ChatGPT 4 Vision
  • Grok AI 1.5 Vision