Artificial Analysis’ cover photo
Artificial Analysis

Artificial Analysis

Technology, Information and Internet

Independent analysis of AI models and hosting providers: https://artificialanalysis.ai/

About us

Leading independent analysis of AI. Understand the AI landscape to choose the best AI technologies for your use case. Backed by Nat Friedman, Daniel Gross and Andrew Ng.

Website
https://artificialanalysis.ai/
Industry
Technology, Information and Internet
Company size
11-50 employees
Type
Privately Held

Employees at Artificial Analysis

Updates

  • Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

    View profile for Andrew Ng
    Andrew Ng Andrew Ng is an Influencer

    Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI

    Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

  • Wan2.2 A14B is the new leading open weights video model! Veo 2 and Kling 2.0-level Text to Video generation quality is now accessible to the open weights community Alibaba's latest release features 27B total parameters with 14B active under Apache 2.0 license. The model excels in Text to Video (7th place) but struggles more with Image to Video (14th), likely due to its 16fps limitation compared to competitors' 24fps support. Alibaba remains one of the few leaders in open weights video models, as the vast majority of releases are proprietary endpoints. Open weights models like Wan2.2 are crucial for developers seeking fine-tuning capabilities and customization that proprietary APIs don't offer. While it leads open source options, the performance gap with SOTA models like Veo 3 and Seedance 1.0 remains substantial.

    • No alternative text description for this image
  • Cerebras has been demonstrating its ability to host large MoEs at very high speeds this week, launching Qwen3 235B 2507 and Qwen3 Coder 480B endpoints at >1,500 output tokens/s ➤ Cerebras Systems now offers endpoints for both Qwen3 235B 2507 Reasoning & Non-reasoning. Both models have 235B total parameters with 22B active. ➤ Qwen 3 235B 2507 Reasoning offers intelligence comparable to o4-mini (high) & DeepSeek R1 0528. The Non-reasoning variant offers intelligence comparable to Kimi K2 and well above GPT-4.1 and Llama 4 Maverick. ➤ Qwen3 Coder 480B has 480B total parameters with 35B active. This model is particularly strong for agentic coding and can be used in a variety of coding agent tools, including the Qwen3-Coder CLI. ➤ One of the most impressive views of these large models on Cerebras is the End-to-End response time achievable on Qwen3 235B 2507 (Reasoning): the model can get through input processing, reasoning and output for our standard 1K token test query in <2 seconds. Cerebras’ launches represent the first time this level of intelligence has been accessible at these output speeds and have the potential to unlock new use cases - like using a reasoning model for each step of an agent without having to wait minutes.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Kolors 2.1 debuts at #5 in the Artificial Analysis Image Arena! Kuaishou Kling AI, best known for their video models, delivers a strong showing in image generation Kuaishou, primarily recognized for their Kling video generation models, has released Kolors 2.1, a frontier quality Image model. Kolors 2.1 produces high-quality images with particularly strong text rendering capabilities, placing 3rd in the Text and Typography category. The model supports up to 2K resolution image generation. The model is also priced competitively at $14/1k images, compared to Seedream 3.0 at $30/1k images, Imagen 4 Preview 0606 at $40/1k images, or GPT-4o at ~$167/1k images. Kolors 2.1 is currently accessible via the KlingAI API, as well as in the Kling app.

    • No alternative text description for this image
  • 🇰🇷 LG recently launched EXAONE 4.0 32B - it scores 64 on Artificial Analysis Intelligence Index, the highest score for a 32B model yet LG AI Research's EXAONE 4.0 is released in two variants: the 32B hybrid reasoning model we’re reporting benchmarking results for here, and a smaller 1.2B model designed for on-device applications that we have not benchmarked yet. Alongside Upstage's recent Solar Pro 2 release, it's exciting to see Korean labs join the US and China near the top of the intelligence charts. Key results: ➤ 🧠 EXAONE 4.0 32B (Reasoning): In reasoning mode, EXAONE 4.0 scores 64 on the Artificial Analysis Intelligence Index. This matches Claude 4 Opus and the new Llama Nemotron Super 49B v1.5 from NVIDIA, and sits only 1 point behind Gemini 2.5 Flash ➤⚡ EXAONE 4.0 32B (Non-Reasoning): In non-reasoning mode, EXAONE 4.0 scores 51 on the Artificial Analysis Intelligence Index. It matches Llama 4 Maverick in intelligence despite having only ~1/4th total parameters (although ~2x the active parameters) ➤ ⚙️ Output tokens and verbosity: In reasoning mode, EXAONE 4.0 used 100M output tokens for the Artificial Analysis Intelligence Index. This is higher than some other frontier models, but aligns with recent trends of reasoning models using more output tokens to 'think more' - similar to Llama Nemotron Super 49B v1.5, Grok 4, and Qwen3 235B 2507 Reasoning. In non-reasoning mode, EXAONE 4.0 used 15M tokens - high for a non-reasoner, but not as high as Kimi K2’s 30M. Key details: ➤ Hybrid reasoning: The model offers optionality between 'reasoning' mode and 'non-reasoning' mode ➤ Availability: Hosted by FriendliAI currently, and competitively priced (especially compared to proprietary options) by FriendliAI at $1 per 1M input and output tokens ➤ Open weights: EXAONE 4.0 is an open weights model available under the EXAONE AI Model License Agreement 1.2. The license limits commercial use. ➤ Multimodality: Text only input and output ➤ Context window: 131k tokens ➤ Parameters: 32B active and total parameters, available in 16bit and 8bit precision (means the model can be run on a single H100 chip in full precision)

  • Announcing the Artificial Analysis Music Arena Leaderboard: with >5k votes, Suno v4.5 is the leading Music Generation model followed by Riffusion's FUZZ-1.1 Pro. Google’s Lyria 2 places third in our Instrumental leaderboard, and Udio’s v1.5 Allegro places third in our Vocals leaderboard. The Instrumental Leaderboard is as follows: 🥇 Suno V4.5 🥈 Riffusion FUZZ-1.1 Pro 🥉 Google DeepMind Lyria 2 Udio v1.5 Allegro Stability AI Stable Audio 2.0 Meta MusicGen Vocals Leaderboard is as follows: Fewer models offer vocals generation support. 🥇 Suno and 🥈 Riffusion maintain their rankings with Udio taking the 🥉 Rankings are based on community votes across a diverse range of genres and prompts. Want to see your prompt featured? You can submit prompts in the arena today. Link to participate and link to the leaderboard in the comments below 👇

    • No alternative text description for this image
    • No alternative text description for this image
  • Google has quietly upgraded Imagen 4! Imagen 4 Ultra now ranks #3 in the Artificial Analysis Image Arena, rivaling GPT-4o and Seedream 3.0 as one of the world's best image models! This substantial update brings Imagen 4 much closer to the leading models in our arena. We continue to observe that Imagen 4 Ultra and Standard often produce very similar outputs, though the difference is more pronounced than in previous versions. Key details: ➤ Imagen 4 remains more affordable than GPT-4o at $40/1k images (Standard) and $60/1k images (Ultra), compared to GPT-4o's ~$167/1k images, while being slightly above Seedream 3.0's $30/1k images ➤ Imagen 4 Ultra generates in ~9.5s vs GPT-4o's ~53s and Seedream 3.0's ~4.5s ➤ You can access Imagen 4 via the Gemini app, Vertex AI, fal, and replicate

    • No alternative text description for this image
  • Artificial Analysis reposted this

    View organization page for NVIDIA AI

    1,325,021 followers

    We’re excited to share that 🥇Llama Nemotron Super 49B v1.5 -- our latest open reasoning model -- is now #1 on the Artificial Analysis Intelligence Index - a leaderboard that spans advanced math, science, and agentic tasks, in the 70B open model category. Llama Nemotron Super 49B v1.5 is trained with high-quality reasoning synthetic data generated from models like Qwen3-235B and DeepSeek R1. It delivers state-of-the-art accuracy and throughput, running on a single H100. Key features: 🎯 Leading accuracy on multi-step reasoning, math, coding, and function-calling 🏗️ Post-trained using RPO, DPO, and RLVR across 26M+ synthetic examples 📊 Fully transparent training data and techniques If you're building AI agents and want a high accuracy, fully-open, and transparent reasoning model that you can deploy anywhere, try Super v1.5 on build.nvidia.com or download from Hugging Face 🤗 https://nvda.ws/4ojtlVF ➡️ Leaderboard: https://nvda.ws/4odapYu 📝 Tech Blog: https://nvda.ws/474GQ55

  • NVIDIA has released the latest member of its Nemotron language model family, Llama Nemotron Super (49B) v1.5, reaching a score of 64 on the Artificial Analysis Intelligence Index. The model is an evolution of Super 49B v1 from earlier this year, with advances from post-training on new reasoning datasets generating a 13-point increase in the Intelligence Index. This puts NVIDIA AI’s latest Super 49B release ahead of their previous Ultra 253B parameter model, despite having less than 1/4 the parameters. Leading dense model performance: with this latest iteration, Nemotron Super 49B v1.5 is the only dense model in the top 5 open weights models, competitive with much larger recent MoEs from Alibaba, Deepseek and MiniMax. Key model details: ➤ Retains the same 131k context window as Nemotron Super v1 ➤ Supports reasoning or non-reasoning modes with ‘/no_think’ settings in the system prompt ➤ Released under the NVIDIA Open Model License, as with previous Nemotron models

    • No alternative text description for this image
    • No alternative text description for this image
  • Wan2.2 A14B is the latest open weights video model from Alibaba, and is now in the Artificial Analysis Video Arena Wan2.2 is a video generation model released today by Alibaba with a novel Mixture of Experts architecture. It is available in both T2V and I2V variants with 27B total parameters, and 14B active parameters per inference step. A smaller 5B variant is also available. Compared to Wan2.1, Alibaba has shared that it has been trained on 65.6% more images and 83.2% more videos. Wan2.2 can generate 5s videos at 720p and 16fps, which means videos are less smooth than comparable models which support generation at 24fps. Wan2.2 is open weights under the Apache 2.0 license. See how Wan2.2 A14B compares to other leading video models like Seedance 1.0, Hailuo 02, Veo 3, and Kling 2.1 in our Video Arena!

Similar pages

Browse jobs