Explore

Fine-tune FLUX fast

Customize FLUX.1 [dev] with the fast FLUX trainer on Replicate

Train the model to recognize and generate new concepts using a small set of example images, for specific styles, characters, or objects. It's fast (under 2 minutes), cheap (under $2), and gives you a warm, runnable model plus LoRA weights to download.

Official models

Official models are always on, maintained, and have predictable pricing.

View all official models

I want to…

Upscale images

Upscaling models that create high-quality images from low-quality images

Restore images

Models that improve or restore images by deblurring, colorization, and removing noise

Enhance videos

Upscaling models that create high-quality video from low-quality videos

Make videos with Wan

Generate videos with Wan, the fastest and highest quality open-source video generation model.

Use Kontext fine-tunes

Browse the diverse range of fine-tunes the community has custom-trained on Replicate

Make 3D stuff

Models that generate 3D objects, scenes, radiance fields, textures and multi-views.

Control image generation

Guide image generation with more than just text. Use edge detection, depth maps, and sketches to get the results you want.

Try for free

Get started with these models without adding a credit card. Whether you're making videos, generating images, or upscaling photos, these are great starting points.

Use official models

Official models are always on, maintained, and have predictable pricing.

Detect objects

Models that detect or segment objects in images and videos.

Use FLUX fine-tunes

Browse the diverse range of fine-tunes the community has custom-trained on Replicate

Latest models

Hailuo 2 is a text-to-video and image-to-video model that can make 6s or 10s videos at 768p (standard) or 1080p (pro). It excels at real world physics.

Updated 22K runs

A low cost and fast version of Hailuo 02. Generate 6s and 10s videos in 512p

Updated 446 runs

Turns your audio/video/images into professional-quality animated videos

Updated 534 runs

A faster and cheaper version of Google’s Veo 3 video model, with audio

Updated 14.6K runs

Sound on: Google’s flagship Veo 3 text to video model, with audio

Updated 129.4K runs

Use kontext to turn any image into an emoji, using a lora by starsfriday

Updated

A very fast and cheap PrunaAI optimized version of Wan 2.2 A14B text-to-video

Updated 5.2K runs

A very fast and cheap PrunaAI optimized version of Wan 2.2 A14B image-to-video

Updated 9.3K runs

PartCrafter is a structured 3D mesh generation model that creates multiple parts and objects from a single RGB image.

Updated 18 runs

The image generation model tailored for local development and personal use

Updated 57 runs

Mediapipe Blendshape Labeler - Predicts the blend shapes of an image.

Updated 210 runs

wan-video/wan-2.2 (all variants) + topazlabs/video-upscale + zsxkib/smart-thinksound

Updated

Wan 2.2 A14B image-to-video with MMaudio

Updated

Updated

Updated

Text-guided image editing model that preserves original details while making targeted modifications like lighting changes, object removal, and style conversion

Updated 64.3K runs

Official CLIP models, generate CLIP (clip-vit-large-patch14) text & image embeddings

Updated 90 runs

Granite-speech-3.3-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST).

Updated 733 runs

Automatically generates expert ThinkSound prompts by analyzing your video w/ Claude 4 - no more struggling with complex audio descriptions

Updated

An opinionated text-to-image model from Black Forest Labs in collaboration with Krea that excels in photorealism. Creates images that avoid the oversaturated "AI look".

Updated 13.8K runs

Voxtral Small is an enhancement of Mistral Small 3 that incorporates state-of-the-art audio input capabilities and excels at speech transcription, translation and audio understanding.

Updated 22 runs

Make a very realistic looking real-world AI video via FLUX 1.1 Pro and Wan 2.2 i2v

Updated

Updated 35.7K runs

Seed-X-PPO-7B by ByteDance-Seed, a powerful series of open-source multilingual translation language models

Updated 16 runs

Updated

Image-to-video at 720p and 480p with Wan 2.2 A14B

Updated 2.9K runs

Generate 6s videos with prompts or images. (Also known as Hailuo). Use a subject reference to make a video with a character and the S2V-01 model.

Updated 554.9K runs

Granite-3.3-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities.

Updated 857K runs

Granite-vision-3.3-2b is a compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

Updated 5.3K runs

Updated 38 runs

Updated

Updated 108 runs

Updated 803 runs

Higgs Audio v2, a powerful text-to-speech audio foundation model that excels in expressive audio generation

Updated 693 runs

Run any ComfyUI workflow. Guide: https://github.com/replicate/cog-comfyui

Updated 6.3M runs

🎤The best open-source speech-to-text model as of Jul 2025, transcribing audio with record 5.63% WER and enabling AI tasks like summarization directly from speech✨

Updated 43 runs

InScene is a LoRA by Peter O’Malley (POM) that's designed to generate images that maintain scene consistency with a source image. It is trained on top of Flux.1-Kontext.dev.

Updated

A video generation model that offers text-to-video and image-to-video support for 5s or 10s videos, at 480p and 720p resolution

Updated 237.6K runs

A pro version of Seedance that offers text-to-video and image-to-video support for 5s or 10s videos, at 480p and 1080p resolution

Updated 168.2K runs

Updated 286 runs

Updated 46 runs

Updated 13 runs

Realistic Inpainting with ControlNET (M-LSD + SEG)

Updated 556K runs

Edit an image with a prompt. This is the hidream-e1.1 model accelerated with the pruna optimisation engine.

Updated 23.1K runs

MOSS-TTSD (text to spoken dialogue) is an open-source bilingual spoken dialogue synthesis model that supports both Chinese and English. It can transform dialogue scripts between two speakers into natural, expressive conversational speech.

Updated 51 runs

Generate an image using the previously generated image as the input with a recursive prompt.

Updated

Turn satellite imagery into professional-quality aerial shots

Updated

"Zoom out" with this FLUX Kontext LoRA

Updated

Overlay one image over another to merge them

Updated

Accelerated variant of Photon prioritizing speed while maintaining quality

Updated 128.3K runs