Inferless

All

181 repositories

mistral-small-3.2-24b-instruct
Public
0•0•0•0•Updated Aug 18, 2025Aug 18, 2025
qwen-image
Public
Python
•0•0•0•0•Updated Aug 14, 2025Aug 14, 2025
pyannote-speaker-diarization-3.1
Public template
A state-of-the-art model that segments and labels audio recordings by accurately distinguishing different speakers. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
audio-generation
Python
•3•5•0•0•Updated Aug 14, 2025Aug 14, 2025
facebook-bart-cnn
Public template
A variant of the BART model designed specifically for natural language summarization. It was pre-trained on a large corpus of English text and later fine-tuned on the CNN/Daily Mail dataset. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•3•9•0•1•Updated Aug 14, 2025Aug 14, 2025
Qwen3-30B-A3B-Instruct-2507
Public
1•0•0•0•Updated Aug 12, 2025Aug 12, 2025
qwen3-coder-30B-a3B-instruct
Public
30.5B MoE code generation model purpose-tuned for code generation and agentic tool use. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
code-generation
Python
•0•0•0•0•Updated Aug 12, 2025Aug 12, 2025
gpt-oss-20b
Public template
A 21B open‑weight language model (with ~3.6 billion active parameters per token) developed by OpenAI for reasoning, tool integration, and low‑latency usage. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•0•0•0•0•Updated Aug 6, 2025Aug 6, 2025
voxtral-mini-3b
Public template
3B parameter audio-language model with speech transcription, translation, and audio understanding capabilities. <metadata> gpu: A10 | collections:["HF_Transformers"] </metadata>
audio-to-text
Python
•1•0•0•0•Updated Jul 30, 2025Jul 30, 2025
kyutai-tts-1.6b
Public template
1.6B parameter text-to-speech model that supports real-time streaming text input with ultra-low latency and voice conditioning capabilities.<metadata> gpu: A10 | collections:["HF_Transformers"] </metadata>
audio-generation
Python
•0•0•0•0•Updated Jul 30, 2025Jul 30, 2025
llama-3.1-8b-instruct-gguf
Public template
An 8B-parameter, instruction-tuned variant of Meta's Llama-3.1 model, optimized in GGUF format for efficient inference. <metadata> gpu: A100 | collections: ["lama.cpp"] </metadata>
generate-text
Python
•7•1•0•0•Updated Jul 17, 2025Jul 17, 2025
stable-diffusion-3-5-large-turbo
Public template
A fast, optimized diffusion model that generates high-quality images from text prompts, ideal for creative visual content. <metadata> gpu: A100 | collections: ["Diffusers"] </metadata>
image-generation
Python
•11•0•0•0•Updated Jul 17, 2025Jul 17, 2025
jina-embeddings-v4
Public template
A 3.8B multimodal-multilingual embedding that unifies text and image understanding in a single late-interaction space, delivers both dense and multi-vector outputs. <metadata> gpu: A10 | collections: ["HF_Transformers"] </metadata>
generate-embeddings
Python
•0•0•0•0•Updated Jul 13, 2025Jul 13, 2025
flux-1-kontext-dev
Public template
12B model from Black Forest Labs that allows in‑context image editing with character and style consistency; supporting iterative, instruction-guided edits. <metadata> gpu: A100 | collections: ["HF_Transformers"] </metadata>
image-editing
Python
•1•0•0•0•Updated Jul 13, 2025Jul 13, 2025
gemma-3n-e4b-it
Public template
8B variant of the lightweight Gemma 3n series that operates with a 4B‑parameter memory footprint, enabling full multimodal inference (text, image, audio, video) on resource‑constrained hardware. <metadata> gpu: A100 | collections: ["HF_Transformers"] </metadata>
generate-text
Python
•1•0•1•0•Updated Jul 13, 2025Jul 13, 2025
qwen3-embedding-0.6b
Public template
600M parameter, 100 language embedding model that turns up to 32k token inputs into instruction-aware vectors. <metadata> gpu: A10 | collections: ["HF_Transformers"] </metadata>
generate-embeddings
Python
•2•0•0•0•Updated Jun 23, 2025Jun 23, 2025
devstral-small
Public template
An agentic LLM for software engineering tasks, excels at using tools to explore codebases, editing multiple files and power software engineering agents. <metadata> gpu: A100 | collections: ["HF_Transformers"] </metadata>
generate-text
Python
•0•0•0•0•Updated Jun 23, 2025Jun 23, 2025
deepseek-r1-qwen3-8b
Public template
A distilled 8B parameter reasoning powerhouse, leveraging deep chain‑of‑thought from the DeepSeek R1‑0528—delivering SOTA open‑source performance. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•0•0•0•0•Updated Jun 23, 2025Jun 23, 2025
nanonets-ocr-s
Public template
Nanonets-OCR-s that turns images or PDFs into structured Markdown capturing tables, LaTeX, captions and tags—for fast, powerful, human-readable OCR. <metadata> gpu: A10 | collections: ["HF_Transformers"] </metadata>
question-answering
Python
•4•1•0•0•Updated Jun 23, 2025Jun 23, 2025
Open-NotebookLM
Public
Python
•0•0•0•0•Updated Jun 11, 2025Jun 11, 2025
yolo11m-detect
Public
Python
•1•0•0•0•Updated May 20, 2025May 20, 2025
kokoro
Public template
82M parameters lightweight text-to-speech (TTS) model that delivers high-quality voice synthesis. <metadata> gpu: T4 | collections: ["SSE Events"] </metadata>
audio-generation
Python
•2•1•0•0•Updated May 19, 2025May 19, 2025
qwen3-14b
Public template
14B model with hybrid approach to problem-solving with two distinct modes: "thinking mode," which enables step-by-step reasoning and "non-thinking mode," designed for rapid, general-purpose responses. <metadata> gpu: A100 | collections: ["vLLM"] </metadata>
generate-text
Python
•1•0•0•0•Updated May 15, 2025May 15, 2025
qwen2.5-omni-7b
Public template
An advanced end-to-end multimodal which can processes text, images, audio, and video inputs, generating real-time text and natural speech responses. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
question-answering
Python
•0•0•0•0•Updated May 12, 2025May 12, 2025
qwen3-8b
Public template
Qwen3-8B is a language model that supports seamless switching between “thinking” mode-for advanced math, coding, and logical inference-and “non-thinking” mode for fast, natural conversation. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•1•0•0•0•Updated May 12, 2025May 12, 2025
MCP-Google-Map-Agent
Public
Python
•1•0•0•0•Updated Apr 30, 2025Apr 30, 2025
phi-4-multimodal-instruct
Public template
State‑of‑the‑art multimodal foundation model developed by Microsoft Research which seamlessly fuses robust language understanding with advanced visual and audio analysis. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
question-answering
Python
•4•0•0•0•Updated Apr 27, 2025Apr 27, 2025
stable-diffusion-3.5-large
Public
8B model, excels in producing high-quality, detailed images up to 1 megapixel in resolution. <metadata> gpu: A100 | collections: ["Diffusers"] </metadata>
image-generation
Python
•0•0•0•0•Updated Apr 21, 2025Apr 21, 2025
phi-4-GGUF
Public template
A 14B model optimized in GGUF format for efficient inference, designed to excel in complex reasoning tasks. <metadata> gpu: A100 | collections: ["llama.cpp","GGUF"] </metadata>
generate-text
Python
•7•0•0•0•Updated Apr 19, 2025Apr 19, 2025
tinyllama-1-1b-chat-v1-0
Public template
A chat model fine-tuned on TinyLlama, a compact 1.1B Llama model pretrained on 3 trillion tokens. <metadata> gpu: T4 | collections: ["vLLM"] </metadata>
generate-text
Python
•4•1•0•0•Updated Apr 18, 2025Apr 18, 2025
llama-2-13b-chat-hf
Public template
A 13B model fine-tuned with reinforcement learning from human feedback, part of Meta’s Llama 2 family for dialogue tasks. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>
generate-text
Python
•1•1•0•0•Updated Apr 18, 2025Apr 18, 2025