Exllama
exllama is a memory-efficient tool for executing Hugging Face transformers with the LLaMA models using quantized weights, enabling high-performance NLP tasks on modern GPUs while minimizing memory usage and supporting various hardware configurations.
Content is being generated for this tool. Please check back soon!
You Might Also Like
Awan LLM
Awan LLM is an AI inference API that offers unlimited token access for...
PromptPoint
PromptPoint Playground simplifies prompt engineering through template-...
Groq
Groq sets the standard for GenAI inference speed, leveraging LPU techn...
InfinityFlow
Infinity AI-Native Database LLM facilitates efficient management and q...
PromptsLabs
PromptsLabs is an AI prompt library for Large Language Model testing, ...
KindlLM
Kindllm is an AI-powered chat app designed for Kindle Paperwhites, off...
Inceptionlabs - Mercury coder
Inception Labs' diffusion-based large language models (dLLMs) offer fa...
Composable prompts
Composable is an API-first platform for developing AI and LLM applicat...
onedollarai.lol
OneDollarAI.lol provides affordable access to advanced large language ...
EvalsOne
EvalsOne is an AI tool that optimizes LLM prompts via prompt evaluatio...
Weavel
Ape by Weavel is an AI prompt engineer that enhances language model pe...
LLMStack
llmstack is an open-source platform for building AI apps and chatbots ...