Available models
Text generation models
Models in this category specialize in generating coherent and contextually relevant text across a wide variety of topics and formats. They are designed to understand and produce written language, making them ideal for applications ranging from content creation to conversation simulation.
- GPT-3.5 Turbo: powered by OpenAI, is a faster variant of the original GPT-3.5 model, designed for quick text generation with maintained quality across diverse domains.
- GPT-4: The fourth iteration of the Generative Pre-trained Transformer, known for its deep comprehension and versatile text generation across numerous contexts.
- GPT-4 Turbo: A turbocharged variant of GPT-4 designed for high-speed text generation, combining GPT-4's advanced understanding with improved processing speed.
- Gemini Pro: Gemini Pro is a scalable, general-purpose AI model developed by Google, designed to handle a variety of information. It stands out for its capability to outperform other models of similar size on research benchmarks, offering features like function calling, embeddings, semantic retrieval, custom knowledge grounding, and chat functionality.
- PaLM 2 (legacy): Google's advanced large language model that contributed to generative AI capabilities across various Google technologies.
- Llama-2: A next-generation large language model from Meta, focused on generative AI applications.
- Claude v2.1: Developed by Anthropic, this advanced language model features a 200K token context window, reduced hallucination rates, and heightened accuracy. It's adept at processing extensive content for a range of applications, including business and legal document analysis
- Mistral 7B instruct: A 7.3 billion parameter language model by Mistral AI, fine-tuned for conversational and question-answering tasks. Designed for real-time applications, it showcases adaptability and efficiency across a variety of tasks
Image generation models
These models are capable of creating detailed and diverse visual content from textual descriptions. They combine elements of creativity and artificial intelligence to produce images that match specified prompts, useful in design, art, and visual content generation.
- DALL-E 2: An AI model by OpenAI capable of generating complex images from textual descriptions, known for its creativity and high fidelity outputs.
- DALL-E 3: The successor to DALL-E 2, improving upon its predecessor with enhanced image quality and generation capabilities.
- OpenJorney: A versatile image generation model designed to create detailed and diverse visual content based on textual prompts.
- Stable Diffusion 2.1: A model that excels in generating high-quality images from textual descriptions, known for its speed and versatility.
Voice to text models
Voice to text models are engineered to accurately transcribe spoken language into written text. They are adept at understanding various languages, accents, and dialects, making them essential for dictation, accessibility features, and voice-controlled applications.
- Whisper: An advanced voice recognition model powered by OpenAI, designed to accurately transcribe spoken language into text, excelling in multiple languages and dialects.
Text to voice models
Models in this category convert written text into natural-sounding spoken language. They are utilized in applications that require speech synthesis, such as virtual assistants, audiobooks, and spoken content for the visually impaired.
- TTS: An OpenAI API that generates high-quality spoken audio from text, featuring six preset voices for real-time use and enhanced quality. Optimized for diverse applications, it offers streaming capabilities and supports inputs up to 4096 characters.
Image recognition models
These models analyze and interpret visual information from images or video streams. They can identify objects, scenes, and activities, making them useful for tasks such as image captioning, surveillance, and automated content categorization.
- GPT2 image captioning: A basic AI model designed for generating textual descriptions of uploaded images, leveraging GPT-2's capabilities for visual recognition and interpretation.
Multimodal models
Multimodal models can process and generate content that spans multiple types of input and output, such as text and images. Their versatility allows them to understand and create rich multimedia content, serving applications that require a combination of visual and textual comprehension.
- GPT-4 Turbo Vision: Combines the text generation prowess of GPT-4 Turbo with visual understanding, enabling it to process and generate content based on both text and images.
- Gemini Pro Vision*: A Google-developed multimodal AI model capable of processing text, images, video, audio, and code. Optimized for a wide range of tasks
Assistants
- GPT-4 Turbo Assistant: A specialized version of GPT-4 Turbo designed to act as a virtual assistant, providing high-speed responses to queries across a wide range of topics.