TritonGPT Large Language Models (LLMs)

Last Updated: June 16, 2025 4:00:25 PM PDT

As commercial generative AI services continue to evolve, users are increasingly familiar with large language models (LLMs) from providers such as OpenAI and others. In response, the TritonGPT platform at UC San Diego now offers access to multiple LLMs, empowering users to choose the model best suited to their specific needs.

To maintain the highest standards of data protection, all models are deployed through secure, institutionally governed infrastructure that ensures compliance, privacy, and robust security controls.

Llama

Meta's Llama, a dialogue-optimized, open-source LLM, which is hosted securely on-premises at the San Diego Supercomputer Center (SDSC) is the default model.

By hosting the Llama model within SDSC's secure environment, UC San Diego retains full control over data usage and sharing. All processing occurs entirely within the university's infrastructure, ensuring institutional oversight and protection.

OpenAI Models

The OpenAI models are made available through Microsoft Azure. Data shared with the Azure OpenAI Service remains private and secure. Microsoft operates this service entirely within the Azure AI Foundry environment, ensuring that user inputs and outputs are not shared with OpenAI or any other external entities. Additionally, this data is never used to train or enhance any models. The Azure OpenAI Service operates independently and does not interact with OpenAI-operated services.

Models accessed via Azure AI Foundry are configured to maintain full data residency within the United States.

Commitment to Privacy and Compliance

By leveraging Azure's secure infrastructure for cloud models and SDSC's on-premises hosting for open-source models, UC San Diego ensures that all data privacy, security, and compliance requirements are fully met.

Model Summaries

Model	Description	Designed For
Llama Scout	A fast, lightweight model hosted within TritonGPT’s on-premises architecture. It provides low-latency responses and supports effecient response to questions.	General-purpose chat, experimentation, and scenarios requiring minimal compute overhead.
GPT-4o (Omni)	GPT-4o enables multimodal interaction with strong performance across text, vision, and conversational workflows. Ideal for versatile tasks that benefit from rich dialogue and visual input.	Natural language interaction tuned for content generation, image-based reasoning, and broad administrative or creative tasks.
GPT-4.1	GPT-4.1 is optimized for comlex tasks, long-context comprehension, and coding/software development.	Dialogue, code generation, and data-rich problem solving.