arize-phoenix-evals

LLM Evaluations


Keywords
Explainability, Monitoring, Observability, ai-monitoring, ai-observability, ai-roi, aiengineering, datasets, hacktoberfest, llm-eval, llmops, ml-observability, mlops, model-observability
License
MulanPSL-2.0
Install
pip install arize-phoenix-evals==0.13.0

Documentation

phoenix banner

Phoenix is an open-source AI observability platform designed for experimentation, evaluation, and troubleshooting. It provides:

  • Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation.
  • Evaluation - Leverage LLMs to benchmark your application's performance using response and retrieval evals.
  • Datasets - Create versioned datasets of examples for experimentation, evaluation, and fine-tuning.
  • Experiments - Track and evaluate changes to prompts, LLMs, and retrieval.

Phoenix is vendor and language agnostic with out-of-the-box support for popular frameworks (πŸ¦™LlamaIndex, πŸ¦œβ›“LangChain, Haystack, 🧩DSPy) and LLM providers (OpenAI, Bedrock, MistralAI, VertexAI, LiteLLM, and more). For details on auto-instrumentation, check out the OpenInference project.

Phoenix runs practically anywhere, including your Jupyter notebook, local machine, containerized deployment, or in the cloud.

phoenix_overview.gif

Installation

Install Phoenix via pip or conda

pip install arize-phoenix

Phoenix container images are available via Docker Hub and can be deployed using Docker or Kubernetes.

Features

Key Features Availability
Tracing βœ…
Evaluation βœ…
Retrieval (RAG) Analysis βœ…
Datasets βœ…
Fine-Tuning Export βœ…
Annotations βœ…
Human Feedback βœ…
Experiments βœ…
Embeddings Analysis βœ…
Data Export βœ…
REST API βœ…
GraphQL API βœ…
Data Retention Customizable
Authentication βœ…
Social Login βœ…
RBAC βœ…
Projects βœ…
Self-Hosting βœ…
Jupyter Notebooks βœ…
Sessions In Progress 🚧
Prompt Playground In Progress 🚧
Prompt Management Coming soon ⏱️

Tracing Integrations

Phoenix is built on top of OpenTelemetry and is vendor, language, and framework agnostic.

Python

Integration Package Version Badge
OpenAI openinference-instrumentation-openai PyPI Version
LlamaIndex openinference-instrumentation-llama-index PyPI Version
DSPy openinference-instrumentation-dspy PyPI Version
AWS Bedrock openinference-instrumentation-bedrock PyPI Version
LangChain openinference-instrumentation-langchain PyPI Version
MistralAI openinference-instrumentation-mistralai PyPI Version
Guardrails openinference-instrumentation-guardrails PyPI Version
VertexAI openinference-instrumentation-vertexai PyPI Version
CrewAI openinference-instrumentation-crewai PyPI Version
Haystack openinference-instrumentation-haystack PyPI Version
LiteLLM openinference-instrumentation-litellm PyPI Version
Groq openinference-instrumentation-groq PyPI Version
Instructor openinference-instrumentation-instructor PyPI Version
Anthropic openinference-instrumentation-anthropic PyPI Version

JavaScript

Integration Package Version Badge
OpenAI @arizeai/openinference-instrumentation-openai NPM Version
LangChain.js @arizeai/openinference-instrumentation-langchain NPM Version
Vercel AI SDK @arizeai/openinference-vercel NPM Version

For details about tracing integrations and example applications, see the OpenInference project.

Community

Join our community to connect with thousands of AI builders.

  • 🌍 Join our Slack community.
  • πŸ“š Read our documentation.
  • πŸ’‘ Ask questions and provide feedback in the #phoenix-support channel.
  • 🌟 Leave a star on our GitHub.
  • 🐞 Report bugs with GitHub Issues.
  • 𝕏 Follow us on 𝕏.
  • πŸ’ŒοΈ Sign up for our mailing list.
  • πŸ—ΊοΈ Check out our roadmap to see where we're heading next.

Breaking Changes

See the migration guide for a list of breaking changes.

Copyright, Patent, and License

Copyright 2024 Arize AI, Inc. All Rights Reserved.

Portions of this code are patent protected by one or more U.S. Patents. See IP_NOTICE.

This software is licensed under the terms of the Elastic License 2.0 (ELv2). See LICENSE.