Enhance LLM Prompt Quality and Results with MLflow Integration

I remain mostly an AI user and moreover without always understanding the subtleties of LMMs. This use does not prevent me from seeking to improve the results of my prompts.

IA: Moving From POC To Scaling

The title above summarizes the state I am in! Indeed, the time of discovering AI and POCs is over, but how do we enter this phase of rationalization and industrialization? Once the exploratory phase is over, it remains to structure and rationalize the approach, to measure the quality of the results produced by the prompts.

For this post also, you can find all files and prompts, on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/ia_using_mlflow

Just for the record, in a corporate manner, I ask ChatGPT to explain to me more precisely the meaning. Here is the explanation:

The title “IA: Moving From POC To Scaling” describes a phase in AI project development where an AI solution that has successfully passed the Proof of Concept (POC) stage is now being prepared for scaling, meaning it will be expanded and deployed on a larger scale for widespread use. The transition requires careful planning and execution to ensure that the AI system can operate efficiently at a larger scale. This transition from POC to scaling is often seen as a critical milestone in AI development, where a project moves from testing feasibility to real-world implementation and growth.”

Concretely, this doesn’t say much about what should be done… But I tried to understand how to take advantage of the latest “using Prompt Engineering” feature of MLflow.

Why that because, this MLflow’s feature allows you to track LLM responses across different settings of model temperature, max output tokens and of course prompts. This allows for transparent prompt engineering and parameter tuning… And above all you can also leverage on self-hosted LLMs on Ollama!

Mostly, I use Ollama to operate open-source LLMs to have extensive control over the confidentiality of the content sent to the LLMs and to reduce drastically the expenses.

I also ended up finding a video and plenty of documentation from Mlflow.

Concretely, to make MLflow work with LLMs, it is necessary to complete the “Served LLM model” of MLflow to test indifferently open source or paid LLMs by declaring them within MLflow.

The the main obstacle was to add “item” in “Served LLM model” in MLflow dropdown list.

Screen captures from Daniel Liden djliden that show how to leverage on the MLflow’s “using Prompt Engineering” feature. Check https://github.com/djliden/llmops-examples/tree/main

Enhance LLM Prompt Quality and Results with MLflow Integration
Screen capture from Daniel Liden djliden

Enhance LLM Prompt Quality and Results with MLflow Integration
Screen capture from Daniel Liden djliden

Enhance LLM Prompt Quality and Results with MLflow Integration
Screen capture from Daniel Liden djliden

Enhance LLM Prompt Quality and Results with MLflow Integration
Screen capture from Daniel Liden djliden

# mlflow_prompt_eng_ui_assets/config.yaml
# https://github.com/djliden/llmops-examples/blob/00b42c7ec0f7e5914bf77966e84ddbfe02230e18/mlflow_prompt_eng_ui_assets/config.yaml
# https://mlflow.org/docs/latest/llms/gateway/migration.html
# https://github.com/minkj1992/llama3-langchain-mlflow/blob/main/mlflow/config.yaml

endpoints:  # Renamed to "endpoints"
  - name: chat
    endpoint_type: llm/v1/chat  # Renamed to "endpoint_type"
    model:
      provider: openai
      name: gpt-3.5-turbo
      config:
        openai_api_key: $OPENAI_API_KEY
  - name: ollama
    endpoint_type: llm/v1/chat
    model:
      provider: openai
      name: llama3
      config:
        openai_api_key: ""
        # https://ollama.com/blog/openai-compatibility
        openai_api_base: http://host.docker.internal:11434/v1

At the same time, it was necessary to familiarize oneself with how MLflow works. Indeed, given MLflow’s ability to create many experiments and runs, it is better to question a priori the tagging and search capacities and therefore the classification of MLflow to take full advantage of the organization induced by MLflow.

Searching By Params

params.batch_size = "2"
params.model LIKE "GPT%"
params.model ILIKE "gPt%"
params.model LIKE "GPT%" AND params.batch_size = "2"

Searching By Tags

tags."environment" = "notebook"
tags.environment = "notebook"
tags.task = "Classification"
tags.task ILIKE "classif%"
params.model_route LIKE "%mistral-ollama%"
params.model_route LIKE "%gpt4o-azure%"
params.model_route LIKE "%openhermes-ollama%"
tags.nid like "%MZ344252%"

Source: https://mlflow.org/docs/latest/search-runs.html

A good introduction to MLFlow features

If you are looking for a complete and easy introduction to MLFlow on how it works in general. Here is a practical introduction to MLFlow to understand its essential concepts. Unfortunately, the “using Prompt Engineering” feature is not covered by this video series.

This 32 videos’ playlist, made by Manuel Gil, illustrate the main concepts of MLflow

https://www.youtube.com/playlist?list=PLQqR_3C2fhUUkoXAcomOxcvfPwRn90U-g

# A. INSTALL MLFLOW 
# 1. create a anaconda env named using_mlflow

# Conda Environment
conda create --name using_mlflow python=3.9.13
conda info --envs
source activate using_mlflow
conda deactivate
source activate using_mlflow

# if needed to remove
conda env remove -n [NAME_OF_THE_CONDA_ENVIRONMENT]
conda env remove -n using_mlflow

# 2. Install MLflow from PyPI using pip:
pip install mlflow

# test the install
mlflow --version

# B. CREATING EXPERIMENTS IN MLFLOW
# launch the UI
mlflow ui

# Check http://127.0.0.1:5000

Collateral discoveries

1. Using Pydantic to control LLM output
Along the way, another area of improvement is the use of Pydantic to validate the generative AI JSON output format. Indeed, if you want to integrate this JSON response into an API or a webapp, it is better to ensure the validity and consistency of this response.

Source: https://medium.com/@mattchinnock/controlling-large-language-model-output-with-pydantic-74b2af5e79d1

2. Using Crewai
During this exploration, I discovered Crewai and also connect to my LLMs Ollama operated system.

Source: https://docs.crewai.com/

# for the examples available in github, it was required to downgrade crewai to 0.10.0
# From crewai 0.51.1 to crewai 0.10.0
pip install crewai==0.10.0

3. Anythingllm
It is worth mentioning this project “Anythingllm” which presents itself as a “all-in-one AI application that can do RAG, AI Agents, and much more with no code or infrastructure headaches”.

It is possible to install it in three diverse ways in Desktop mode, via Docker or finally via Homebrew.

Once installed in Desktop mode for example, which nevertheless requires 5GB of Storage, it is possible to connect to Ollama for instance locally via the Anythingllm settings and leverage on self-hosted llms provided by Ollama e.g phi3.5, openhermes, mistral-openorca, zephyr, orca-mini… etc

Source: https://docs.anythingllm.com/setup/llm-configuration/local/ollama

Source: https://anythingllm.com/

For model available, check https://anythingllm.com/

More infos

MLflow

AnythingLLM

crewAI

Pydantic

Other