Enhance LLM Prompt Quality and Results with MLflow Integration

I remain mostly an AI user and moreover without always understanding the subtleties of LMMs. This use does not prevent me from seeking to improve the results of my prompts.

IA: Moving From POC To Scaling

The title above summarizes the state I am in! Indeed, the time of discovering AI and POCs is over, but how do we enter this phase of rationalization and industrialization? Once the exploratory phase is over, it remains to structure and rationalize the approach, to measure the quality of the results produced by the prompts.

For this post also, you can find all files and prompts, on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/ia_using_mlflow

Just for the record, in a corporate manner, I ask ChatGPT to explain to me more precisely the meaning. Here is the explanation:

The title “IA: Moving From POC To Scaling” describes a phase in AI project development where an AI solution that has successfully passed the Proof of Concept (POC) stage is now being prepared for scaling, meaning it will be expanded and deployed on a larger scale for widespread use. The transition requires careful planning and execution to ensure that the AI system can operate efficiently at a larger scale. This transition from POC to scaling is often seen as a critical milestone in AI development, where a project moves from testing feasibility to real-world implementation and growth.”

Concretely, this doesn’t say much about what should be done… But I tried to understand how to take advantage of the latest “using Prompt Engineering” feature of MLflow.

Why that because, this MLflow’s feature allows you to track LLM responses across different settings of model temperature, max output tokens and of course prompts. This allows for transparent prompt engineering and parameter tuning… And above all you can also leverage on self-hosted LLMs on Ollama!

Mostly, I use Ollama to operate open-source LLMs to have extensive control over the confidentiality of the content sent to the LLMs and to reduce drastically the expenses.

I also ended up finding a video and plenty of documentation from Mlflow.

A very educational notebook that fully describes the process of “Comparing LLMs with MLFlow”. Like it is said by the author “This notebook demonstrates how to use MLFlow to compare different text generation models from Hugging Face and compare different generation configurations for those models.”
MLflow: serving LLMs and prompt engineering
MLflow’s Support for LLMs. Nothing less than “aims to alleviate these challenges by introducing a suite of features and tools designed with the end-user in mind”

Concretely, to make MLflow work with LLMs, it is necessary to complete the “Served LLM model” of MLflow to test indifferently open source or paid LLMs by declaring them within MLflow.

The the main obstacle was to add “item” in “Served LLM model” in MLflow dropdown list.

Screen captures from Daniel Liden djliden that show how to leverage on the MLflow’s “using Prompt Engineering” feature. Check https://github.com/djliden/llmops-examples/tree/main

Screen capture from Daniel Liden djliden

# mlflow_prompt_eng_ui_assets/config.yaml
# https://github.com/djliden/llmops-examples/blob/00b42c7ec0f7e5914bf77966e84ddbfe02230e18/mlflow_prompt_eng_ui_assets/config.yaml
# https://mlflow.org/docs/latest/llms/gateway/migration.html
# https://github.com/minkj1992/llama3-langchain-mlflow/blob/main/mlflow/config.yaml

endpoints:  # Renamed to "endpoints"
  - name: chat
    endpoint_type: llm/v1/chat  # Renamed to "endpoint_type"
    model:
      provider: openai
      name: gpt-3.5-turbo
      config:
        openai_api_key: $OPENAI_API_KEY
  - name: ollama
    endpoint_type: llm/v1/chat
    model:
      provider: openai
      name: llama3
      config:
        openai_api_key: ""
        # https://ollama.com/blog/openai-compatibility
        openai_api_base: http://host.docker.internal:11434/v1

At the same time, it was necessary to familiarize oneself with how MLflow works. Indeed, given MLflow’s ability to create many experiments and runs, it is better to question a priori the tagging and search capacities and therefore the classification of MLflow to take full advantage of the organization induced by MLflow.

Searching By Params

params.batch_size = "2"
params.model LIKE "GPT%"
params.model ILIKE "gPt%"
params.model LIKE "GPT%" AND params.batch_size = "2"

Searching By Tags

tags."environment" = "notebook"
tags.environment = "notebook"
tags.task = "Classification"
tags.task ILIKE "classif%"

params.model_route LIKE "%mistral-ollama%"
params.model_route LIKE "%gpt4o-azure%"
params.model_route LIKE "%openhermes-ollama%"
tags.nid like "%MZ344252%"

Source: https://mlflow.org/docs/latest/search-runs.html

A good introduction to MLFlow features

If you are looking for a complete and easy introduction to MLFlow on how it works in general. Here is a practical introduction to MLFlow to understand its essential concepts. Unfortunately, the “using Prompt Engineering” feature is not covered by this video series.

This 32 videos’ playlist, made by Manuel Gil, illustrate the main concepts of MLflow

https://www.youtube.com/playlist?list=PLQqR_3C2fhUUkoXAcomOxcvfPwRn90U-g

# A. INSTALL MLFLOW 
# 1. create a anaconda env named using_mlflow

# Conda Environment
conda create --name using_mlflow python=3.9.13
conda info --envs
source activate using_mlflow
conda deactivate
source activate using_mlflow

# if needed to remove
conda env remove -n [NAME_OF_THE_CONDA_ENVIRONMENT]
conda env remove -n using_mlflow

# 2. Install MLflow from PyPI using pip:
pip install mlflow

# test the install
mlflow --version

# B. CREATING EXPERIMENTS IN MLFLOW
# launch the UI
mlflow ui

# Check http://127.0.0.1:5000

Collateral discoveries

1. Using Pydantic to control LLM output
Along the way, another area of improvement is the use of Pydantic to validate the generative AI JSON output format. Indeed, if you want to integrate this JSON response into an API or a webapp, it is better to ensure the validity and consistency of this response.

Source: https://medium.com/@mattchinnock/controlling-large-language-model-output-with-pydantic-74b2af5e79d1

2. Using Crewai
During this exploration, I discovered Crewai and also connect to my LLMs Ollama operated system.

Source: https://docs.crewai.com/

# for the examples available in github, it was required to downgrade crewai to 0.10.0
# From crewai 0.51.1 to crewai 0.10.0
pip install crewai==0.10.0

3. Anythingllm
It is worth mentioning this project “Anythingllm” which presents itself as a “all-in-one AI application that can do RAG, AI Agents, and much more with no code or infrastructure headaches”.

It is possible to install it in three diverse ways in Desktop mode, via Docker or finally via Homebrew.

Once installed in Desktop mode for example, which nevertheless requires 5GB of Storage, it is possible to connect to Ollama for instance locally via the Anythingllm settings and leverage on self-hosted llms provided by Ollama e.g phi3.5, openhermes, mistral-openorca, zephyr, orca-mini… etc

Source: https://docs.anythingllm.com/setup/llm-configuration/local/ollama

Source: https://anythingllm.com/

For model available, check https://anythingllm.com/

More infos

MLflow

MLFlow: A Quickstart Guide – YouTube
https://www.youtube.com/watch?v=cjeCAoW83_U
01. Introduction To MLflow | Track Your Machine Learning Experiments | MLOps – YouTube
https://www.youtube.com/watch?v=ksYIVDue8ak
MLflow for Machine Learning Development – YouTube
https://www.youtube.com/playlist?list=PLQqR_3C2fhUUkoXAcomOxcvfPwRn90U-g
GitHub – manuelgilm/mlflow_for_ml_dev: Repository with code examples of mlflow
https://github.com/manuelgilm/mlflow_for_ml_dev
Advancements in Open Source LLM Tooling, Including MLflow – YouTube
https://www.youtube.com/watch?v=WpudXKAZQNI
MLflow LLM Evaluate
https://mlflow.org/docs/latest/llms/llm-evaluate/index.html
mlflow/examples/evaluation at master · mlflow/mlflow · GitHub
https://github.com/mlflow/mlflow/tree/master/examples/evaluation
Announcing MLflow 2.4 for LLMOps | Databricks Blog
https://www.databricks.com/blog/announcing-mlflow-24-llmops-tools-robust-model-evaluation
MLflow | ️ LangChain
https://python.langchain.com/v0.2/docs/integrations/providers/mlflow_tracking/
GitHub – Netflix/metaflow: :rocket: Build and manage real-life ML, AI, and data science projects with ease!
https://github.com/Netflix/metaflow
llmops-examples/compare-openai-transformers.ipynb at main · djliden/llmops-examples · GitHub
https://github.com/djliden/llmops-examples/blob/main/compare-openai-transformers.ipynb
llmops-examples/mlflow-prompt-eng-ui.ipynb at main · djliden/llmops-examples · GitHub
https://github.com/djliden/llmops-examples/blob/main/mlflow-prompt-eng-ui.ipynb
LLM_Notebooks/mlflow/Deployment_Server/mlflow_Serve.ipynb at 66d4b3a6d9d08813bb94ea653fd59275e66c91cc · olonok69/LLM_Notebooks · GitHub
https://github.com/olonok69/LLM_Notebooks/blob/66d4b3a6d9d08813bb94ea653fd59275e66c91cc/mlflow/Deployment_Server/mlflow_Serve.ipynb#L23
GitHub – djliden/llmops-examples: Example code and notebooks related to mlflow, llmops, etc.
https://github.com/djliden/llmops-examples/tree/main
Comparing LLMs with MLFlow | Medium
https://medium.com/@dliden/comparing-llms-with-mlflow-1c69553718df
LangChain within MLflow (Experimental)
https://mlflow.org/docs/latest/llms/langchain/guide/index.html
Evaluate a Hugging Face LLM with mlflow.evaluate()
https://mlflow.org/docs/latest/llms/llm-evaluate/notebooks/huggingface-evaluation.html
Prompt Engineering UI (Experimental)
https://mlflow.org/docs/latest/llms/prompt-engineering/index.html
Practical-Deep-Learning-at-Scale-with-MLFlow/chapter01 at main · PacktPublishing/Practical-Deep-Learning-at-Scale-with-MLFlow · GitHub
https://github.com/PacktPublishing/Practical-Deep-Learning-at-Scale-with-MLFlow/tree/main/chapter01
How to create a deep learning inference pipeline model using MLflow in three steps | by Yong Liu | Medium
https://medium.com/@yong.liu_60428/how-to-create-a-deep-learning-inference-pipeline-model-using-mlflow-in-three-steps-a567c534d751
Comparing LLMs with MLFlow | Medium
https://medium.com/@dliden/comparing-llms-with-mlflow-1c69553718df
MLflow Deployments for LLMs | ️ LangChain
https://python.langchain.com/v0.2/docs/integrations/providers/mlflow/
Just a moment…
https://www.datacamp.com/tutorial/mlflow-streamline-machine-learning-workflow
Exploring MLflow experiments with a powerful UI | by Gor Arakelyan | AimStack | Medium
https://medium.com/aimstack/exploring-mlflow-experiments-with-a-powerful-ui-238fa2acf89e
Quickstart: Install MLflow, instrument code & view results in minutes — MLflow 2.7.0 documentation
https://mlflow.org/docs/2.7.0/quickstart.html
Quickstart: Compare runs, choose a model, and deploy it to a REST API — MLflow 2.7.0 documentation
https://mlflow.org/docs/2.7.0/quickstart_mlops.html#quickstart-mlops
Tutorials and Examples — MLflow 2.7.0 documentation
https://mlflow.org/docs/2.7.0/tutorials-and-examples/index.html#tutorials-and-examples
Streamlining Text Classification Models with MLflow: A Comprehensive Guide | by Vasista Reddy | ScrapeHero | Medium
https://medium.com/scrapehero/streamlining-text-classification-models-with-mlflow-a-comprehensive-guide-6cc3ce71ed90
GitHub – adamksiezyk/data-science-workbench at mlflow-local-llm
https://github.com/adamksiezyk/data-science-workbench/tree/mlflow-local-llm
Evaluating & Tracking LLMs using MLflow Model Evaluation & Phoenix -part-2 | by M K Pavan Kumar | . | Medium
https://medium.com/aimonks/evaluating-tracking-llms-using-mlflow-model-evaluation-phoenix-part-2-1830b3177abe
Model Tracking with MLFlow & Deployment with FastAPI | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-4-tracking-with-mlflow-deployment-with-fastapi-61614115436

AnythingLLM

AnythingLLM | The all-in-one AI application for everyone
https://anythingllm.com/

crewAI

crewAI
https://docs.crewai.com/
crewAI – Platform for Multi AI Agents Systems
https://www.crewai.com/
GitHub – brooklynb7/lang-ollama at 1c459019ab49414107a9f820cf1dd53750c3fa76
https://github.com/brooklynb7/lang-ollama/tree/1c459019ab49414107a9f820cf1dd53750c3fa76
CrewAI Tutorial – Next Generation AI Agent Teams (Fully Local) – YouTube
https://www.youtube.com/watch?v=tnejrr-0a94
GitHub – crewAIInc/crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
https://github.com/crewAIInc/crewAI
CrewAI Tutorial – Next Generation AI Agent Teams (Fully Local) – YouTube
https://www.youtube.com/watch?v=tnejrr-0a94
Search on “from crewai import Agent, Task, Crew ollama” on github
https://github.com/search?q=from+crewai+import+Agent%2C+Task%2C+Crew+ollama&type=code

Pydantic

Minimize LLM Hallucinations with Pydantic Validators | Pydantic
https://pydantic.dev/articles/llm-validation
Enforce and Validate LLM Output with Pydantic | Timo’s Blog
https://timotk.github.io/posts/enforce-validate-llm-output-pydantic/
Pydantic and Prompt Engineering: The Essentials for Validating Large Language Model Outputs | by Aziz Ben Othman | Medium
https://medium.com/@azizbenothman76/pydantic-and-prompt-engineering-the-essentials-for-validating-language-model-outputs-e48553eb4a3b
How to return structured data from a model | ️ LangChain
https://python.langchain.com/v0.2/docs/how_to/structured_output/
Tutorial Overview
https://mlflow.org/docs/latest/getting-started/logging-first-model/index.html
A Gentle Introduction to MLOps | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-1-a-gentle-introduction-to-mlops-1b184d2c32a8
Data & Model Management with DVC | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-2-data-model-management-with-dvc-6be2ad284ec4
ML Experimentation using PyCaret | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-3-ml-experimentation-using-pycaret-747f14e4c28d
Model Tracking with MLFlow & Deployment with FastAPI | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-4-tracking-with-mlflow-deployment-with-fastapi-61614115436

Other

GitHub – cric96/langchain-examples: Basic examples of prompt engineering leveraging langchain: https://www.langchain.com/
https://github.com/cric96/langchain-examples
Deploy a local LLM | RAGFlow
https://ragflow.io/docs/dev/deploy_local_llm
teknium/OpenHermes-2.5-Mistral-7B · Hugging Face
https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B