Enhance LLM Prompt Quality and Results with MLflow Integration
I remain mostly an AI user and moreover without always understanding the subtleties of LMMs. This use does not prevent me from seeking to improve the results of my prompts.
IA: Moving From POC To Scaling
The title above summarizes the state I am in! Indeed, the time of discovering AI and POCs is over, but how do we enter this phase of rationalization and industrialization? Once the exploratory phase is over, it remains to structure and rationalize the approach, to measure the quality of the results produced by the prompts.
For this post also, you can find all files and prompts, on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/ia_using_mlflow
Just for the record, in a corporate manner, I ask ChatGPT to explain to me more precisely the meaning. Here is the explanation:
The title “IA: Moving From POC To Scaling” describes a phase in AI project development where an AI solution that has successfully passed the Proof of Concept (POC) stage is now being prepared for scaling, meaning it will be expanded and deployed on a larger scale for widespread use. The transition requires careful planning and execution to ensure that the AI system can operate efficiently at a larger scale. This transition from POC to scaling is often seen as a critical milestone in AI development, where a project moves from testing feasibility to real-world implementation and growth.”
Concretely, this doesn’t say much about what should be done… But I tried to understand how to take advantage of the latest “using Prompt Engineering” feature of MLflow.
Why that because, this MLflow’s feature allows you to track LLM responses across different settings of model temperature, max output tokens and of course prompts. This allows for transparent prompt engineering and parameter tuning… And above all you can also leverage on self-hosted LLMs on Ollama!
Mostly, I use Ollama to operate open-source LLMs to have extensive control over the confidentiality of the content sent to the LLMs and to reduce drastically the expenses.
I also ended up finding a video and plenty of documentation from Mlflow.
- A very educational notebook that fully describes the process of “Comparing LLMs with MLFlow”. Like it is said by the author “This notebook demonstrates how to use MLFlow to compare different text generation models from Hugging Face and compare different generation configurations for those models.”
- MLflow: serving LLMs and prompt engineering
- MLflow’s Support for LLMs. Nothing less than “aims to alleviate these challenges by introducing a suite of features and tools designed with the end-user in mind”
Concretely, to make MLflow work with LLMs, it is necessary to complete the “Served LLM model” of MLflow to test indifferently open source or paid LLMs by declaring them within MLflow.
The the main obstacle was to add “item” in “Served LLM model” in MLflow dropdown list.
Screen captures from Daniel Liden djliden that show how to leverage on the MLflow’s “using Prompt Engineering” feature. Check https://github.com/djliden/llmops-examples/tree/main
Screen capture from Daniel Liden djliden
Screen capture from Daniel Liden djliden
Screen capture from Daniel Liden djliden
Screen capture from Daniel Liden djliden
# mlflow_prompt_eng_ui_assets/config.yaml # https://github.com/djliden/llmops-examples/blob/00b42c7ec0f7e5914bf77966e84ddbfe02230e18/mlflow_prompt_eng_ui_assets/config.yaml # https://mlflow.org/docs/latest/llms/gateway/migration.html # https://github.com/minkj1992/llama3-langchain-mlflow/blob/main/mlflow/config.yaml endpoints: # Renamed to "endpoints" - name: chat endpoint_type: llm/v1/chat # Renamed to "endpoint_type" model: provider: openai name: gpt-3.5-turbo config: openai_api_key: $OPENAI_API_KEY - name: ollama endpoint_type: llm/v1/chat model: provider: openai name: llama3 config: openai_api_key: "" # https://ollama.com/blog/openai-compatibility openai_api_base: http://host.docker.internal:11434/v1
At the same time, it was necessary to familiarize oneself with how MLflow works. Indeed, given MLflow’s ability to create many experiments and runs, it is better to question a priori the tagging and search capacities and therefore the classification of MLflow to take full advantage of the organization induced by MLflow.
Searching By Params
params.batch_size = "2" params.model LIKE "GPT%" params.model ILIKE "gPt%" params.model LIKE "GPT%" AND params.batch_size = "2"
Searching By Tags
tags."environment" = "notebook" tags.environment = "notebook" tags.task = "Classification" tags.task ILIKE "classif%"
params.model_route LIKE "%mistral-ollama%" params.model_route LIKE "%gpt4o-azure%" params.model_route LIKE "%openhermes-ollama%" tags.nid like "%MZ344252%"
Source: https://mlflow.org/docs/latest/search-runs.html
A good introduction to MLFlow features
If you are looking for a complete and easy introduction to MLFlow on how it works in general. Here is a practical introduction to MLFlow to understand its essential concepts. Unfortunately, the “using Prompt Engineering” feature is not covered by this video series.
This 32 videos’ playlist, made by Manuel Gil, illustrate the main concepts of MLflow
https://www.youtube.com/playlist?list=PLQqR_3C2fhUUkoXAcomOxcvfPwRn90U-g
# A. INSTALL MLFLOW # 1. create a anaconda env named using_mlflow # Conda Environment conda create --name using_mlflow python=3.9.13 conda info --envs source activate using_mlflow conda deactivate source activate using_mlflow # if needed to remove conda env remove -n [NAME_OF_THE_CONDA_ENVIRONMENT] conda env remove -n using_mlflow # 2. Install MLflow from PyPI using pip: pip install mlflow # test the install mlflow --version # B. CREATING EXPERIMENTS IN MLFLOW # launch the UI mlflow ui # Check http://127.0.0.1:5000
Collateral discoveries
1. Using Pydantic to control LLM output
Along the way, another area of improvement is the use of Pydantic to validate the generative AI JSON output format. Indeed, if you want to integrate this JSON response into an API or a webapp, it is better to ensure the validity and consistency of this response.
Source: https://medium.com/@mattchinnock/controlling-large-language-model-output-with-pydantic-74b2af5e79d1
2. Using Crewai
During this exploration, I discovered Crewai and also connect to my LLMs Ollama operated system.
Source: https://docs.crewai.com/
# for the examples available in github, it was required to downgrade crewai to 0.10.0 # From crewai 0.51.1 to crewai 0.10.0 pip install crewai==0.10.0
3. Anythingllm
It is worth mentioning this project “Anythingllm” which presents itself as a “all-in-one AI application that can do RAG, AI Agents, and much more with no code or infrastructure headaches”.
It is possible to install it in three diverse ways in Desktop mode, via Docker or finally via Homebrew.
Once installed in Desktop mode for example, which nevertheless requires 5GB of Storage, it is possible to connect to Ollama for instance locally via the Anythingllm settings and leverage on self-hosted llms provided by Ollama e.g phi3.5, openhermes, mistral-openorca, zephyr, orca-mini… etc
Source: https://docs.anythingllm.com/setup/llm-configuration/local/ollama
Source: https://anythingllm.com/
For model available, check https://anythingllm.com/
More infos
MLflow
- MLFlow: A Quickstart Guide – YouTube
https://www.youtube.com/watch?v=cjeCAoW83_U - 01. Introduction To MLflow | Track Your Machine Learning Experiments | MLOps – YouTube
https://www.youtube.com/watch?v=ksYIVDue8ak - MLflow for Machine Learning Development – YouTube
https://www.youtube.com/playlist?list=PLQqR_3C2fhUUkoXAcomOxcvfPwRn90U-g - GitHub – manuelgilm/mlflow_for_ml_dev: Repository with code examples of mlflow
https://github.com/manuelgilm/mlflow_for_ml_dev - Advancements in Open Source LLM Tooling, Including MLflow – YouTube
https://www.youtube.com/watch?v=WpudXKAZQNI - MLflow LLM Evaluate
https://mlflow.org/docs/latest/llms/llm-evaluate/index.html - mlflow/examples/evaluation at master · mlflow/mlflow · GitHub
https://github.com/mlflow/mlflow/tree/master/examples/evaluation - Announcing MLflow 2.4 for LLMOps | Databricks Blog
https://www.databricks.com/blog/announcing-mlflow-24-llmops-tools-robust-model-evaluation - MLflow | ️ LangChain
https://python.langchain.com/v0.2/docs/integrations/providers/mlflow_tracking/ - GitHub – Netflix/metaflow: :rocket: Build and manage real-life ML, AI, and data science projects with ease!
https://github.com/Netflix/metaflow - llmops-examples/compare-openai-transformers.ipynb at main · djliden/llmops-examples · GitHub
https://github.com/djliden/llmops-examples/blob/main/compare-openai-transformers.ipynb - llmops-examples/mlflow-prompt-eng-ui.ipynb at main · djliden/llmops-examples · GitHub
https://github.com/djliden/llmops-examples/blob/main/mlflow-prompt-eng-ui.ipynb - LLM_Notebooks/mlflow/Deployment_Server/mlflow_Serve.ipynb at 66d4b3a6d9d08813bb94ea653fd59275e66c91cc · olonok69/LLM_Notebooks · GitHub
https://github.com/olonok69/LLM_Notebooks/blob/66d4b3a6d9d08813bb94ea653fd59275e66c91cc/mlflow/Deployment_Server/mlflow_Serve.ipynb#L23 - GitHub – djliden/llmops-examples: Example code and notebooks related to mlflow, llmops, etc.
https://github.com/djliden/llmops-examples/tree/main - Comparing LLMs with MLFlow | Medium
https://medium.com/@dliden/comparing-llms-with-mlflow-1c69553718df - LangChain within MLflow (Experimental)
https://mlflow.org/docs/latest/llms/langchain/guide/index.html - Evaluate a Hugging Face LLM with mlflow.evaluate()
https://mlflow.org/docs/latest/llms/llm-evaluate/notebooks/huggingface-evaluation.html - Prompt Engineering UI (Experimental)
https://mlflow.org/docs/latest/llms/prompt-engineering/index.html - Practical-Deep-Learning-at-Scale-with-MLFlow/chapter01 at main · PacktPublishing/Practical-Deep-Learning-at-Scale-with-MLFlow · GitHub
https://github.com/PacktPublishing/Practical-Deep-Learning-at-Scale-with-MLFlow/tree/main/chapter01 - How to create a deep learning inference pipeline model using MLflow in three steps | by Yong Liu | Medium
https://medium.com/@yong.liu_60428/how-to-create-a-deep-learning-inference-pipeline-model-using-mlflow-in-three-steps-a567c534d751 - Comparing LLMs with MLFlow | Medium
https://medium.com/@dliden/comparing-llms-with-mlflow-1c69553718df - MLflow Deployments for LLMs | ️ LangChain
https://python.langchain.com/v0.2/docs/integrations/providers/mlflow/ - Just a moment…
https://www.datacamp.com/tutorial/mlflow-streamline-machine-learning-workflow - Exploring MLflow experiments with a powerful UI | by Gor Arakelyan | AimStack | Medium
https://medium.com/aimstack/exploring-mlflow-experiments-with-a-powerful-ui-238fa2acf89e - Quickstart: Install MLflow, instrument code & view results in minutes — MLflow 2.7.0 documentation
https://mlflow.org/docs/2.7.0/quickstart.html - Quickstart: Compare runs, choose a model, and deploy it to a REST API — MLflow 2.7.0 documentation
https://mlflow.org/docs/2.7.0/quickstart_mlops.html#quickstart-mlops - Tutorials and Examples — MLflow 2.7.0 documentation
https://mlflow.org/docs/2.7.0/tutorials-and-examples/index.html#tutorials-and-examples - Streamlining Text Classification Models with MLflow: A Comprehensive Guide | by Vasista Reddy | ScrapeHero | Medium
https://medium.com/scrapehero/streamlining-text-classification-models-with-mlflow-a-comprehensive-guide-6cc3ce71ed90 - GitHub – adamksiezyk/data-science-workbench at mlflow-local-llm
https://github.com/adamksiezyk/data-science-workbench/tree/mlflow-local-llm - Evaluating & Tracking LLMs using MLflow Model Evaluation & Phoenix -part-2 | by M K Pavan Kumar | . | Medium
https://medium.com/aimonks/evaluating-tracking-llms-using-mlflow-model-evaluation-phoenix-part-2-1830b3177abe - Model Tracking with MLFlow & Deployment with FastAPI | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-4-tracking-with-mlflow-deployment-with-fastapi-61614115436
AnythingLLM
- AnythingLLM | The all-in-one AI application for everyone
https://anythingllm.com/
crewAI
- crewAI
https://docs.crewai.com/ - crewAI – Platform for Multi AI Agents Systems
https://www.crewai.com/ - GitHub – brooklynb7/lang-ollama at 1c459019ab49414107a9f820cf1dd53750c3fa76
https://github.com/brooklynb7/lang-ollama/tree/1c459019ab49414107a9f820cf1dd53750c3fa76 - CrewAI Tutorial – Next Generation AI Agent Teams (Fully Local) – YouTube
https://www.youtube.com/watch?v=tnejrr-0a94 - GitHub – crewAIInc/crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
https://github.com/crewAIInc/crewAI - CrewAI Tutorial – Next Generation AI Agent Teams (Fully Local) – YouTube
https://www.youtube.com/watch?v=tnejrr-0a94 - Search on “from crewai import Agent, Task, Crew ollama” on github
https://github.com/search?q=from+crewai+import+Agent%2C+Task%2C+Crew+ollama&type=code
Pydantic
- Minimize LLM Hallucinations with Pydantic Validators | Pydantic
https://pydantic.dev/articles/llm-validation - Enforce and Validate LLM Output with Pydantic | Timo’s Blog
https://timotk.github.io/posts/enforce-validate-llm-output-pydantic/ - Pydantic and Prompt Engineering: The Essentials for Validating Large Language Model Outputs | by Aziz Ben Othman | Medium
https://medium.com/@azizbenothman76/pydantic-and-prompt-engineering-the-essentials-for-validating-language-model-outputs-e48553eb4a3b - How to return structured data from a model | ️ LangChain
https://python.langchain.com/v0.2/docs/how_to/structured_output/ - Tutorial Overview
https://mlflow.org/docs/latest/getting-started/logging-first-model/index.html - A Gentle Introduction to MLOps | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-1-a-gentle-introduction-to-mlops-1b184d2c32a8 - Data & Model Management with DVC | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-2-data-model-management-with-dvc-6be2ad284ec4 - ML Experimentation using PyCaret | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-3-ml-experimentation-using-pycaret-747f14e4c28d - Model Tracking with MLFlow & Deployment with FastAPI | Analytics Vidhya
https://medium.com/analytics-vidhya/fundamentals-of-mlops-part-4-tracking-with-mlflow-deployment-with-fastapi-61614115436
Other
- GitHub – cric96/langchain-examples: Basic examples of prompt engineering leveraging langchain: https://www.langchain.com/
https://github.com/cric96/langchain-examples - Deploy a local LLM | RAGFlow
https://ragflow.io/docs/dev/deploy_local_llm - teknium/OpenHermes-2.5-Mistral-7B · Hugging Face
https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B