Transform Your Ideas into Reality: Develop an Advanced LLM AI App with Mistral and ChatGPT’s Expert Guidance and Comprehensive Prompts

What to do once you identify the possible usages offered by AI? As a Product Owner, the risk is sometimes to pay lip service once again by describing needs but never really satisfying them!

For this post, you can find all files, mostly prompts, on my GitHub account. See https://github.com/bflaven/ia_usages_code_depot/tree/main/prompts/prompts_webapp_api_fmm_ia

Other elements e.g extract in French from the readme of the project. See https://github.com/bflaven/ia_usages/tree/main/ia_building_llm_api_web_apps_start_finish

There is a French expression that summarize this fate is “se payer de mots” that could be translated by “to indulge in words”. You just talk, you never act. That is probably the meaning behind this Laozi’s quote that shaped Chinese wisdom

He who knows does not speak; he who speaks does not know. Laozi

So, I decide to move my a… then how to proceed?

First, what is the better way than using AI to understand and make AI
Second, the real added value resides in the prompts.

For the moment, I decide not to release the code as I do not know the status of it regarding my professional situation but anyway with the help of the prompts it is like having an open book. So, You will find in the repository the list of 50 prompts that it took me to code a “Webapp + API + LLM” device from scratch.

The Advanced LLM AI App relies on these elements: Mistral is used as LLM, LangChain and Ollama as framework to query the LLM, FastAPI as framework to create the API and Streamlit as framework to create the web application.

The challenge was being able to query Mistral as a LMM in a secure, confidential and for free while waiting to implement or not a paid API key for Mistral or ChatGPT.

Again, some chineese wisdom, like Confucius said:

Give a man a fish and you feed him for a day, teach a man to fish and you feed him for a lifetime. Confucius

Instead of give a fish (coding), I teach you how to fish (prompting) 🙂

For more security concern, I don’t give the code as I risk putting myself at odds with my employer. On the other hand, I constantly need to refer to this knowledge base so what’s better than keeping the prompts and share it with myself and the rest of the world.

At the end this post, I gave some screens captures from the final result.

Workplan

So, humbly and within my means, I therefore allowed myself to build a system to tackle the subject of AI in order to demystify its uses and its technicality.

As demonstrated by the previous posts, you must already be able to simply state the objective pursued based on what you already know and what you are trying to do.

Objective: The idea is to facilitate the AI’s exploration uses via user interfaces in order to obtain user feedback on the level of quality and acceptance of this or that AI functionality. : transcription, entity extraction, translation, summary…

It requires creating interfaces that the user can grab as quickly as possible. It means removing all obstacles upstream, in order to connect these screens to artificial intelligence as quickly as possible.

After some research, the idea that emerged was to set up, as quickly as possible, a complete system including the following building blocks: LLM + API + WEBAPP.

This is the meaning of this graph below and which I introduce into the equation in point 1.

What’s amazing is that with the sole help of AI, designing, coding… so prototyping everything took about an intense week but only one week where it would have probably taken 6 months or even more.

1. The basic architecture

Well, the quickest and easiest way to apprehend IA.

Element	Framework	Functional Description
LLM	Ollama + Mistral	Leverages Ollama and Mistral for its functionality.
API	FastAPI	The API is built using FastAPI and delivers various AI features such as summary generation with Bart, entity recognition with Spacy, transcription with Whisper, translation with NLLB, etc.
WEBAPP	Streamlit	The WEBAPP is built using Streamlit and provides a human interface for accessing the endpoints of the API.

2. And then? Introducing the usages

What functionalities should be included in this architecture and how to possibly go beyond basic uses and of course always rely on the fundamentals according to the DRY principle

I ask ChatGPT and it gave me a quick roadmap. ChatGPT gave me a quick reminder on the best pratices to stick to avoid pitfalls, you can check below the regular stuff to do!

In this architecture, the functionalities to consider can revolve around enhancing user experience, optimizing performance, extending data analysis and processing capabilities, as well as ensuring security. Here are some suggestions for each element of the architecture and how to go beyond basic usage while adhering to the DRY (Don’t Repeat Yourself) principle:

LLM:

Improvement of accuracy and reliability: Integrate advanced machine learning techniques to enhance prediction quality.
Customization: Allow model customization to adapt to specific user needs.
Interoperability: Ensure seamless integration with other systems and external services.

API:

Scalability: Design the API to effectively handle growing workloads.
Comprehensive documentation: Provide detailed API documentation to facilitate its use by third-party developers.
Versioning: Implement an API versioning system to allow evolution while maintaining backward compatibility.

WEBAPP:

User-friendly interface: Design an intuitive and aesthetically pleasing user interface to enhance user experience.
Performance optimization: Optimize queries and loading times to ensure maximum responsiveness of the application.
Real-time analysis: Integrate real-time analysis features to enable users to visualize and understand data more quickly.

Going beyond basic usage while adhering to the DRY principle:

Modularity: Design each element of the architecture in a modular way to promote code reuse and easy maintenance.
Use of microservices: Decompose functionalities into independent microservices to enable scalable development and efficient scaling.
Automation: Implement continuous integration and continuous deployment pipelines to automate development, testing, and deployment processes.
Monitoring and logging: Integrate monitoring and logging tools to track system performance, detect errors, and proactively improve code quality.

By following these principles and integrating these functionalities, the architecture can be extended beyond basic usage while remaining compliant with the DRY principle, enabling the development of a robust, scalable, and easily maintainable system.

So with these principles, I stuffed my architecture with my use cases which mechanically drew a roadmap.

POC_1: A simple way to explore and interact with IA features through an API (Fastapi) connected to a WEBAPP (Streamlit).
POC_2: Some explorations on Chatbot Building with 2 main objectives : 1. Ask a local LMM (Mistral) via a ChatBot, therefore free and confidentially via Ollama. Of course, the system must be transposed to a paid model (ChatGPT, Mistral) and in the cloud (Azure); 2. Explore the notion of RAG (Retrieval-Augmented Generation). Functionally, this means building a knowledge base from heterogeneous content (.txt, .doc, docx, .pdf… etc.) then relying on a LLM like Mistral in order to query this knowledge base through a WEBAPP (Streamlit) that provided a ChatBot.
POC_3: Turninng all these POC into a reliable docker-compose that support a backend (API made with FastAPI) and frontend (a WEBAPP made with Streamlit)

For me, the highest step was obviously the addition of an LMM in this device in order to be able to prompt in a confidential and secure manner. In case of failure, I will have opted for Buy instead of Build.

2. The RAG pack (POC_2)

Just a few words on the POC_2, on the RAG. I had never heard of this thing but incidentally, once the basic architecture is set up, you can easily launch grapples towards other uses such as Rag for example.

For those, who do not know what is a RAG?
Here is the most straightforward explanations that I found on the web and that mostly focus on user benefices.

The source is this post
“Retrieval-Augmented Generation (RAG): From Theory to LangChain Implementation”: https://towardsdatascience.com/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2

Retrieval-Augmented Generation (RAG) is the concept to provide LLMs with additional information from an external knowledge source. This allows them to generate more accurate and contextual answers while reducing hallucinations.

In simple terms, RAG is to LLMs what an open-book exam is to humans. In an open-book exam, students are allowed to bring reference materials, such as textbooks or notes, which they can use to look up relevant information to answer a question. The idea behind an open-book exam is that the test focuses on the students’ reasoning skills rather than their ability to memorize specific information.

On the RAG, you can find various prompts and code at https://github.com/bflaven/ia_usages_code_depot/tree/main/prompts/prompts_serie_conversational_chat

3. A small extra : using FLOWISEAI

During, this exploration, I have discovered FLOWISEAI and great videos from @leonvanzyl.
FLOWISEAI is a very intuitive software that should be used for many uses such as RAG for instance. It is UX-oriented and user-centric to enable anyone to get to crip with IA concepts such as Prompt, Rag, Chatbot… FLOWISEAI is as great as Streamlit.

HOW TO INSTALL FLOWISEAI

# go to dir
cd /Users/brunoflaven/Documents/01_work/blog_articles/ia_using_flowiseai/

# create dir
mkdir flowiseai

# go to the dir flowiseai
cd flowiseai

# install flowise
npm install flowise

# install flowise globally
npm install -g flowise
npm uninstall -g flowise

# start flowise
npx flowise start

# check the site
http://localhost:3000/

# Other documentation
# https://www.langchain.com/

4. Screens captures for “Webapp + API + LLM”

CONSOLE_SCREEN_1 for the API (fastapi)

CONSOLE_SCREEN_2 for the WEBAPP (streamlit)

CONSOLE_SCREEN_3 for the LLM (ollama + Mistral)

Easy to create documentation (swagger) for this API (fastapi) for various endpoints IA

Transcription using Whisper from OpenAI

Spelling correction of text in FR. It’s not AI, it’s just integration. 12 possible languages See Online spelling, style and grammar checker – LanguageTool

Extraction of entities from a text in English with Spacy. There are 18 types of extracted features.

Social and SEO: Repackaging of content, with prompts, in a specific format for Social Networks: Based on a text, generation of a 140-character “SMO friendly” tweet with a proposal of 5 hashtags from the text. Functionality available via an LMM (Mistral). or Based on a text, generation of three editorial proposals for “SEO friendly” titles with a proposal of 5 keywords from the text. Functionality available via an LMM (Mistral)

Social and SEO: Ditto! The result is exported with the help of Panda in CSV format

Summary: Repackaging of content, with prompts, in “Summary” format: creation of a long summary from a text in FR and EN with selection of 5 significant keywords from the text with for a LLM (ollama + Mistral). There is also 3 types of summary endpoints one using Bart, another using BART model with ktrain.

List of the endpoints available through the API (fastapi) then you just must make the interface to make it work in the WEBAPP (Streamlit) and for sure provide for each point the IA feature.

# tags_metadata
tags_metadata = [
    {
        'name': 'healthcheck',
        'description': 'TRUE. It basically sends a GET request to the route & hopes to get a "200"'
    },
    {
        'name': 'write',
        'description': 'TRUE. Write to the DB. Post inside table "source_content_posts"'
    },
    {
        'name': 'read',
        'description': 'TRUE. Get all content from the DB. Read from table "source_content_posts"'
    },
    {
        'name': 'spelling',
        'description': 'TRUE. Able to launch a spell checking on text. Read from field "text" from table "source_content_posts"'
    },
    {
        'name': 'entities',
        'description': 'TRUE. Extract a entities from a text only in ENGLISH with Spacy (NER).'
    },
    {
        'name': 'translate',
        'description': 'Text to present translation with nllb-200-distilled-600M'
    },
    {
        'name': 'translate_languages',
        'description': 'Text to present languages for nllb-200-distilled-600M'
    },
    {
        'name': 'summary_bart',
        'description': 'TRUE. This endpoint generates a summary of the given text input using BART model.'
    },
    {
        'name': 'summary_bart_conditional_generation',
        'description': 'TRUE. This endpoint generates a summary of the given text input using BART model.'
    },
    {
        'name': 'summary_ktrain_transformer_summarizer',
        'description': 'TRUE. This endpoint accepts a text document and returns a summary generated by the BART model using ktrain.'
    },
    {
        'name': 'llm_generate',
        'description': 'TRUE. This endpoint accepts a text document and returns different elements depending prompt selected: English SEO-friendly title with keywords, French SEO-friendly title with keywords, a twitter SMO-friendly with hastags. It leverages on Mistral.'
    },
    {
        'name': 'llm_operate',
        'description': 'TRUE. This endpoint accepts a text document and proceeds to different operations such as summary of the text content. It leverages on Mistral.'
    },
    {
        'name': 'audio',
        'description': 'This is the audio transcription with Whisper. It works in 70 languages.'
    },
    {
        'name': 'audio_express',
        'description': 'This is the audio transcription with faster_whisper and WhisperModel. It works in 70 languages.'
    },
    {
        'name': 'video',
        'description': 'This is the video transcription with Whisper. It works in 70 languages.'

    },
    {
        'name': 'video_express',
        'description': 'This is the audio transcription with faster_whisper and WhisperModel. It works in 70 languages.'
    },
]

More infos

Building LLM-Powered Web Apps with Client-Side Technology · Ollama Blog
https://ollama.com/blog/building-llm-powered-web-apps
Community | Run Large Language Models with Ollama and AWS Lightsail for Research
https://community.aws/posts/run-large-language-models-with-ollama-and-lightsail-for-research
Naomi Oreskes, historienne des sciences : « Nous mettons en œuvre aux Etats-Unis des idées politiques qui ne fonctionnent pas. Nous payons le prix fort du libre marché »
https://www.lemonde.fr/idees/article/2024/03/03/naomi-oreskes-nous-devons-etre-attentifs-a-la-maniere-dont-la-vie-intellectuelle-peut-etre-faconnee-par-les-interets-des-entreprises_6219816_3232.html
cairn.info
https://www.cairn.info/revue-pouvoirs-2019-3-page-119.htm
Build your own RAG and run it locally: Langchain + Ollama + Streamlit
https://blog.duy-huynh.com/build-your-own-rag-and-run-them-locally/
GitHub – jacttp/simpleRAG: Mistral RAG for chating with pdfs, UI with streamlit
https://github.com/jacttp/simpleRAG
GitHub – vndee/local-rag-example: Build your own ChatPDF and run them locally
https://github.com/vndee/local-rag-example.git
Building a Local PDF Chat Application with Mistral 7B LLM, Langchain, Ollama, and Streamlit | by Harjot | Medium
https://medium.com/@harjot802/building-a-local-pdf-chat-application-with-mistral-7b-llm-langchain-ollama-and-streamlit-67b314fbab57
GitHub – SonicWarrior1/pdfchat: Local PDF Chat Application with Mistral 7B LLM, Langchain, Ollama, and Streamlit
https://github.com/SonicWarrior1/pdfchat
GitHub – nilsjennissen/mistral: Mistral 7B Streamlit Application
https://github.com/nilsjennissen/mistral
GitHub – alros/rag-python at a325cc2a31a6700f12a48c20ef4e546c7fd58673
https://github.com/alros/rag-python/tree/a325cc2a31a6700f12a48c20ef4e546c7fd58673
Your own ChatGPT for $0.04/hr – With Ollama, ChatUI & Salad
https://blog.salad.com/ollama-deploy-chatgpt/
Mistral 7B RAG Tutorial: Build RAG Application Easily – YouTube
https://www.youtube.com/watch?v=3dqH6HI5rrU
GitHub – Yashmori09/Website-Chatbot-Mistral7b-RAG-App
https://github.com/Yashmori09/Website-Chatbot-Mistral7b-RAG-App
Using Retrieval Augmented Generation with a Large Language Model AI Chatbot – YouTube
https://www.youtube.com/watch?v=XctooiH0moI
Running Mistral AI on your machine with Ollama – YouTube
https://www.youtube.com/watch?v=NFgEgqua-fg
GitHub – kesamet/ai-assistant at 34b0af2eb5d8266625a5ff9c28ec8c9c7a4c59a6
https://github.com/kesamet/ai-assistant/tree/34b0af2eb5d8266625a5ff9c28ec8c9c7a4c59a6
GitHub – yassineselmi/langchain-workshop at 1f39fc156d48f9143409fd7924bddff1e776546c
https://github.com/yassineselmi/langchain-workshop/tree/1f39fc156d48f9143409fd7924bddff1e776546c
GitHub – mmahmad/localmodel-ui
https://github.com/mmahmad/localmodel-ui
Build a chatbot with custom data sources, powered by LlamaIndex
https://blog.streamlit.io/build-a-chatbot-with-custom-data-sources-powered-by-llamaindex/
llamaindex-chat-with-streamlit-docs/streamlit_app.py at main · carolinedlu/llamaindex-chat-with-streamlit-docs · GitHub
https://github.com/carolinedlu/llamaindex-chat-with-streamlit-docs/blob/main/streamlit_app.py
GitHub – nilsjennissen/law-search-llm: A local Large Language Model Application for legal document seach
https://github.com/nilsjennissen/law-search-llm/tree/main
fullstack-gpt/pages at caedeff863c57ba38f37d51a0415e471a7422250 · Jinwook-Song/fullstack-gpt · GitHub
https://github.com/Jinwook-Song/fullstack-gpt/tree/caedeff863c57ba38f37d51a0415e471a7422250/pages
GitHub – tGhattas/LLM-chat at 6d89be198a535878c3f817e02f6b399f43ab9497
https://github.com/tGhattas/LLM-chat/tree/6d89be198a535878c3f817e02f6b399f43ab9497
Google Colab
https://colab.research.google.com/github/mistralai/cookbook/blob/main/basic_RAG.ipynb
cookbook/langgraph_crag_mistral.ipynb at main · mistralai/cookbook · GitHub
https://github.com/mistralai/cookbook/blob/main/langgraph_crag_mistral.ipynb
GitHub – mistralai/cookbook
https://github.com/mistralai/cookbook
GitHub – ohdoking/ollama-with-rag: Ollama with RAG and Chainlit is a chatbot project leveraging Ollama, RAG, and Chainlit. It uses Chromadb for vector storage, gpt4all for text embeddings, and includes a fine-tuning and evaluation module for language models.
https://github.com/ohdoking/ollama-with-rag
A Step-by-Step Guide to PDF Chatbots with Langchain and Ollama – Analytics Vidhya
https://www.analyticsvidhya.com/blog/2023/10/a-step-by-step-guide-to-pdf-chatbots-with-langchain-and-ollama/
Flowise AI (2024) Tutorial – YouTube
https://www.youtube.com/playlist?list=PL4HikwTaYE0H7wBxhvQqxYcKOkZ4O3zXh
LangChain
https://www.langchain.com/
LangChain Python Tutorial – YouTube
https://www.youtube.com/playlist?list=PL4HikwTaYE0GEs7lvlYJQcvKhq0QZGRVn
Using LangSmith to test LLMs and AI applications – LogRocket Blog
https://blog.logrocket.com/langsmith-test-llms-ai-applications/
Construire son RAG (Retrieval Augmented Generation) grÃ¢ce Ã langchain: Lâexemple de lâHelpdesk dâOCTO – OCTO Talks !
https://blog.octo.com/le-chatbot-docto-langchain-rag-et-code-associe
LangChain cookbook | 🦜️🔗 LangChain
https://python.langchain.com/cookbook
Retrieval Augmented Generation with Huggingface Transformers and Ray | Distributed Computing with Ray
https://medium.com/distributed-computing-with-ray/retrieval-augmented-generation-with-huggingface-transformers-and-ray-b09b56161b1e
Q&A with RAG | 🦜️🔗 LangChain
https://python.langchain.com/docs/expression_language/cookbook/retrieval
Harnessing Retrieval Augmented Generation With Langchain | by Amogh Agastya | Better Programming
https://betterprogramming.pub/harnessing-retrieval-augmented-generation-with-langchain-2eae65926e82