Unraveling the Cost of AI: The Hidden Expenses of API Keys and Pay-as-You-Go Pricing in AI-Based Products
“After all, we are not communist” says Emilio “The Wolf” Barzini in the Godafather* and indeed despite the virtuous storytelling, IA is here to make money and to make a lot…
For this post, you can find all files, mostly prompts, on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm
* I watched Coppola’s The Godfather once again which is, in addition to being a great film about the Italian American mafia, a great lesson on capitalism and team management!
More seriously, as the PO of an AI-based product, the question of price quickly arises. I’m not talking about any development costs but much more the price of an API key and its use via prompts. Indeed, IA companies are not philanthropists, and their economic models are based on addiction. Without paying attention, the pay-as-you-go pricing system can quickly become “poisonous” as the more you outsource tasks to IA, the more you pay.
For the last week, I have decided to go with ChatGPT and Mistral API key, so I was forced to scrutinize the pricing pages for input and output according to the models. Here is the ressources:
- The precise prices of the ChatGPT API: https://openai.com/api/pricing
- The precise prices of the Mistral API see the Pay as you Go section: https://mistral.ai/fr/technology/#models
First, I suck in excel so I’d rather go with python. Rapidly, I made a search on explanations about the ChatGPT and Mistral pricing because I am lazy. I found good ressources, especially this post: “Reduce Your OpenAI API Costs by 70%” at https://levelup.gitconnected.com/reduce-your-openai-api-costs-by-70-a9f123ce55a6
This post introduces a fruitful correlation between cost efficiency and prompt design patterns!
Some tips on Cost efficiency & Prompt Design Patterns
Here are notions quickly summarized that I kept from this post. The very first thing that you need to know is that, when you connect with an LLM via an API key, you are billed when sending content to the LLM (input) and when receiving content from the LLM (output).
1. Number of tokens for a prompt
For input, even though you’re sending a very short prompt, there will be additional tokens sent within the prompt.
Notice that prompt_tokens, which refers to the input tokens, is 8. This is because every time you send text to the API, an additional 7 tokens are automatically added.
A good tool for ChatGPT, to count tokens number, it is to use this tool which calculates the tokens number from a text: https://www.tokencounter.io/
For LLM, the key element the Token so what you need to know is “what a token is?” which is the basic element that will be “input” and “output” by the LLM.
For example, for an English text, 1 token corresponds to approximately 4 characters or 0.75 words. For reference, Shakespeare’s collected works are approximately 900,000 words or 1.2 million tokens.
# some examples 1 token ~= 4 chars in English 1 token ~= ¾ words 100 tokens ~= 75 words 1-2 sentence(s) ~= 30 tokens 1 paragraph ~= 100 tokens 1,500 words ~= 2048 tokens 2,000 words ~= 2730 tokens
Simple Token definition
You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.
Source: https://platform.openai.com/tokenizer
2. Number of tokens for completion
You will pay also what is produced by the LLM in the form of tokens.
The value is what we expected, representing the tokens in the response generated by the model.
See 001_reduce_api_costs.py, 002_reduce_api_costs.py, 003_reduce_api_costs.py at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/reduce_api_costs
3. Clustering using OpenAI API
Clustering, a great idea that must be explored if you intend to leverage on an LLM API KEY.
Here a quick definition of the user need.
Imagine you have a huge list of news headlines and you want to cluster them. While using embeddings is one option, let’s say you want to use OpenAI language models API, as it can capture meaning in a human-like way. Before we create the prompt template to minimize costs, let’s look at what our news headlines list looks like. I’ve represented the news headlines in a shorter form, like s0, s1, s2, and so on. The reason for this is that when the language model clusters them, it can simply use these short abbreviations (e.g., s35 for the 35th news headline) instead of writing out the entire headline in each cluster.
Here is also a simple explanation of the expectation.
Next, I defined my prompt template. This template specifies the format of the answer I want the language model to provide, along with some additional information for clarity. The key here is that we’re not asking the model to write out the full headlines, but rather to just use the short abbreviations. All we need to do is pass this prompt template to the function we created earlier and see how it performs in terms of pricing and response quality.
See 003_reduce_api_costs.py at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/reduce_api_costs
4. SpellCheck using OpenAI API
Prompts enable you to do anything from summarize a text to code a script so why do not use LLM to correct spelling mistakes. The only drawback, the experience has been made in English only, I am not sure that the performances in correcting spelling mistakes are equivalent in any language.
Let’s say you have a lengthy text document and you want to build a grammar correction tool as a small web app. While there are many NLP techniques available for this task, language models, particularly those from OpenAI, have been trained on vast amounts of data, making them a potentially better choice. Again, the key is to be strategic with our prompt template. We want the API response to highlight incorrect words and suggest their correct spellings, rather than providing the entire corrected text as output.
See 005_reduce_api_costs.py, 006_reduce_api_costs.py at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/reduce_api_costs
Exploring the pricing of the NVIDIA platform
There is also an uncomplicated way to explore different types of LLMs, a bit like huggingface.co, which is to rely on the NVIDIA platform which provides a turnkey package of LLMs and a handful of credits for start your tests. Once you have overcome the complexity of the site, you have access to code and a set of models: llama3-70b, phi-3-mini, codegemma-7b, mistral-large…
https://build.nvidia.com/explore/discover#llama3-70b
https://build.nvidia.com/explore/discover#phi-3-mini
https://build.nvidia.com/explore/discover#codegemma-7b
https://build.nvidia.com/mistralai/mistral-large
... etc
See files at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/using_nvidia_api
Exploring basiclingua-LLM-Based-NLP
The author of the post “Reduce Your OpenAI API Costs by 70%” have developed an NLP library that uses LLM APIs to perform various tasks. It includes over thirty features, many of which works with similar cost-optimization strategies as described in this post.
The code available on the GitHub repository possess numerous prompts resources that can provide great sample and models to learn prompting.
See files at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/using_basiclingua
Tasks resolved by BasicLINGUA
Entity Extraction, Text Summarization, Text Classification, Text Sentiment Analysis, Text Coreference Resolution, Text Intent Recognition, Text OCR, Text Anomaly Detection, Text Sense Disambiguation, Text Spellcheck
############## EXTRACT PATTERNS ############## # Generate the prompt template prompt_template = f''' Given the input text: user input: {user_input} extract following patterns from it: {patterns} output must be a python dictionary with keys as patterns and values as list of extracted patterns ''' ############## NER EXTRACTION ############## # check if parameters are of correct type if not isinstance(user_input, str): raise TypeError("user_input must be of type str") if not isinstance(ner_tags, str): raise TypeError("ner_tags must be of type str") # check if parameters are not empty if not user_input: raise ValueError("user_input cannot be empty") # user ner tags if ner_tags != "": user_ner_tags = f'''NER TAGS: {ner_tags}''' else: user_ner_tags = f'''NER TAGS: FAC, CARDINAL, NUMBER, DEMONYM, QUANTITY, TITLE, PHONE_NUMBER, NATIONAL, JOB, PERSON, LOC, NORP, TIME, CITY, EMAIL, GPE, LANGUAGE, PRODUCT, ZIP_CODE, ADDRESS, MONEY, ORDINAL, DATE, EVENT, CRIMINAL_CHARGE, STATE_OR_PROVINCE, RELIGION, DURATION, URL, WORK_OF_ART, PERCENT, CAUSE_OF_DEATH, COUNTRY, ORG, LAW, NAME, COUNTRY, RELIGION, TIME''' # Generate the prompt template prompt_template = f'''Given the input text: user input: {user_input} perform NER detection on it. {user_ner_tags} answer must be in the format tag:value ''' |
Source: https://github.com/FareedKhan-dev/basiclingua-LLM-Based-NLP
A quick schematic process of validation of each AI feature
As a reminder, here is a detailed process to validate an IA feature. The validation process will be the same for each use case.
- Phase_1: R&D phase with POCs
- Phase_2: User feedback to gauge quality, artisanal validation, progress on the quality level.
- Phase_3: Pilot phase integration into a business tool such as Backoffice. This pilot phase makes it possible to extend the validation phase to a larger sample of content and uses. This integration will be done via Feature flipping (the AI feature is only available to a limited number of users). This makes it possible to test/validate by having usage feedback on potential problems (performance, connectivity, refine usage based on production content, etc.). This is a Fine-tuning phase.
- Phase_4: Put into production, opening of the feature to all users in the business tool e.g. Backoffice. The feature nevertheless continues to evolve because of improvement of the existing model or by changing the model.
As soon as the integration is done in a business tool, phase_3 must be carried out identically.
More infos
- Reduce Your OpenAI API Costs by 70% | by Fareed Khan | Mar, 2024 | Level Up Coding
https://levelup.gitconnected.com/reduce-your-openai-api-costs-by-70-a9f123ce55a6 - Pricing ChatGPT
https://openai.com/api/pricing - Technologie | Mistral AI | Frontier AI in your hands
https://mistral.ai/fr/technology/#models - GitHub – FareedKhan-dev/basiclingua-LLM-Based-NLP: LLM Based NLP Library.
https://github.com/FareedKhan-dev/basiclingua-LLM-Based-NLP - Long document content extraction | OpenAI Cookbook
https://cookbook.openai.com/examples/entity_extraction_for_long_documents - Reduce Your OpenAI API Costs by 70% | by Fareed Khan | Mar, 2024 | Level Up Coding
https://levelup.gitconnected.com/reduce-your-openai-api-costs-by-70-a9f123ce55a6 - GitHub – FareedKhan-dev/basiclingua-LLM-Based-NLP: LLM Based NLP Library.
https://github.com/FareedKhan-dev/basiclingua-LLM-Based-NLP - Understanding OpenAI API Cost In-Depth Using a Real Example – WordBot
https://blog.wordbot.io/ai-artificial-intelligence/understanding-gpt3-cost-in-depth-using-a-real-example/ - TokenCounter: tokenize and estimate your LLM costs
https://www.tokencounter.io/ - Create Your Azure Free Account Today | Microsoft Azure
https://azure.microsoft.com/en-us/free/ai-services/ - Mistral Large now available on Azure
https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/mistral-large-mistral-ai-s-flagship-llm-debuts-on-azure-ai/ba-p/4066996 - GitHub – openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI’s models.
https://github.com/openai/tiktoken - GitHub – Promptly-Technologies-LLC/llm_cost_estimation: A simple Python library for estimating what the cost of an API call will be
https://github.com/Promptly-Technologies-LLC/llm_cost_estimation - GitHub – microsoft/LLMLingua: To speed up LLMs’ inference and enhance LLM’s perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://github.com/microsoft/LLMLingua - GitHub – magdalenakuhn17/awesome-cheap-llms: Cost reduction tools and techniques for LLM based systems
https://github.com/magdalenakuhn17/awesome-cheap-llms - GitHub – AnthusAI/LLM-Price-Comparison: A comparison of the price per million tokens and benchmark scores of various large language models.
https://github.com/AnthusAI/LLM-Price-Comparison - Build a Token Counter and Cost Estimator with Streamlit and OpenAI | by Tony Esposito | Medium
https://medium.com/@fbanespo/build-a-token-counter-and-cost-estimator-with-streamlit-and-openai-2181e603f7cb - The Ultimate Pricing Cheat-Sheet for Large Language Models
https://www.newtuple.com/post/the-ultimate-pricing-cheat-sheet-for-large-language-models - Cost Analysis of deploying LLMs: A comparative Study between Cloud Managed, Self-Hosted and 3rd Party LLMs | by Hugo Debes | Artefact Engineering and Data Science | Medium
https://medium.com/artefact-engineering-and-data-science/llms-deployment-a-practical-cost-analysis-e0c1b8eb08ca - Your request has been blocked. This could be
due to several reasons.
https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/how-to-evaluate-llms-a-complete-metric-framework/ - llm-cost-estimation — llm-cost-estimator latest documentation
https://llm-cost-estimator.readthedocs.io/en/latest/index.html - GitHub – egordm/RougLLy: Quick and Realistic Cost Estimation for LLMs.
https://github.com/egordm/RougLLy - Paul Simmering – LLM Price Comparison
https://simmering.dev/blog/llm-price-performance/ - Understanding the cost of Large Language Models (LLMs)
https://www.tensorops.ai/post/understanding-the-cost-of-large-language-models-llms - Advanced Prompt Engineering – Practical Examples
https://www.tensorops.ai/post/prompt-engineering-techniques-practical-guide - GitHub – TensorOpsAI/LLMstudio: Framework to bring LLM applications to production
https://github.com/TensorOpsAI/LLMStudio - TokenCounter: tokenize and estimate your LLM costs
https://www.tokencounter.io/ - LLM Price Calculator
https://www.llmcalc.com/ - Compare LLM API Pricing Instantly – Get the Best Deals at LLM Price Check
https://llmpricecheck.com/ - LLM Pricing Calculator – LLM Price Check
https://llmpricecheck.com/calculator - llm_cost_estimation · PyPI
https://pypi.org/project/llm_cost_estimation/ - LLM Pricing – Compare Large Language Model Costs and Pricing
https://llm-price.com/ - GitHub – g-simmons/llm-cost-estimator: A simple cost estimator for batch text generation with OpenAI LLMs
https://github.com/g-simmons/llm-cost-estimator - GitHub – AgentOps-AI/tokencost: Easy token price estimates for LLMs
https://github.com/AgentOps-AI/tokencost - 50+ Open-Source Options for Running LLMs Locally – Vince Lam
https://vinlam.com/posts/local-llm-options/ - GitHub – VidhyaVarshanyJS/EnsembleX: EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
https://github.com/VidhyaVarshanyJS/EnsembleX - Tokenizer
https://platform.openai.com/tokenizer - Log in with Atlassian account
https://francemm.atlassian.net/browse/IA-98 - Technologie | Mistral AI | Frontier AI in your hands
https://mistral.ai/fr/technology/ - Usage tiers ChatGPT
https://platform.openai.com/docs/guides/rate-limits/usage-tiers - Technologie | Mistral AI | Frontier AI in your hands
https://mistral.ai/fr/technology/ - Pay As You Go—Buy Directly | Microsoft Azure
https://azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go - Mistral Large now available on Azure
https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/mistral-large-mistral-ai-s-flagship-llm-debuts-on-azure-ai/ba-p/4066996 - Azure AI | Mistral AI Large Language Models
https://docs.mistral.ai/deployment/cloud/azure/