Unraveling the Cost of AI: The Hidden Expenses of API Keys and Pay-as-You-Go Pricing in AI-Based Products

“After all, we are not communist” says Emilio “The Wolf” Barzini in the Godafather* and indeed despite the virtuous storytelling, IA is here to make money and to make a lot…

For this post, you can find all files, mostly prompts, on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm

* I watched Coppola’s The Godfather once again which is, in addition to being a great film about the Italian American mafia, a great lesson on capitalism and team management!

More seriously, as the PO of an AI-based product, the question of price quickly arises. I’m not talking about any development costs but much more the price of an API key and its use via prompts. Indeed, IA companies are not philanthropists, and their economic models are based on addiction. Without paying attention, the pay-as-you-go pricing system can quickly become “poisonous” as the more you outsource tasks to IA, the more you pay.

For the last week, I have decided to go with ChatGPT and Mistral API key, so I was forced to scrutinize the pricing pages for input and output according to the models. Here is the ressources:

  1. The precise prices of the ChatGPT API: https://openai.com/api/pricing

  2. The precise prices of the Mistral API see the Pay as you Go section: https://mistral.ai/fr/technology/#models

First, I suck in excel so I’d rather go with python. Rapidly, I made a search on explanations about the ChatGPT and Mistral pricing because I am lazy. I found good ressources, especially this post: “Reduce Your OpenAI API Costs by 70%” at https://levelup.gitconnected.com/reduce-your-openai-api-costs-by-70-a9f123ce55a6

This post introduces a fruitful correlation between cost efficiency and prompt design patterns!

Some tips on Cost efficiency & Prompt Design Patterns

Here are notions quickly summarized that I kept from this post. The very first thing that you need to know is that, when you connect with an LLM via an API key, you are billed when sending content to the LLM (input) and when receiving content from the LLM (output).

1. Number of tokens for a prompt

For input, even though you’re sending a very short prompt, there will be additional tokens sent within the prompt.

Notice that prompt_tokens, which refers to the input tokens, is 8. This is because every time you send text to the API, an additional 7 tokens are automatically added.

A good tool for ChatGPT, to count tokens number, it is to use this tool which calculates the tokens number from a text: https://www.tokencounter.io/

For LLM, the key element the Token so what you need to know is “what a token is?” which is the basic element that will be “input” and “output” by the LLM.

For example, for an English text, 1 token corresponds to approximately 4 characters or 0.75 words. For reference, Shakespeare’s collected works are approximately 900,000 words or 1.2 million tokens.

# some examples

1 token ~= 4 chars in English
1 token ~= ¾ words
100 tokens ~= 75 words

1-2 sentence(s) ~= 30 tokens
1 paragraph ~= 100 tokens
1,500 words ~= 2048 tokens
2,000 words ~= 2730 tokens

Simple Token definition

You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

Source: https://platform.openai.com/tokenizer

2. Number of tokens for completion
You will pay also what is produced by the LLM in the form of tokens.

The value is what we expected, representing the tokens in the response generated by the model.

See 001_reduce_api_costs.py, 002_reduce_api_costs.py, 003_reduce_api_costs.py at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/reduce_api_costs

3. Clustering using OpenAI API

Clustering, a great idea that must be explored if you intend to leverage on an LLM API KEY.

Here a quick definition of the user need.

Imagine you have a huge list of news headlines and you want to cluster them. While using embeddings is one option, let’s say you want to use OpenAI language models API, as it can capture meaning in a human-like way. Before we create the prompt template to minimize costs, let’s look at what our news headlines list looks like. I’ve represented the news headlines in a shorter form, like s0, s1, s2, and so on. The reason for this is that when the language model clusters them, it can simply use these short abbreviations (e.g., s35 for the 35th news headline) instead of writing out the entire headline in each cluster.

Here is also a simple explanation of the expectation.

Next, I defined my prompt template. This template specifies the format of the answer I want the language model to provide, along with some additional information for clarity. The key here is that we’re not asking the model to write out the full headlines, but rather to just use the short abbreviations. All we need to do is pass this prompt template to the function we created earlier and see how it performs in terms of pricing and response quality.

See 003_reduce_api_costs.py at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/reduce_api_costs

4. SpellCheck using OpenAI API
Prompts enable you to do anything from summarize a text to code a script so why do not use LLM to correct spelling mistakes. The only drawback, the experience has been made in English only, I am not sure that the performances in correcting spelling mistakes are equivalent in any language.

Let’s say you have a lengthy text document and you want to build a grammar correction tool as a small web app. While there are many NLP techniques available for this task, language models, particularly those from OpenAI, have been trained on vast amounts of data, making them a potentially better choice. Again, the key is to be strategic with our prompt template. We want the API response to highlight incorrect words and suggest their correct spellings, rather than providing the entire corrected text as output.

See 005_reduce_api_costs.py, 006_reduce_api_costs.py at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/reduce_api_costs

Exploring the pricing of the NVIDIA platform

There is also an uncomplicated way to explore different types of LLMs, a bit like huggingface.co, which is to rely on the NVIDIA platform which provides a turnkey package of LLMs and a handful of credits for start your tests. Once you have overcome the complexity of the site, you have access to code and a set of models: llama3-70b, phi-3-mini, codegemma-7b, mistral-large…


https://build.nvidia.com/explore/discover#llama3-70b
https://build.nvidia.com/explore/discover#phi-3-mini
https://build.nvidia.com/explore/discover#codegemma-7b
https://build.nvidia.com/mistralai/mistral-large
... etc

See files at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/using_nvidia_api

Exploring basiclingua-LLM-Based-NLP

The author of the post “Reduce Your OpenAI API Costs by 70%” have developed an NLP library that uses LLM APIs to perform various tasks. It includes over thirty features, many of which works with similar cost-optimization strategies as described in this post.

The code available on the GitHub repository possess numerous prompts resources that can provide great sample and models to learn prompting.

See files at https://github.com/bflaven/ia_usages/tree/main/ai_pricing_llm/using_basiclingua

Tasks resolved by BasicLINGUA
Entity Extraction, Text Summarization, Text Classification, Text Sentiment Analysis, Text Coreference Resolution, Text Intent Recognition, Text OCR, Text Anomaly Detection, Text Sense Disambiguation, Text Spellcheck

 
 
 
############## EXTRACT PATTERNS ##############
 
		# Generate the prompt template
        prompt_template = f'''
        Given the input text:
        user input: {user_input}
 
        extract following patterns from it: {patterns}
 
        output must be a python dictionary with keys as patterns and values as list of extracted patterns
        '''
 
 
 
 
############## NER EXTRACTION ##############
 
		# check if parameters are of correct type
        if not isinstance(user_input, str):
            raise TypeError("user_input must be of type str")
        if not isinstance(ner_tags, str):
            raise TypeError("ner_tags must be of type str")
 
        # check if parameters are not empty
        if not user_input:
            raise ValueError("user_input cannot be empty")
 
        # user ner tags
        if ner_tags != "":
            user_ner_tags = f'''NER TAGS: {ner_tags}'''
        else:
            user_ner_tags = f'''NER TAGS: FAC, CARDINAL, NUMBER, DEMONYM, QUANTITY, TITLE, PHONE_NUMBER, NATIONAL, JOB, PERSON, LOC, NORP, TIME, CITY, EMAIL, GPE, LANGUAGE, PRODUCT, ZIP_CODE, ADDRESS, MONEY, ORDINAL, DATE, EVENT, CRIMINAL_CHARGE, STATE_OR_PROVINCE, RELIGION, DURATION, URL, WORK_OF_ART, PERCENT, CAUSE_OF_DEATH, COUNTRY, ORG, LAW, NAME, COUNTRY, RELIGION, TIME'''
 
        # Generate the prompt template
        prompt_template = f'''Given the input text:
        user input: {user_input}
 
        perform NER detection on it.
        {user_ner_tags}
        answer must be in the format
        tag:value
        '''

Source: https://github.com/FareedKhan-dev/basiclingua-LLM-Based-NLP

A quick schematic process of validation of each AI feature

As a reminder, here is a detailed process to validate an IA feature. The validation process will be the same for each use case.

  • Phase_1: R&D phase with POCs
  • Phase_2: User feedback to gauge quality, artisanal validation, progress on the quality level.
  • Phase_3: Pilot phase integration into a business tool such as Backoffice. This pilot phase makes it possible to extend the validation phase to a larger sample of content and uses. This integration will be done via Feature flipping (the AI feature is only available to a limited number of users). This makes it possible to test/validate by having usage feedback on potential problems (performance, connectivity, refine usage based on production content, etc.). This is a Fine-tuning phase.
  • Phase_4: Put into production, opening of the feature to all users in the business tool e.g. Backoffice. The feature nevertheless continues to evolve because of improvement of the existing model or by changing the model.

As soon as the integration is done in a business tool, phase_3 must be carried out identically.

More infos