One of the most notable technologies in the field of artificial intelligence in recent years is chatbots based on natural language processing(NLP), especially OpenAI’s ChatGPT. In this post, we will examine the pricing policies and API usage costs of ChatGPT since the release of GPT-4 and explore ways to minimize costs for efficient use.
There are different costs for using the browser Chat and the API under the paid policy. Let’s check how much they cost and how to achieve optimal cost efficiency.
ChatGPT Pricing Policy
Free Policy
- The Chat provided on the OpenAI ChatGPT website offers an environment where general users can use or test the model.
- You can use the GPT-3.5 model(175 billion parameters) for free.
- There may be delays during peak hours.
Paid Policy
- You must subscribe to ChatGPT Plus($20 per month, including VAT $22) to use it.
- You can use the more performant GPT-4 model.
(However, you can only ask 40 questions within 3 hours in browser Chat.) - Priority access during peak hours and faster response times are available.
- New features(DALL-E, Browsing, Advanced Data Analysis, Plugins, Vision, Voice) such as image generation or web search can be used.
- You can use ChatGPT in software via the Application Programming Interface(API).
(API charges apply differently from ChatGPT Plus.)
ChatGPT API Pricing Policy
Language Model Pricing
GPT-3.5-Turbo
- gpt-3.5-turbo-0125 Model: Input $0.5 / 1M tokens, Output $1.5 / 1M tokens
- gpt-3.5-turbo-instruct Model: Input $1.5 / 1M tokens, Output $2.0 / 1M tokens
GPT-4-Turbo
- gpt-4-turbo All Models: Input $10.0 / 1M tokens, Output $30.0 / 1M tokens
(gpt-4-0125-preview, gpt-4-1106-preview, gpt-4-1106-vision-preview) - Vision pricing(based on 1024 px x 1024 px): $0.00765
GPT-4
- gpt-4 Model: Input $30.0 / 1M tokens, Output $60.0 / 1M tokens
- gpt-4-32k Model: Input $60.0 / 1M tokens, Output $120.0 / 1M tokens
Assistants API Price
- Code interpreter: $0.03 / session
- Retrieval: $0.20 / GB / assistant / day (free until 04/01/2024)
What is a Token?
A token is a piece of a word used in natural language processing. In English text, 1 token corresponds to approximately 4 characters or 0.75 words, and all of Shakespeare’s works correspond to about 900,000 words or 1.2M tokens.
Let’s check the number of tokens ourselves.
“ChatGPT” corresponds to 7 letters and 3 tokens,
and the sentence “You can do various activities using ChatGPT.” corresponds to 44 letters and 10 tokens.
Now, let’s compare the tokens for input and output of a question in GPT-4
For the English question “Where is the highest mountain in the world?” the GPT-4 answer is “The highest mountain in the world is Mount Everest, located on the border between Nepal and the Tibet Autonomous Region of China. Its official elevation is 8,848.86 meters (29,031.7 feet) above sea level, as determined by a 2019 survey.”
The question is composed of 43 letters and 9 tokens, and the answer is composed of 236 letters and 56 tokens.
You can check the number of tokens at OpenAI – Tokenizer.
Calculating the cost based on tokens(based on GPT-4 32K context)
Input: 9 tokens, Output: 56 tokens (Cost: ($60.09 + $120.056)/1M = $0.00726)
Differences in Context Size
4K, 8K, 16K, 32K context sizes represent the “Context Size” of the model, i.e., the number of tokens the model can process at once. 4K can process up to 4,000 tokens in one request, and 32K can process up to 32,000 tokens.
- Processing Capacity: The 32K context model can process more data at once than the 8K context model. Therefore, it is useful when dealing with longer texts or conversations.
- Cost: Generally, models with larger context sizes cost more. Based on the price information above, the 32K context costs more than the 8K context.
- Response Time: The 32K context model may have longer response times because it processes more data.
- Accuracy and Performance: Models with larger context sizes may be more advantageous in understanding more complex contexts. Therefore, they can provide more accurate responses while maintaining the context of long texts or conversations.
Strategies for Token and Cost Reduction
- Query Optimization: You can minimize the number of required tokens by using concise and clear queries.
- Language Choice: Requesting in English rather than other languages can minimize the number of used tokens.
- Model Selection: Choose the appropriate model based on the complexity of the task (Model and Context Size Selection).
- Token Limit: Manage costs by limiting the maximum number of tokens in API requests.
(optimize max_tokens, n, best_of parameters)
The maximum number of generated tokens is calculated as [max_tokens * max(n, best_of)], and if n, best_of are 1(Default), answers of max_tokens are generated.
(n: number of returned answers, best_of: number of answers considered) - Monitoring: Regularly check usage and adjust the budget as needed.
- Including stop sequences: Add stop sequences to prevent unnecessary token generation.
For example, when creating a list, you can use stop sequences.
If [11] is used as a stop sequence, the completion is interrupted when [11] is reached, so only [10] items are created in the list.
Which model should be used?
- GPT-3.5-Turbo: A model optimized for conversation
- GPT-4: A model for more complex tasks and extensive knowledge
Tips(Ways to Minimize Costs)
- If using ChatGPT Plus, conduct a pre-evaluation of GPT-3.5 and GPT-4 through Chat rather than the API.
- Prefer using the GPT-3.5-Turbo model over the GPT-4 model.
- Implement it to work in English rather than other languages (uses fewer tokens).
- If an answer in another language is needed
When using the GPT-3.5-Turbo model: Request the answer in the desired language directly.
When using the GPT-4 model: Request the answer in English using the GPT-4 model, then request a translation into the desired language using the GPT-3.5-Turbo model. - Make good use of the Stop Sequence.
Let’s create a ChatGPT service while minimizing costs in this way.