Unit Progress
0% Complete

Having explored various AI applications and how to evaluate features, let’s now look ‘under the hood’ at a core technology driving many of these advancements: Large Language Models, often referred to as LLMs. This section focuses on helping you understand these foundational models better.

1.4.1 Understanding Popular AI Language Models for Business Usere

If you’re considering creating your own AI-driven solutions or integrating powerful language capabilities directly into your workflows, you’ll likely encounter well-known foundational LLMs such as OpenAI’s ChatGPT series, Anthropic’s Claude, or Google’s Gemini. Choosing which model or provider might work best for your specific needs can seem like a task reserved for AI experts, given the rapid developments and technical nuances. However, you can approach this at an introductory level to gain clarity on the key factors to consider for an initial assessment.

(While this provides a starting point, making strategic, long-term decisions about adopting or building on specific LLMs often requires deeper technical analysis and due diligence beyond this introductory overview.)

Initial Considerations When Choosing an LLM

Before diving deep into specific model comparisons, start by clarifying your own requirements. Thinking through these points will help you narrow down options and ask vendors the right questions:

  1. Identify Your Needs: What specific task(s) do you primarily want the AI to perform? (e.g., drafting marketing copy, summarizing reports, answering customer service queries, generating code, analyzing data?). Does your task require expertise or tuning specific to your industry?
  2. Ease of Use and Technical Resources: How easily can the model be implemented? Are you looking for ready-to-use interfaces (like the web versions of ChatGPT, Claude, Gemini), or do you need API access for integration? Consider platforms with strong vendor support or extensive documentation if your internal technical resources are limited.
  3. Customization and Fine-Tuning: Do you need the AI to learn your specific company voice, terminology, or knowledge base? Assess whether you need the capability to fine-tune a model on your own data, and what that would entail.
  4. Integration Capabilities: How easily can the model’s API or platform integrate with the software and systems you already use (e.g., CRM, email marketing tools, internal databases)? Check for compatibility and available integrations.
  5. Cost Structure: LLM costs can vary significantly. Understand the pricing model (e.g., subscription tiers, pay-per-use based on tokens, flat fees). Factor in potential setup costs, integration expenses, and ongoing operational costs, not just the base price.
  6. Data Privacy and Compliance: This is critical. Ensure any considered model or platform complies with data protection laws relevant to your business and customers (like GDPR). Understand clearly how your input data is used (e.g., for training the vendor’s model?), stored, and protected. Look for vendors transparent about their security and compliance measures.
  7. Scalability and Future Needs: Choose a model and provider that can potentially grow with your business needs. Consider how often the models are updated and improved. Is there a clear future roadmap that aligns with your potential long-term usage?

Comparing Specific Models

Understanding the fundamental differences between major LLMs is helpful. While the absolute “best” model changes rapidly with new releases, the types of differences often persist. Some models might excel in creative writing (like certain versions of ChatGPT), others prioritize safety and nuanced reasoning (often associated with Claude), while some lead in multimodal capabilities (handling text, images, audio like Gemini) or deep integration with specific ecosystems.

The following table presents an overview comparing some established LLM providers often discussed in a business context: OpenAI (ChatGPT models), Anthropic (Claude models), and Google (Gemini models) in 2025 (May).

Important Note: The AI landscape changes extremely fast! Specific model versions (e.g., GPT-4 vs. GPT-4o, different Claude 3 variants like Haiku/Sonnet/Opus, various Gemini versions like Pro/Ultra) have distinct capabilities, performance levels, context window sizes, and pricing. Always consult the providers’ current documentation for the latest, most accurate information before making any decisions. This table is intended only to illustrate the kinds of strategic differences and typical strengths associated with these major players, helping you understand what factors to compare.

Feature / Aspect

ChatGPT (OpenAI Models)

Claude (Anthropic Models)

Gemini (Google Models)

General Focus / Strength

High Versatility, Strong Creativity, Conversational Ease, Broad Ecosystem

Emphasis on Safety, Ethics, Nuanced Reasoning, Handling Very Long Context

Strong Multimodality (Text, Image, Audio, Video), Deep Google Ecosystem Integration, Speed

Key Characteristics

Excels at generating diverse creative text formats.
Large ecosystem with integrations, custom GPTs.
Often seen as a capable “all-around” performer.
Can access external information (depending on version/interface).

Designed with “Constitutional AI” for safety & ethical alignment.
Outstanding ability to process and reason over very large amounts of text (long documents, codebases).
Strong analytical capabilities.
Can be more cautious or refuse prompts deemed potentially problematic.

Natively built to understand and combine multiple data types (text, images, etc.).
Seamless integration potential with Google Workspace (Docs, Sheets, Gmail).
Often cited for fast response times (latency).
Strong performance reported in logic, math, and coding tasks.

Potential SME Uses

Marketing & Content Creation: Drafting emails, blog posts, social media content, product descriptions.
Productivity & Automation: Summarizing meetings or long texts, generating first drafts of reports, automating simple text-based tasks.
Customer Service: Powering chatbots for initial query handling, drafting standard replies.
Innovation: Brainstorming ideas, exploring creative concepts.

Compliance & HR: Analyzing dense legal documents or regulations, drafting internal policies, summarizing compliance materials.
Data Analysis: Extracting insights from long customer feedback reports, detailed market research analysis.
Knowledge Management: Creating reliable Q&A systems based on extensive internal documentation.
Customer Service (Regulated/Sensitive): Providing careful, measured responses in sensitive contexts.

Marketing & E-commerce: Generating descriptions based on product images, analyzing visual trends, creating multimodal ad content.
Productivity (within Google Workspace): Summarizing emails in Gmail, generating text in Docs, analyzing data in Sheets.
Operations: Analyzing reports containing both text and charts/images.
Development & IT: Assisting with coding tasks, explaining code visually.

Key Considerations / Trade-offs

May require more prompt engineering for highly nuanced or safety-critical tasks.
Data usage policies for training require careful review (though business versions offer more privacy).
Deepest integration often within its own ecosystem.

Conservatism might limit certain creative or exploratory uses.
Ecosystem of integrations might be less extensive than OpenAI’s.
Historically positioned with premium pricing for top models (check current tiers).

Maximum benefit often realized within the Google Workspace ecosystem.
Advanced multimodal features might be unnecessary if your needs are purely text-based.
As with all providers, rapid model updates require staying current.

As you’ve likely gathered, LLMs do not come in one size fits all! Choosing an LLM – or perhaps even more than one for different needs – is fundamentally connected to your intended use case, as each model is typically optimized for different ranges of tasks. Some models excel at creative generation, while others are engineered for more rigorous analysis or adherence to safety guidelines, and these represent just a few of the dimensions to consider!

1.4.2 Calculating Usage Costs for LLMs

When using Large Language Models (LLMs), especially when accessed via Application Programming Interfaces (APIs) from providers like OpenAI, Google, Anthropic, or through platforms like Microsoft Azure AI, understanding the cost structure is essential. Usage costs are typically calculated based on the amount of text processed, measured in units called “tokens.”

Token-Based Billing Explained

What is a Token?
In the context of AI language models, a token isn’t necessarily a whole word. It’s the basic unit of text the model processes. Depending on the model’s specific “tokenizer” (the tool that breaks down text), a token could be a whole word, a part of a word (subword), a single character, or even punctuation. The process of breaking text down this way is called tokenization.

Estimation for English
While the exact count varies, a common rule of thumb for English text is that one token represents roughly 4 characters or about 0.75 words. Conversely, one word is approximately 1.33 tokens. (Note: This ratio can differ significantly for other languages and for text containing lots of punctuation or code. Most providers offer online tools to calculate the precise token count for your specific text based on their tokenizer.)

Input vs. Output Tokens
Costs are calculated based on both the text you send to the model (your input tokens or “prompt”) and the text the model generates for you (the output tokens or “completion”/”response”).

Pricing Structure
Crucially, providers often charge different rates per token for input versus output. Typically, output tokens are more expensive because they reflect the computational effort of the model generating new content. Furthermore, costs vary significantly depending on the specific LLM used (more advanced models generally cost more per token). Some providers may offer volume discounts or tiered pricing based on usage levels, potentially allowing for lower rates if you commit to a certain minimum usage. Always check the specific provider’s terms.

Calculating Costs

The general formula to calculate the cost for a specific API call, considering potentially different rates, is:

Total Cost = (Input Tokens / 1000 * Cost per 1k Input Tokens) + (Output Tokens / 1000 * Cost per 1k Output Tokens)

(Note: Pricing is almost universally quoted per 1,000 tokens, sometimes abbreviated as “/1k tokens” or “/kT”).

IMPORTANT PRICING DISCLAIMER: The token costs mentioned in the following examples are purely illustrative and used only to demonstrate the calculation method. They do not reflect current market rates, or any specific provider’s pricing. LLM pricing changes frequently and varies widely between providers (OpenAI, Anthropic, Google, Microsoft Azure AI, etc.) and different model versions (e.g., basic vs. advanced capabilities). You MUST always consult the official, current pricing pages of the specific LLM provider and model version you intend to use for accurate cost estimation.

Practical Examples:

Let’s apply the formula with illustrative pricing:

Example 1: Simple Chatbot Conversation

  • User Input: 15 words
  • AI Response: 25 words
  • Calculation Steps:
    1. Estimate Tokens (using 1 word ≈ 1.33 tokens):
      • Input Tokens: 15 words × 1.33 ≈ 20 tokens
      • Output Tokens: 25 words × 1.33 ≈ 33 tokens
    2. Calculate Cost (using a single hypothetical rate of $0.002 per 1k tokens for simplicity here):
      • Total Tokens = 20 + 33 = 53 tokens
      • Total Cost = (53 / 1000) * $0.002 = $0.000106
    3. Result: Cost per conversation ≈ $0.0001 (less than a cent).

Example 2: Document Summarization

  • Input Document: 2,500 words (approx. 5 pages)
  • Desired Summary: 500 words (approx. 1 page)
  • Calculation Steps:
    1. Estimate Tokens:
      • Input Tokens: 2,500 words × 1.33 ≈ 3,325 tokens
      • Output Tokens: 500 words × 1.33 ≈ 665 tokens
    2. Calculate Cost (using different hypothetical rates: Input @ $0.03/1k tokens, Output @ $0.06/1k tokens):
      • Input Cost = (3,325 / 1000) * $0.03 = $0.09975
      • Output Cost = (665 / 1000) * $0.06 = $0.0399
      • Total Cost = $0.09975 + $0.0399 = $0.13965
    3. Result: Cost to summarize ≈ $0.14.

Example 3: Content Generation

  • Task: Generate a 2,000-word article
  • Input Prompt: 100 words
  • Calculation Steps:
    1. Estimate Tokens:
      • Input Tokens: 100 words × 1.33 ≈ 133 tokens
      • Output Tokens: 2,000 words × 1.33 ≈ 2,660 tokens
    2. Calculate Cost (using same hypothetical rates as Ex 2: Input @ $0.03/1k, Output @ $0.06/1k):
      • Input Cost = (133 / 1000) * $0.03 = $0.00399
      • Output Cost = (2,660 / 1000) * $0.06 = $0.1596
      • Total Cost = $0.00399 + $0.1596 = $0.16359
    3. Result: Cost to generate article ≈ $0.16.

Example 4: Translation of a Document

  • Document Length: 5,000 words (approx. 10 pages)
  • Assumption: Output word count is roughly similar to input.
  • Calculation Steps:
    1. Estimate Tokens:
      • Input Tokens: 5,000 words × 1.33 ≈ 6,650 tokens
      • Output Tokens: Assume approx. 6,650 tokens
    2. Calculate Cost (using a single hypothetical rate of $0.002 per 1k tokens for simplicity):
      • Total Tokens = 6,650 + 6,650 = 13,300 tokens
      • Total Cost = (13,300 / 1000) * $0.002 = $0.0266
    3. Result: Cost to translate ≈ $0.027.

Beyond Per-Token Costs

While token-based pricing for API calls is common, keep in mind other potential costs when budgeting for AI integration:

  • Fine-tuning: If you need to adapt a pre-trained model to your specific data or task, there are typically costs associated with the training process itself, and potentially higher ongoing costs for hosting the customized model.
  • Dedicated Capacity / Provisioned Throughput: For applications requiring guaranteed performance levels or very high volume, some providers offer dedicated instances of models, often billed hourly or monthly with committed usage, rather than strictly per token.
  • Platform Fees: If accessing LLMs through a third-party platform or software that bundles AI features, there might be subscription fees separate from or in addition to the underlying token usage costs.
  • Integration & Development: Don’t forget the internal or external costs associated with the initial development work required to integrate the LLM API into your existing software or workflows.

Understanding these cost components allows for more realistic budgeting and ROI calculation when considering LLM adoption for your SME.

Activity: “Applying the AI Evaluation Framework”

Learning Activity: Applying the AI Evaluation Framework

Introduction: You've just reviewed a comprehensive set of questions (in section 2.2.1) to ask when evaluating AI features or solutions for your business. But how do these questions apply in practice? Theory is one thing, but real-world scenarios bring the evaluation process to life.

This activity will walk you through two common situations where an SME might consider adopting AI. For each scenario, take a moment to think about which evaluation questions would be most critical before reading the expert commentary. This will help solidify your understanding of how to apply the framework effectively. Let's look at the first scenario.

Scenario 1: The E-commerce Retailer and the AI Chatbot

Business Context:
Imagine you run a small but growing online shop selling specialized goods (e.g., artisanal products, hobby supplies).
Your small team handles all aspects, including customer service via email and website chat.

Challenge:
You're receiving a high volume of repetitive customer inquiries about order status, shipping details, product specifications, and return policies.
Answering these takes up significant time, delaying responses to more complex questions or sales opportunities.
You also lose potential customers who ask questions outside of your limited business hours.

Proposed AI Solution:
Your e-commerce platform provider is promoting a new, integrated AI chatbot feature.
They claim it uses Natural Language Processing (NLP) to understand customer questions and can instantly answer over 70% of common inquiries 24/7,
pulling information directly from your product catalog and order system.
They offer it as a monthly add-on subscription.

Applying the Framework - Your Turn to Think:
Review the evaluation questions listed in section 2.2.1.
For this specific scenario, which 3–4 categories of questions do you think are the most critical to investigate thoroughly before subscribing to this AI chatbot?
Why are those particular areas so important here?

(Consider your answer, then click Next to see the expert commentary.)

Expert Commentary for Scenario 1

This is a very common situation for growing online businesses! While an AI chatbot promises significant time savings and improved customer service availability, a thorough evaluation is key before committing. Based on the framework in section 2.2.1, here are the areas experts would typically prioritize investigating for this type of integrated chatbot:

  1. Accuracy and Performance: This is paramount. Don't just rely on the vendor's "70% success" claim. Ask for specific performance metrics (like actual correct answer rates, escalation rates). How was this measured? Request validation data or ideally, a trial. An inaccurate chatbot can frustrate customers more than a delayed human response.
  2. Data Privacy and Security / GDPR Compliance: The chatbot will likely access customer names, order history, and potentially contact details (personal data). It's crucial to understand exactly what data it uses, how it's stored, who has access, and how it complies with GDPR. Ask about data minimization, encryption, and how data subject rights are handled. Failure here carries significant legal and reputational risk.
  3. Integration and Implementation: How truly "integrated" is it? Does it seamlessly pull real-time order status and product info? What technical effort is needed on your side? Will it conflict with other website plugins? A clunky integration can create more problems than it solves.
  4. Transparency and Explainability: While deep explainability might be complex, you need some level of understanding. If the bot provides incorrect information (e.g., wrong shipping cost), can the vendor explain why? More importantly, can you easily override or correct the bot's knowledge base? Lack of control or transparency can be problematic.
  5. Support and Maintenance: What happens when the bot fails or encounters questions it wasn't trained on? What level of support does the vendor provide for troubleshooting and updating the bot's capabilities or knowledge? Ensure there's a clear process for handling issues and ongoing improvements.
While other areas like cost (Licensing) are obviously important, focusing on these five ensures the core functionality is effective, secure, compliant, and manageable for your SME.

Scenario 2: The Food Distributor and the AI Forecasting Tool

Business Context:

Picture yourself running a local distribution business supplying fresh produce or specialty food items to restaurants and cafes. Managing inventory effectively is crucial due to the perishable nature of your goods.

Challenge:

You constantly struggle with balancing inventory levels. Sometimes you run out of popular items, leading to lost sales and unhappy clients. Other times, you overstock, resulting in costly spoilage and waste. Your current forecasting method relies heavily on past orders and gut feeling, which isn't proving reliable enough as your business and product range grow.

Proposed AI Solution:

You're evaluating a specialized software solution for food distributors that includes an AI-powered demand forecasting module. The vendor claims the AI analyzes your historical sales data, accounts for seasonality, and potentially even local events or weather patterns (if data is available) to provide significantly more accurate weekly or daily stock level recommendations for each item, helping you optimize ordering.

Applying the Framework - Your Turn to Think:

Look back at the evaluation questions in section 2.2.1. In this situation, trying to forecast demand for perishable goods, which 3-4 categories of questions would be your top priority to discuss with this vendor? What specific information would you need to feel confident in this AI tool?

(Consider your answer, then click Next to see the expert commentary.)

Expert Commentary for Scenario 2

Using AI for demand forecasting, especially with perishables, offers huge potential but also carries risks if not implemented correctly. For the food distributor considering this AI module, here are the critical evaluation areas experts would focus on:

  1. Accuracy and Performance Metrics: This is non-negotiable. Vague claims of "more accurate" aren't enough. Ask exactly how accuracy is measured (e.g., Mean Absolute Percentage Error - MAPE, bias). What specific metrics demonstrate improvement over your current methods? Request case studies with quantifiable results (e.g., reduction in spoilage %, reduction in stockout %). Even small forecasting errors can be costly with perishables.
  2. Data Quality and Requirements: The AI model is entirely dependent on the quality and relevance of the data fed into it. Critically evaluate if your historical sales data is clean, complete, granular enough (e.g., daily sales per item), and sufficient in volume for the AI to learn meaningful patterns. Ask the vendor detailed questions about the specific data requirements. Remember: garbage data in, garbage forecast out.
  3. Integration and Implementation: How smoothly does this module connect with your existing Point-of-Sale (POS), inventory management, and ordering systems? Will data flow automatically, or does it require manual export/import? Seamless integration is vital for efficiency and accuracy.
  4. Vendor Experience and References: Has this vendor successfully implemented this specific AI forecasting tool for other businesses in the food distribution sector, particularly those dealing with perishables? Ask for references you can talk to. Generic forecasting models might not capture the nuances of your industry (short shelf lives, specific seasonal peaks, impact of promotions).
  5. Customization and Control: Can the AI model be customized or fine-tuned based on factors unique to your business (e.g., specific supplier lead times, known local events impacting demand, different perishability rates for various products)? Do you have any control over key assumptions or parameters used by the AI, or is it a complete "black box"?
Other factors like Data Ownership, Scalability, and Support are also important, but ensuring the tool is genuinely accurate for your specific context, uses good data, integrates well, and comes from a vendor with relevant experience are the absolute priorities here.

Applying Your Knowledge

These scenarios illustrate why a structured evaluation approach, like the one outlined in section 2.2.1, is so important. The critical questions will vary depending on the specific AI application and your business context, but thinking through these areas systematically helps you move beyond vendor claims and make informed decisions.

Remember, successful AI adoption isn't just about the technology; it's about choosing the right technology and ensuring it aligns with your data, processes, compliance needs, and strategic goals.