Which AI tool (aka LLM) should I pick for my next task?

In the growing world of AI models, there‘s an exciting variety of tools and platforms that are changing the way we interact with technology. OpenAI's GPT series, including ChatGPT, has been making waves of excitement for its uncanny ability to generate human-like text across various tasks. Models like PaLM 2, developed by Google DeepMind, have turned everyones heads, not just for improving English but for nailing performance across a wide range of languages. And let’s not forget contenders like Gemini, which are bringing fresh ideas to the natural language processing game.

ChatGPT, with its sleek architecture and remarkable performance, promises to push the boundaries of what's possible in AI-generated content. Meanwhile, LlaMA's 3.1 bold advancements in understanding and generating text proves just how many different approaches are thriving in this space. The innovation pace is dizzying, and with ChatGPT version 4o and 4o mini, as well as its opponent LlaMA 3.1 405B, being released, the horizon of possibilities expands even further. 

If you’ve checked out our previous post on the basics of prompt engineering, consider this blog as an extension of it, because to truly master this skill, you need to get hands-on with LLMs, testing and refining your prompts as you go.

In this blog, we will break down what LLMs are, how they work, the different types of LLMs out there, and which ones are best for specific tasks.

LLM = a librarian? 

The first thing to do, to understand prompt engineering is to ask yourself, what are those Large Language Models that we have mentioned? 

Let us guide you through by giving an example. Imagine you’ve got a librarian – not just any librarian, but one who’s memorized every single book in the library. You ask for information, and the librarian can draw out the specific fact or piece you require. But here's the catch: this librarian has never left the library. They have no real-world experience, no personal insights and can only provide information based on what they've read in the books. This example draws a picture of how LLMs (Large Language Models) work.

LLMs like ChatGPT are trained on massive amounts of text data. They don't “know” things in the same sense that people do, but they're really effective at replicating human-like language based on all the patterns they've seen. So, if you ask an LLM to create something that seems like a scientific study and it has been trained on a large number of academic materials, it will produce something that appears credible. But don't be fooled: LLMs, like the librarian who just understands what's in the books, lack real-life experience and perspectives. They are not “thinking” in the same way that humans do; they are simply very excellent at predicting what will happen next based on what they have learned. Simple as that.

The variety of LLMs and their unique features

Now that we uncovered what those LLM’s mean, let’s go deeper to find out the wide range of LLM models. Large Language Models (LLMs) are taking over the AI game, and these days the competition between the models is wild. Everyone from OpenAI to Google and Meta or X is flexing their AI muscles to create the best LLM for different tasks. But the question remains: which model should I use and for what?

So, whether you're diving into advanced coding or just trying to keep up with the AI arms race, it helps to know where each model has their strengths in it. Let us break down the battle between the most popular 2024 LLM models: ChatGPT 4 by OpenAI, Claude 3 by Anthropic, PaLM 2, Gemini by Google DeepMind, and LLaMA 3.1 by Meta.

ChatGPT 4 aka “the heavy weight champion”

ChatGPT 4 with a context window of 8k tokens (how much the model can “read” at one time) is like the gold standard for versatility and creative problem-solving. It’s designed for deep, complex tasks like long-form content generation and technical reasoning.

ChatGPT 4 is unmatched when it comes to handling a variety of tasks, from natural language understanding to coding. But hey, quality has its price, because ChatGPT 4 is one of the most expensive LLMs out there. Although speaking of speed, ChatGPT 4 is not the fastest in the race – it’s designed more for quality than quick turnarounds and it’s slower compared to other models like Claude 3 and LLaMA 3.1. Bottom line: if you're after the best of the best in depth and reasoning, ChatGPT 4 is your go-to, but be ready to pay for it.

Claude 3 aka “the ethical AI wizard”

Anthropic, the company behind Claude, emphasizes safety and developed Claude 3 with a strong focus on ethical AI principles. With a 200k token context window, which is like reading an entire book in one go – making it perfect for analyzing long documents or massive datasets. The Claude 3 Sonnet model, in particular, is optimized for speed and can process tasks faster than both its predecessor Claude 2 and ChatGPT 4 in some cases. However, it's important to note that while it’s great at keeping things secure and speedy, its accuracy in certain tasks (especially those involving complex reasoning) may still lag behind ChatGPT 4. To wrap things up, if fast model pace, safety, and moral AI are your top priorities, Claude 3 is your best bet – but if you’re looking for a deep dive into complex reasoning, ChatGPT 4 might still have the upper hand.

PaLM 2 aka “the multilingual professor”

If your work crosses linguistic barriers, this model is for you. PaLM 2 excels in understanding and generating text in multiple languages, making it a go-to for projects that need strong translation capabilities or deal with non-English content. In comparison, while LLaMA 3.1 might flex on its 405 billion parameters, PaLM 2 steals the show in multilingual tasks, supporting over 100 languages and crushing it in cross-linguistic benchmarks. Plus, it’s more affordable than the heavyweights like ChatGPT, making it a great option for those who need high-quality performance without breaking the bank. While this LLM model may not lead the way in deep reasoning or coding, PaLM 2 is your budget-friendly, multilingual AI buddy that gets the job done.

LLaMA 3.1 aka “the open-source diamond”

For those who want control, flexibility, and transparency, LLaMA 3.1 is an open-source gem. With models scaling up to 405 billion parameters, it's perfect for research and development environments where customization is key. While it doesn't quite reach the creative heights of ChatGPT 4 or the speed of Claude 3, LLaMA 3.1 is a solid performer, particularly in coding tasks and general reasoning. Plus, Meta is constantly improving LLaMA, with a multimodal version in the works, potentially making it a strong player in future creative and visual tasks.

Gemini aka “the multimodal guy”

Finally, we’ve got Gemini, Google DeepMind’s latest creation that blends text and image processing like a pro. It’s multimodal, meaning it can handle text, images, video, and even audio – opening up some seriously cool possibilities for both creative and technical projects. While it excels in raw processing power, outperforming ChatGPT 4 in several benchmarks, Gemini’s integration with Google’s tools makes it ideal for enterprise tasks, particularly those that require web-based searches or multimodal reasoning. Plus, with all the customization options at your fingertips, Gemini is a versatile pick for complex, multi-faceted projects.

Final thoughts

So, which LLM should you pick? Well, that depends on what you value and need most. ChatGPT is your best bet for quality and depth but be ready to pay up. Claude 3 is your ethical, speed solution, which won’t break the bank, when it comes to pay time, while PaLM 2 shines in multilingual tasks and gives you fast results without cutting any corners on quality. LLaMA 3.1 is a great research tool that’s budget-friendly, and Gemini? that’s your creative and multimodal guy, ready to tackle tasks from coding to arithmetic and general reasoning.

To sum it up, although we have covered just a few LLM models in our post, the main point is that whatever model you choose, remember: there’s no “one-size-fits-all” LLM. It’s all about finding the right model for the job. When you choose the right one, don’t forget to check out our blog on prompt engineering to get maximum efficiency and results from the LLM.