In this episode of the podcast, members of the InfoQ editorial staff and friends of InfoQ will be discussing the current trends in the domain of AI, ML and Data Engineering.
One of the regular features of InfoQ are the trends reports, which each focus on a different aspect of software development. These reports provide the InfoQ readers and listeners with a high-level overview of the topics to pay attention to this year.
InfoQ AI, ML and Data Engineering editorial team met with external guests to discuss the trends in AI and ML areas, and what to watch out for the next 12 months. This podcast is a recording of this discussion where panelists discuss how the innovative AI technologies are disrupting the industry.
Easily build, operate, and secure distributed applications that are resilient to failure, highly efficient, and operative at any scale with Akka by Lightbend.
Minimize expenses, improve operational efficiency, and achieve smooth integration from cloud to edge by leveraging Akka! Discover more at Lightbend.com/akka.
Srini Penchikala: Welcome, everyone, to the 2024 AI and ML Trends Report podcast. Warm greetings from the InfoQ’s AI, ML, and Data Engineering Team. In today’s episode, we’re joined by two remarkable guests to discuss this year’s trends in AI and ML technologies. I’m Srini Penchikala, the lead editor for AI, ML, and Data Engineering at InfoQ, and I’ll be moderating our discussion. Let’s start by welcoming our guests. First, we have Namee Oberst. Hello Namee, we appreciate you being here. Could you please introduce yourself and share what you’ve been focusing on recently?
Namee Oberst: Hello, thank you for the warm welcome. It’s a pleasure to join you today. I’m Namee Oberst, founder of LLMware, an open-source library dedicated to developing LLM-based applications for RAG and AI agents, utilizing small, specialized language models. Our platform also includes over 50 fine-tuned models available on Hugging Face.
Srini Penchikala: Thank you, Namee. Mandy, it’s great to have you with us too. Could you introduce yourself to our listeners?
Mandy Gu: Hi, thanks so much for having me. I’m super excited. So my name is Mandy Gu. I am lead machine learning engineering and data engineering at Wealthsimple. Wealthsimple is a Canadian FinTech company helping over 3 million Canadians achieve their version of financial independence through our unified app.
Srini Penchikala: Next up, Roland.
Roland Meertens: Hey, I’m Roland, leading the datasets team at Wayve. We make self-driving cars.
Srini Penchikala: Anthony, how about you?
Anthony Alford: Hi, I’m Anthony Alford. I work as a director of software development at Genesis Cloud Services.
Srini Penchikala: And Daniel?
Daniel Dominguez: Hello, I’m Daniel Dominguez. I manage an offshore company focusing on cloud computing within the AWS Department Network. Additionally, I am an AWS Community Builder and also delve into machine learning.
Srini Penchikala: Thank you everyone. Let’s begin. Today, we’ll delve into the AI and ML landscape, discussing our current position and, crucially, our future direction, particularly given the rapid evolution of AI technology since last year’s trends report. Before we kick off today’s podcast topics, just a bit of housekeeping for our listeners. There are two primary aspects to these reports. The first is this podcast, which offers a chance to hear from a panel of experts about how cutting-edge AI technologies are revolutionizing the industry. The ensuing part is a detailed article available on the InfoQ website. It will present a trends graph indicating various stages of technology adoption and elaborate on the technologies that have been introduced or revised since the previous year’s report.
I recommend everyone to definitely check out the article as well when it’s published later this month. Now back to the podcast discussion. It all starts with ChatGPT, right? ChatGPT was introduced about a year and a half ago, early last year. Since that time, the fields of generative AI and large language model technologies have been advancing rapidly, showing no signs of slowing down. Consequently, all major tech players have been actively rolling out their AI solutions. For instance, at this year’s Google I/O Conference, Google unveiled significant updates including Google Gemini and enhancements to AI-driven search functionalities, expected to revolutionize traditional search methods. Meanwhile, OpenAI launched GPT-4o around that period, a versatile omni model capable of processing audio, vision, and text simultaneously—a truly multi-modal system.
Simultaneously, Meta introduced Llama 3, followed by the more recent Llama version 3.1, equipped with 405 billion parameters—a staggering increase reflecting the rapid growth in model complexities. Open-source projects such as Ollama are also gaining significant traction. The spotlight remains on large language models (LLMs), trained on extensive datasets, enabling them to comprehend and generate human-like text and support diverse tasks — a perfect subject to start this year’s trend report discussions. Anthony, having tracked LLM advancements closely, could you share insights on the current state of generative AI and LLMs, including key developments our listeners should be aware of?
Anthony Alford: Absolutely, if I had to encapsulate the essence of LLMs in a single term, it might be “more” or perhaps “scale”. Presently, we’re in the era dominated by LLMs and foundational models, with OpenAI likely being at the forefront. However, other significant contributors like Google, with their Claude model from Anthropic, play crucial roles, though their models are generally proprietary. Contrarily, Meta appears to be altering this landscape towards more openness. Notably, Mark Zuckerberg recently declared, “The future of AI is open.” Hence, Meta and Mistral have ventured to make their models’ weights accessible. Although OpenAI doesn’t share weight data, it still publishes extensive technical details. For instance, the inaugural GPT-3 supported 175 billion parameters; although specific details for its successor, GPT-4, remain undisclosed, the prevailing trend suggests a considerable escalation in parameters, datasets, and computational budgets.
Another persistent trend likely to progress is the initial, vast pre-training—essentially utilizing the entire web as a dataset—followed by precise fine-tuning, a breakthrough originally spearheaded by ChatGPT. This training paradigm, known as instruction tuning, is now widely adopted. Transitioning to another significant area of development, the context length—measurable by the data quantity a model can analyze for responding—continues to expand. This is in contrast to newer SSMs like Mamba, potentially unaffected by context length limitations. Mandy, any input on this?
Mandy Gu: Yes, I mean I think that’s definitely a trend that we’re seeing with longer context windows. And originally when ChatGPT, when LLMs first got popularized, this was a shortcoming that a lot of people brought up. It’s harder to use LLM at scale or as more as you called it when we had restrictions around how much information we can pass through it. Earlier this year, Gemini, the Google Foundation, this GCP foundational models, they introduced the one plus million context window length and this was a game changer because in the past we’ve never had anything close to it. I think this has sparked the trend where other providers are trying to create similarly long or longer context windows. And one of the second order effects that we’re seeing from this is around accessibility. It’s made complex tasks such as information retrieval a lot simpler. Whereas in the past we would need a multi-stage retrieval system like RAG, now it’s easier, although not necessarily better, we could just pass all those contexts into this one plus million context window length. So that’s been an interesting development over the past few months.
Anthony Alford: Namee, did you have anything to add there?
Namee Oberst: Well, we specialize in using small language models. I understand the value of the longer context-length windows, but we’ve actually performed internal studies and there have been various experiments too by popular folks on YouTube where you take even a 2000 token passage and you pass it to a lot of the larger language models and they’re really not so good at finding the lost in the middle problem for doing passages. So if you really want to do targeted information search, it’s still the longer context windows are a little misleading I feel like sometimes to the users because it makes you feel like you can dump in everything and find information with precision and accuracy. But I don’t think that that’s the case at this point. So I think really well-crafted RAG workflow is still the answer.
And then basically for all intents and purposes, even if it’s like a million token context lines or whatever, it could be 10 million. But if you look at the scale of the number of documents that an enterprise has in an enterprise use case, it probably still doesn’t move the needle. But for a consumer use case, yes, definitely a longer context window for a very quick and easy information retrieval is probably very helpful.
Anthony Alford: So, you’re suggesting there might be diminishing returns in some cases?
Namee Oberst: Absolutely, it greatly depends on the scenario. Take the situation where you have to analyze thousands of documents. In cases like these, an extensive context window doesn’t really add value. Numerous studies have illustrated that LLMs are not really efficient search tools for pinpointing exact details. Personally, I would not typically recommend using longer context LLMs over something like RAG. There are better methods available for information retrieval. However, the advantage of a longer context window becomes apparent when dealing with large documents that exceed the scope of a smaller context window. For instance, transforming a lengthy Medium article into a detailed white paper is something I find highly effective and a strong use case for LLMs.
Anthony Alford: Speaking of RAG, or retrieval augmented generation, could we delve deeper into its benefits? It appears to tackle the issue of context length effectively.
Namee Oberst: Definitely, I strongly support using it. Looking at the options Hugging Face offers, along with benchmark tests, the performance is outstanding. The rate of innovation in open-source models is equally remarkable. Taking into account something like GPT-4o’s inference speed and its broad utility, it’s clear it has significant potential to serve diverse needs effectively.
But if you’re looking at an enterprise use case where you have very specific workflows and you’re looking to solve a very targeted problem, such as automating a specific workflow or automating report generation, or enriching information retrieval within a set framework of documents, open source models or smaller, specialized language models can be tailored and utilized effectively. These models can be operated with enhanced privacy and security on enterprise private clouds and can be deployed on edge devices as well. The potential of smaller, specialized models for specific tasks is significant.
Srini Penchikala: Yes, I experimented with an open source solution called Ollama a few months back for a specific application and found that self-hosted models, which do not require sending data to the cloud, are very effective. Using self-hosted models with Retrieval-Augmented Generation (RAG) techniques, which are particularly useful for managing proprietary information, is becoming increasingly popular in the corporate sector. This approach allows companies to leverage powerful tools without compromising data security.
Roland Meertens: I believe that currently many companies begin by implementing solutions from OpenAI to demonstrate their business utility. Afterwards, they explore how to integrate these technologies into their applications. It’s advantageous that one can start swiftly with existing solutions and later build a tailored infrastructure to better support their specific needs.
Srini Penchikala: Exactly, Roland. It’s about scaling up efficiently, right? Identifying and utilizing the most effective scale-up model is crucial for success.
Roland Meertens: Yes.
Srini Penchikala: Yes. Let’s continue the LLM discussions, right? Another area is the multi-model LLMs, the GPT-4o model, the omni model. So where I think it definitely takes the LLMs to the next level. It’s not about text anymore. We can use audio or video or any of the other formats. So anyone have any comments on the GPT-4o or just the multi-model LLMs?
Namee Oberst: In preparation for today’s podcast, I actually did an experiment. I have a subscription to GPT-4o, so I actually just put in a couple of prompts this morning, just out of curiosity because we’re very text-based, so I don’t actually use that feature that much. So I asked it to generate a new logo for LLMware, like for LLMware using the word, and it failed three times, so it botched the word LLMware like every single time. So having said that, I know it’s really incredible and I think they’re making fast advances, but I was trying to see where are they today, and it wasn’t great for me this morning, but I know that of course they’re still better than probably anything else that’s out there having said that, before anybody comes for me.
Roland Meertens: In terms of generating images, I must say I was super intrigued last year with how good Midjourney was and how fast they were improving, especially the small size of the company. That a small company can just beat out the bigger players by having better models is fantastic to see.
Mandy Gu: I think that goes back to the theme, Namee was touching on it, where big companies like OpenAI, they’re very good at generalization and they’re very good at getting especially new people into the space, but as you get deeper, you find that, as we always say in AI machine learning, there’s no free lunch. You explore, you test, you learn, and then you find what works for you, which isn’t always one of these big players. For us, where we benefited the most internally from the multi-modal models is not from image generation, but more so from the OCR capabilities. So one very common use case is just passing in images or files and then being able to converse with the LLM against, in particular, the images. That has been the biggest value proposition for us and it’s really popular with our developers because a lot of the times when we’re helping our end users, where our internal teams debug, they’ll send us a screenshot of the stack trace or a screenshot of the problem and being able to just throw that into the LLM as opposed to deciphering the message has been a really valuable time saver.
So not so much image generation, but from the OCR capabilities, we’ve been able to get a lot of value.
Srini Penchikala: That makes sense. When you take these technologies, OpenAI or anyone else, it’s not a one-size-fits-all when you introduce the company use cases. So everybody has unique use cases.
Daniel Dominguez: I think it’s interesting that I think now we mentioned all the Hugging Face libraries and models that are right now, for example, I’m thinking and looking right now in Hugging Face, there are more than 800,000 models. So definitely it’ll be interesting next year how many new models are going to be out there. Right now the trends are, as we mentioned, Llama, Google Gemma, Mistral models, Stability models. So definitely in one year, how many new models are going to be out there, not only on text, but also on images, also on video? So definitely there’s something that, I mean it will be interesting to know how many models were last year actually, but now it could be an interesting number to see how many new models are going to be next year in this space.
Srini Penchikala: Yes, good point, Daniel. Just as application servers emerged about 20 years ago, I believe many current systems will undergo consolidation, leaving only a few to stand the test of time. Now, let’s discuss RAG, which you mentioned. It appears to be a prime opportunity for businesses to integrate their data, whether stored locally or in the cloud, to leverage LLM models for insights. Can you envision any real-world applications of RAG that might interest our listeners?
Mandy Gu: Indeed, I see RAG as one of the most practical applications of LLMs at scale. Its effectiveness hinges on how the retrieval system is designed, allowing for a variety of implementations. At our company, we extensively use RAG through a proprietary tool that integrates our self-managed LLMs with all corporate knowledge bases. We utilize Notion for documentation, GitHub for code, and combine these with public records from our help center and other platforms.
We’ve established a retrieval augmented generation system, which daily aggregates data from these resources into our vector database. This setup allows staff to query or command the system through a web application, significantly enhancing accuracy and relevance over traditional methods, like using the context window alone in models such as Gemini 1.5 series. Primarily, this technology has significantly boosted employee efficiency with several compelling use cases for RAG.
Namee Oberst: Mandy, that is a fantastic example of a well-implemented enterprise project, harnessing the full potential of LLMs. You mentioned hosting these LLMs yourselves—did you use an open-source model? You need not delve into specifics, but such information would certainly highlight this exemplary application of generative AI.
Mandy Gu: Yes, indeed. Initially, we adopted open source models primarily from Hugging Face when building our LLM platform to offer a secure and accessible way for our employees to engage with this technology. Initially relying on OpenAI, we implemented a PII redaction system to safeguard our sensitive data. However, feedback revealed that this redaction model hindered potent generative AI use cases involving sensitive information essential for everyday tasks. This feedback prompted us to shift focus from merely preventing sensitive information sharing with external providers to enabling safe sharing with self-hosted LLMs.
Namee Oberst: That approach is truly remarkable, Mandy. It resonates with our objectives at LLMware, where we employ small language models at the back-end for efficient and secure inferencing. You mentioned Ollama, but we use Llama.cpp in our platform to facilitate easy handling of quantized models. I also envision a future where LLMs are downscaled to fit into laptops and private clouds, streamlining deployment and enhancing security. Your forward-thinking strategy is impressive and seems to set a benchmark in the industry.
Mandy Gu: I appreciate your insights, Namee. Discussing Llama.cpp, I find it fascinating how effectively a relatively small team has managed to implement this framework at scale. While the switch to more quantized models might lose some precision, the gains in speed and reduced latency are substantial, proving beneficial for rapid experimentation and deployment.
Namee Oberst: The achievements with Llama.cpp are indeed inspiring. Optimized for Mac Metal and NVIDIA CUDA, we aim to extend its utility across diverse operating systems like Microsoft ONNX and Intel OpenVINO, reaching beyond the predominantly Mac-dominated environments. Such advancements promise a future where deploying these technologies across varied platforms and maximizing GPU capabilities becomes more streamlined and impactful.
Srini Penchikala: Yes, there’s plenty of interesting developments in the area. Both of you touched upon smaller language models and edge computing. Let’s delve deeper into that topic. While large language models are frequently discussed, I’m curious about your insights on smaller language models. Namee, you mentioned your company, LLMWare, is working on a model known as RAG designed specifically for smaller language models. Could you elaborate on this emerging field? I understand that companies like Microsoft are experimenting with what they term a Phi-3 model. How do these differ from other models? What should our listeners know to stay current with developments in small language models?
Namee Oberst: Absolutely, we’ve been at the forefront with small language models, dedicating significant time to them for over a year now—almost arriving too early to this trend. The reality is, RAG is not a brand-new concept; it has been part of the machine learning discourse for a few years now. Our journey with RAG commenced quite early at the company, leading us to innovate with smaller parameter models. The results were astonishing—these models are not only powerful but also provide substantial performance advantages while ensuring data safety and security, which is crucial. This is especially relevant for heavily regulated sectors, which was always at the forefront of my thinking, stemming from my experience as a corporate attorney and general counsel for a public insurance brokerage firm.
Thus, employing smaller models in such industries becomes quite straightforward, as they also offer notable cost benefits. The need for hefty models is diminishing as smaller models reduce both operational scale and expense. The latest developments, like Microsoft’s Phi-3 and the RAG-tuned models available on Hugging Face, are exemplary. We’ve rigorously compared these models using uniform datasets to fine-tune over twenty models, and surprisingly, Phi-3 has surpassed even 8 billion parameter models in our tests, leading in accuracy—an incredible feat.
The pace at which these small language models are being refined and made freely available on platforms like Hugging Face is truly stunning. They are evolving rapidly, and I firmly believe that they will soon become as ubiquitous as standard software solutions, perfectly suited for deployment on edge devices. This is an exhilarating prospect and underpins my earlier comment on the swift and significant progress in this domain.
Srini Penchikala: Yes, definitely a lot of the use cases include a combination of offline large language model processing versus online on the device closer to the edge real time analysis. So that’s where small language models can help. Roland or Daniel or Anthony, do you have any comments on the small language models? What are you guys seeing in this space?
Anthony Alford: Yes, exactly. Microsoft’s Phi or Phi, I think first we need to figure out which one that is, but definitely that’s been making headlines. The other thing, we have this on our agenda and Namee, you mentioned that they’re getting better. The question is how do we know how good they are? How good is good enough? There’s a lot of benchmarks. There’s things like MMLU, there’s HELM, there’s the Chatbot Arena, there’s lots of leader boards, there’s a lot of metrics. I don’t want to say people are gaming the metrics, but it’s like p-hacking, right? You publish a paper that says you’ve beat some other baseline on this metric, but that doesn’t always translate into say, business value. So I think that’s a problem that still is to be solved.
Namee Oberst: Yes, no, I fully agree. Anthony, your skepticism around the public..
Anthony Alford: I’m not skeptical.
Namee Oberst: No, I actually, I’m not crazy about them. So we’ve actually developed our own internal benchmarking tests that are asking some common sense business type questions and legal questions, just fact-based questions because our platform is really for the enterprise. So in an enterprise you really care less so about creativity in this instance, but just about how well are these models able to answer fact-based questions and basic logic, basic math, like yes or no questions. So we actually created our own benchmark testing and so the Phi-3 result is off of that because I’m skeptical of some of the published results because, I mean, have you actually looked through some of those questions like on HellaSwag or whatever? I can’t answer some of that. I am not an uneducated person. I don’t know what the right or wrong answer is sometimes either. So we decided to create our own testing and the Phi-3 results that we’ve been talking about are based on what we developed and I’m not sponsored by Microsoft. I wish they would, but I’m not.
Srini Penchikala: Definitely, I want to get into LLM evaluation shortly, but before we go there, any language model thoughts?
Roland Meertens: One thing which I think is cool about Phi is that they trained it using higher quality data and also by generating their own data. For example, for the coding, they asked it to write instructions for a student and then trained on that data. So I really like seeing that if you have higher quality data and you select the data you have better, you also get better models.
Anthony Alford: “Textbooks Are All You Need”, right? Was that the paper?
Roland Meertens: “Textbooks Are All You Need” is indeed the title of the publication, but there are also various contributions from Hugging Face affiliates on “SantaCoder: don’t reach for the stars!”. The focus now is on the type of data fed into these models, a somewhat neglected aspect of machine learning.
Srini Penchikala: Apart from Phi, which I believe is correct, I recall Daniel touching upon TinyLlama. Could you elaborate more on these smaller scale language models?
Daniel Dominguez: Absolutely. As Namee highlighted, a plethora of these smaller language models are operational on Hugging Face, presenting several new discoveries. Another intriguing aspect at Hugging Face is the distinction between poor and rich GPU capabilities displayed on the leaderboard. Regardless of your system’s GPU strength, these models are functional. Notably, thanks to advancements from companies such as NVIDIA, these models efficiently operate not only in the cloud but also on less powerful local systems. This makes it feasible to run small language models on basic devices like an Arduino.
Srini Penchikala: Absolutely. Before we conclude our discussion on language models, I’d like to touch on something important—beyond the traditional benchmarks, which we sometimes take with skepticism, are there effective real-world practices for evaluating these models? With so many options available, how can newcomers to this field make informed decisions about which language models to use or discard?
Mandy Gu: It’s crucial to consider the business relevance when evaluating these models, as Anthony suggested earlier. General benchmark tests often don’t provide all the answers. Instead, evaluating a language model should involve testing its performance on specific applications. For example, if the task involves summarizing a research paper or simplifying complex terminology, the evaluation should focus on the model’s effectiveness in these particular areas. This approach aligns with the concept that no single model excels at every task. Through deliberate trial and error, we can identify the most suitable models or techniques for specific tasks, ultimately basing our judgments on the outcome’s success against the predefined criteria.
Srini Penchikala: We will include links to relevant benchmarks and leaderboards in our transcript. Now, let’s pivot to discussing AI agents, including developments in AI-powered coding assistants. Roland, can you share your observations in this area, particularly your experience with tools like Copilot?
Roland Meertens: Last year, I speculated that AI agents would become a prominent trend, but the progress hasn’t been overwhelming. Although OpenAI recently launched the GPT Store allowing the creation of customized agents, I haven’t seen widespread enthusiasm for any particular agent. However, there are emerging tools like Devin, an AI software engineer, which integrates a terminal, code editor, and browser to autonomously tackle assigned tasks. Despite a modest success rate, Devin represents a step forward in autonomous software engineering developments.
The other thing is that you’ve got some places like AgentGPT. I tried it out, I asked it to create an outline for the Trends in AI podcast and it was like, “Oh, we can talk about trends like CNNs and RNNs.” I don’t think those are the trends anymore, but it’s good that it’s excited about it. But yes, overall I think that there’s still massive potential for you want to do something, do it completely automatically instead of me trying to find out which email I should send using ChatGPT and then sending it and then the other person summarizing it and then writing a reply using ChatGPT, why not take that out and have the emails fly automatically?
Anthony Alford: My question is, what makes something an agent?
Roland Meertens: Yes, that’s a good question. So I think what I saw so far in terms of agents is something which can combine multiple tasks.
Anthony Alford: When I was in grad school, I was studying intelligent agents and essentially we talk about someone having agency. It’s essentially autonomy. So I think that’s probably the thing that the AI safety people are worried about is giving these things autonomy. And regardless of where you stand on AI doom, it’s a very valid point. Probably ChatGPT is not ready for autonomy.
Roland Meertens: It depends.
Anthony Alford: Yes, but very limited scope of autonomy.
Roland Meertens: It depends on what you want to do and where you are willing to give your autonomy away. I am not yet willing to put in autonomous Roland agent in my workspace. I think I wouldn’t come across very smart. It would be an improvement over my normal self, but I see that people are doing this for dating apps, for example, where they are automating that part. Apparently they’re willing to risk that.
Daniel Dominguez: As Roland said, they’re not yet on the big wave, but definitely there is something that is going to happen with them. For example, I saw recently also Meta and Zuckerberg said that the new Meta AI agents for small businesses, they’re going to be something that are going to help small business owners to automate a lot of things on their own spaces. Hugging Chat also has a lot of AI agents for daily workflows, for example, to do a lot of things. I know that also, for example, Slack now has a lot of AI agents to help summarize conversation or tasks or daily the workflows of whatever.
As we continue to evolve the AI landscape, it’s becoming clear that AI agents will increasingly become a natural fixture in small business environments. These agents, integrated within daily workflows, will significantly streamline various tasks. Companies will start to include these specialty agents on their platforms, and soon services like Google might begin incorporating AI functionalities into applications like Gmail to assist with daily operations.
Roland Meertens highlights the power of these AI agents by noting the ease with which they can integrate and automate functions. “If you hit a certain point in the workflow, the AI can take one route; if not, it diverts to another. It’s about using all these tools together seamlessly,” says Meertens.
Mandy Gu points out the convenience brought by embedding these AI agents into workspaces where people already spend their time, like Google Workspaces for Gmail. This integration means users don’t have to switch between applications to manage tasks, reducing effort and increasing efficiency. “Reducing the switches between platforms can dramatically cut down on the work involved, which is a key driver for wider adoption of such technologies,” Gu remarks.
Expanding on practical implications, Srini Penchikala suggests that beyond managing tasks, these AI agents could also advise on communication methods, like choosing between sending emails or making phone calls, potentially enhancing productivity further.
Roland Meertens: Reflecting on trends, it seems that last year marked the era where every company declared themselves as AI-driven, particularly emphasizing the adoption of Chatbots. Anecdotes abound, including one from a colleague who let ChatGPT craft a detailed argument for them, spanning three pages. To me, the appeal of personalized arguments wanes; I’d much rather browse a website directly. This leads me to ponder the future dynamics—will every digital interaction be mediated by Chatbot interfaces, or will there remain straightforward methods to, say, check a book’s price without conversational intermediaries?
Srini Penchikala: Surely, we must avoid too much reliance on automated agents in our applications, right?
Roland Meertens: Indeed, a prudent advice would be not to let automated agents overrun your daily interactions.
Srini Penchikala: Absolutely, let’s proceed further. Anthony, your insights on AI safety were intriguing. Diving deeper into this subject, Namee and Mandy, you’re both immersed in numerous practical projects. How do you balance the pursuit of innovative solutions while ensuring that these technologies are secure and respect user privacy and data?
Mandy Gu: With the advent of generative AI, the realm of security is witnessing numerous indirect consequences, such as concerns about data privacy escalating due to fourth party data sharing. Many SaaS providers incorporate AI functionalities, often transferring data to entities like OpenAI without explicit clarity. The sensitivity of this data necessitates vigilance. Key considerations include establishing a detailed understanding of data flows, which is increasingly complex due to AI integrations, and ensuring that we pave the easiest path for our employees to follow stringent data privacy protocols.
Referring back to a previous discussion, implementing an overly stringent PII redaction system could deter usage, pushing individuals towards solutions like ChatGPT. Instead, providing appealing alternatives and incentives can foster easier compliance and promote a culture of robust data privacy practices among internal users.
Namee Oberst: Absolutely, Mandy. The model you outlined highlights the critical nature of design in generative AI workflows within enterprises for ensuring data security. It’s imperative to consider when PII should be triggered, the possibility of vendors inadvertently sharing sensitive data with providers like OpenAI, and the necessity of tracing data lineage. Ensuring that workflows are auditable to monitor all data interactions and AI explainability is crucial. Also essential is recognizing potential vulnerabilities, such as prompt injection, though smaller language models might be less susceptible. Enterprises must thoroughly evaluate these aspects when deploying AI, as these factors significantly impact data security and safety, reinforcing many points you raised, Mandy.
Mandy Gu: I appreciate your focus on attack surfaces, which can indeed become unmanageable. The analogy of generative AI to the cable versus streaming dynamic effectively illustrates the issue with multiple AI integrations. It’s akin to subscribing to Netflix, Hulu, and other services simultaneously, which is not just costly but also broadens the attack surfaces. This highlights the necessity for a thoughtful approach within the build vs. buy strategy, paying special attention to data management and acquisition costs.
Increasing awareness about these challenges is becoming evident. Vendors and SaaS providers are beginning to accommodate security concerns, indicating a shift towards hosting on private client platforms like AWS or GCP’s VPCs. This protects data by keeping it within a client’s cloud environment, showing a promising trend towards enhanced security awareness.
Namee Oberst: Absolutely.
Srini Penchikala: Security aside, managing LLMs and AI technologies in production is another crucial topic. Could we briefly discuss LangOps or LLMOps? Mandy, perhaps you could start the discussion on how we should be supporting LLMs in production and share some insights from your experiences?
Mandy Gu: Absolutely. At WealthSimple, we approach our LLM strategies through three distinct avenues. The first focuses on augmenting employee productivity. The second involves refining our operations to better serve our clients. The third pillar is foundational, which we refer to as LLMOps or LLM platform tasks that support the other two areas. Our philosophy fosters enablement, emphasizing security, accessibility, and optionality. This approach helps us avoid common pitfalls, such as using LLMs inappropriately, by providing versatile, reusable platform elements. This, in turn, encourages more organic adoption and adaptation of Gener AI technologies.
Our journey has evolved with experience. Initially, we created an LLM gateway featuring an audit trail and a system for PII redaction, allowing safe interactions with OpenAI and similar services. Feedback indicated that the PII redaction limited many practical applications, prompting us to support self-hosted models. This shift enabled us to integrate open-source models into our platform for broader inferencing capabilities. We also developed a reusable API for retrieval processes and enhanced the accessibility surrounding our vector database. As we continued to enhance our platform components, our end users, including scientists and developers, began exploring and integrating LLMs into their workflows, which we then helped scale and deploy.
Srini Penchikala: Thank you, Mandy. This conversation has been enlightening. As we conclude, I’d like to hear your thoughts on future AI developments. What predictions do you have for the AI landscape over the next year?
Mandy Gu: The current enthusiasm surrounding LLMs is likely to become more tempered. Although the recent period has seen explosive growth, many organizations still view LLMs as speculative investments. Over the next year, I anticipate a shift towards setting more realistic expectations and a broader integration of this technology into existing workflows, leading to less hype but more substantial, practical implementations.
Srini Penchikala: Daniel, how about you?
Daniel Dominguez: I believe that with the surge of data produced by artificial intelligence, we’ll see its integration with technologies like blockchain. Many blockchain projects already incorporate AI for data processing. The combination of artificial intelligence and blockchain is still emerging, but it promises significant advancements particularly in how data is managed and utilized in databases. It’s an area where we’re just scratching the surface, but I expect to see substantial developments in the fusion of AI and blockchain technology.
Srini Penchikala: What about you Roland?
Roland Meertens: My focus is on robotics, which is now evolving into what we call embodied AI. This shift has been unfolding since last year. Embodied AI, unlike traditional AI, interacts with the physical world. Imagine AI agents not just running software but operating robots to perform physical tasks like fetching items. This practical application of AI in robotics, to enhance general behavior and task execution, is what I foresee as the next significant leap in technology.
Srini Penchikala: So these robots will then be your hired programmers, correct?
Roland Meertens: Actually, not exactly. There will be agents acting as your co-programmers, while robots assist in other aspects of your life. Another point of interest is the use of proprietary data by companies. With the wealth of data they possess, I wonder whether companies will refine their models with this data and market them, or whether they’ll continue using existing frameworks like RAG. Consider a gardener, for instance, who has accumulated extensive photos and notes on garden care over the years. There are numerous small businesses with valuable data like this—how will they leverage it to develop their tools, such as chatbots or automation processes using AI?
Srini Penchikala: Anthony, what about your insights?
Anthony Alford: Regarding the AI winter—Mandy touched on the possibility of reduced hype, which could be seen as a preliminary phase of an AI winter. More critically, there have been discussions, supported by a study reported in Nature, about the deterioration quality when AI is trained on content produced by other AI. This raises concerns about the potential ‘pollution’ of the internet with AI-generated content. It’s a scenario I’m not confident about, but one where I would prefer to be mistaken, and would gladly accept being wrong.
Srini Penchikala: No, it is very possible, right? And how about you Namee? What do you see as a prediction in the next 12 months?
Namee Oberst: So I foresee a little bit of what Anthony and Mandy described, but actually then moving on very, very quickly to the much more valuable, realistic and tangible use cases, probably involving more automated workflows and the agent work processes and then moving into more edge devices like laptops and even phones. So that’s what I’m foreseeing. So we shall see. It’ll be interesting.
Srini Penchikala: Yes, it’ll be interesting. Yes, that’s what I’m seeing as well. So it’ll be more unified, end-to-end, holistic, AI-powered solutions with these small language models, RAG, the AI-powered hardware. So I think a lot of good things are happening. I think hopefully, Anthony, the AI winter won’t last for too long. That’s the podcast for today. Anybody have any concluding remarks?
Namee Oberst: It was so fun to be on this podcast. Thank you so much for having me. I really enjoyed the experience here.
Anthony Alford: Ditto. Loved it.
Mandy Gu: Yes.
Roland Meertens: I especially like seeing our podcast over the years. If you go back to, I think we started this in 2021, maybe. It’s always fun to see how our predictions change over the years and how our topics change over the years.
Srini Penchikala: I want to thank you all the panelists for joining, participating in this discussion for 2024 AI and ML Trends Report and what to look forward to in this space for the remainder of this year and next year. To our audience, we hope you all enjoyed this podcast. Hope this discussion has offered a good roundup update on the emerging trends and technologies in the AI and ML space. Please visit infoq.com and download the trends report with an updated version of the adoption graph that will be available along with this podcast recording that will show what trends and technologies and topics are becoming more mature in terms of adoption and what are still in the emerging phase.
I hope you join us again soon for another episode of the InfoQ Podcast. Explore our past discussions on fascinating subjects such as architecture and design, cloud, DevOps, and AI, ML and Data Engineering topics in the podcast section of infoq.com website. Thank you everyone, and have a great one. Until next time.
Mentioned
Communication Patterns for Architects and Engineers with Jacqui Read
Justin Sheehy on Being a Responsible Developer in the Age of AI Hype
Decentralizing Decision-Making with Shawna Martell & Dan Fike
Meryem Arik on LLM Deployment, State-of-the-Art RAG Apps, and Inference Architecture Stack
Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.