From the course: RAG and Fine-Tuning Explained

Knowledge graphs

From the course: RAG and Fine-Tuning Explained

Knowledge graphs

- Okay, looking at this embeddings example, you may have spotted an inherent weakness, and it becomes more obvious if we add in a larger data set that includes things that are not related to cooking. In many cases, especially in the enterprise, domain data is expansive and covers many different subject matters, and disciplines, and context. This passes a significant challenge for embeddings because our language is semantically, syntactically, and lexically ambiguous. For example, the same word can mean different things in different contexts. Let me show you, let's say we have one embedding for use, tomato, paste, and another embedding for use, tooth, paste. These embeddings look nothing alike, but there are two common words in there, use and paste. Now let's say the user puts in a very vague prompt, "What paste do I use to make pasta?" Without further context, the embedding system will zero in on the words use and paste and find any embeddings that match. In return, we get tomato paste and toothpaste. I can think of exactly zero scenarios where both of these responses are correct, but the AI has neither knowledge nor understanding of the data or the query, so it returns both entities and the RAG-based completion becomes some nonsense about how to use toothpaste to add peppermint flavor to your pasta dish or something. The problem here is that while an embedding system can help us identify close matches in language, it has no understanding of the semantic meaning of the connections. Both toothpaste and tomato paste are pastes, but they have very different uses. To address this problem, we can lean on an ancient AI technology called knowledge graphs. Originally invented for search, knowledge graphs add direction and semantic meaning to the vectors connecting the dots in an embeddings map. So rather than having a vector that just says, "This word and this word are connected," the graph says maybe, "This word is part of this word or is used in this context." Remapping our examples using knowledge graphs, we add semantic context by connecting, for example, tomato and paste to cooking with an "is used in" graph and connecting tooth and paste to hygiene using a "is part of" graph. Now, when the system creates a knowledge graph of the query, "What paste do I use to make pasta?" it contains the added graph capturing the context "is used in cooking," and we get only tomato paste as a relevant match. Now seeing this, you may rightfully conclude that we should just use knowledge graphs for RAG all the time, and in practical reality, that's not always possible. In real-world environments, AI systems using RAG draw from a variety of sources, including traditional databases, APIs, embeddings, and advanced knowledge graph vectors. The best option for a particular use case comes down to many variables, including what type of data it is, how much work is involved in transforming the data into, for example, vector embeddings, how often the data updates, et cetera. The typical recommendation is to start with the most basic option, a straight-up database with an API, and explore more advanced options if the results are not satisfactory.

Contents