Graph of Thought Prompting in LLMs


Understanding the Graph of Thoughts in Large Language Models

Hey there! In this article, we'll dive into the concept of Graph of Thoughts, a method that builds on the Chain of Thought (CoT) approach to improve reasoning in large language models (LLMs). 



 What is Graph of Thoughts?


Graph of Thoughts (GoT) is a method that leverages graph data structures to mimic the human thinking process. Unlike traditional sequential or random approaches, human thought is often non-linear and interconnected, making graphs a more suitable representation.



Improvement Over Chain of Thought


Chain of Thought (CoT) is a technique where a complex reasoning task is broken down into smaller, sequential steps. Each step generates a rationale that is appended to the original query, and the process is repeated in an auto-regressive manner. Google's PaLM model, which has 540 billion parameters, used CoT and saw significant improvements in reasoning tasks.

However, GoT takes this a step further by using graph data structures to encode information. This approach has led to a 3.41% to 5.08% improvement over the earlier CoT baselines on specific benchmarks.



How Graph of Thoughts Works

 Rationale Generation and Answer Generation

In a GoT setting, the process involves two main steps:


1. **Rationale Generation**: During each forward propagation, different modalities of data (text, images, graphs) are encoded into vector representations. The attention mechanism, similar to that in transformers, is adapted for graphs and text. This is followed by a cross-attention step, where parameters are shared between different modalities.


2. **Answer Generation**: The generated sequence of vectors is sent to a fusion layer, which combines the different modalities of data. The fused data is then processed by a transformer decoder to generate a rationale. This rationale is appended to the input text and sent to the GoT constructor, which generates a logical graph. This process is repeated multiple times, typically 5-6 times, depending on the model's requirements.



 Example: Classifying Prominent Locations

Let's consider a hypothetical example using Uber data. Suppose we have a dataset of 1,000 rides with pickup and drop-off locations. We want to classify the most prominent locations in the city to optimize ride scheduling.


1. **Graph Construction**: We generate a graph using the latitude and longitude of the locations.


2. **Graph Encoding**: This graph is sent to a transformer-based graph encoder, which uses the adjacency matrix to perform attention mechanisms.

3. **Classification**: The encoder classifies the most prominent locations. For instance, the CN Tower might have a weight of 0.91, while less prominent locations have lower weights.

This is a simplified example of how graph attention works. There are other types of graph attention, such as spectral domain attention, which are even more integrated and advanced.



Results and Benchmarks

 Performance on Benchmarks

The GoT model has shown significant improvements over previous multimodal CoT models. Despite having fewer parameters (less than a billion), it outperforms models like GPT-4 and ChatGPT, which have hundreds of billions of parameters. This is evident from the benchmarking tests, where GoT consistently beats larger models.



Generalization to Unseen Data

The GoT model is more generalizable to unseen data compared to multimodal CoT models. This is because the additional modality (graph) allows the model to capture and represent information more effectively, leading to better performance on new tasks.



Limitations

 Computational Cost

One limitation of GoT is the additional computational cost and slightly slower training time. This is due to the graph reconstruction that occurs during each rationale generation step. The reconstruction process has a complexity of \(O(n^2)\) because the graph is represented as an adjacency matrix. However, this increased complexity is a trade-off for better reasoning and generalization.

Future Potential

Integration with Knowledge Graphs

There is significant potential for LLMs to be enhanced by knowledge graphs and graph data structures. Future research is likely to focus on improving the efficiency of graph operations and exploring the spectral domain of graphs. These advancements could lead to even more powerful and efficient LLMs.



Spectral Domain Research

Currently, there is limited research into the spectral domain of graphs. However, as more studies are conducted, we can expect to see further improvements in how LLMs handle and process graph data. This could result in better reasoning capabilities and more efficient models.

Conclusion

Graph of Thoughts represents a promising advancement in the field of LLMs. By leveraging graph data structures, it better mimics the non-linear and interconnected nature of human thought, leading to improved reasoning and generalization. While there are computational challenges, the potential benefits make it a hot topic in AI research. Expect to see more developments in this area in the coming years, as LLMs continue to evolve and become more sophisticated.

Thank you for reading! If you're interested in the technical details, I encourage you to dive into the original research papers.








Comments

Popular posts from this blog

Video From YouTube

GPT Researcher: Deploy POWERFUL Autonomous AI Agents

Building AI Ready Codebase Indexing With CocoIndex