Contrastive Decoding Improves Reasoning in Large Language Models
Abstract Commentary & Rating
Published on Sep 16
Authors:Sean O'Brien,Mike Lewis
Abstract
We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.
Commentary
The paper titled "Contrastive Decoding Improves Reasoning in Large Language Models" presents an approach to improving text generation quality and reasoning capabilities in large language models.
Key Insights:
Contrastive Decoding: This method leverages the difference in likelihood between strong and weak models to generate text. Originally designed for improving long-form text generation, the authors demonstrate its value for reasoning tasks as well.
Significant Outperformance: Contrastive Decoding allows LLaMA-65B to surpass several other state-of-the-art models on specific reasoning benchmarks, such as the HellaSwag commonsense reasoning benchmark and the GSM8K math word reasoning benchmark.
Avoiding Errors: The analysis indicates that this method can help in avoiding some abstract reasoning errors. It also reduces simpler errors such as unnecessary copying of input sections during text generation.
Potential Real-World Impact:
Enhanced Text Generation: The method promises to improve the quality of text generated by large language models, making outputs more coherent, relevant, and reasoned.
Improved Reasoning: A better performance on reasoning tasks can have numerous applications ranging from more intelligent chatbots to tools that can assist professionals in various analytical tasks.
Wider Applicability: As a training-free method, Contrastive Decoding offers an advantage as it doesn't require additional computational resources for training.
Versatility: The approach seems versatile, showing improvements across both long-form generation and specific reasoning tasks.
Challenges:
Dependence on Weak Models: The effectiveness of Contrastive Decoding relies on the presence of both strong and weak models, which might not always be available or may vary in relative strength.
Given the novel approach to improve text generation and reasoning, as well as its demonstrated efficacy:
I'd rate the real-world impact of this paper as an 8.5 out of 10.
The method appears to offer a powerful, general-purpose technique for generating text from language models. If it can be broadly applied to a range of tasks and settings, its real-world impact could be considerable, especially in applications where reasoning capabilities of models are crucial.