Self-Supervised Prompt Optimization | 自监督提示优化

https://www.semanticscholar.org/paper/Self-Supervised-Prompt-Optimization-Xiang-Zhang/9bbe3995bfe3740d5be29d56778d7434c572cd78

精心设计的提示对于增强大型语言模型 (LLM) 的推理能力,同时使其输出与不同领域的任务要求保持一致至关重要。但是,手动设计的提示需要专业知识和迭代实验。虽然现有的提示优化方法旨在自动化此过程,但它们严重依赖外部参考(如真实数据)或人工参考,从而限制了它们在无法获得此类数据或获取成本高昂的真实场景中的适用性。为了解决这个问题,我们提出了自我监督提示优化 (SPO),这是一个经济高效的框架,可以发现封闭式和开放式任务的有效提示,而无需外部参考。受到提示质量直接体现在 LLM 输出中并且 LLM 可以有效评估对任务要求的遵守情况的观察的启发,我们仅从输出比较中得出评估和优化信号。具体来说,SPO 通过 LLM 评估器评估的成对输出比较来选择高级提示,然后是使输出与任务要求保持一致的 LLM 优化器。大量实验表明,SPO 优于最先进的快速优化方法,以显著降低的成本(例如,现有方法的 1.1% 至 5.6%)和更少的样品(例如,三个样品)获得相当或更好的结果。此 https URL 上提供了该代码。

Well-designed prompts are crucial for enhancing Large language models’ (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually designed prompts require expertise and iterative experimentation. While existing prompt optimization methods aim to automate this process, they rely heavily on external references such as ground truth or by humans, limiting their applicability in real-world scenarios where such data is unavailable or costly to obtain. To address this, we propose Self-Supervised Prompt Optimization (SPO), a cost-efficient framework that discovers effective prompts for both closed and open-ended tasks without requiring external reference. Motivated by the observations that prompt quality manifests directly in LLM outputs and LLMs can effectively assess adherence to task requirements, we derive evaluation and optimization signals purely from output comparisons. Specifically, SPO selects superior prompts through pairwise output comparisons evaluated by an LLM evaluator, followed by an LLM optimizer that aligns outputs with task requirements. Extensive experiments demonstrate that SPO outperforms state-of-the-art prompt optimization methods, achieving comparable or superior results with significantly lower costs (e.g., 1.1% to 5.6% of existing methods) and fewer samples (e.g., three samples). The code is available at this https URL.

6 个赞

感谢推荐

感谢推荐

进来看看 。

看完了,感觉有点水
自监督优化,就是一个模型做生成,另一个模型做裁判,只不过 一直以来这是用来优化生成的,这回他们用来优化提示词 prompt 本身。
按这种思路,其实是预先假设了,模型本身ok,任务本身 接近固定,然后去优化prompt。可是现在的情况是,模型还在进化,任务也在优化,甚至大部分人还没有任务。
(一个简单的例子,比方说 模型对 talor swift 非常喜欢,在评价涉及prompt 会给出更高的评价,这当然就不算模型的特性,只能算和它的缺点对齐了)

进来学习,我来看看怎么个事