精心设计的提示对于增强大型语言模型 (LLM) 的推理能力,同时使其输出与不同领域的任务要求保持一致至关重要。但是,手动设计的提示需要专业知识和迭代实验。虽然现有的提示优化方法旨在自动化此过程,但它们严重依赖外部参考(如真实数据)或人工参考,从而限制了它们在无法获得此类数据或获取成本高昂的真实场景中的适用性。为了解决这个问题,我们提出了自我监督提示优化 (SPO),这是一个经济高效的框架,可以发现封闭式和开放式任务的有效提示,而无需外部参考。受到提示质量直接体现在 LLM 输出中并且 LLM 可以有效评估对任务要求的遵守情况的观察的启发,我们仅从输出比较中得出评估和优化信号。具体来说,SPO 通过 LLM 评估器评估的成对输出比较来选择高级提示,然后是使输出与任务要求保持一致的 LLM 优化器。大量实验表明,SPO 优于最先进的快速优化方法,以显著降低的成本(例如,现有方法的 1.1% 至 5.6%)和更少的样品(例如,三个样品)获得相当或更好的结果。此 https URL 上提供了该代码。
Well-designed prompts are crucial for enhancing Large language models’ (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually designed prompts require expertise and iterative experimentation. While existing prompt optimization methods aim to automate this process, they rely heavily on external references such as ground truth or by humans, limiting their applicability in real-world scenarios where such data is unavailable or costly to obtain. To address this, we propose Self-Supervised Prompt Optimization (SPO), a cost-efficient framework that discovers effective prompts for both closed and open-ended tasks without requiring external reference. Motivated by the observations that prompt quality manifests directly in LLM outputs and LLMs can effectively assess adherence to task requirements, we derive evaluation and optimization signals purely from output comparisons. Specifically, SPO selects superior prompts through pairwise output comparisons evaluated by an LLM evaluator, followed by an LLM optimizer that aligns outputs with task requirements. Extensive experiments demonstrate that SPO outperforms state-of-the-art prompt optimization methods, achieving comparable or superior results with significantly lower costs (e.g., 1.1% to 5.6% of existing methods) and fewer samples (e.g., three samples). The code is available at this https URL.