始皇的新玩具4o＋TTS＋Whisper糊了个玩具

pipiwei · 2024 年7 月 31 日 14:02

始皇的玩具始皇的新玩具TTS＋Whisper糊了个玩具

打开Pandora魔盒，再撅OpenAI一次之：whisper + tts

天才般的始皇又又又把openai的服务给代理了，那么必须要马上去好好玩一下。

根据群友的提示，我们打开API测试网站：
https://hoppscotch.io/

可以去设置里改成中文

选择post，网址： https://api.oaifree.com/v1/audio/speech

请求体 body填入：
{
  "model": "tts-1-hd",
  "input": "虽然不是什么复杂的东西，虽然可能很快被人抄走，但那又如何？再打开一次Pandora魔盒，再撅一次OpenAI足矣！",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1
}
授权 Authorization 类型选择 Bearer

token 填入大佬分享的token
https://linux.do/t/topic/49662

愉快玩耍

image882×997 35.4 KB

代码

import PySimpleGUI as sg
import requests
import json
import tempfile
import os
import sounddevice as sd
import numpy as np
import wave
import pygame
import threading

class AudioRecorder:
    def __init__(self, sample_rate=44100, channels=1):
        self.sample_rate = sample_rate
        self.channels = channels
        self.recording = False
        self.frames = []

    def callback(self, indata, frames, time_info, status):
        if self.recording:
            self.frames.append(indata.copy())

    def start(self):
        self.recording = True
        self.frames = []
        self.stream = sd.InputStream(callback=self.callback, channels=self.channels, samplerate=self.sample_rate)
        self.stream.start()

    def stop(self):
        self.recording = False
        self.stream.stop()
        self.stream.close()
        return np.concatenate(self.frames, axis=0)

def text_to_speech(text, voice, auth_token):
    url = "https://api.oaifree.com/v1/audio/speech"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {auth_token}"
    }
    data = {
        "model": "tts-1-hd",
        "input": text,
        "voice": voice,
        "response_format": "mp3",
        "speed": 1
    }
    
    try:
        response = requests.post(url, headers=headers, data=json.dumps(data))
        response.raise_for_status()
        
        with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file:
            temp_file.write(response.content)
            temp_file_path = temp_file.name
        
        pygame.mixer.init()
        pygame.mixer.music.load(temp_file_path)
        pygame.mixer.music.play()
        while pygame.mixer.music.get_busy():
            pygame.time.Clock().tick(10)
        pygame.mixer.quit()
        
        os.unlink(temp_file_path)
    except Exception as e:
        print(f"语音转换错误: {e}")

def speech_to_text(audio_file_path, auth_token):
    url = "https://api.oaifree.com/v1/audio/transcriptions"
    headers = {
        "Authorization": f"Bearer {auth_token}"
    }
    files = {
        "file": open(audio_file_path, "rb"),
        "model": (None, "whisper-1")
    }
    
    try:
        response = requests.post(url, headers=headers, files=files)
        response.raise_for_status()
        return response.json()["text"]
    except requests.exceptions.RequestException as e:
        print(f"语音转文字错误: {e}")
        return None

def chat_with_ai(messages, auth_token):
    url = "https://api.oaifree.com/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {auth_token}",
        "Content-Type": "application/json"
    }
    data = {
        "messages": messages,
        "stream": False,
        "model": "gpt-4o",
        "temperature": 0.5,
        "max_tokens": 8000,
        "top_p": 1
    }
    
    try:
        response = requests.post(url, headers=headers, data=json.dumps(data))
        response.raise_for_status()
        response_data = response.json()
        
        if 'choices' in response_data and len(response_data['choices']) > 0:
            if 'message' in response_data['choices'][0] and 'content' in response_data['choices'][0]['message']:
                return response_data['choices'][0]['message']['content']
            else:
                print("意外的响应结构:")
                print(json.dumps(response_data, indent=2))
                return None
        else:
            print("未找到有效的响应内容:")
            print(json.dumps(response_data, indent=2))
            return None
    except requests.exceptions.RequestException as e:
        print(f"AI 对话错误: {e}")
        return None

def main():
    recorder = AudioRecorder()
    messages = []  # 用于存储对话历史
    
    voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    
    layout = [
        [sg.Text("语音助手", justification='center', font=('Any', 20))],
        [sg.Text("AUTH_TOKEN:"), sg.Input(key="-AUTH-", password_char='*')],
        [sg.Text("状态: 准备就绪", key="-STATUS-", justification='center')],
        [sg.Text("选择语音模型:"), sg.Combo(voices, default_value="echo", key="-VOICE-")],
        [sg.Button("开始录音", key="-RECORD-"), sg.Button("停止录音", key="-STOP-", disabled=True)],
        [sg.Text("您的问题:", justification='left')],
        [sg.Multiline(size=(50, 5), key="-USER-INPUT-", disabled=True)],
        [sg.Text("AI 回答:", justification='left')],
        [sg.Multiline(size=(50, 10), key="-AI-RESPONSE-", disabled=True)],
        [sg.Button("清除对话历史", key="-CLEAR-")]
    ]

    window = sg.Window("语音助手", layout, finalize=True, element_justification='center')

    recording = False

    while True:
        event, values = window.read(timeout=100)
        
        if event == sg.WINDOW_CLOSED:
            break
        
        if event == "-RECORD-" and not recording:
            recording = True
            window["-STATUS-"].update("正在录音...")
            window["-RECORD-"].update(disabled=True)
            window["-STOP-"].update(disabled=False)
            recorder.start()
        
        elif event == "-STOP-" and recording:
            recording = False
            window["-STATUS-"].update("正在处理...")
            window["-RECORD-"].update(disabled=True)
            window["-STOP-"].update(disabled=True)
            
            audio_data = recorder.stop()
            
            with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_file:
                temp_file_path = temp_file.name
                with wave.open(temp_file_path, 'wb') as wf:
                    wf.setnchannels(recorder.channels)
                    wf.setsampwidth(2)
                    wf.setframerate(recorder.sample_rate)
                    wf.writeframes((audio_data * 32767).astype(np.int16).tobytes())
            
            auth_token = values["-AUTH-"]
            user_input = speech_to_text(temp_file_path, auth_token)
            os.unlink(temp_file_path)

            if user_input:
                window["-USER-INPUT-"].update(user_input)
                messages.append({"role": "user", "content": user_input})
                window["-STATUS-"].update("AI 正在思考...")
                ai_response = chat_with_ai(messages, auth_token)
                if ai_response:
                    messages.append({"role": "assistant", "content": ai_response})
                    window["-AI-RESPONSE-"].update(ai_response)
                    window["-STATUS-"].update("AI 正在说话...")
                    selected_voice = values["-VOICE-"]
                    threading.Thread(target=text_to_speech, args=(ai_response, selected_voice, auth_token)).start()
            else:
                window["-STATUS-"].update("未能理解。请重试。")
            
            window["-RECORD-"].update(disabled=False)
            window["-STOP-"].update(disabled=True)
        
        elif event == "-CLEAR-":
            messages.clear()
            window["-USER-INPUT-"].update("")
            window["-AI-RESPONSE-"].update("")
            window["-STATUS-"].update("对话历史已清除")
        
        if not pygame.mixer.get_busy():
            window["-STATUS-"].update("准备就绪")

    window.close()

if __name__ == "__main__":
    main()

可以直接使用已打包的exe
[123云盘下载地址](https://www.123pan.com/s/pTVtjv-DyQWd.html)旧版已失效

填入Acces token就可以愉快的玩耍了
tts

当然，需要plus账户哈

各位佬友好像都觉得这个UI真的似乎不是很友好(确实不好 )

加个夜班赶一下交互感

附上下载链接和代码链接
代码包：https://www.123pan.com/s/pTVtjv-ZyQWd.html
成品包：https://www.123pan.com/s/pTVtjv-cyQWd.html

neo · 2024 年7 月 31 日 14:04

这个UI有些年头了啊

syclove · 2024 年7 月 31 日 14:06

不错，支持了

civil · 2024 年7 月 31 日 14:06

感谢分享！

pipiwei · 2024 年7 月 31 日 14:06

哈哈哈，这个py的库，没用前端
随便糊个等其他大佬解决美化问题

pipiwei · 2024 年7 月 31 日 14:07

我也是蹭的，嘿嘿，可惜了普号不能用

zhong_little · 2024 年7 月 31 日 14:09

才知道开了，尝尝鲜

Qfeng · 2024 年7 月 31 日 14:12

PySimpleGUI 佬这个库有点老呀，还要注册才可以用

pipiwei · 2024 年7 月 31 日 14:13

我也是试用30天，哈哈，这个UI等大佬改

Qfeng · 2024 年7 月 31 日 14:37

交给gpt给pyqt5写一下就行了

Haven · 2024 年7 月 31 日 14:48

优秀的佬友

pipiwei · 2024 年7 月 31 日 15:52

哈哈哈，公然挑衅oai

PLA81 · 2024 年7 月 31 日 16:30

感谢

sanbo_x · 2024 年7 月 31 日 16:36

666

pipiwei · 2024 年7 月 31 日 21:37

改了流式处理文本响应
增加了UI简单优化

使用：
开始对话之后会录音
每一轮人的录音结束后需手动点击“我说完了”会自动发送等待AI回复
对话结束

one1bug · 2024 年7 月 31 日 21:40

没有plus

pipiwei · 2024 年7 月 31 日 21:41

啊这，看看有无大善人

handsome · 2024 年8 月 1 日 01:02

太强了！

boyang · 2024 年8 月 1 日 01:03

不错

906051999 · 2024 年8 月 1 日 01:30

就玩过一次

话题		回复	浏览量
如何玩上始皇的新玩具：OpenAI的whisper + tts 资源荟萃人工智能	87	4205	2024 年8 月 29 日
【教程】使用access_token在始皇的接口上进行聊天（零基础+进阶）资源荟萃 ChatGPT , 人工智能	236	12451	2024 年11 月 9 日
听说论坛没有C++程序？那来吧！基于C++Qt开发的新玩具：OpenAI + TTS 资源荟萃 ChatGPT , OpenAI , 人工智能	10	822	2024 年8 月 29 日
【更新&基于Double】设定自己的ChatGPT聊天，带语音播放开发调优 ChatGPT , OpenAI	17	1353	2024 年8 月 28 日
基于始皇的TTS整个活，IOS 快捷指令对话ChatGPT！资源荟萃 ChatGPT , OpenAI , TTS , 人工智能	20	1293	2024 年8 月 29 日

始皇的新玩具4o＋TTS＋Whisper糊了个玩具

相关话题