在cloudflare部署hugging face的免费api，可对接oneapi/newapi，免费使用Qwen2.5 72B等模型

wangdefa · 2024 年10 月 12 日 07:45

感谢大佬的开源
自部署hugging face的免费api，拉到newapi内使用,Qwen2.5 70B等免费 - 开发调优 - LINUX DO

顺手打包了镜像需要的可以看看

docker pull oozzbb/hg2api:latest

运行命令可参考

HUGGINGFACE_API_KEY ，去hugging face申请
API_KEY ，对接one-api/new-api使用

docker run --name hg2api --restart always -p 5023:5000 -e HUGGINGFACE_API_KEY=hg_xxx -e  API_KEY=sk-1234567890 oozzbb/hg2api:latest

正文开始
因国内部署无法访问hugging face，所以在大佬的基础上改造成能部署到cloudflare workers

准备工作
1、注册 cloudflare
2、注册hugging face并申请api key，申请api key地址
3、复制以下代码部署到 cloudflare workers 中即可
4、支持在oneapi/newapi点击“获取模型列表”一键添加可用模型

我是隐藏代码

//对接one-api/new-api使用
const API_KEY = "sk-1234567890";

//你的hugging face api key去hugging face申请
const HUGGINGFACE_API_KEY = "hf_xxxxxxxxxxx";

//目前发现的可用模型，请求时如模型不在该列表内，则使用你请求的模型
const CUSTOMER_MODEL_MAP = {
    "qwen2.5-72b-instruct": "Qwen/Qwen2.5-72B-Instruct",
    "gemma2-2b-it": "google/gemma-2-2b-it",
    "gemma2-27b-it": "google/gemma-2-27b-it",
    "llama-3-8b-instruct": "meta-llama/Meta-Llama-3-8B-Instruct",
    "llama-3.2-1b-instruct": "meta-llama/Llama-3.2-1B-Instruct",
    "llama-3.2-3b-instruct": "meta-llama/Llama-3.2-3B-Instruct",
    "phi-3.5": "microsoft/Phi-3.5-mini-instruct"
};

async function handleRequest(request) {
    try {
        if (request.method === "OPTIONS") {
            return getResponse("", 204);
        }

        const authHeader = request.headers.get("Authorization");
        if (!authHeader || !authHeader.startsWith("Bearer ") || authHeader.split(" ")[1] !== API_KEY) {
            return getResponse("Unauthorized", 401);
        }

        if (request.url.endsWith("/v1/models")) {
            const arrs = [];
            Object.keys(CUSTOMER_MODEL_MAP).map(element => arrs.push({ id: element, object: "model" }))
            const response = {
                data: arrs,
                success: true
            };

            return getResponse(JSON.stringify(response), 200);
        }

        if (request.method !== "POST") {
            return getResponse("Only POST requests are allowed", 405);
        }

        if (!request.url.endsWith("/v1/chat/completions")) {
            return getResponse("Not Found", 404);
        }

        const data = await request.json();
        const messages = data.messages || [];
        const model = CUSTOMER_MODEL_MAP[data.model] || data.model;
        const temperature = data.temperature || 0.7;
        const max_tokens = data.max_tokens || 8196;
        const top_p = Math.min(Math.max(data.top_p || 0.9, 0.0001), 0.9999);
        const stream = data.stream || false;

        const requestBody = {
            model: model,
            stream: stream,
            temperature: temperature,
            max_tokens: max_tokens,
            top_p: top_p,
            messages: messages
        };

        const apiUrl = `https://api-inference.huggingface.co/models/${model}/v1/chat/completions`;
        const response = await fetch(apiUrl, {
            method: 'POST',
            headers: {
                'Authorization': `Bearer ${HUGGINGFACE_API_KEY}`,
                'Content-Type': 'application/json'
            },
            body: JSON.stringify(requestBody)
        });

        if (!response.ok) {
            const errorText = await response.text();
            return getResponse(`Error from API: ${response.statusText} - ${errorText}`, response.status);
        }

        const newResponse = new Response(response.body, {
            status: response.status,
            headers: {
                ...Object.fromEntries(response.headers),
                'Access-Control-Allow-Origin': '*',
                'Access-Control-Allow-Methods': '*',
                'Access-Control-Allow-Headers': '*'
            }
        });

        return newResponse;
    } catch (error) {
        return getResponse(JSON.stringify({
            error: `处理请求失败: ${error.message}`
        }), 500);
    }
}

function getResponse(resp, status) {
    return new Response(resp, {
        status: status,
        headers: {
            "Content-Type": "application/json",
            "Access-Control-Allow-Origin": "*",
            "Access-Control-Allow-Methods": "*",
            "Access-Control-Allow-Headers": "*"
        }
    });
}

addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})

handsome · 2024 年10 月 12 日 07:46

太强了！感谢大佬！

PLA81 · 2024 年10 月 12 日 07:52

感谢大佬分享

velsa · 2024 年10 月 12 日 08:29

感谢分享，用上了

opennex · 2024 年10 月 12 日 08:30

大佬膜拜！！

WyInnovate · 2024 年10 月 12 日 10:18

厉害了！！！

awz707 · 2024 年10 月 12 日 10:33

感谢佬友分享

Kuailiaojie1 · 2024 年10 月 12 日 10:37

大佬厉害了

Atomesh · 2024 年10 月 12 日 11:40

wangdefa:

const CUSTOMER_MODEL_MAP = {
    "qwen2.5-72b-instruct": "Qwen/Qwen2.5-72B-Instruct",
    "gemma2-2b-it": "google/gemma-2-2b-it",
    "gemma2-27b-it": "google/gemma-2-27b-it",
    "llama-3-8b-instruct": "meta-llama/Meta-Llama-3-8B-Instruct",
    "llama-3.2-1b-instruct": "meta-llama/Llama-3.2-1B-Instruct",
    "llama-3.2-3b-instruct": "meta-llama/Llama-3.2-3B-Instruct",
    "phi-3.5": "microsoft/Phi-3.5-mini-instruct"
};

感谢大佬，我就使用了你打包的镜像完成的部署，我想问一下，关于这个可用模型，是在哪里获得的呢，我以为是 Hugging face chat 的模型，结果发现不是

编辑：看了下代码我发现原来是在 Supported Models 看，打扰了，多谢

xjfkkk · 2024 年10 月 12 日 11:53

用不完，根本用不完

moemoe · 2024 年10 月 12 日 12:00

已部署到CF，谢谢滑稽大佬啊

wangdefa · 2024 年10 月 12 日 13:00

很多收费模型，只是试了其中一些，很多没试

jdzw · 2024 年10 月 12 日 13:06

话说千问2.5这个模型怎么样？

wangdefa · 2024 年10 月 12 日 13:11

一般吧，有时提问内容包括英文时，可能会回复英文

Zu_Andy · 2024 年10 月 12 日 13:24

感谢佬友分享

Arvin_Xu · 2024 年10 月 12 日 15:33

只有 mistralai/Mistral-7B-Instruct-v0.2 是免费模型吧？其他应该都是收费模型…

今天刚给 LobeChat 加上 HuggingFace 的支持：

不如直接用 OpenRouter 的免费模型

zhong_little · 2024 年10 月 12 日 16:28

这里似乎可以优化一下：

const url = new URL(request.url);
const pathname = url.pathname;
if (pathname !== "/v1/chat/completions") {
    return getResponse("Not Found", 404);
}

我使用 nextchat 内置接口的时候，前端请求路径会变成 https://xxxx/v1/chat/completions?path=v1&path=chat&path=completions 的形式，导致请求 404，研究了下才发现这里做了严格的路径判断

Atomesh · 2024 年10 月 12 日 16:29

感觉 Llama-3.1-70B 不行是因为包含在了 Pro 计划里面

我尝试了下 https://huggingface.co/models?inference=warm&sort=trending 中的模型，随便挑了两个是可以的（我也不知道为啥 3-8b 在 pro 计划但是可以）

具体哪些模型可以，可以在对应模型的 HF 页面的右边 Inference API 跑一下看看
比如 meta-llama/Llama-3.1-70B-Instruct 就会提示 Pro 订阅

而 meta-llama/Llama-3.2-11B-Vision-Instruct 就可以

truetrue · 2024 年10 月 12 日 18:06

mark cloudflare worker huggingface api，谢谢佬

shaoyou · 2024 年10 月 12 日 21:09

膜拜佬先收藏

话题		回复	浏览量
Cloudflare免费模型食用指南资源荟萃人工智能	68	4197	2024 年12 月 8 日
【Cloudflare系列教程】部署 ChatGPT-Next-Web 资源荟萃	47	1927	2025 年1 月 16 日
抱脸部署OpenWebUI教程资源荟萃人工智能 , OpenWebUI	240	12984	2025 年1 月 18 日
【api check】完全重构全新版本 v2.1 + v1.5 开发调优 OpenAI , 人工智能	101	4179	2025 年1 月 26 日
在 Cursor 中使用 Cloudflare Worker 代理以第三方api的方式访问 Claude 和 Deepseek 的教程开发调优 ChatGPT , Claude , 人工智能	39	2290	2025 年1 月 10 日

在cloudflare部署hugging face的免费api，可对接oneapi/newapi，免费使用Qwen2.5 72B等模型

相关话题