python问题：asyncio比ThreadPool更慢如何解释？附deepl-pro实测

mikeee · 2024 年8 月 26 日 07:42

"""Translate using deeplpro's dl_session cookie."""
# pylint: disable=broad-exception-caught

import asyncio
import sys
from time import monotonic
from random import randrange
from typing import Union
from concurrent.futures import ThreadPoolExecutor
import httpx
from loadtext import loadtext

COOKIE = "dl_session=81f94b98-497f-4f18-ab79-1ef25076f4c9"

HEADERS = {
    "Content-Type": "application/json",
    "Cookie": COOKIE,
}
URL = "https://api.deepl.com/jsonrpc"


def deepl_tr(
    text: str, source_lang: str = "auto", target_lang: str = ""
) -> Union[dict, Exception]:
    """Translate using deeplpro's dl_session cookie."""
    if not source_lang.strip():
        source_lang = "auto"

    if not target_lang.strip():
        target_lang = "zh"

    data = {
        "jsonrpc": "2.0",
        "method": "LMT_handle_texts",
        "id": randrange(sys.maxsize),
        "params": {
            "splitting": "newlines",
            "lang": {
                "source_lang_user_selected": source_lang,
                "target_lang": target_lang,
            },
            "texts": [
                {
                    "text": text,
                    "requestAlternatives": 3,
                }
            ],
        },
    }
    try:
        _ = httpx.post(URL, json=data, headers=HEADERS)
    except Exception as exc:
        return exc

    try:
        jdata = _.json()
    except Exception as exc:
        return exc

    return jdata


async def deepl_tr_async(
    text: str, source_lang: str = "auto", target_lang: str = ""
) -> Union[dict, Exception]:
    """Translate using deeplpro's dl_session cookie."""
    if not source_lang.strip():
        source_lang = "auto"

    if not target_lang.strip():
        target_lang = "zh"

    data = {
        "jsonrpc": "2.0",
        "method": "LMT_handle_texts",
        "id": randrange(sys.maxsize),
        "params": {
            "splitting": "newlines",
            "lang": {
                "source_lang_user_selected": source_lang,
                "target_lang": target_lang,
            },
            "texts": [
                {
                    "text": text,
                    "requestAlternatives": 3,
                }
            ],
        },
    }
    async with httpx.AsyncClient() as client:
        try:
            _ = await client.post(URL, json=data, headers=HEADERS)
        except Exception as exc:
            return exc
    try:
        jdata = _.json()
    except Exception as exc:
        return exc

    return jdata


def main():
    """Run."""
    texts = loadtext(r"2024-08-20.txt")

    then = monotonic()

    # default workers = min(32, (os.cpu_count() or 1) + 4)
    # with ThreadPoolExecutor(len(texts)) as executor:
    with ThreadPoolExecutor() as executor:
        _ = executor.map(deepl_tr, texts)
    print(*_)
    time_el = monotonic() - then
    print(f"{len(texts)}, {time_el:.2f} {time_el / len(texts):.2f}")


async def main_a():
    """Run async."""
    texts = loadtext(r"2024-08-20.txt")

    then = monotonic()
    coros = [deepl_tr_async(text) for text in texts]
    _ = await asyncio.gather(*coros)
    print(_)
    time_el = monotonic() - then
    print(f"{len(texts)}, {time_el:.2f} {time_el / len(texts):.2f}")


if __name__ == "__main__":
    asyncio.run(main_a())

    main()

用一个39段的英文测试，所需时间如下：
异步：
d9ac6888c46ff7c76f5337fbb114895e

ThreadPoolExecutor， workers=8

ThreadPoolExecutor也用 39 个workers 跑过，时间变化不大。

不知道有没有python佬知道怎么解释。

（有兴趣自己试试的网友将loadtext那地方改成任何约40段英文的txt文件就可以了）

pluszero1982 · 2024 年8 月 26 日 15:06

异步一般都不会更快啊。

异步解决的是慢操作(比如io)阻塞的问题吧?

hututu · 2024 年8 月 26 日 15:35

佬，解决了吗，这是为什么啊？不应异步更快吗？

Rainforest · 2024 年8 月 26 日 15:40

异步协程的核心优势在于无需切换线程，资源开销少，并且可以由用户控制

mikeee · 2024 年8 月 26 日 15:42

没解决，chatgpt解释得也是模棱两可。

zhong_little · 2024 年8 月 26 日 15:43

是 aigc 吗，aigc 要截图哦

xico · 2024 年8 月 26 日 15:44

感觉是的，异步实际上是在更合理的安排任务，没有做加速，线程池是实打实的靠堆cpu资源来提升。如果在io任务处理时间没那么分散，异步其实就没多大用。

pluszero1982 · 2024 年8 月 26 日 17:09

异步我个人理解就是挂起操作，没道理性能更高。

典型的高性能场景，一般使用的c/c++/Java都不主打异步。

真的能说主打异步的js其实并不以性能见长。

WhoToFind · 2024 年8 月 26 日 17:40

web请求能测出来啥，不管是啥方式99%的时间都花在IO上

wxf666 · 2024 年8 月 26 日 17:43

为啥你同步方式，直接发请求；

异步方式，构造 39 个 AsyncClient()，每个再独自发异步请求呢？

你同步方式，也每个线程构造 Client()，再发请求试试？

wxf666 · 2024 年8 月 26 日 17:55

正常来说，只需要一个 (Async)Client()，所有请求都通过它发出就好。

如果你每个线程，也单独构造 Client() 发请求，结果还是比异步快，

难道是构造请求/解析响应很耗时（你文本很大很长吗）？

httpx 能多线程不受 GIL 约束同时解析，比异步单线程逐个解析快？

jooooody · 2024 年8 月 26 日 17:58

I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading

I/O Bound, Slow I/O, Many connections => Asyncio

hututu · 2024 年8 月 27 日 01:19

我跑出来是异步更快啊！！! 佬，网络波动比较大？

多线程cpu调度也是很快的

Sunnnner · 2024 年8 月 27 日 01:36

io性能可以考异步来进行解决，如果你想纯加快速度可以优化代码，用点算法之类的

just4learning · 2024 年8 月 27 日 01:39

concurrent 不也是异步的实现吗？
官方文档的描述： The concurrent.futures module provides a high-level interface for asynchronously executing callables.

所以你的本意是在对比2种异步写法的耗时还是以为concurrent是同步实现在对比同步异步的耗时呢？

just4learning · 2024 年8 月 27 日 01:43

网络就是典型的慢io啊

mikeee · 2024 年8 月 27 日 01:44

这个属于期待结果的范围。佬另外写的码？修改了一楼的码？有可能我的机器太忙了，我再看看。

just4learning · 2024 年8 月 27 日 01:47

这俩方法都是异步的。
如果是同步的话，构造多个client的时间肯定不会比网络io更慢的，更多的是资源浪费而已。

hututu · 2024 年8 月 27 日 01:50

不考虑网络的情况下，两者时间差只有cpu调度的时间。大佬已经给出答案了

The concurrent.futures module provides a high-level interface for asynchronously executing callables.
The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor. Both implement the same interface, which is defined by the abstract Executor class.

CatFaceWest · 2024 年8 月 27 日 02:07

应该是httpx异步的实现有问题吧，换aiohttp就不分伯仲了

aiohttp
httpx

async def deepl_tr_async_aiohttp(text: str, source_lang: str = "auto", target_lang: str = "") -> Union[dict, Exception]:
    """Translate using deeplpro's dl_session cookie."""
    if not source_lang.strip():
        source_lang = "auto"

    if not target_lang.strip():
        target_lang = "zh"

    data = {
        "jsonrpc": "2.0",
        "method": "LMT_handle_texts",
        "id": randrange(sys.maxsize),
        "params": {
            "splitting": "newlines",
            "lang": {
                "source_lang_user_selected": source_lang,
                "target_lang": target_lang,
            },
            "texts": [
                {
                    "text": text,
                    "requestAlternatives": 3,
                }
            ],
        },
    }

    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(URL, json=data, headers=HEADERS) as response:
                response.raise_for_status()
                jdata = await response.json()
                return jdata
        except Exception as exc:
            return exc

话题		回复	浏览量
fastapi+hypercorn/gunicorn/flask 性能测试记录开发调优软件测试	9	285	2024 年12 月 13 日
【更新了】给大家分享个deepl pro文本翻译转api的不限量翻译资源荟萃 DeepLX , DeepL	63	5117	2024 年12 月 7 日
高推理速度的Qwen2.5-Coder-32B和Qwen2.5-72B api 福利羊毛人工智能	33	1370	2024 年12 月 12 日
OCR程序对接Deeplx接口 Code 开发调优 DeepLX , API , 沉浸式翻译 , 纯水	13	520	2024 年12 月 2 日
免费的 DeepLX Ultra 前端翻译+API 开发调优 DeepLX , API	71	3251	2024 年11 月 22 日

python问题：asyncio比ThreadPool更慢如何解释？附deepl-pro实测

相关话题