python问题:asyncio比ThreadPool更慢如何解释?附deepl-pro实测

"""Translate using deeplpro's dl_session cookie."""
# pylint: disable=broad-exception-caught

import asyncio
import sys
from time import monotonic
from random import randrange
from typing import Union
from concurrent.futures import ThreadPoolExecutor
import httpx
from loadtext import loadtext

COOKIE = "dl_session=81f94b98-497f-4f18-ab79-1ef25076f4c9"

HEADERS = {
    "Content-Type": "application/json",
    "Cookie": COOKIE,
}
URL = "https://api.deepl.com/jsonrpc"


def deepl_tr(
    text: str, source_lang: str = "auto", target_lang: str = ""
) -> Union[dict, Exception]:
    """Translate using deeplpro's dl_session cookie."""
    if not source_lang.strip():
        source_lang = "auto"

    if not target_lang.strip():
        target_lang = "zh"

    data = {
        "jsonrpc": "2.0",
        "method": "LMT_handle_texts",
        "id": randrange(sys.maxsize),
        "params": {
            "splitting": "newlines",
            "lang": {
                "source_lang_user_selected": source_lang,
                "target_lang": target_lang,
            },
            "texts": [
                {
                    "text": text,
                    "requestAlternatives": 3,
                }
            ],
        },
    }
    try:
        _ = httpx.post(URL, json=data, headers=HEADERS)
    except Exception as exc:
        return exc

    try:
        jdata = _.json()
    except Exception as exc:
        return exc

    return jdata


async def deepl_tr_async(
    text: str, source_lang: str = "auto", target_lang: str = ""
) -> Union[dict, Exception]:
    """Translate using deeplpro's dl_session cookie."""
    if not source_lang.strip():
        source_lang = "auto"

    if not target_lang.strip():
        target_lang = "zh"

    data = {
        "jsonrpc": "2.0",
        "method": "LMT_handle_texts",
        "id": randrange(sys.maxsize),
        "params": {
            "splitting": "newlines",
            "lang": {
                "source_lang_user_selected": source_lang,
                "target_lang": target_lang,
            },
            "texts": [
                {
                    "text": text,
                    "requestAlternatives": 3,
                }
            ],
        },
    }
    async with httpx.AsyncClient() as client:
        try:
            _ = await client.post(URL, json=data, headers=HEADERS)
        except Exception as exc:
            return exc
    try:
        jdata = _.json()
    except Exception as exc:
        return exc

    return jdata


def main():
    """Run."""
    texts = loadtext(r"2024-08-20.txt")

    then = monotonic()

    # default workers = min(32, (os.cpu_count() or 1) + 4)
    # with ThreadPoolExecutor(len(texts)) as executor:
    with ThreadPoolExecutor() as executor:
        _ = executor.map(deepl_tr, texts)
    print(*_)
    time_el = monotonic() - then
    print(f"{len(texts)}, {time_el:.2f} {time_el / len(texts):.2f}")


async def main_a():
    """Run async."""
    texts = loadtext(r"2024-08-20.txt")

    then = monotonic()
    coros = [deepl_tr_async(text) for text in texts]
    _ = await asyncio.gather(*coros)
    print(_)
    time_el = monotonic() - then
    print(f"{len(texts)}, {time_el:.2f} {time_el / len(texts):.2f}")


if __name__ == "__main__":
    asyncio.run(main_a())

    main()

用一个39段的英文测试,所需时间如下:
异步:
d9ac6888c46ff7c76f5337fbb114895e
a2aa951251a128d5f2fc013105b50b8d

ThreadPoolExecutor, workers=8
image

7dd66b19ade646855ae7792635a04478

ThreadPoolExecutor也用 39 个workers 跑过,时间变化不大。

不知道有没有python佬知道怎么解释。

(有兴趣自己试试的网友将loadtext那地方改成任何约40段英文的txt文件就可以了)

5 个赞

异步一般都不会更快啊。

异步解决的是慢操作(比如io)阻塞的问题吧?

4 个赞

佬,解决了吗,这是为什么啊? 不应异步更快吗?

1 个赞

异步协程的核心优势在于无需切换线程,资源开销少,并且可以由用户控制

1 个赞

没解决,chatgpt解释得也是模棱两可。

是 aigc 吗,aigc 要截图哦

1 个赞

感觉是的,异步实际上是在更合理的安排任务,没有做加速,线程池是实打实的靠堆cpu资源来提升。如果在io任务处理时间没那么分散,异步其实就没多大用。

1 个赞

异步我个人理解就是挂起操作,没道理性能更高。

典型的高性能场景,一般使用的c/c++/Java都不主打异步。

真的能说主打异步的js其实并不以性能见长。

1 个赞

web请求能测出来啥,不管是啥方式99%的时间都花在IO上

2 个赞

为啥你同步方式,直接发请求;

异步方式,构造 39 个 AsyncClient(),每个再独自发异步请求呢?

你同步方式,也每个线程构造 Client(),再发请求试试?

1 个赞

正常来说,只需要一个 (Async)Client(),所有请求都通过它发出就好。

如果你每个线程,也单独构造 Client() 发请求,结果还是比异步快,

难道是构造请求/解析响应很耗时(你文本很大很长吗)?

httpx 能多线程不受 GIL 约束同时解析,比异步单线程逐个解析快?

1 个赞

I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading

I/O Bound, Slow I/O, Many connections => Asyncio

3 个赞

我跑出来是异步更快啊!!! 佬,网络波动比较大?

多线程cpu调度也是很快的

image
image

1 个赞

io性能可以考异步来进行解决,如果你想纯加快速度可以优化代码,用点算法之类的

1 个赞

concurrent 不也是异步的实现吗?
官方文档的描述: The concurrent.futures module provides a high-level interface for asynchronously executing callables.

所以你的本意是在对比2种异步写法的耗时还是以为concurrent是同步实现在对比同步异步的耗时呢?

2 个赞

网络就是典型的慢io啊

1 个赞

这个属于期待结果的范围。佬另外写的码?修改了一楼的码?有可能我的机器太忙了,我再看看。

这俩方法都是异步的。
如果是同步的话,构造多个client的时间肯定不会比网络io更慢的,更多的是资源浪费而已。

1 个赞

不考虑网络的情况下,两者时间差只有cpu调度的时间。大佬已经给出答案了

The concurrent.futures module provides a high-level interface for asynchronously executing callables.
The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor. Both implement the same interface, which is defined by the abstract Executor class.

1 个赞

应该是httpx异步的实现有问题吧,换aiohttp就不分伯仲了

  • aiohttp
    image

  • httpx
    image

async def deepl_tr_async_aiohttp(text: str, source_lang: str = "auto", target_lang: str = "") -> Union[dict, Exception]:
    """Translate using deeplpro's dl_session cookie."""
    if not source_lang.strip():
        source_lang = "auto"

    if not target_lang.strip():
        target_lang = "zh"

    data = {
        "jsonrpc": "2.0",
        "method": "LMT_handle_texts",
        "id": randrange(sys.maxsize),
        "params": {
            "splitting": "newlines",
            "lang": {
                "source_lang_user_selected": source_lang,
                "target_lang": target_lang,
            },
            "texts": [
                {
                    "text": text,
                    "requestAlternatives": 3,
                }
            ],
        },
    }

    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(URL, json=data, headers=HEADERS) as response:
                response.raise_for_status()
                jdata = await response.json()
                return jdata
        except Exception as exc:
            return exc
1 个赞