비동기 코루틴 개발 가이드: 고성능 추천 시스템 구축-PHP 튜토리얼-php.cn

비동기 코루틴 개발 가이드: 고성능 추천 시스템 구축

인터넷과 모바일 인터넷의 급속한 발전으로 데이터의 양이 폭발적으로 증가하면서 데이터를 효율적으로 처리하는 방법은 주요 기업의 R&D팀이 직면한 중요한 문제가 되었습니다. 추천 시스템은 주요 응용 분야 중 하나이며 많은 기업에서 널리 사용됩니다. 비동기 코루틴은 높은 동시성 시나리오에서 고성능 데이터 처리를 달성하는 데 중요한 기술입니다. 이 문서에서는 비동기 코루틴을 사용하여 고성능 추천 시스템을 구축하는 방법을 소개하고 구체적인 코드 예제를 제공합니다.

1. 비동기 코루틴이란 무엇인가요?

비동기 코루틴은 원래 Python 언어로 제안되고 구현된 매우 효율적인 동시 프로그래밍 모델입니다. Go 언어의 goroutine, Swift의 SwiftNIO 등 다양한 언어에서 차용되어 개발되었습니다. 비동기 코루틴은 코루틴 수준에서 전환하여 동시 비동기 I/O 작업을 지원합니다.

멀티 스레딩과 비교하여 비동기 코루틴은 다음과 같은 장점이 있습니다.

더 효율적: 비동기 코루틴은 전환 오버헤드가 거의 없이 매우 가벼운 스레드 모델을 구현할 수 있습니다.
더 유연함: 코루틴 간 전환은 커널에 들어갈 필요가 없지만 프로그램에 의해 제어되므로 코루틴의 수와 예약 방법을 더 유연하게 제어할 수 있습니다.
사용하기 더 쉬움: 다중 스레드 잠금 메커니즘과 비교하여 비동기 코루틴은 공동 스케줄링을 통해 잠금과 같은 다중 스레드 문제를 피할 수 있으므로 코드를 더 간단하고 사용하기 쉽게 만듭니다.

2. 추천 시스템의 비동기 코루틴 적용 시나리오

추천 시스템은 구현 중에 사용자 행동 로그, 항목 속성 정보 등 많은 양의 데이터를 처리해야 하지만, 비동기 코루틴은 고성능 데이터 처리를 달성할 수 있습니다. . 구체적으로 추천 시스템의 다음 응용 시나리오는 비동기 코루틴 사용에 적합합니다.

사용자 관심 기능 추출: 사용자 행동 로그의 비동기 읽기 및 처리가 비동기 코루틴을 통해 구현되고 사용자 관심 기능이 추출되어 개인화를 지원합니다. 추천 .
항목 정보 집계: 항목 속성 정보의 비동기 읽기 및 처리가 비동기 코루틴을 통해 구현되며, 다양한 정보를 집계하여 항목의 종합 추천을 지원합니다.
추천 결과 정렬: 추천 결과의 빠른 정렬 및 필터링은 비동기 코루틴을 통해 구현되어 추천 시스템의 높은 처리량과 낮은 대기 시간을 보장합니다.

3. 비동기 코루틴 개발 가이드

다음은 코루틴 개발 프로세스, 스케줄링 메커니즘, 비동기 I/O 작업의 세 가지 측면에서 비동기 코루틴 개발 가이드를 소개합니다.

코루틴 개발 프로세스

비동기 코루틴에서는 코루틴의 생성, 전환 및 예약을 실현하기 위해 코루틴 라이브러리를 사용해야 합니다. 현재 더 널리 사용되는 코루틴 라이브러리에는 Python의 asyncio, Go의 goroutine, Swift의 SwiftNIO가 있습니다.

간단한 비동기 코루틴 프로그램을 구현하기 위해 Python에서 asyncio를 예로 들어 보겠습니다.

import asyncio
 
async def foo():
    await asyncio.sleep(1)
    print('Hello World!')
 
loop = asyncio.get_event_loop()
loop.run_until_complete(foo())

로그인 후 복사

위 프로그램에서 asyncio.sleep(1)은 현재 코루틴을 1초 동안 절전 모드로 두는 것을 의미합니다. 비동기 I/O 작업을 시뮬레이션할 때 async def로 선언된 함수는 비동기 함수를 나타냅니다. 프로그램에서 loop.run_until_complete()를 사용하여 코루틴을 실행하면 출력 결과는 Hello World!입니다. asyncio.sleep(1) 表示让当前协程休眠 1 秒钟，以模拟异步 I/O 操作，async def 声明的函数表示异步函数。在程序中使用 loop.run_until_complete() 来运行协程，输出结果为 Hello World!。

调度机制

在异步协程中，协程的调度是非常重要的一环。通过异步协程的协作式调度，可以更加灵活地控制协程的数量和调度顺序，以达到最优的性能表现。

在 asyncio 中，使用 asyncio.gather() 方法来执行多个协程，例如：

import asyncio
 
async def foo():
    await asyncio.sleep(1)
    print('foo')
 
async def bar():
    await asyncio.sleep(2)
    print('bar')
 
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(foo(), bar()))

로그인 후 복사

上述程序中，asyncio.gather() 可以同时执行多个协程，输出结果为 foo 和 bar。这里的两个协程的时间长度分别为 1 秒和 2 秒，因此输出顺序为 foo 和 bar。

异步 I/O 操作

在推荐系统中，需要使用异步 I/O 操作来处理大量的用户行为日志、物品属性信息等数据。在异步协程中使用异步 I/O 操作可以大大提高数据读取和处理的效率。

在 asyncio 中，使用 asyncio.open() 方法来异步读取文件，例如：

import asyncio
 
async def read_file():
    async with aiofiles.open('data.log', 'r') as f:
        async for line in f:
            print(line.strip())
 
loop = asyncio.get_event_loop()
loop.run_until_complete(read_file())

로그인 후 복사

上述程序中，使用 async with aiofiles.open() 来异步打开文件，使用 async for line in f 来异步读取文件中的每行数据。在程序中使用 loop.run_until_complete()

비동기 코루틴에서 코루틴의 스케줄링은 매우 중요한 부분입니다. 비동기식 코루틴의 협업 스케줄링을 통해 코루틴의 수와 스케줄링 순서를 보다 유연하게 제어하여 최적의 성능을 얻을 수 있습니다.

asyncio.gather()

import asyncio
import json
 
async def extract_feature(data):
    result = {}
    for item in data:
        uid = item.get('uid')
        if uid not in result:
            result[uid] = {'click': 0, 'expose': 0}
        if item.get('type') == 'click':
            result[uid]['click'] += 1
        elif item.get('type') == 'expose':
            result[uid]['expose'] += 1
    return result
 
async def read_file():
    async with aiofiles.open('data.log', 'r') as f:
        data = []
        async for line in f:
            data.append(json.loads(line))
            if len(data) >= 1000:
                result = await extract_feature(data)
                print(result)
                data = []
 
        if len(data) > 0:
            result = await extract_feature(data)
            print(result)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(read_file())

로그인 후 복사

asyncio.gather()

foo

bar

foo

bar

비동기 I/O 작업

asyncio.open()

import asyncio
import json
 
async def aggregate_info(data):
    result = {}
    for item in data:
        key = item.get('key')
        if key not in result:
            result[key] = []
        result[key].append(item.get('value'))
    return result
 
async def read_file():
    async with aiofiles.open('data.log', 'r') as f:
        data = []
        async for line in f:
            data.append(json.loads(line))
            if len(data) >= 1000:
                result = await aggregate_info(data)
                print(result)
                data = []
 
        if len(data) > 0:
            result = await aggregate_info(data)
            print(result)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(read_file())

로그인 후 복사

async with aiofiles.open()</code을 사용하세요. > 파일을 비동기적으로 열려면 <code>async for line in f

loop.run_until_complete()

import asyncio
import json
 
async def extract_feature(data):
    result = {}
    for item in data:
        uid = item.get('uid')
        if uid not in result:
            result[uid] = {'click': 0, 'expose': 0}
        if item.get('type') == 'click':
            result[uid]['click'] += 1
        elif item.get('type') == 'expose':
            result[uid]['expose'] += 1
    return result
 
async def read_file():
    async with aiofiles.open('data.log', 'r') as f:
        data = []
        async for line in f:
            data.append(json.loads(line))
            if len(data) >= 1000:
                result = await extract_feature(data)
                print(result)
                data = []
 
        if len(data) > 0:
            result = await extract_feature(data)
            print(result)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(read_file())

로그인 후 복사

上述程序中，extract_feature() 函数用于从用户行为日志中提取用户兴趣特征，read_file() 函数读取用户行为日志，并调用 extract_feature() 函数进行用户特征提取。在程序中，使用 if len(data) >= 1000 判断每次读取到的数据是否满足处理的条件。

物品信息聚合

在推荐系统中，物品信息的聚合是支持物品的综合推荐的必要环节。物品属性信息是推荐系统中的重要数据之一，因此需要使用异步 I/O 来进行读取和处理。

import asyncio
import json
 
async def aggregate_info(data):
    result = {}
    for item in data:
        key = item.get('key')
        if key not in result:
            result[key] = []
        result[key].append(item.get('value'))
    return result
 
async def read_file():
    async with aiofiles.open('data.log', 'r') as f:
        data = []
        async for line in f:
            data.append(json.loads(line))
            if len(data) >= 1000:
                result = await aggregate_info(data)
                print(result)
                data = []
 
        if len(data) > 0:
            result = await aggregate_info(data)
            print(result)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(read_file())

로그인 후 복사

上述程序中，aggregate_info() 函数用于从物品属性信息中聚合物品信息，read_file() 函数读取物品属性信息，并调用 aggregate_info() 函数进行信息聚合。在程序中，使用 if len(data) >= 1000 判断每次读取到的数据是否满足处理的条件。

推荐结果排序

在推荐系统中，推荐结果的排序是支持高吞吐量和低延迟的关键环节。通过异步协程进行推荐结果的排序和过滤，可以大大提高推荐系统的性能表现。

import asyncio
 
async def sort_and_filter(data):
    data.sort(reverse=True)
    result = []
    for item in data:
        if item[1] > 0:
            result.append(item)
    return result[:10]
 
async def recommend():
    data = [(1, 2), (3, 4), (2, 5), (7, 0), (5, -1), (6, 3), (9, 8)]
    result = await sort_and_filter(data)
    print(result)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(recommend())