求解释一下python中bytearray和memoryview 的使用以及适用的场景

Question

x = bytearray(b'abcde')y = memoryview(x)y[1:3] = b'yz'x[1:3] = b'ab'y[3] = ord(b'e')x[3] = ord(b'f') x = bytearray(b'abcde')while len(x)&gt;0: x = x[1:] x = bytearray(b'abcde')y = memoryview(x)whi

高洛峰 · Answer

I just recently used memoryview to answer this question.

Bytearray is a mutable byte sequence, relative to str in Python2, but str is immutable.
In Python3, since str is unicode encoding by default, it can only be accessed by bytes through bytearray.

Memoryview provides a byte-by-byte memory access interface for objects that support buffer protocol[1,2]. The advantage is that there is no memory copy.
Str and bytearray support buffer procotol by default.
Comparison of the following two behaviors:
To put it simply, the slicing operation of str and bytearray will generate new slices str and bytearray and copy the data, but it will not happen after using memoryview.

Do not use memoryview

>> a = 'aaaaaa'
>> b = a[:2]    # 会产生新的字符串

>> a = bytearray('aaaaaa')
>> b = a[:2]    # 会产生新的bytearray
>> b[:2] = 'bb' # 对b的改动不影响a
>> a
bytearray(b'aaaaaa')
>> b
bytearray(b'bb')

Use memoryview

>> a = 'aaaaaa'
>> ma = memoryview(a)
>> ma.readonly  # 只读的memoryview
True
>> mb = ma[:2]  # 不会产生新的字符串

>> a = bytearray('aaaaaa')
>> ma = memoryview(a)
>> ma.readonly  # 可写的memoryview
False
>> mb = ma[:2]      # 不会会产生新的bytearray
>> mb[:2] = 'bb'    # 对mb的改动就是对ma的改动
>> mb.tobytes()
'bb'
>> ma.tobytes()
'bbaaaa'

My usage scenario is socket reception and analysis of received data in network programs:

The sock receiving code before using memoryview is simplified as follows

def read(size):

ret = '' 
remain = size
while True:
    data = sock.recv(remain)
    ret += data     # 这里不断会有新的str对象产生
    if len(data) == remain:
        break
    remain -= len(data)
return ret

After using meoryview, we avoid constant string splicing and the generation of new objects
```
def read(size):
    ret = memoryview(bytearray(size)) 
    remain = size
    while True:
        data = sock.recv(remain)
        length = len(data)
        ret[size - remain: size - remain + length] = data
        if len(data) == remain:
            break
        remain -= len(data)
    return ret
```
There is another advantage of returning memoryview. When using struct for unpack parsing, you can directly receive the memoryview object, which is very efficient (avoiding a large number of slicing operations when parsing large str in segments).

For example:

    mv = memoryview('\x00\x01\x02\x00\x00\xff...')
    type, len = struct.unpack('!BI', mv[:5])
    ...

[1] https://jakevdp.github.io/blo...
[2] http://legacy.python.org/dev/...