通过内置对象理解 Python（六）

2021-11-03

`bytearray` and `memoryview`: 字节接口

bytearray 与 bytes 类似，它的意义体现在：

bytearray 在一些低级操作中，比如有关字节和位运算，使用 bytearray 对于改变单个字节会更有效。例如下面的魔幻操作：

>>> def upper(s):
...     return ''.join(chr(ord(c) & 223) for c in s)
...
>>> def toggle(s): return ''.join(chr(ord(c) ^ 32) for c in s)
...
>>> def lower(s): return ''.join(chr(ord(c) | 32) for c in s)
...
>>> upper("Lao Qi")
'LAO\x00QI'
>>> toggle("Lao Qi")
'lAO\x00qI'
>>> lower("Lao Qi")
'lao qi'

字节的大小是固定的，而字符串则由于编码规则，其长度会有所不同，比如按照常用的 unicode 编码标准 utf-8 进行编码：

>>> x = 'I♥🐍'
>>> len(x)
3
>>> x.encode()
b'I\xe2\x99\xa5\xf0\x9f\x90\x8d'
>>> len(x.encode())
8
>>> x[2]
'🐍'
>>> x[2].encode()
b'\xf0\x9f\x90\x8d'
>>> len(x[2].encode())
4

变量 x 引用的字符串 I♥🐍 由三个字符构成，实际上共计 8 个字节，而表情符号 🐍 有4个字节长。按照下面的演示，如果读取表情符的每个单独的字节，它的“值”总是在 0 到 255 之间:

>>> x[2]
'🐍'
>>> b = x[2].encode()
>>> b
b'\xf0\x9f\x90\x8d'  # 4 bytes
>>> b[:1]
b'\xf0'
>>> b[1:2]
b'\x9f'
>>> b[2:3]
b'\x90'
>>> b[3:4]
b'\x8d'
>>> b[0]  # indexing a bytes object gives an integer
240
>>> b[3]
141

下面来看一些针对字节的位操作的例子:

def alternate_case(string):
    """Turns a string into alternating uppercase and lowercase characters."""
    array = bytearray(string.encode())
    for index, byte in enumerate(array):
        if not ((65 <= byte <= 90) or (97 <= byte <= 126)):
            continue

        if index % 2 == 0:
            array[index] = byte | 32
        else:
            array[index] = byte & ~32

    return array.decode()

>>> alternate_case('Hello WORLD?')
'hElLo wOrLd?'

这不是一个很好的示例，因此不用耗费精力解释它，但它确实有效，而且，相比于为每个字符的更改创建一个新的 bytes 对象，它更有效。

另外一个内置函数 memoryview 与 bytearray 很类似，但它可以引用一个对象或一个切片，而不是为自己创建一个新的副本，允许你传一个对内存中“字节段”的引用，并在原地编辑它:

>>> array = bytearray(range(256))
>>> array
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08...
>>> len(array)
256
>>> array_slice = array[65:91]  # Bytes 65 to 90 are uppercase english characters
>>> array_slice
bytearray(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
>>> view = memoryview(array)[65:91]  # Does the same thing,
>>> view
<memory at 0x7f438cefe040>  # but doesn't generate a new new bytearray by default
>>> bytearray(view)
bytearray(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ')  # It can still be converted, though.
>>> view[0]  # 'A'
65
>>> view[0] += 32  # Turns it lowercase
>>> bytearray(view)
bytearray(b'aBCDEFGHIJKLMNOPQRSTUVWXYZ')  # 'A' is now lowercase.
>>> bytearray(view[10:15])
bytearray(b'KLMNO')
>>> view[10:15] = bytearray(view[10:15]).lower()
>>> bytearray(view)
bytearray(b'aBCDEFGHIJklmnoPQRSTUVWXYZ')  # Modified 'KLMNO' in-place.

`bin`, `hex`, `oct`, `ord`, `chr` and `ascii` ：实现最基本转换

bin 、hex 和 oct 三个内置函数实现了最基本的数制转换：

>>> bin(42)
'0b101010'
>>> hex(42)
'0x2a'
>>> oct(42)
'0o52'
>>> 0b101010
42
>>> 0x2a
42
>>> 0o52
42

轻松地实现了二进制、八进制和十六进制与十进制整数之间的转换。

>>> type(0x20)
<class 'int'>
>>> type(0b101010)
<class 'int'>
>>> 0o100 == 64
True

虽然十进制容易理解，但在有的时候，用其他进制，也是有必要的，如：

>>> bytes([255, 254])
b'\xff\xfe'              # Not very easy to comprehend
>>> # This can be written as:
>>> bytes([0xff, 0xfe])
b'\xff\xfe'              # An exact one-to-one translation

下面的示例中，则将文件的打开模式 mode 的值用八进制实现：

>>> import os
>>> os.open('file.txt', os.O_RDWR, mode=384)    # ??? what's 384
>>> # This can be written as:
>>> os.open('file.txt', os.O_RDWR, mode=0o600)  # mode is 600 -> read-write

请注意，bin 仅用于创建一个 Python 整数的二进制数时，如果想要的是二进制字符串，最好使用 Python 的字符串格式：

1 2	>>> f'{42:b}' 101010

内置函数 ord 和 chr 用于实现 ASCII 和 unicode 字符及其字符编码间的转换：

>>> ord('x')
120
>>> chr(120)
'x'
>>> ord('🐍')
128013
>>> hex(ord('🐍'))
'0x1f40d'
>>> chr(0x1f40d)
'🐍'
>>> '\U0001f40d'  # The same value, as a unicode escape inside a string
'🐍'

`format`：文本格式

内置函数 format(string, spec) 是 string.format(spec) 的另一种方式。可以用它实现字符串的转换，比如：

>>> format(42, 'c')             # int to ascii
'*'
>>> format(604, 'f')            # int to float
'604.000000'
>>> format(357/18, '.2f')       # specify decimal precision
'19.83'
>>> format(604, 'x')            # int to hex
'25c'
>>> format(604, 'b')            # int to binary
'1001011100'
>>> format(604, '0>16b')        # binary with zero-padding
'0000001001011100'
>>> format('Python!', '🐍^15')  # centered aligned text
'🐍🐍🐍🐍Python!🐍🐍🐍🐍'

在《Python 大学使用教程》一书中对字符串的格式化输出有详细介绍，并且在另外一本即将出版的书稿中，专门介绍了格式化输出，请参阅：【字符串格式化输出】，或者访问：http://www.itdiffer.com/self-learning.html 查阅。

`any` 和 `all`

这是两个非常 Pythonic 的函数，恰当使用，能让代码更短，可读性更强，体现了 Python 的精髓。例如：

假设编写一个验证请求是否合规的 API，接受来自请求的 JSON 数据，判断该数据中是否含有 id 字段，并且该字段的长度必须是 20 ，一种常见的写法是：

def validate_responses(responses):
    for response in responses:
        # Make sure that `id` exists
        if 'id' not in response:
            return False
        # Make sure it is a string
        if not isinstance(response['id'], str):
            return False
        # Make sure it is 20 characters
        if len(response['id']) != 20:
            return False

    # If everything was True so far for every
    # response, then we can return True.
    return True

用 all 函数优化之后为：

def validate_responses(responses):
    return all(
        'id' in response
        and isinstance(response['id'], str)
        and len(response['id']) == 20
        for response in responses
    )

all 的参数是布尔值组成的迭代器，若迭代器中有一个 False 值，函数 all 的返回即为 False 。否则返回 True 。

再看一个判断回文的示例：

def contains_palindrome(words):
    for word in words:
        if word == ''.join(reversed(word)):
            return True

    # Found no palindromes in the end
    return False

与之相对的是

1 2	def contains_palindrome(words): return any(word == ''.join(reversed(word)) for word in words)

补充知识： any 和 all 内部的列表解析

我们可以把使用 any 或 all 的代码写成列表解析式:

1	>>> any([num == 0 for num in nums])

而不是生成器表达式:

1	>>> any(num == 0 for num in nums)

用列表解析和生成器，两者有较大的区别：

>>> any(num == 10 for num in range(100_000_000))
True
>>> any([num == 10 for num in range(100_000_000)])
True

使用列表解析的第二行代码不仅会在列表中毫无理由地存储1亿个值，然后再运行 any ，而且在我的机器上也需要10秒以上的时间。同时，因为第一行代码是一个生成器表达式，它会逐个生成从 0 到 10 的数字，并将它们传给 any ，一旦计数达到 10，any 就会中断迭代并几乎立即返回 True 。这也意味着，在这种情况下，它的运行速度实际上快了一千万倍。

所以，要使用生成器。

关于生成器的更多知识，请查阅《Python 大学实用教程》（电子工业出版社）

（补充知识完毕）

`abs`, `divmod`, `pow` and `round` ：数学基础

这四个数学函数在编程中非常常见，它们被直接放在随时可用的内置函数中，而不是放在 math 模块中。

它们非常简单:

abs 返回一个数字的绝对值，例如：

>>> abs(42)
42
>>> abs(-3.14)
3.14
>>> abs(3-4j)
5.0

divmod 返回除法运算后的商和余数：

>>> divmod(7, 2)
(3, 1)
>>> quotient, remainder = divmod(5327, 100)
>>> quotient
53
>>> remainder
27

pow 返回一个值的指数运算结果：

>>> pow(100, 3)
1000000
>>> pow(2, 10)
1024

round 按照四舍五入原则返回数字：

>>> import math
>>> math.pi
3.141592653589793
>>> round(math.pi)
3
>>> round(math.pi, 4)
3.1416
>>> round(1728, -2)
1700

`isinstance` and `issubclass` ：类型检查

type 内置函数可以用于对象的类型检查，就像这样:

def print_stuff(stuff):
    if type(stuff) is list:
        for item in stuff:
            print(item)
    else:
        print(stuff)

这个函数中检验参数对象是否是 list 类型。

>>> print_stuff('foo')
foo
>>> print_stuff(123)
123
>>> print_stuff(['spam', 'eggs', 'steak'])
spam
eggs
steak

目前看起来，它能运行，但是，实际上存在一些问题。这里有一个例子:

>>> class MyList(list):
...     pass
...
>>> items = MyList(['spam', 'eggs', 'steak'])
>>> items
['spam', 'eggs', 'steak']
>>> print_stuff(items)
['spam', 'eggs', 'steak']

当然，items 仍然是一个列表，但是 print_stuff 函数不再识别它了。原因很简单，因为 type(items) 的返回值是 MyList ，不是 list 。

也可以说，函数 type 没有考虑继承问题，如果改用 isinstance ，它不仅检查一个对象是否是一个类的实例，它还检查该对象是否是一个子类的实例：

>>> class MyList(list):
...     pass
...
>>> items = ['spam', 'eggs', 'steak']
>>> type(items) is list
True
>>> isinstance(items, list)
True   # Both of these do the same thing

>>> items = MyList(['spam', 'eggs', 'steak'])
>>> type(items) is list
False  # And while `type` doesn't work,
>>> isinstance(items, list)
True   # `isinstance` works with subclasses too.

类似地， issubclass 检查一个类是否是另一个类的子类。 isinstance 的第一个参数是一个对象，但 issubclass 的第一个参数是另一个类:

1 2	>>> issubclass(MyList, list) True

所以，应该将 print_stuff 函数中的 type 替换为 isinstance ，优化之后，继续测试：

1
2
3

>>> items = ('spam', 'eggs', 'steak')
>>> print_stuff(items)
('spam', 'eggs', 'steak')

如果传入的实参不是列表，则不能输出实参对象类型。对此的一种解决方法就是通过多分支的 if 语句实现。如果只是内置对象还好办一些，尽管如此，分支太多，代码也是丑陋的。

为此，Python 中有一个含有各种内置类型的“类”，可以用它们来测试类的某些“行为”，而不是测试类本身。在我们的例子中，行为是作为其他对象的容器，称之为 Container：

>>> from collections.abc import Container
>>> items = ('spam', 'eggs', 'steak')
>>> isinstance(items, tuple)
True
>>> isinstance(items, list)
False
>>> isinstance(items, Container)
True  # This works!

每个容器对象类型都会在 Container 基类的检查中返回 True ， issubclass 也行之有效:

>>> from collections.abc import Container
>>> issubclass(list, Container)
True
>>> issubclass(tuple, Container)
True
>>> issubclass(set, Container)
True
>>> issubclass(dict, Container)
True

把它添加到代码中，就变成:

from collections.abc import Container
def print_stuff(stuff):    
    if isinstance(stuff, Container):     
        for item in stuff:           
            print(item)    
    else:       
        print(stuff)

最后要特别声明：在实际的编程中，不提倡对参数类型进行检查。具体原因，请参阅《Python 大学实用教程》（电子工业出版社）中对“多态”的讲解内容。

← 通过内置对象理解 Python（五）通过内置对象理解 Python（七） →

赏

使用支付宝打赏

使用微信打赏

若你觉得我的文章对你有帮助，欢迎点击上方按钮对我打赏

关注微信公众号，读文章、听课程，提升技能