通过内置对象理解 Python（七）

2021-11-03

`sorted` and `reversed` 用于序列

排序和反转数据序列可能是任何编程语言中最常用的算法操作，Python 中的内置函数 sorted 和 reversed 都用于实现这些功能。

sorted 函数对输入的数据进行排序，并返回一个排序过的 list 对象。

1
2
3

>>> items = (3, 4, 1, 2)
>>> sorted(items)
[1, 2, 3, 4]

它使用 Tim Peters 发明的的“TimSort”算法，Tim Peters 是最早的 Python 大师之一。

sorted 还有另外两个参数：reverse 和 key 。当 reverse=True 时，对数据进行倒序排序；key 参数接受一个函数，该函数用于每个元素，根据每个项目的自定义属性对数据进行排序。来看看下面的代码:

>>> items = [
...   {'value': 3},
...   {'value': 1},
...   {'value': 2},
... ]
>>> sorted(items, key=lambda d: d['value'])
[{'value': 1}, {'value': 2}, {'value': 3}]
>>> names = ['James', 'Kyle', 'Max']
>>> sorted(names, key=len)  # Sorts by name length
['Max', 'Kyle', 'James']

还要注意，虽然 list.sort() 已经是对列表排序的一种方法，但 .sort() 方法只存在于列表中，而 sorted 可以接受任何可迭代对象。

reversed 函数接受任何序列类型，并返回一个生成器，它将原来对象中的成员顺序反序。

返回生成器是很好的，因为这意味着反转某些对象根本不需要额外的内存空间，比如 range 或 list ，它们的反转值可以逐个生成。

>>> items = [1, 2, 3]
>>> x = reversed(items)
>>> x
<list_reverseiterator object at 0x7f1c3ebe07f0>
>>> next(x)
3
>>> next(x)
2
>>> next(x)
1
>>> next(x)
StopIteration # Error: end of generator
>>> for i in reversed(items):
...     print(i)
...
321
>>> list(reversed(items))
[3, 2, 1]

`map` and `filter`

在 Python 中，所有东西都可能是对象，但这并不一定意味着 Python 代码需要面向对象。实际上，可以用 Python 编写非常易读的函数代码。

如果不知道什么是函数式语言或函数式代码，那么这里的概念是：所有的功能都是通过函数提供的。没有一个正式的类、对象、继承等概念。本质上，所有程序都只是操作数据片段，其方法是：将它们传递给函数并将修改后的值返回给你。

函数式编程中两个非常常见的概念是map和filter， Python为它们提供了内置函数:

map 是一个“高阶函数”，它是将另一个函数作为参数的一个函数。例如：

>>> def square(x):
...     return x * x
...
>>> numbers = [8, 4, 6, 5]
>>> list(map(square, numbers))
[64, 16, 36, 25]
>>> for squared in map(square, numbers):
...     print(squared)
...
64
16
36
25

map 有两个参数：函数和序列，它通过将每个元素作为输入来运行该函数，并将所有输出存储在一个新列表中。 map(square, numbers) 获取每个数字并返回一个平方数列表。

注意，我必须使用 list(map(square, numbers))，这是因为 map 本身返回一个生成器。请求这些值时，它们会被惰性地一次映射一个，例如，如果循环一个 map 对象，它会在序列的每一项上逐个运行映射函数。这意味着 map 不会存储映射值的完整列表，也不会在不需要的时候浪费时间计算额外的值。

filter 非常类似于 map ，只不过，它不是将每个值映射到一个新值，而是根据条件过滤一系列的值。

这意味着 filter 的输出将包含与输入相同的项，除了一些项可能被丢弃。例如下面的例子，过滤掉奇数。

>>> items = [13, 10, 25, 8]  
>>> evens = list(filter(lambda num: num % 2 == 0, items))  
>>> evens  
[10, 8]

一些人可能已经意识到这些函数本质上是在做与列表解析相同的事情，对的!

列表解析式基本上能实现以上操作，且可读性更强。

>>> def square(x):
...     return x * x
...
>>> numbers = [8, 4, 6, 5]
>>> [square(num) for num in numbers]
[64, 16, 36, 25]

>>> items = [13, 10, 25, 8]
>>> evens = [num for num in items if num % 2 == 0]
>>> evens
[10, 8]

至于实际中用什么方式，自己选择。

`len`, `max`, `min` and `sum` ：计算集中量

Python 有几个计算集中量函数，即：将一组值组合成单个结果的函数。这几个函数的应用非常简单，仅举一例：

>>> numbers = [30, 10, 20, 40]
>>> len(numbers)
4
>>> max(numbers)
40
>>> min(numbers)
10
>>> sum(numbers)
100

有三个函数可以接受任何容器数据类型，比如集合、字典甚至字符串:

>>> author = 'guidovanrossum'
>>> len(author)
14
>>> max(author)
'v'
>>> min(author)
'a'

sum 的参数必须是由数字为成员的容器，这意味着，以下操作是可行的:

1 2	>>> sum(b'guidovanrossum') 1542

把问题留给你，来弄清楚这是怎么回事？

`iter` and `next` ：迭代

iter 和 next 定义了 for 循环的工作机制。

for 循环看起来像这样:

1 2	for item in mylist: print(item)

它的内部实际上是这样做工作的：

mylist_iterable = iter(mylist)
while True:   
    try:        
        item = next(mylist_iterable)       
        print(item)    
    except StopIteration:      
        break

Python中的 for 循环是一个巧妙伪装的 while 循环。当遍历列表或任何其他支持迭代的数据类型时，只是意味着它理解 iter 函数，并返回一个“迭代器”对象。

Python 中的迭代器对象做两件事:

每次执行 next 时，都会产生新的值
当迭代器的值用完时，它们会抛出 StopIteration 内置异常。

这就是 for 循环的工作原理。

顺便说一下，生成器也遵循迭代器协议:

>>> gen = (x**2 for x in range(1, 4))
>>> next(gen)
1
>>> next(gen)
4
>>> next(gen)
9
>>> next(gen)
Error: StopIteration

`range`, `enumerate` and `zip` ：简化迭代

range 函数最多接受 3 参数，并返回由整数值组成的一个可迭代对象:

>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(3, 8))
[3, 4, 5, 6, 7]
>>> list(range(1, 10, 2))
[1, 3, 5, 7, 9]
>>> list(range(10, 1, -2))
[10, 8, 6, 4, 2]

当需要访问列表中元素的索引和值时，enumerate 非常有用。

不要这样做：

>>> menu = ['eggs', 'spam', 'bacon']
>>> for i in range(len(menu)):
...     print(f'{i+1}: {menu[i]}')
...1: eggs
   2: spam
   3: bacon

而是可以这样做:

>>> menu = ['eggs', 'spam', 'bacon']
>>> for index, item in enumerate(menu, start=1):
...     print(f'{index}: {item}')
...1: eggs
   2: spam
   3: bacon

类似地，zip 用于从多个可迭代对象中获取按索引排列的值。

不要这样做：

>>> students = ['Jared', 'Brock', 'Jack']
>>> marks = [65, 74, 81]
>>> for i in range(len(students)):
...     print(f'{students[i]} got {marks[i]} marks')
...Jared got 65 marks
   Brock got 74 marks
   Jack got 81 marks

可以这样做：

>>> students = ['Jared', 'Brock', 'Jack']
>>> marks = [65, 74, 81]
>>> for student, mark in zip(students, marks):
...     print(f'{student} got {mark} marks')
...
Jared got 65 marks
Brock got 74 marks
Jack got 81 marks

这两种方法都可以极大地简化迭代代码。

`slice`

当对一个 Python 可迭代对象进行切片时，在后台使用的就是 slice 对象。

例如，在 my_list[1:3] 中， [1:3] 不是特殊部分，只有 1:3 是。方括号仍在尝试对列表进行索引! 但是这些方括号里的1:3实际上创建了一个 slice 对象。

这就是为什么 my_list[1:3] 实际上等同于 my_list[slice(1, 3)]：

>>> my_list = [10, 20, 30, 40]
>>> my_list[1:3]
[20, 30]
>>> my_list[slice(1, 3)]
[20, 30]
>>> nums = list(range(10))
>>> nums
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> nums[1::2]
[1, 3, 5, 7, 9]
>>> s = slice(1, None, 2)  # Equivalent to `[1::2]`
>>> s
slice(1, None, 2)
>>> nums[s]
[1, 3, 5, 7, 9]

`breakpoint`：内置调试

breakpoint 是一个 Python 3.7 开始才出现的内置函数，从名称上看就知道，它的作用在于“打断点”，是用于调试的简单方式。本质上，它是从 pdb 模块中调用 set_trace() ，pdb 模块是 Python 中内置的调试模块。

关于调试的更多更详细信息，请参阅：http://www.itdiffer.com/self-learning.html 中的有关章节内容。

`open` ：创建文件

open 是用于读写文件的函数。因为操作简单，此处不再赘述，可以参阅 http://www.itdiffer.com/self-learning.html 中的有关章节。

`repr` ：方便开发者

repr 用于创建一个对象的有用的字符串表示，希望它能简明地描述对象及其当前状态。这样做的目的是能够通过查看对象的 repr 来调试简单的问题，而不必在每一步都探查对象属性。

下面是一个很好的例子:

>>> class Vector:
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...
>>> v = Vector(3, 5)
>>> v
<__main__.Vector object at 0x7f27dff5a1f0>

The default repr is not helpful at all. You’d have to manually check for its attributes:

默认的 repr 根本没有帮助。你必须手动检查它的属性:

>>> dir(v)
['__class__', ... , 'x', 'y']
>>> v.x
3
>>> v.y
5

但是，如果在类中实现 __repr__：

>>> class Vector:
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...     def __repr__(self):
...         return f'Vector(x={self.x}, y={self.y})'
>>> v = Vector(3, 5)
>>> v
Vector(x=3, y=5)

现在不需要对这个对象所包含的内容感到疑惑。它就在你面前!

`help`, `exit` and `quit` ：调用 site 模块的内置函数

现在，这些内置函数不是“真正的”内置程序。也就是说，它们并不是在 builtins 模块中定义的。它们是在 site 模块中定义的，然后在 site 模块运行时被注入到内置模块中。

site 模块会在启动 Python 时，默认自动运行。它负责设置一些有用的东西，包括针对 import 语句导入模块的 pip ，以及在交互模式中设置按 tab 键自动提示等。

它做的另外一件事是设置了这些有用的“内置函数”：

help 查找模块和对象的文档。相当于调用 pydoc.doc()。
exit and quit 退出 Python 进程。相当于调用 sys.exit()。

`copyright`, `credits`, `license` ：重要文本内容

site 模块还定义了这三个用于显示文本的函数种文本，在交互模式中输入它们会打印出相应的文本，如：

>>> license()
A. HISTORY OF THE SOFTWARE
==========================

Python was created in the early 1990s by Guido van Rossum at Stichting
Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands
as a successor of a language called ABC.  Guido remains Python's
principal author, although it includes many contributions from others.

In 1995, Guido continued his work on Python at the Corporation for
National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)
in Reston, Virginia where he released several versions of the
software.

In May 2000, Guido and the Python core development team moved to
BeOpen.com to form the BeOpen PythonLabs team.  In October of the same
year, the PythonLabs team moved to Digital Creations, which became
Zope Corporation.  In 2001, the Python Software Foundation (PSF, see
https://www.python.org/psf/) was formed, a non-profit organization
created specifically to own Python-related Intellectual Property.
Zope Corporation was a sponsoring member of the PSF.

All Python releases are Open Source (see http://www.opensource.org for
the Open Source Definition).  Historically, most, but not all, Python
Hit Return for more, or q (and Return) to quit: q

结束语

本文内容即将结束。了解 Python 基础知识，是学习和使用 Python 语言的关键。本文的内容仅供已经入门的读者进阶参考。可以说，基础知识，并不基础，因为背后的原理，才是 Python 的根本。

www.itdiffer.com 网站有很多提升进阶的文章，供读者参考。

参考文献

https://sadh.life/post/builtins/

← 通过内置对象理解 Python（六）用 Python 编辑 PDF 文件 →

赏

使用支付宝打赏

使用微信打赏

若你觉得我的文章对你有帮助，欢迎点击上方按钮对我打赏

关注微信公众号，读文章、听课程，提升技能