This article mainly introduces to you the relevant information about the Sequence slice subscript problem in Python. The article introduces it in detail through the example code, which has certain reference and learning value for everyone. Friends who need it Let’s take a look together below.
Preface
In python, slicing is a frequently used syntax, whether it is a tuple, a list or String, the general syntax is:
sequence[ilow:ihigh:step] # ihigh
, step can be empty; for the sake of simplicity and ease of understanding, the usage of step is temporarily excluded Consider
Let’s briefly demonstrate the usage
sequence = [1,2,3,4,5] sequence [ilow:ihigh] # 从ilow开始到ihigh-1结束 sequence [ilow:] # 从ilow开始直到末尾 sequence [:ihigh] # 从头部开始直到ihigh结束 sequence [:] # 复制整个列表
The syntax is very concise and easy to understand. This syntax It is simple and easy to use in our daily use, but I believe that when we use this slicing syntax, we will habitually follow some rules:
ilow, ihigh are both smaller than sequence. Length
ilow < ihigh
Because in most cases, only by following the above rules can we get what we expected Result! But what if I don't follow it? What happens to slicing?
No matter we are using tuples, lists or strings, when we want to get an element, we will use the following syntax:
sequence = [1,2,3,4,5] print sequence[1] # 输出2 print sequence[2] # 输出3
Let’s call the 1 and 2 that appear above subscripts. Whether it is a tuple, a list or a string, we can use the subscript to get the corresponding value, but If the subscript exceeds the length of the object, an index exception (IndexError) will be triggered
sequence = [1,2,3,4,5] print sequence[15] ### 输出 ### Traceback (most recent call last): File "test.py", line 2, in <module> print a[20] IndexError: list index out of range
So what about slicing? The two syntaxes are very similar, assuming that ilow and ihigh are respectively 10 and 20, then what is the result?
Reappearance of the scene
##
# version: python2.7 a = [1, 2, 3, 5] print a[10:20] # 结果会报异常吗?
>>> a = [1, 2, 3, 5] >>> print a[10:20] []
>>> s = '23123123123' >>> print s[400:2000] '' >>> t = (1, 2, 3,4) >>> print t[200: 1000] ()
Principle Analysis
# #Before we reveal it, we must first figure out how python handles this slice. We can use the dis module to help:
############# 切片 ################ [root@iZ23pynfq19Z ~]# cat test.py a = [11,2,3,4] print a[20:30] #结果: [root@iZ23pynfq19Z ~]# python -m dis test.py 1 0 LOAD_CONST 0 (11) 3 LOAD_CONST 1 (2) 6 LOAD_CONST 2 (3) 9 LOAD_CONST 3 (4) 12 BUILD_LIST 4 15 STORE_NAME 0 (a) 2 18 LOAD_NAME 0 (a) 21 LOAD_CONST 4 (20) 24 LOAD_CONST 5 (30) 27 SLICE+3 28 PRINT_ITEM 29 PRINT_NEWLINE 30 LOAD_CONST 6 (None) 33 RETURN_VALUE ############# 单下标取值 ################ [root@gitlab ~]# cat test2.py a = [11,2,3,4] print a[20] #结果: [root@gitlab ~]# python -m dis test2.py 1 0 LOAD_CONST 0 (11) 3 LOAD_CONST 1 (2) 6 LOAD_CONST 2 (3) 9 LOAD_CONST 3 (4) 12 BUILD_LIST 4 15 STORE_NAME 0 (a) 2 18 LOAD_NAME 0 (a) 21 LOAD_CONST 4 (20) 24 BINARY_SUBSCR 25 PRINT_ITEM 26 PRINT_NEWLINE 27 LOAD_CONST 5 (None) 30 RETURN_VALUE
, and dis displays these bytecodes in a more impressive way, allowing us to see the execution process. The following is an explanation of the output columns of dis:
. The main difference is: test.py slices use bytecode SLICE +3 is implemented, and the test2.py single subscript value is mainly implemented through the bytecode BINARY_SUBSCR. As we guessed, similar syntax is completely different code. Because what we want to discuss is slicing (SLICE+ 3), so we will not expand BINARY_SUBSCR anymore. Interested children can check the relevant source code to learn about the specific implementation. Location: python/object/ceval.c Then let’s discuss SLICE+3
/*取自: python2.7 python/ceval.c */ // 第一步: PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) { .... // 省略n行代码 TARGET_WITH_IMPL_NOARG(SLICE, _slice) TARGET_WITH_IMPL_NOARG(SLICE_1, _slice) TARGET_WITH_IMPL_NOARG(SLICE_2, _slice) TARGET_WITH_IMPL_NOARG(SLICE_3, _slice) _slice: { if ((opcode-SLICE) & 2) w = POP(); else w = NULL; if ((opcode-SLICE) & 1) v = POP(); else v = NULL; u = TOP(); x = apply_slice(u, v, w); // 取出v: ilow, w: ihigh, 然后调用apply_slice Py_DECREF(u); Py_XDECREF(v); Py_XDECREF(w); SET_TOP(x); if (x != NULL) DISPATCH(); break; } .... // 省略n行代码 } // 第二步: apply_slice(PyObject *u, PyObject *v, PyObject *w) /* return u[v:w] */ { PyTypeObject *tp = u->ob_type; PySequenceMethods *sq = tp->tp_as_sequence; if (sq && sq->sq_slice && ISINDEX(v) && ISINDEX(w)) { // v,w的类型检查,要整型/长整型对象 Py_ssize_t ilow = 0, ihigh = PY_SSIZE_T_MAX; if (!_PyEval_SliceIndex(v, &ilow)) // 将v对象再做检查, 并将其值转换出来,存给ilow return NULL; if (!_PyEval_SliceIndex(w, &ihigh)) // 同上 return NULL; return PySequence_GetSlice(u, ilow, ihigh); // 获取u对象对应的切片函数 } else { PyObject *slice = PySlice_New(v, w, NULL); if (slice != NULL) { PyObject *res = PyObject_GetItem(u, slice); Py_DECREF(slice); return res; } else return NULL; } // 第三步: PySequence_GetSlice(PyObject *s, Py_ssize_t i1, Py_ssize_t i2) { PySequenceMethods *m; PyMappingMethods *mp; if (!s) return null_error(); m = s->ob_type->tp_as_sequence; if (m && m->sq_slice) { if (i1 < 0 || i2 < 0) { if (m->sq_length) { // 先做个简单的初始化, 如果左右下表小于, 将其加上sequence长度使其归为0 Py_ssize_t l = (*m->sq_length)(s); if (l < 0) return NULL; if (i1 < 0) i1 += l; if (i2 < 0) i2 += l; } } // 真正调用对象的sq_slice函数, 来执行切片的操作 return m->sq_slice(s, i1, i2); } else if ((mp = s->ob_type->tp_as_mapping) && mp->mp_subscript) { PyObject *res; PyObject *slice = _PySlice_FromIndices(i1, i2); if (!slice) return NULL; res = mp->mp_subscript(s, slice); Py_DECREF(slice); return res; } return type_error("'%.200s' object is unsliceable", s);
, but this sq_slice is a bit special, because different objects have different corresponding functions. The following are the corresponding functions:
// 字符串对象 StringObject.c: (ssizessizeargfunc)string_slice, /*sq_slice*/ // 列表对象 ListObject.c: (ssizessizeargfunc)list_slice, /* sq_slice */ // 元组 TupleObject.c: (ssizessizeargfunc)tupleslice, /* sq_slice */
/* 取自ListObject.c */ static PyObject * list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh) { PyListObject *np; PyObject **src, **dest; Py_ssize_t i, len; if (ilow < 0) ilow = 0; else if (ilow > Py_SIZE(a)) // 如果ilow大于a长度, 那么重新赋值为a的长度 ilow = Py_SIZE(a); if (ihigh < ilow) ihigh = ilow; else if (ihigh > Py_SIZE(a)) // 如果ihigh大于a长度, 那么重新赋值为a的长度 ihigh = Py_SIZE(a); len = ihigh - ilow; np = (PyListObject *) PyList_New(len); // 创建一个ihigh - ilow的新列表对象 if (np == NULL) return NULL; src = a->ob_item + ilow; dest = np->ob_item; for (i = 0; i < len; i++) { // 将a处于该范围内的成员, 添加到新列表对象 PyObject *v = src[i]; Py_INCREF(v); dest[i] = v; } return (PyObject *)np; }
in conclusion As can be seen from the slicing function corresponding to the sq_slice function above, if when using slicing, the left and right subscripts are greater than the length of the sequence, they will be reassigned to the length of the sequence, so our initial slicing: print a[10:20]
, what actually runs is: print a4:4
. Through this analysis, in the future when you encounter a slice with a subscript greater than the object length, you should I won’t be confused anymore~
The above is the detailed content of About the subscript problem of Sequence slicing and its solution. For more information, please follow other related articles on the PHP Chinese website!