There is a module collections in Python, which is explained as a data type container module. There is a collections.defaultdict() that is often used. Mainly talk about this thing.
Overview:
The defaultdict(function_factory) here constructs a dictionary-like object, in which the values of keys are determined and assigned by themselves, but the type of values is a class instance of function_factory and has a default value. For example, default(int) creates a dictionary-like object, and any values in it are instances of int, and even if it is a non-existent key, d[key] also has a default value. This default value is the default value of int(). 0.
defaultdict
dict subclass that calls a factory function to supply missing values.
This is a short explanation
defaultdict belongs to a subclass of the built-in function dict and calls the factory function to provide the missing value.
Confused, what is a factory function:
Explanation from python core programming
Python 2.2 unifies types and classes, and all built-in types are now classes. On this basis, the original
so-called Built-in conversion functions like int(), type(), list(), etc. are now factory functions. That is to say, although they look a bit like functions, they are actually classes. When you call them, you actually generate an instance of that type, just like a factory producing goods.
The following familiar factory functions are called built-in functions in older Python versions:
int(), long(), float(), complex()
str(), unicode(), basestring ()
list(), tuple()
type()
Other types that did not have factory functions before now also have factory functions. In addition, corresponding factory functions have also been added for new data types that support new style classes
. These factory functions are listed below:
dict()
bool()
set(), frozenset()
object()
classmethod()
staticmethod()
super()
property ()
file()
Let’s look at its use again:
import collections s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] d = collections.defaultdict(list) for k, v in s: d[k].append(v) list(d.items())
It’s starting to make sense here. It turns out that defaultdict can accept a built-in function list as a parameter. In fact, list() itself is a built-in function, but after the update, everything in python is an object, so list is adapted into a class, and an instance of the class is generated when list is introduced.
Still don’t quite understand, let’s look at the help explanation of defaultdict again
class collections.defaultdict([default_factory[, ...]])
Returns a new dictionary-like object. defaultdict is a subclass of the built-in dict class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the dict class and is not documented here.
First of all, collections.defaultdict will return a dictionary-like object, note that it is similar Objects are not exactly the same objects. The defaultdict class is almost the same as the dict class, except that it overloads a method and adds a writable instance variable. (writable instance variables, I still don’t get it)
The first argument provides the initial value for the default_factory attribute; it defaults to None. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.
defaultdict objects support the following method in addition to the standard dict operations:
__missing__(key)
If the default_factory attribute is None, this raises a KeyError exception with the key as argument.
If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
Mainly focus on this, if default_factory is not None, this default_factory will be a parameter-less form is called, providing a default value to the key of the ___missing__ method. This default value will be inserted into the data dictionary as a key and then returned.
Very dizzy. There is a __missing__ method. This __missing__ method is the built-in method of collections.defaultdict().
If calling default_factory raises an exception this exception is propagated unchanged.
This method is called by the __getitem__() method of the dict class when the requested key is not found; whatever it returns or raises is then returned or raised by __getitem__ ().
Note that __missing__() is not called for any operations besides __getitem__(). This means that get() will, like normal dictionaries, return None as a default rather than using default_factory.
defaultdict objects support the following instance variable:
default_factory
This attribute is used by the __missing__() method; it is initialized from the first argument to the constructor, if present, or to None, if absent.
It seems that this document is difficult to understand . Look directly at the example:
import collections s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] # defaultdict d = collections.defaultdict(list) for k, v in s: d[k].append(v) # Use dict and setdefault g = {} for k, v in s: g.setdefault(k, []).append(v) # Use dict e = {} for k, v in s: e[k] = v ##list(d.items()) ##list(g.items()) ##list(e.items())
Look at the results
list(d.items()) [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] >>> list(g.items()) [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] >>> list(e.items()) [('blue', 4), ('red', 1), ('yellow', 3)] >>> d defaultdict(<class 'list'>, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}) >>> g {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]} >>> e {'blue': 4, 'red': 1, 'yellow': 3} >>> d.items() dict_items([('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]) >>> d["blue"] [2, 4] >>> d.keys() dict_keys(['blue', 'red', 'yellow']) >>> d.default_factory <class 'list'> >>> d.values() dict_values([[2, 4], [1], [1, 3]])
collections.defaultdict(list) is compared with using dict.setdefault() similar
python help上也这么说了
When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the default_factory function which returns an empty list. The list.append() operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the list.append() operation adds another value to the list. This technique is simpler and faster than an equivalent technique using dict.setdefault():
说这种方法会和dict.setdefault()等价,但是要更快。
有必要看看dict.setdefault()
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
如果这个key已经在dictionary里面存着,返回value.如果key不存在,插入key和一个default value,返回Default. 默认的defaults是None.
但是这里要注意的是defaultdict是和dict.setdefault等价,和下面那个直接赋值是有区别的。从结果里面就可以看到,直接赋值会覆盖。
从最后的d.values还有d[“blue”]来看,后面的使用其实是和dict的用法一样的,唯一不同的就是初始化的问题。defaultdict可以利用工厂函数,给初始keyi带来一个默认值。
这个默认值也许是空的list[] defaultdict(list), 也许是0, defaultdict(int).
再看看下面的这个例子。
defaultdict(int) 这里的d其实是生成了一个默认为0的带key的数据字典。你可以想象成 d[key] = int default (int工厂函数的默认值为0)
d[k]所以可以直接读取 d[“m”] += 1 就是d[“m”] 就是默认值 0+1 = 1
后面的道理就一样了。
>>> s = 'mississippi' >>> d = defaultdict(int) >>> for k in s: ... d[k] += 1 ... >>> list(d.items()) [('i', 4), ('p', 2), ('s', 4), ('m', 1)]