python - How to sort a sequence of data using a certain data in a tuple or a certain set of keys in a dictionary?

Question

Suppose that after analyzing a large amount of raw data, we obtain such data: [(id,node,val)(id,node,val)...] is a tuple of user id, server, and value. Then we need to Separate according to the server, sort based on the val size, and then write it to excel. Or produce [{...

过去多啦不再A梦 · Answer

from collections import defaultdict

d = defaultdict(list)
data = [(id,node,val),(id,node,val)...]

# 按node进行分组
for x in data:
    d[x[1]].append(x)
    
# 将分组数据依次写入excel
for _, v in d.iteritems():
    # 排序
    tmp = sorted(v, key=lambda x: x["val"], reverse=True/False)
    # 写入excel
    write_to_excel(tmp)

In addition, you can actually write all the data into a csv file by id, node, val
Writing a shell script through Linux's awk, uniq, sort and other command tools is also very fast

Also, it is not clear how big your massive data is and what order of magnitude it is. If the amount of data is really large, it is possible that the memory of the above python code is not enough. You need to estimate this by yourself

我想大声告诉你 · Answer

If I understand your needs correctly, you can use a dictionary. The key of the dictionary is the name of the node, and the value of the dictionary is a list composed of items:

data = [{"id":xxx,"node":xxx,"val":xxx},{"id":xxx,"node":xxx,"val":xxx}...]

result = {}
for data_item in data:
    node_name = data_item["node"]
    if node_name in result.keys():
        result[node_name].append(data_item)
    else:
        result[node_name] = [data_item]

Then take out the value of each item in the dictionary (that is, the data list) according to the key (server name), and sort it by adding lambda to sort it according to a certain value in each item.