Suppose that such data is obtained through post-analysis of massive raw data:
[(id,node,val)(id,node,val)...]
is a tuple of user id, server, and value in order, and then separate according to the server, and then rely on the val size Sort and then write to excel.
Or generate [{"id":xxx,"node":xxx,"val":xxx},{"id":xxx,"node":xxx,"val":xxx}...]
If there is only one set of kv, it can be sorted by sorted, but the name of the node is unknown now, and these server names may change every day. After I obtain such data, how do I separate and sort the data according to the server name?
The main problem here is that the name of the node itself is not fixed. For example, you first create n lists and put the data of the same node into them, but you don’t know how many lists to create. And when writing the processed data to excel later, a loop will inevitably be used.
This is a loop within a loop, and the name of the new data group is not determined either after the data is classified or after it is arranged. Even using the exec command cannot meet the needs
In addition, you can actually write all the data into a csv file by id, node, val
Writing a shell script through Linux's awk, uniq, sort and other command tools is also very fast
Also, it is not clear how big your massive data is and what order of magnitude it is. If the amount of data is really large, it is possible that the memory of the above python code is not enough. You need to estimate this by yourself
If I understand your needs correctly, you can use a dictionary. The key of the dictionary is the name of the node, and the value of the dictionary is a list composed of items:
Then take out the value of each item in the dictionary (that is, the data list) according to the key (server name), and sort it by adding lambda to sort it according to a certain value in each item.