


A classic algorithm question caused by a Linux command commonly used in projects
小时候家里定了《读者》的月刊,里面记录一个故事:说有有个偏僻的乡村一日突然来了一个美女,她携着万贯家财子女在当地安家落户,成了当地的乡绅。她让她的子女世世代代的保守这个秘密,直到这个秘密不会再对家族带来灾难。她就是陈圆圆。当年吴三桂领清兵入关,冲冠一怒为红颜,改写了中国的历史,自己却能全身而退的那个人。
周五例行公事的查看一下离线数据推送项目的数据和log。将log用awk分段之后,我想知道实时数据前10个被重复发送的数据ID都被重复发送了几次,从而找到进一步优化的入手点,天知道我对这个项目已经进行了多少次优化了。于是linux命令就是
cat transmission.log |grep 'IncrementAlbumService.java:146'|awk '{print $6}'|awk -F ',' '{print $1}'| sort |uniq -c| sort -nr |head
I feel very guilty for the results I got
(Data security, the ID rule part of our project is not displayed)
Although this is related to their operation, I originally It is time to send the data when it detects the change, but for such a large resend rate. Regardless of the interface of the update service or the offline service, there are still points that can be optimized. Girl, my thinking has always been different from that of those male idols who use hair dryers and artificial watering cans to create a picture when they appear. In addition to this result, I am also thinking about another classic algorithm problem: there is a text file with about 10,000 lines, one word in each line, and it is required to count the top ten most frequently occurring words.
For this algorithm problem, the above Linux command is sort|uniq -c |sort -nr | head. The time complexity is the largest of the following:
1> First do a sort,
Direct insertion sort: continuously insert elements into the ordered list, the worst time is complex The degree is O(n2)
Shell sorting: insertion sorting with reduced increment, unstable, depends on the selection of the increment factor sequence, the worst time complexity is O(n 2)
Simple selection sorting: Select the smallest or largest number among the numbers to be sorted and exchange it with the first unsorted position. The worst time complexity is O(n2 )
Binary selection sort: Each simple selection sort determines two elements, which can reduce the cycle by half.
Heap sort: tree selection sort, large root heap, small root heap. The worst time complexity is O(N*logN)
Bubble sorting: Each time two adjacent numbers are compared and exchanged, the worst time complexity is O(n2 )
Quick sorting: select the base element and divide the elements to be sorted each time. The worst time complexity is O(n2)
Merge sorting: divide the two elements An ordered list is synthesized into a new ordered list. The worst-case time complexity is O(N*logN)
. Bucket sorting: an algorithm that trades space for time. The complexity is close to O(n)
Radix sorting: Allocate and collect according to hundreds of thousands of digits, the time complexity is O(dn)
2>uniq time complexity is O(n)
3> ;Sort time service degree is the same as 1>
4>The time complexity after sorting is O(1)
The algorithm used is also related to the size of the file. If the file is too large , if there is too much data, the files need to be split, sorted separately and then merged in multiple ways. So the number of words is mentioned here.
Without linux commands, the classic solution is to first use a dictionary tree to count word frequencies, and then use a big root heap. Let’s first introduce the dictionary tree, also called the tire tree. Because search engines often use this to make text word frequency statistics, and word segmentation algorithms also use this as a basic data structure, so I know a little bit about it. Its advantages are: minimizing unnecessary string comparisons, and query efficiency is higher than hash tables. The core idea is to exchange space for time and use public prefixes to reduce query time overhead. So when talking about statistics, the first thing that comes to mind is the dictionary tree. If you maintain an array of the top ten maximum word frequencies when counting word frequencies, the time complexity will be 10 times higher when compared in loop processing. Therefore, it is more appropriate to make statistics first and then select the top 10 in terms of time efficiency.
Actually, I don’t know much about algorithms, I just know how to use them. A former colleague of mine read an article I wrote on WeChat and asked me: "Is feed streaming a very technical job?" His question reminded me of Li Xiaoyao in "Sword of Immortals" who insisted on pretending to be tall, rich and handsome in the restaurant. When he said he wanted to order the most expensive dish: "Fried beef with vegetables", everyone laughed. Ling'er asked Li Xiaoyao: "Brother Xiaoyao, is fried beef with vegetables a very expensive dish?". Although my colleague was sincerely asking my opinion because he was in JD.com and was considering whether to go to Momo, I felt like the Li Xiaoyao who had never seen the world. The business logic of feed flow can be done in any way. Whether it has technical content depends on how it is done. I have written a patent to introduce a method of assembling feed streams. The process has not been completed, so I will not disclose the calculation method until then. But if you think hard, there are still many optimization points. The year before last, when I liked playing Moments, I would often find that the Moments I had deleted reappeared, or all the recent data in my or other people's Moments suddenly disappeared, leaving only very old data, such as two months ago a year ago. Data from a year ago will be automatically restored after one day. It's all a matter of strategy. There are many problems with WeChat Moments. Since one of our products, mm, is a family member of the WeChat architect, I won’t complain too much.
Although today is Sunday, you can use your imagination a little bit, but you also have to have a theme. The previous example has a classic top K problem. Because search engines often need to count the most popular query strings, the top K question is the basis. TopK problems use small root heaps. Maintain a small root heap of K size, traverse the elements to be compared, and compare them with the following elements respectively. If it is smaller than the root element, it means that it will definitely not enter the top K and will be eliminated. If it is greater than the root element, eliminate the root element. Then adjust the tree to the minimum heap and continue the comparison.
The minimum heap is a complete binary tree, and the value of each non-leaf node is not greater than the value of its child node. If this rule is broken, adjustments must be made from the first non-leaf node to the root node in a bottom-up order.
I decided to interview on Hulu next week, but I haven’t done so yet, so I probably won’t. Two years ago, my former colleague recommended Amazon, but I was not asked to go for an interview. To comfort myself, I guess they were not hiring at that time. I have never been to an interview with a foreign company like this before, so I don’t know what the routine is. If we start preparing now, we will probably be able to pass it after the National Day. I think it would be very disadvantageous for me to go to the interview by myself. It’s not going to be bad at all, it’s going to be very unstable. Friends who have read my articles may think that my articles are very messy and complicated. This is indeed the case for me in life. I have a wide range of knowledge, I am very whimsical, and I have no scruples. On the one hand, this lays the foundation for my creativity, but on the other hand, it is not conducive to my ability to express myself on the spot. The brain is like a computer. I have many parallel programs, the memory is not large enough, and there is a lot of data. Memory paging causes constant disk swapping. Time-sensitive actions like interviews can easily lead to timeout returns. I have so many technical invention patents, and now I think, I can’t even remember what I invented. I just took the bus. Because there were very few people, the driver asked me where to get off. He meant that it would stop wherever no one got off. It took me a long time to remember. My brain runs more in asynchronous non-blocking mode. In fact, synchronous blocking would be better for things like interviews. However, there is a solution to everything. If you can't find a solution, you just don't have enough ability. There's nothing wrong with that. However, the interview is to examine comprehensive abilities, such as teamwork, conversational skills, etc. I believe that no one in our department will have any objection to the sentence "Xiaojing is very smart." I also believe that colleagues with whom I collaborate in the department or work will not think that I am a difficult person to communicate with or get along with. But I tend to forget how to speak during interviews. But if I fail an interview because of this problem, I have no complaints. Because the interviewer is your future colleague and leader, if you are not in tune with the interviewer, you may not be able to use your abilities in the future. If you don’t do well in interviews and still feel that your abilities are sufficient, it’s likely that you are not high-level enough and have never seen what truly outstanding people look like. However, I am the kind of person who continues to do things even when I am determined to hit a wall. If I decide to give up on something, it’s because it’s not worth doing.
I like working. My goal is to still have a creative job when I am 60 years old. So I am afraid that domestic Internet companies will let me retire at the age of 40. There is one more important thing: I want to make my own search engine middleware. Domestic Internet companies mainly use it, so I am afraid it will be difficult for me to have the energy to do this. Of course, if you can’t go to Hulu, the search engine still has to do it. It’s just a matter of how to allocate your time.
I actually like to hit the wall, probably because I don’t want to grow up so quickly. If you act mature and elegant every day, you need to hide some things that you are not good at or things that may go wrong. As a result, I will be happy every day, but I may stay like this for the rest of my life. There are many famous figures in history who were originally playboys, but later became great men after their family fell into decline. In the book, there are two types of turning points in life: meeting noble people and encountering setbacks. When you are young and open-minded, you can have an epiphany when you meet a noble person and open your mind. As experience increases, people will be more selective in receiving the information around them. At this time, they may need to encounter great setbacks before they can rethink life. If I can see a better future, I am willing to go it alone and burn the boat. It’s better to have ups and downs than one day at a time. If you want to live, live a wonderful life~~
The above is the detailed content of A classic algorithm question caused by a Linux command commonly used in projects. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The key differences between CentOS and Ubuntu are: origin (CentOS originates from Red Hat, for enterprises; Ubuntu originates from Debian, for individuals), package management (CentOS uses yum, focusing on stability; Ubuntu uses apt, for high update frequency), support cycle (CentOS provides 10 years of support, Ubuntu provides 5 years of LTS support), community support (CentOS focuses on stability, Ubuntu provides a wide range of tutorials and documents), uses (CentOS is biased towards servers, Ubuntu is suitable for servers and desktops), other differences include installation simplicity (CentOS is thin)

CentOS installation steps: Download the ISO image and burn bootable media; boot and select the installation source; select the language and keyboard layout; configure the network; partition the hard disk; set the system clock; create the root user; select the software package; start the installation; restart and boot from the hard disk after the installation is completed.

Backup and Recovery Policy of GitLab under CentOS System In order to ensure data security and recoverability, GitLab on CentOS provides a variety of backup methods. This article will introduce several common backup methods, configuration parameters and recovery processes in detail to help you establish a complete GitLab backup and recovery strategy. 1. Manual backup Use the gitlab-rakegitlab:backup:create command to execute manual backup. This command backs up key information such as GitLab repository, database, users, user groups, keys, and permissions. The default backup file is stored in the /var/opt/gitlab/backups directory. You can modify /etc/gitlab

CentOS will be shut down in 2024 because its upstream distribution, RHEL 8, has been shut down. This shutdown will affect the CentOS 8 system, preventing it from continuing to receive updates. Users should plan for migration, and recommended options include CentOS Stream, AlmaLinux, and Rocky Linux to keep the system safe and stable.

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

CentOS hard disk mount is divided into the following steps: determine the hard disk device name (/dev/sdX); create a mount point (it is recommended to use /mnt/newdisk); execute the mount command (mount /dev/sdX1 /mnt/newdisk); edit the /etc/fstab file to add a permanent mount configuration; use the umount command to uninstall the device to ensure that no process uses the device.

How to use Docker Desktop? Docker Desktop is a tool for running Docker containers on local machines. The steps to use include: 1. Install Docker Desktop; 2. Start Docker Desktop; 3. Create Docker image (using Dockerfile); 4. Build Docker image (using docker build); 5. Run Docker container (using docker run).

After CentOS is stopped, users can take the following measures to deal with it: Select a compatible distribution: such as AlmaLinux, Rocky Linux, and CentOS Stream. Migrate to commercial distributions: such as Red Hat Enterprise Linux, Oracle Linux. Upgrade to CentOS 9 Stream: Rolling distribution, providing the latest technology. Select other Linux distributions: such as Ubuntu, Debian. Evaluate other options such as containers, virtual machines, or cloud platforms.
