Initial review
Yesterday, I saw a very interesting method of deleting a large number of files in a directory. This method comes from Zhenyu Lee in http://www.quora.com/How-can-someone-rapidly-delete-400-000-files.
He did not use find or xargs. He creatively took advantage of the powerful function of rsync and used rsync –delete to replace the target folder with an empty folder. Afterwards, I did an experiment to compare the various methods. To my surprise, Lee's method was much faster than the others. Below is my review.
Environment:
CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
MEM: 4G
HD: ST3250318AS: 250G/7200RPM
Method
# Of Files
Deletion Time
rsync -a –delete empty/ s1/ 1000000 6m50.638s
find s2/ -type f -delete 1000000 87m38.826s
find s3/ -type f | xargs -L 100 rm 1000000 83m36.851s
find s4/ -type f | –exclude, you can selectively delete files that meet the conditions. Another point is that this method is perfect when you need to retain this directory for other uses.
Re-evaluation
A few days ago, Keith-Winstein replied to this post on Quora and said that my previous evaluation could not be copied because the operation lasted too long. Just to clarify, these numbers are too large, probably because my computer has been doing so much over the past few years, and there may have been some file system errors in the review. But I'm not sure those are the reasons. Well now, I spent a day with a newer computer and did the review again. This time I used /usr/bin/time, which provides more detailed information. Below are the new results.
(Every time there are 1,000,000 files. The volume of each file is 0.)
Command
Elapsed
System Time
%CPU
cs (Vol/Invol)
rsync -a –delete empty/ a 10.60 1.31 95 106/22
find b/ -type f -delete 28.51 14.46 52 14849/11
find c/ -type f | xargs -L 100 rm 41.69 20.60 54 37048/15074
find d/ -type f |
Original output
# method 1 ~/test $ /usr/bin/time -v rsync -a --delete empty/ a/ Command being timed: "rsync -a --delete empty/ a/" User time (seconds): 1.31 System time (seconds): 10.60 Percent of CPU this job got: 95% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.42 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 24378 Voluntary context switches: 106 Involuntary context switches: 22 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 # method 2 Command being timed: "find b/ -type f -delete" User time (seconds): 0.41 System time (seconds): 14.46 Percent of CPU this job got: 52% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.51 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 11749 Voluntary context switches: 14849 Involuntary context switches: 11 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 # method 3 find c/ -type f | xargs -L 100 rm ~/test $ /usr/bin/time -v ./delete.sh Command being timed: "./delete.sh" User time (seconds): 2.06 System time (seconds): 20.60 Percent of CPU this job got: 54% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.69 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1764225 Voluntary context switches: 37048 Involuntary context switches: 15074 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 # method 4 find d/ -type f | xargs -L 100 -P 100 rm ~/test $ /usr/bin/time -v ./delete.sh Command being timed: "./delete.sh" User time (seconds): 2.86 System time (seconds): 27.82 Percent of CPU this job got: 89% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.32 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1764278 Voluntary context switches: 929897 Involuntary context switches: 21720 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 # method 5 ~/test $ /usr/bin/time -v rm -rf f Command being timed: "rm -rf f" User time (seconds): 0.20 System time (seconds): 14.80 Percent of CPU this job got: 47% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.29 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 176 Voluntary context switches: 15134 Involuntary context switches: 11 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0