Python method to traverse all files in a directory-Python Tutorial-php.cn

os.walk generator
os.walk(PATH), PATH is a folder path, of course you can use it. Or.../this way.
What is returned is a list of triplet elements, each element represents the content of a folder. The first one is the content of the current folder.
The returned triple represents (the working folder, the list of folders under this folder, the list of files under this folder).
So,
Get all subfolders, that is (d represents this triplet):

os.path.join(d[0],d[1]);

Copy after login

Get all sub-files, that is:

 os.path.join(d[0],d[2]);

Copy after login

The following example uses two sets of loops. After traversing, a list of all file names is obtained and then all files are looped:

result = [os.path.join(dp, f) for dp, dn, fs in os.walk("_pages") for f in fs if os.path.splitext(f)[1] == '.html']
for fname in result:
 #do something

Copy after login

actually equals

result=[]
for dp, dn, fs in os.walk("_pages"):
 for f in fs:
 if (os.path.splitext(f)[1] == '.html'):
  result.append(os.path.join(dp, f))
for fname in result:
 #do something

Copy after login

Finally determine whether the html suffix is used to obtain the file name. You can also use glob:

result = [y for x in os.walk(PATH) for y in glob.glob(os.path.join(x[0], '*.txt'))]

Copy after login

You can also use iterator methods:

from itertools import chain
import glob
result = (chain.from_iterable(glob.iglob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))

Copy after login

Advanced
The standard file number traversal generator os.walk is both powerful and flexible. However, os.walk still lacks some detailed processing capabilities required by applications, such as selecting files according to a certain pattern and performing operations on all files (or directories). Sorting, or only traversing the current directory without entering its subdirectories, so the interface needs to be encapsulated.

import os, fnmatch 
 
def filter_files(dirname, patterns='*', single_level=False, yield_folders=False): 
  patterns = patterns.split(';') 
  allfiles = [] 
  for rootdir, subdirname, files in os.walk(dirname): 
    print subdirname 
    allfiles.extend(files) 
    if yield_folders: 
      allfiles.extend(dubdirname) 
    if single_level: 
      break 
  allfiles.sort() 
  for eachpattern in patterns: 
    for eachfile in fnmatch.filter(allfiles, eachpattern): 
        print os.path.normpath(eachfile)

Copy after login

Description:
1.The difference between extend and append
Lists are implemented as classes. "Creating" a list actually instantiates a class. Therefore, lists can be manipulated in multiple ways. Lists can contain elements of any data type, and the elements in a single list do not need to be all of the same type. The append() method adds a new element to the end of the list. Accepting only one parameter, the extend() method only accepts a list as a parameter and adds each element of the parameter to the original list.
2. fnmatch module
The fnmatch module uses patterns to match file names. The pattern syntax is the same as that used in Unix shells. An asterisk (*) matches zero or more characters, and a question mark (?) matches a single character. You can also use square brackets to specify a character range, for example [0-9] represents a number, and all other characters match themselves.
1) fnmatch.fnmatch(name, pattern) method: tests whether name matches pattern and returns true/false
2) fnmatch.filter(names, pat) implements filtering or filtering of special characters in the list and returns a list of characters that match the matching pattern. Of course, names represents the list

Related labels：

python document Traverse

Previous article：Sharing code examples of Python simulating logging into Taobao and counting Taobao consumption Next article：Simply master the usage of the counter structure in Python's Collections module

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn