路径/home/ghost/workspace/Other/结构如下
├── git ├── input │ ├── csv │ │ ├── test_file_1.csv │ │ └── test_file_2.csv │ ├── test.csv │ ├── test_file_1.txt │ └── test_file_2.txt ├── input-archive └── temp现用python进行文件遍历与过滤
结果如下:
['input', 'temp', 'input-archive', 'git']如下代码中root是基准文件夹,dirs是基准文件夹下的文件夹,files为基准文件夹下的文件
exclude = ['git','temp'] # 遍历时希望排除的文件夹 for root, dirs, files in os.walk(work_folder): for ex in exclude: if ex in dirs: dirs.remove(ex) # 移除 dirs 中不想继续遍历的文件夹 print(root,dirs,files)结果如下:
/home/ghost/workspace/Other ['input', 'input-archive', 'git'] [] /home/ghost/workspace/Other/input ['csv'] ['test_file_1.txt', 'test_file_2.txt', 'test.csv'] /home/ghost/workspace/Other/input/csv [] ['test_file_2.csv', 'test_file_1.csv'] /home/ghost/workspace/Other/input-archive [] []找出目录下所有csv文件(含子目录),这里用到glob模块,recursive为True配合 ** 符号代表递归向下搜索。
import glob pat = '/home/ghost/workspace/Other/input/**/*.csv' for csv in glob.glob(pat,recursive = True): print(csv)结果如下:
/home/ghost/workspace/Other/input/test.csv /home/ghost/workspace/Other/input/csv/test_file_2.csv /home/ghost/workspace/Other/input/csv/test_file_1.csv