百度360必应搜狗淘宝本站头条
当前位置:网站首页 > IT技术 > 正文

办公必备工具(办公实用工具)

wptr33 2025-06-15 19:46 5 浏览

我们在使用电脑时,会经常保存文件,肯定会出现重复保存的情况,重复的文件夹或者重复的文件名,天长日久,给管理文件带来麻烦,我们可以编写一个工具,查找相同的文件或者对比文件夹,删除多余重复的文件。

  1. 精准查重内容对比:通过计算文件MD5/SHA-1哈希值识别内容相同的文件(即使文件名不同)。快速扫描:支持递归扫描指定文件夹,自动过滤空文件。多类型支持:覆盖文档、图片、音频、压缩包等常见格式。
  2. 智能对比引擎文件夹差异分析:高亮显示两个文件夹内新增/修改/删除的文件(参考BCompare算法)1。相似图片识别:通过图像特征匹配技术,检测分辨率不同、裁剪或加水印的相似图片3。
  3. 安全清理机制可视化预览:并列展示重复文件组,支持按大小/修改时间排序。一键去重:保留最新版本或用户指定文件,自动移除非必要副本。回收站保护:删除前自动备份至回收站,避免误操作。
  4. 技术实现:
    import os

import hashlib

import tkinter as tk

from tkinter import filedialog, messagebox, ttk

from collections import defaultdict

class ScrolledFrame(tk.Frame):

def __init__(self, parent, *args, **kw):

tk.Frame.__init__(self, parent, *args, **kw)

vscrollbar = tk.Scrollbar(self, orient=tk.VERTICAL)

vscrollbar.pack(fill=tk.Y, side=tk.RIGHT, expand=tk.FALSE)

canvas = tk.Canvas(self, bd=0, highlightthickness=0,

yscrollcommand=vscrollbar.set)

canvas.pack(side=tk.LEFT, fill=tk.BOTH, expand=tk.TRUE)

vscrollbar.config(command=canvas.yview)

self.interior = interior = tk.Frame(canvas)

interior_id = canvas.create_window(0, 0, window=interior,

anchor=tk.NW)

def _configure_interior(event):

size = (interior.winfo_reqwidth(), interior.winfo_reqheight())

canvas.config(scrollregion="0 0 %s %s" % size)

interior.bind('<Configure>', _configure_interior)

# Enable mouse wheel scrolling

def _on_mousewheel(event):

canvas.yview_scroll(int(-1 * (event.delta / 120)), "units")

canvas.bind_all("<MouseWheel>", _on_mousewheel)

# Set a fixed width for the scrolled frame to prevent window from expanding too much

self.fixed_width = 600

self.config(width=self.fixed_width)

self.update_idletasks()

# Ensure the canvas width matches the fixed width of the frame

def _configure_canvas(event):

if interior.winfo_reqwidth() != self.fixed_width:

canvas.itemconfigure(interior_id, width=self.fixed_width)

canvas.bind('<Configure>', _configure_canvas)

class DuplicateFileFinder:

def __init__(self, root):

self.root = root

self.root.title("重复文件查找与删除工具")

self.files_dict = defaultdict(list)

self.total_files = 0

self.processed_files = 0

self.compare_total_files = 0

self.compare_processed_files = 0

self.check_vars = {}

self.single_folder_mode = True

self.folder1 = None

self.folder2 = None

self.create_widgets()

def create_widgets(self):

# Create a notebook (tabbed interface)

notebook = ttk.Notebook(self.root)

notebook.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)

# Tab 1: Single Folder Mode

tab1 = tk.Frame(notebook)

notebook.add(tab1, text="单文件夹模式")

# Tab 2: Folder Comparison Mode

tab2 = tk.Frame(notebook)

notebook.add(tab2, text="文件夹对比模式")

# Create widgets for Tab 1

self.create_single_folder_widgets(tab1)

# Create widgets for Tab 2

self.create_folder_comparison_widgets(tab2)

def create_single_folder_widgets(self, parent):

padx = 10

pady = 5

main_frame = tk.Frame(parent)

main_frame.pack(fill=tk.BOTH, expand=True)

top_frame = tk.Frame(main_frame)

top_frame.grid(row=0, column=0, sticky='ew', padx=padx, pady=pady)

self.path_label = tk.Label(top_frame, text="选择路径:")

self.path_label.grid(row=0, column=0, sticky='w')

self.path_entry = tk.Entry(top_frame, width=50)

self.path_entry.grid(row=0, column=1, padx=(0, padx), pady=pady)

button_frame = tk.Frame(top_frame)

button_frame.grid(row=1, column=0, columnspan=2, pady=pady)

self.browse_button = tk.Button(button_frame, text="浏览", command=self.browse_directory)

self.browse_button.grid(row=0, column=0, padx=padx, pady=pady)

self.find_button = tk.Button(button_frame, text="查找重复文件", command=
self.start_find_duplicates)

self.find_button.grid(row=0, column=1, padx=padx, pady=pady)

self.progress_bar = ttk.Progressbar(main_frame, orient='horizontal', length=300, mode='determinate')

self.progress_bar.grid(row=1, column=0, sticky='ew', padx=padx, pady=pady)

self.results_frame = ScrolledFrame(main_frame)

self.results_frame.grid(row=2, column=0, sticky='nsew', padx=padx, pady=pady)

main_frame.grid_columnconfigure(0, weight=1)

main_frame.grid_rowconfigure(2, weight=1)

bottom_frame = tk.Frame(main_frame)

bottom_frame.grid(row=3, column=0, sticky='ew', padx=padx, pady=pady)

self.delete_button = tk.Button(bottom_frame, text="删除选定的文件", command=
self.delete_selected_files)

self.delete_button.pack(side=tk.LEFT, padx=padx, pady=pady, expand=True)

self.save_button = tk.Button(bottom_frame, text="保存无重复文件", command=self.save_unique_files)

self.save_button.pack(side=tk.LEFT, padx=padx, pady=pady, expand=True)

self.export_button = tk.Button(bottom_frame, text="导出重复文件列表", command=
self.export_duplicate_files)

self.export_button.pack(side=tk.LEFT, padx=padx, pady=pady, expand=True)

self.quit_button = tk.Button(bottom_frame, text="退出", command=self.root.quit)

self.quit_button.pack(side=tk.RIGHT, padx=padx, pady=pady, expand=True)

def create_folder_comparison_widgets(self, parent):

padx = 10

pady = 5

main_frame = tk.Frame(parent)

main_frame.pack(fill=tk.BOTH, expand=True)

# Folder selection frame

folder_frame = tk.Frame(main_frame)

folder_frame.grid(row=0, column=0, sticky='ew', padx=padx, pady=pady)

# Folder 1

folder1_frame = tk.Frame(folder_frame)

folder1_frame.grid(row=0, column=0, sticky='ew', padx=padx, pady=pady)

self.folder1_label = tk.Label(folder1_frame, text="选择第一个文件夹:")

self.folder1_label.grid(row=0, column=0, sticky='w')

self.folder1_entry = tk.Entry(folder1_frame, width=50)

self.folder1_entry.grid(row=0, column=1, padx=(0, padx), pady=pady)


self.folder1_browse_button = tk.Button(folder1_frame, text="浏览", command=lambda: self.browse_directory(1))

self.folder1_browse_button.grid(row=0, column=2, padx=padx, pady=pady)

# Folder 2

folder2_frame = tk.Frame(folder_frame)

folder2_frame.grid(row=1, column=0, sticky='ew', padx=padx, pady=pady)

self.folder2_label = tk.Label(folder2_frame, text="选择第二个文件夹:")

self.folder2_label.grid(row=0, column=0, sticky='w')

self.folder2_entry = tk.Entry(folder2_frame, width=50)

self.folder2_entry.grid(row=0, column=1, padx=(0, padx), pady=pady)


self.folder2_browse_button = tk.Button(folder2_frame, text="浏览", command=lambda: self.browse_directory(2))

self.folder2_browse_button.grid(row=0, column=2, padx=padx, pady=pady)

# Compare button and progress bar

compare_and_progress_frame = tk.Frame(folder_frame)

compare_and_progress_frame.grid(row=2, column=0, columnspan=3, pady=pady)

self.compare_progress_bar = ttk.Progressbar(compare_and_progress_frame, orient='horizontal', length=300, mode='determinate')

self.compare_progress_bar.pack(fill=tk.X, padx=padx, pady=(0, pady))

self.compare_button = tk.Button(
compare_and_progress_frame, text="比较文件夹", command=
self.start_compare_folders)

self.compare_button.pack(pady=(pady, 0))

# Results frame

self.compare_results_frame = ScrolledFrame(main_frame)

self.compare_results_frame.grid(row=1, column=0, sticky='nsew', padx=padx, pady=pady)

# Bottom buttons

bottom_frame = tk.Frame(main_frame)

bottom_frame.grid(row=2, column=0, sticky='ew', padx=padx, pady=pady)


self.compare_delete_button = tk.Button(bottom_frame, text="删除选定的文件", command=
self.delete_selected_files)

self.compare_delete_button.pack(side=tk.LEFT, padx=padx, pady=pady, expand=True)

self.compare_save_button = tk.Button(bottom_frame, text="保存无重复文件", command=self.save_unique_files)

self.compare_save_button.pack(side=tk.LEFT, padx=padx, pady=pady, expand=True)


self.compare_export_button = tk.Button(bottom_frame, text="导出重复文件列表", command=
self.export_duplicate_files)

self.compare_export_button.pack(side=tk.LEFT, padx=padx, pady=pady, expand=True)

self.compare_quit_button = tk.Button(bottom_frame, text="退出", command=self.root.quit)

self.compare_quit_button.pack(side=tk.RIGHT, padx=padx, pady=pady, expand=True)

main_frame.grid_columnconfigure(0, weight=1)

main_frame.grid_rowconfigure(1, weight=1)

def browse_directory(self, folder_num=None):

directory = filedialog.askdirectory()

if directory:

if folder_num == 1:

self.folder1 = directory

self.folder1_entry.delete(0, tk.END)

self.folder1_entry.insert(0, directory)

elif folder_num == 2:

self.folder2 = directory

self.folder2_entry.delete(0, tk.END)

self.folder2_entry.insert(0, directory)

else:

self.path_entry.delete(0, tk.END)

self.path_entry.insert(0, directory)

def hashfile(self, path, blocksize=65536):

with open(path, 'rb') as afile:

hasher = hashlib.md5()

buf = afile.read(blocksize)

while len(buf) > 0:

hasher.update(buf)

buf = afile.read(blocksize)

return hasher.hexdigest()

def start_find_duplicates(self):

self.single_folder_mode = True

self.files_dict.clear()

start_path = self.path_entry.get()

if not os.path.isdir(start_path):

messagebox.showerror("错误", "无效的目录")

return

self.total_files = sum([len(files) for _, _, files in os.walk(start_path)])

self.processed_files = 0

self.progress_bar['maximum'] = self.total_files

self.progress_bar['value'] = 0

self.find_button.config(state=tk.DISABLED)

self.find_duplicates(start_path)

self.display_results()

self.find_button.config(state=tk.NORMAL)

def start_compare_folders(self):

self.single_folder_mode = False

self.files_dict.clear()

self.compare_processed_files = 0

if not self.folder1 or not self.folder2:

messagebox.showerror("错误", "请选择两个文件夹")

return

if not os.path.isdir(self.folder1) or not os.path.isdir(self.folder2):

messagebox.showerror("错误", "无效的文件夹")

return

# Calculate total files for progress bar

self.compare_total_files = sum([len(files) for _, _, files in os.walk(self.folder1)])

self.compare_total_files += sum([len(files) for _, _, files in os.walk(self.folder2)])

self.compare_processed_files = 0

self.compare_progress_bar['maximum'] = self.compare_total_files

self.compare_progress_bar['value'] = 0

self.compare_button.config(state=tk.DISABLED)

# Find duplicates in first folder

files_in_folder1 = self.find_files_in_folder(self.folder1)

# Find duplicates in second folder

files_in_folder2 = self.find_files_in_folder(self.folder2)

# Compare the two folders

self.compare_folders(files_in_folder1, files_in_folder2)

self.display_comparison_results()

self.compare_button.config(state=tk.NORMAL)

def find_files_in_folder(self, folder_path):

files_in_folder = defaultdict(list)

for dirpath, _, filenames in os.walk(folder_path):

for filename in filenames:

file_path = os.path.join(dirpath, filename)

try:

file_hash = self.hashfile(file_path)

files_in_folder[file_hash].append(file_path)

self.compare_processed_files += 1

self.compare_progress_bar['value'] = self.compare_processed_files

self.root.update_idletasks()

except Exception as e:

print(f"无法处理文件 {file_path}: {e}")

return files_in_folder

def compare_folders(self, files1, files2):

# Find files that are in both folders

common_hashes = set(files1.keys()) & set(files2.keys())

# For each common hash, collect all files from both folders

for hash_val in common_hashes:

# Combine all files from both folders

all_files = files1[hash_val] + files2[hash_val]

# Group files by filename

filename_groups = defaultdict(list)

for file_path in all_files:

filename = os.path.basename(file_path)

filename_groups[filename].append(file_path)

# Add groups where there are multiple files with the same name or content

for filename, paths in filename_groups.items():

if len(paths) > 1:

self.files_dict[hash_val].extend(paths)

def find_duplicates(self, start_path):

name_to_paths_and_hashes = defaultdict(lambda: defaultdict(list))

for dirpath, _, filenames in os.walk(start_path):

for filename in filenames:

file_path = os.path.join(dirpath, filename)

try:

file_hash = self.hashfile(file_path)

name_to_paths_and_hashes[filename][file_hash].append(file_path)

except Exception as e:

print(f"无法处理文件 {file_path}: {e}")

finally:

self.processed_files += 1

self.progress_bar['value'] = self.processed_files

self.root.update_idletasks()

# Add only those groups of files which are truly duplicates by name and content

for filename, hash_dict in name_to_paths_and_hashes.items():

for file_hash, paths in hash_dict.items():

if len(paths) > 1:

self.files_dict[file_hash] = paths

def display_results(self):

if self.single_folder_mode:

self.display_single_folder_results()

else:

self.display_comparison_results()

def display_single_folder_results(self):

for widget in self.results_frame.interior.winfo_children():

widget.destroy()

self.check_vars.clear()

if not self.files_dict:

no_result_label = tk.Label(
self.results_frame.interior, text="未找到重复文件。")

no_result_label.pack(anchor=tk.W, padx=10, pady=5)

return

for hash_val, files in self.files_dict.items():

if len(files) > 1:

group_label = tk.Label(
self.results_frame.interior, text=f"重复文件组(哈希值:{hash_val}):")

group_label.pack(anchor=tk.W, padx=10, pady=(5, 0))

for i, file in enumerate(files):

var = tk.BooleanVar()

chk = tk.Checkbutton(self.results_frame.interior, text=file, variable=var, anchor=tk.W)

chk.pack(anchor=tk.W, padx=20, pady=(0, 5))

if i == 0: # Do not select the first file

chk.config(state=tk.DISABLED)

var.set(False)

else: # Select all other files by default

var.set(True)

self.check_vars[file] = var

separator = tk.Label(self.results_frame.interior, text="-" * 80)

separator.pack(anchor=tk.W, padx=10, pady=(0, 5))

def display_comparison_results(self):

for widget in self.compare_results_frame.interior.winfo_children():

widget.destroy()

self.check_vars.clear()

if not self.files_dict:

no_result_label = tk.Label(
self.compare_results_frame.interior, text="未找到重复文件。")

no_result_label.pack(anchor=tk.W, padx=10, pady=5)

return

# Display same name duplicates

same_name_duplicates = defaultdict(list)

different_name_duplicates = defaultdict(list)

for hash_val, files in self.files_dict.items():

filenames = [os.path.basename(file) for file in files]

if len(set(filenames)) == 1:

same_name_duplicates[hash_val].extend(files)

else:

different_name_duplicates[hash_val].extend(files)

# Display same name duplicates

same_name_label = tk.Label(
self.compare_results_frame.interior, text="相同文件名的重复文件:")

same_name_label.pack(anchor=tk.W, padx=10, pady=(5, 0))

for hash_val, files in same_name_duplicates.items():

group_label = tk.Label(
self.compare_results_frame.interior, text=f"重复文件组(哈希值:{hash_val}):")

group_label.pack(anchor=tk.W, padx=10, pady=(5, 0))

for i, file in enumerate(files):

var = tk.BooleanVar()

chk = tk.Checkbutton(self.compare_results_frame.interior, text=file, variable=var, anchor=tk.W)

chk.pack(anchor=tk.W, padx=20, pady=(0, 5))

if i == 0: # Do not select the first file

chk.config(state=tk.DISABLED)

var.set(False)

else: # Select all other files by default

var.set(True)

self.check_vars[file] = var

# Display same content but different name duplicates

different_name_label = tk.Label(
self.compare_results_frame.interior, text="相同内容不同文件名的重复文件:")

different_name_label.pack(anchor=tk.W, padx=10, pady=(5, 0))

for hash_val, files in different_name_duplicates.items():

group_label = tk.Label(
self.compare_results_frame.interior, text=f"重复文件组(哈希值:{hash_val}):")

group_label.pack(anchor=tk.W, padx=10, pady=(5, 0))

for i, file in enumerate(files):

var = tk.BooleanVar()

chk = tk.Checkbutton(self.compare_results_frame.interior, text=file, variable=var, anchor=tk.W)

chk.pack(anchor=tk.W, padx=20, pady=(0, 5))

if i == 0: # Do not select the first file

chk.config(state=tk.DISABLED)

var.set(False)

else: # Select all other files by default

var.set(True)

self.check_vars[file] = var

def delete_selected_files(self):

selected_files = [file for file, var in self.check_vars.items() if var.get()]

if not selected_files:

messagebox.showwarning("警告", "没有选择任何文件进行删除")

return

if messagebox.askyesno("确认删除", f"你确定要删除 {len(selected_files)} 个文件吗?"):

for file in selected_files:

try:

os.remove(file)

print(f"已删除文件: {file}")

except Exception as e:

print(f"无法删除文件 {file}: {e}")

def save_unique_files(self):

if not self.check_vars:

messagebox.showwarning("警告", "没有找到重复文件")

return

save_dir = filedialog.askdirectory(title="选择保存无重复文件的文件夹")

if not save_dir:

return

# Create a new folder for unique files

unique_dir = os.path.join(save_dir, "无重复文件")

os.makedirs(unique_dir, exist_ok=True)

# For single folder mode

if self.single_folder_mode:

original_folder = self.path_entry.get()

# Copy files that are not selected for deletion

for file_path, var in self.check_vars.items():

if not var.get(): # If not selected for deletion

# Get the relative path within the original folder

rel_path = os.path.relpath(file_path, original_folder)

# Create corresponding directory structure in the new folder

os.makedirs(dest_dir, exist_ok=True)

# Copy the file

try:

os.copy2(file_path, dest_dir)

except Exception as e:

print(f"无法复制文件 {file_path}: {e}")

# For folder comparison mode

else:

# Copy all files except those selected for deletion

for file_path, var in self.check_vars.items():

if not var.get(): # If not selected for deletion

# Determine which folder the file belongs to

if file_path.startswith(self.folder1):

rel_path = os.path.relpath(file_path, self.folder1)

dest_dir = os.path.join(unique_dir, "Folder1", os.path.dirname(rel_path))

elif file_path.startswith(self.folder2):

rel_path = os.path.relpath(file_path, self.folder2)

dest_dir = os.path.join(unique_dir, "Folder2", os.path.dirname(rel_path))

else:

continue

os.makedirs(dest_dir, exist_ok=True)

try:

os.copy2(file_path, dest_dir)

except Exception as e:

print(f"无法复制文件 {file_path}: {e}")

messagebox.showinfo("完成", f"无重复文件已保存到: {unique_dir}")

def export_duplicate_files(self):

if not self.files_dict:

messagebox.showwarning("警告", "没有找到重复文件")

return

file_path = filedialog.asksaveasfilename(

defaultextension=".txt",

filetypes=[("Text files", "*.txt"), ("All files", "*.*")],

title="导出重复文件列表"

)

if not file_path:

return

try:

with open(file_path, 'w', encoding='utf-8') as f:

for hash_val, files in self.files_dict.items():

if len(files) > 1:

f.write(f"重复文件组(哈希值:{hash_val}):\n")

for i, file in enumerate(files):

f.write(f"{file}\n")

f.write("=" * 80 + "\n")

messagebox.showinfo("完成", f"重复文件列表已导出到: {file_path}")

except Exception as e:

messagebox.showerror("错误", f"导出失败: {e}")

if __name__ == "__main__":

root = tk.Tk()

# Set the icon of the application window

else:

print("Icon file not found:", icon_path)

app = DuplicateFileFinder(root)

root.mainloop()


相关推荐

Python自动化脚本应用与示例(python办公自动化脚本)

Python是编写自动化脚本的绝佳选择,因其语法简洁、库丰富且跨平台兼容性强。以下是Python自动化脚本的常见应用场景及示例,帮助你快速上手:一、常见自动化场景文件与目录操作...

Python文件操作常用库高级应用教程

本文是在前面《Python文件操作常用库使用教程》的基础上,进一步学习Python文件操作库的高级应用。一、高级文件系统监控1.1watchdog库-实时文件系统监控安装与基本使用:...

Python办公自动化系列篇之六:文件系统与操作系统任务

作为高效办公自动化领域的主流编程语言,Python凭借其优雅的语法结构、完善的技术生态及成熟的第三方工具库集合,已成为企业数字化转型过程中提升运营效率的理想选择。该语言在结构化数据处理、自动化文档生成...

14《Python 办公自动化教程》os 模块操作文件与文件夹

在日常工作中,我们经常会和文件、文件夹打交道,比如将服务器上指定目录下文件进行归档,或将爬虫爬取的数据根据时间创建对应的文件夹/文件,如果这些还依靠手动来进行操作,无疑是费时费力的,这时候Pyt...

python中os模块详解(python os.path模块)

os模块是Python标准库中的一个模块,它提供了与操作系统交互的方法。使用os模块可以方便地执行许多常见的系统任务,如文件和目录操作、进程管理、环境变量管理等。下面是os模块中一些常用的函数和方法:...

21-Python-文件操作(python文件的操作步骤)

在Python中,文件操作是非常重要的一部分,它允许我们读取、写入和修改文件。下面将详细讲解Python文件操作的各个方面,并给出相应的示例。1-打开文件...

轻松玩转Python文件操作:移动、删除

哈喽,大家好,我是木头左!Python文件操作基础在处理计算机文件时,经常需要执行如移动和删除等基本操作。Python提供了一些内置的库来帮助完成这些任务,其中最常用的就是os模块和shutil模块。...

Python 初学者练习:删除文件和文件夹

在本教程中,你将学习如何在Python中删除文件和文件夹。使用os.remove()函数删除文件...

引人遐想,用 Python 获取你想要的“某个人”摄像头照片

仅用来学习,希望给你们有提供到学习上的作用。1.安装库需要安装python3.5以上版本,在官网下载即可。然后安装库opencv-python,安装方式为打开终端输入命令行。...

Python如何使用临时文件和目录(python目录下文件)

在某些项目中,有时候会有大量的临时数据,比如各种日志,这时候我们要做数据分析,并把最后的结果储存起来,这些大量的临时数据如果常驻内存,将消耗大量内存资源,我们可以使用临时文件,存储这些临时数据。使用标...

Linux 下海量文件删除方法效率对比,最慢的竟然是 rm

Linux下海量文件删除方法效率对比,本次参赛选手一共6位,分别是:rm、find、findwithdelete、rsync、Python、Perl.首先建立50万个文件$testfor...

Python 开发工程师必会的 5 个系统命令操作库

当我们需要编写自动化脚本、部署工具、监控程序时,熟练操作系统命令几乎是必备技能。今天就来聊聊我在实际项目中高频使用的5个系统命令操作库,这些可都是能让你效率翻倍的"瑞士军刀"。一...

Python常用文件操作库使用详解(python文件操作选项)

Python生态系统提供了丰富的文件操作库,可以处理各种复杂的文件操作需求。本教程将介绍Python中最常用的文件操作库及其实际应用。一、标准库核心模块1.1os模块-操作系统接口主要功能...

11. 文件与IO操作(文件io和网络io)

本章深入探讨Go语言文件处理与IO操作的核心技术,结合高性能实践与安全规范,提供企业级解决方案。11.1文件读写11.1.1基础操作...

Python os模块的20个应用实例(python中 import os模块用法)

在Python中,...