Muhammad Manamil on November 13, 2025
Hey everyone! As a developer, I spend my life seeking efficiency. We all know Python is the Swiss Army knife of scripting, but sometimes the "12 scripts to rename your downloads" lists just don't cut it. We need real power. We need automations that tackle the tedious, the complex, and the truly time-consuming tasks that crop up daily, especially in a professional setting.
I've put together a list of 12 highly practical, copy-paste-ready Python scripts that I actually use to shave hours off my workflow every single week. These aren't just file organizers; these are scripts that dive into system resources, handle asynchronous data, and manipulate files at a deep level.
Ready to level up your productivity? Let's dive in. 🚀
How often do you copy a URL only to realize you need to strip the tracking parameters? Or maybe you copy a Python function name and need it in snake_case for your documentation? Manual reformatting is a massive time sink.
This script runs in the background, monitors your clipboard, and applies a set of user-defined Regular Expression (RegEx) rules. If a rule matches the copied text, it automatically replaces it with the transformed output. Think of it as IFTTT for your clipboard.
import pyperclip
import re
import time
RULES = [
# 1. Clean Google/Affiliate Tracking URLs
(re.compile(r'(\?|&)(utm_.*?|aff_.*?|ref=.*?)=[^&]*'), ''),
# 2. Convert PascalCase to snake_case
(re.compile(r'(.)([A-Z][a-z]+)'), r'\1_\2'),
(re.compile(r'([a-z0-9])([A-Z])'), r'\1_\2')
]
def apply_rules(text):
for pattern, replacement in RULES:
text = pattern.sub(replacement, text).lower()
return text
if __name__ == "__main__":
recent_value = pyperclip.paste()
print("Clipboard Engine running... Press Ctrl+C to stop.")
while True:
try:
current_value = pyperclip.paste()
if current_value != recent_value:
transformed_value = apply_rules(current_value)
if transformed_value != current_value:
print(f"Clipboard updated. Original: '{current_value[:30]}...' -> New: '{transformed_value[:30]}...'")
pyperclip.copy(transformed_value)
recent_value = pyperclip.paste() # Update with the potentially new value
time.sleep(0.5)
except KeyboardInterrupt:
break
This script uses the awesome pyperclip library (you might need pip install pyperclip) to access the system clipboard. It runs an infinite loop, checking the clipboard every 0.5 seconds.
The RULES list holds pairs of (RegEx_Pattern, Replacement_String). The magic happens when we find a new value; we apply the transformations, and if the output is different, we write it back to the clipboard. I use the URL cleaning rule daily when sharing links—it keeps URLs short and clean!
mmapCopying a very large file (e.g., a 10GB dataset or log file) using standard Python I/O (.read()/.write()) is slow because the OS has to copy the data twice: first from disk to OS kernel buffer, and then from the kernel buffer to the Python user-space buffer.
The mmap module allows us to map a file directly into the process's memory space. This drastically reduces the overhead, effectively creating a "zero-copy" operation for copying data streams, which is a huge win for massive files.
import mmap
import os
import shutil
def zero_copy_transfer(source_path, dest_path):
"""Copies a file using memory mapping for speed."""
if not os.path.exists(source_path):
print(f"Error: Source file not found at {source_path}")
return
# Use shutil.copyfile if the file is small (mmap overhead isn't worth it)
file_size = os.path.getsize(source_path)
if file_size < 1024 * 1024 * 5: # 5 MB threshold
shutil.copyfile(source_path, dest_path)
print(f"Standard copy used for small file: {source_path}")
return
try:
with open(source_path, 'rb') as f_in, open(dest_path, 'wb') as f_out:
# Resize the output file to the same size as the input file
f_out.truncate(file_size)
# Map the file into memory
with mmap.mmap(f_out.fileno(), file_size, access=mmap.ACCESS_WRITE) as m:
# Read the source data directly into the memory-mapped object
data = f_in.read()
m.write(data)
m.flush()
print(f"Zero-copy transfer complete: {source_path} -> {dest_path}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
# Replace these paths with your large file test
SOURCE = 'large_log_file.txt'
DEST = 'large_log_file_copy.txt'
# NOTE: You need a large file at SOURCE path to test this effectively
# You might want to create a dummy 100MB file first!
# zero_copy_transfer(SOURCE, DEST)
print("Script ready. Replace file names and uncomment 'zero_copy_transfer' to run.")
We open both source and destination files. The key step is f_out.truncate(file_size) which pre-allocates the space for the destination file. Then, mmap.mmap(f_out.fileno(), file_size, access=mmap.ACCESS_WRITE) maps the output file's underlying physical disk space into our process's virtual memory. When we call m.write(data), Python essentially copies the data from the source file stream directly to this mapped memory, bypassing the need for temporary buffers, making it lightning fast for huge files. This is essential for moving large database dumps or video assets.
Traditional backups waste disk space by creating a full copy every time. But if you only use simple syncing (like rsync), you lose the version history. We need the space efficiency of syncing and the safety of versioning.
This script creates daily snapshots using hardlinks. A hardlink is a second entry in the file system for the same file data. When the script runs, it creates a new snapshot folder. It hardlinks the files that haven't changed since the last backup (zero space used!) and physically copies only the files that have changed.
import os
import time
import datetime
import shutil
SOURCE_DIR = '/path/to/your/important/data' # <-- CHANGE THIS
BACKUP_ROOT = '/path/to/your/backup/drive' # <-- CHANGE THIS
def get_last_snapshot(root_dir):
"""Finds the path to the most recent daily snapshot."""
snapshots = [d for d in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, d))]
if not snapshots:
return None
snapshots.sort(reverse=True)
return os.path.join(root_dir, snapshots[0])
def run_hardlink_backup():
timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S")
new_snapshot_dir = os.path.join(BACKUP_ROOT, timestamp)
last_snapshot_dir = get_last_snapshot(BACKUP_ROOT)
print(f"Creating new snapshot: {new_snapshot_dir}")
os.makedirs(new_snapshot_dir, exist_ok=True)
# 1. Hardlink all unchanged files from the last snapshot
if last_snapshot_dir:
for root, _, files in os.walk(last_snapshot_dir):
relative_path = os.path.relpath(root, last_snapshot_dir)
target_root = os.path.join(new_snapshot_dir, relative_path)
os.makedirs(target_root, exist_ok=True)
for file in files:
src_file = os.path.join(root, file)
dest_file = os.path.join(target_root, file)
try:
# Skip if the source file doesn't exist anymore
if os.path.exists(os.path.join(SOURCE_DIR, relative_path, file)):
os.link(src_file, dest_file)
except Exception:
pass # Ignore errors for now
# 2. Copy/Overwrite changed files from the SOURCE_DIR
for root, dirs, files in os.walk(SOURCE_DIR):
relative_path = os.path.relpath(root, SOURCE_DIR)
target_root = os.path.join(new_snapshot_dir, relative_path)
os.makedirs(target_root, exist_ok=True)
for file in files:
src_file = os.path.join(root, file)
dest_file = os.path.join(target_root, file)
# Check if file has changed (simple check: size/mod time comparison)
if not last_snapshot_dir or \
not os.path.exists(os.path.join(last_snapshot_dir, relative_path, file)) or \
os.path.getmtime(src_file) > os.path.getmtime(os.path.join(last_snapshot_dir, relative_path, file)):
print(f"Copying (changed/new): {file}")
shutil.copy2(src_file, dest_file) # copy2 preserves metadata
# Else: File already hardlinked from step 1
print("Backup finished.")
if __name__ == "__main__":
# IMPORTANT: Update SOURCE_DIR and BACKUP_ROOT paths
# run_hardlink_backup()
print("Script ready. Please update paths and uncomment 'run_hardlink_backup' to run.")
This script relies on os.link(), which creates a hardlink. Hardlinks only work within the same disk/partition. The core logic is:
Find the path of the last backup (get_last_snapshot).
Create the new snapshot folder.
Walk the last snapshot and hardlink everything to the new snapshot. (This is fast and uses no extra space.)
Walk the source data. If a file is new or has a newer modification time (os.path.getmtime), copy it over the hardlink in the new snapshot, overwriting it.
This is a fantastic automation to run via Cron or Windows Task Scheduler for true, space-efficient, versioned local backups.
If you manage a documentation site, a large README, or local knowledge base built with Markdown, broken internal or external links are a constant headache. Manual checking is impossible.
This script scans all Markdown files in a directory, extracts all [text](link) patterns, and then checks the validity of each link. It uses standard file checks for internal links and the requests library for external URLs.
import os
import re
import requests
DOCS_DIR = './docs' # <-- CHANGE THIS to your docs folder
def check_link(link, base_path):
"""Checks if a link is valid (local file or external URL)."""
if link.startswith('http'):
# External link check
try:
response = requests.head(link, timeout=5, allow_redirects=True)
if response.status_code >= 400:
return f"External link broken: Status {response.status_code}"
return "OK"
except requests.RequestException as e:
return f"External link failed: {e.__class__.__name__}"
elif link.startswith('#'):
# Anchor links are tricky to validate robustly, skipping for simplicity
return "Anchor (SKIPPED)"
else:
# Local file check
full_path = os.path.join(os.path.dirname(base_path), link)
if not os.path.exists(full_path):
# Check if it's an absolute path from DOCS_DIR root
if not os.path.exists(os.path.join(DOCS_DIR, link)):
return f"Local file missing: {full_path}"
return "OK"
def find_and_check_links():
# RegEx to find Markdown links: [text](link)
LINK_REGEX = re.compile(r'\[.*?\]\((.*?)\)')
print(f"Scanning for links in {DOCS_DIR}...")
broken_links = 0
for root, _, files in os.walk(DOCS_DIR):
for file_name in files:
if file_name.endswith('.md'):
file_path = os.path.join(root, file_name)
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
for match in LINK_REGEX.finditer(content):
link = match.group(1).split()[0] # Take link, ignore optional title
result = check_link(link, file_path)
if result != "OK" and "SKIPPED" not in result:
broken_links += 1
print(f"[🚨 BROKEN] File: {file_path}, Link: {link}, Reason: {result}")
if broken_links == 0:
print("\nâś… All links seem fine!")
else:
print(f"\n❌ Found {broken_links} broken links.")
if __name__ == "__main__":
# You need the 'requests' library: pip install requests
# find_and_check_links()
print("Script ready. Update DOCS_DIR and uncomment 'find_and_check_links' to run.")
This script requires the requests library. We use os.walk to find all .md files. The RegEx r'\[.*?\]\((.*?)\)' grabs the URL within the parentheses. The check_link function distinguishes between local file paths (using os.path.exists) and external URLs (using requests.head to check the status code without downloading the entire page). Running this before every documentation deployment is a massive quality-of-life improvement.
Polling your mailbox every few minutes to check for urgent emails is inefficient and slow. For true real-time email triage without constantly checking a browser tab, we need a better method.
The IMAP IDLE command allows a client (our Python script) to instruct the server (like Gmail) to notify it immediately when a change (like a new email) occurs in a monitored folder. This is a low-latency, low-resource way to check for highly specific, urgent emails (e.g., "Deployment Failed").
import imaplib
import time
import re
# --- CONFIGURATION ---
IMAP_SERVER = 'imap.gmail.com'
EMAIL_ADDRESS = 'your_email@gmail.com' # <-- CHANGE THIS
PASSWORD = 'your_app_password' # <-- CHANGE THIS (Use App Password for Gmail)
MONITOR_FOLDER = 'INBOX'
URGENT_REGEX = re.compile(r'(failed|alert|critical|deploy|error)', re.IGNORECASE)
# ---------------------
def imap_idle_triage():
try:
# Connect and Login
mail = imaplib.IMAP4_SSL(IMAP_SERVER)
mail.login(EMAIL_ADDRESS, PASSWORD)
mail.select(MONITOR_FOLDER)
print(f"Logged in and monitoring {MONITOR_FOLDER} using IDLE...")
while True:
# Send IDLE command
mail.send(b'A001 IDLE\r\n')
# Wait for the server to send a response (new data or timeout)
response = mail.readline()
# If the server responds with an update (like EXISTS or RECENT)
if b'OK' not in response:
print("\nNew mail activity detected!")
mail.send(b'DONE\r\n') # Stop IDLE command
mail.readline() # Consume server response
# Check for new, urgent emails
status, messages = mail.search(None, 'UNSEEN')
if status == 'OK':
for msg_id in messages[0].split():
# Fetch the Envelope/Subject only
status, data = mail.fetch(msg_id, '(ENVELOPE)')
if status == 'OK':
envelope_data = data[0][1].decode('utf-8')
# Simple extraction of the subject line from the envelope data (complex regex needed for full robust extraction)
subject_match = re.search(r'"([^"]*)"', envelope_data)
subject = subject_match.group(1) if subject_match else "[Subject Not Found]"
if URGENT_REGEX.search(subject):
print(f"🚨 URGENT EMAIL: {subject}")
# Add code here to trigger a system notification or a text message
else:
print(f"New email: {subject}")
mail.select(MONITOR_FOLDER) # Reselect the folder
time.sleep(1) # Small pause before next IDLE check/loop
except Exception as e:
print(f"IMAP IDLE error: {e}")
finally:
try:
mail.logout()
except:
pass
if __name__ == "__main__":
# IMPORTANT: Update configurations and use an App Password for secure login
# imap_idle_triage()
print("Script ready. Update config and uncomment 'imap_idle_triage' to run.")
This is a more advanced script. You must use an App Password (not your regular account password) for services like Gmail. We connect, select the folder, and send the IDLE command. The script then waits (mail.readline()) until the server pushes a response. When activity is detected, we stop the IDLE (DONE), search for UNSEEN messages, and check their subject against our URGENT_REGEX. This is priceless for DevOps engineers or anyone needing immediate alerts on server failures or critical system health checks.
Screenshots contain valuable text (error messages, code snippets, meeting notes) that is inaccessible to your system's search bar. Finding that one old screenshot is a nightmare.
This script processes a directory of screenshots, uses an Optical Character Recognition (OCR) tool (like Tesseract, via the pytesseract library) to extract all text, and stores the text in a simple SQLite database alongside the file path. You can then query the database for any text found in any of your images.
import os
import sqlite3
from PIL import Image
import pytesseract # pip install pytesseract, and install Tesseract-OCR engine separately!
SCREENSHOTS_DIR = './screenshots' # <-- CHANGE THIS
DB_PATH = 'ocr_index.db'
IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')
def create_db(conn):
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS ocr_data (
filepath TEXT PRIMARY KEY,
ocr_text TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
""")
conn.commit()
def process_screenshots():
conn = sqlite3.connect(DB_PATH)
create_db(conn)
cursor = conn.cursor()
for root, _, files in os.walk(SCREENSHOTS_DIR):
for file_name in files:
if file_name.lower().endswith(IMAGE_EXTENSIONS):
file_path = os.path.join(root, file_name)
# Check if already indexed
cursor.execute("SELECT filepath FROM ocr_data WHERE filepath = ?", (file_path,))
if cursor.fetchone():
continue
try:
# OCR process
text = pytesseract.image_to_string(Image.open(file_path))
# Store data
cursor.execute("INSERT INTO ocr_data (filepath, ocr_text) VALUES (?, ?)",
(file_path, text))
conn.commit()
print(f"Indexed: {file_name}")
except Exception as e:
print(f"Could not process {file_name}: {e}")
conn.close()
def search_screenshots(query):
conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()
# Use LIKE for simple full-text search
cursor.execute("SELECT filepath, ocr_text FROM ocr_data WHERE ocr_text LIKE ?", (f'%{query}%',))
results = cursor.fetchall()
conn.close()
if results:
print(f"\n--- Found {len(results)} matches for '{query}' ---")
for filepath, text in results:
print(f"Match in: {filepath}")
# print(f"Snippet: {text[:200]}...") # Optional: print snippet
else:
print(f"\nNo matches found for '{query}'.")
if __name__ == "__main__":
# You need pytesseract and the Tesseract engine installed
# process_screenshots()
# search_screenshots("your search term here")
print("Script ready. Install pytesseract/Tesseract, update config, then run process/search functions.")
This script needs pytesseract and the underlying Tesseract OCR engine (must be installed separately on your OS). We use sqlite3 to create a lightweight, local database. The process_screenshots function iterates over the image files, runs pytesseract.image_to_string(), and inserts the result into the DB. The search_screenshots function then allows you to query the indexed text. This is a life-saver for finding old error traces or configuration details hidden in images.
When reviewing legacy code or cleaning up a massive Python file, you often wonder which imported modules are actually used by the logic. Commenting out imports until the code breaks is frustrating.
The ast (Abstract Syntax Tree) module allows Python to inspect its own structure. This script reads a Python file, parses its AST, and reports all imported modules (including from X import Y) but excludes any imports that are never referenced in the subsequent code body.
import ast
import sys
from collections import defaultdict
def analyze_python_file(filepath):
"""Analyzes a Python file to track imported modules and their usage."""
with open(filepath, 'r') as f:
tree = ast.parse(f.read())
imported_names = {} # {alias: module_name or name}
used_names = set()
# 1. Collect all imported names
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imported_names[alias.asname or alias.name] = alias.name
elif isinstance(node, ast.ImportFrom):
if node.module:
for alias in node.names:
imported_names[alias.asname or alias.name] = f"{node.module}.{alias.name}"
# 2. Find all names used in the code
for node in ast.walk(tree):
if isinstance(node, ast.Name) and isinstance(node.ctx, (ast.Load, ast.Store, ast.Del)):
used_names.add(node.id)
# 3. Compare and Report
unused_imports = {}
for alias, module_name in imported_names.items():
# Check if the alias (or the name itself) was ever used
if alias not in used_names:
unused_imports[alias] = module_name
print(f"--- Unused Imports in {filepath} ---")
if unused_imports:
for alias, module_name in unused_imports.items():
print(f"Potential unused import: {module_name} (as {alias})")
else:
print("âś… Clean: No immediately obvious unused imports found.")
return unused_imports
if __name__ == "__main__":
# Example usage: Pass a Python file path
# If you run this script on itself, it will find its own unused imports (if any)
if len(sys.argv) < 2:
print(f"Usage: python {sys.argv[0]} " )
else:
filepath = sys.argv[1]
# analyze_python_file(filepath)
print("Script ready. Pass a Python file path as an argument and uncomment 'analyze_python_file' to run.")
This is a sophisticated static analysis tool. The ast module creates a tree representation of the code. We first walk the tree and populate imported_names with every alias and module. We then walk the tree again to find every ast.Name node that is being Loaded (i.e., used). By comparing the two sets (imported_names vs. used_names), we can identify modules that were imported but whose names never appeared in the code body. This is a powerful, dependency-free way to clean up large files and improve cold-start times.
Sometimes you need to analyze when you (or your team) are most productive, or figure out why commits stop abruptly. The standard git log is too verbose for this.
This script uses the subprocess module to run git log, extracts the commit timestamps, and generates a simple, text-based heatmap showing commit activity broken down by hour of the day and day of the week.
import subprocess
import datetime
from collections import defaultdict
def create_commit_heatmap(repo_path='.'):
"""Generates a text-based heatmap of commit times."""
# Run git log and format the output as raw timestamps
try:
command = [
'git', 'log', '--all',
'--pretty=format:%at' # %at is author timestamp (seconds since epoch)
]
result = subprocess.run(
command,
cwd=repo_path,
capture_output=True,
text=True,
check=True
)
timestamps = result.stdout.strip().split('\n')
except subprocess.CalledProcessError as e:
print(f"Error running git: {e.stderr.strip()}")
return
except FileNotFoundError:
print("Error: Git executable not found.")
return
# Initialize heatmap: [Day of Week][Hour of Day]
heatmap = defaultdict(lambda: defaultdict(int))
for ts in timestamps:
try:
timestamp = int(ts)
dt_object = datetime.datetime.fromtimestamp(timestamp)
day_of_week = dt_object.weekday() # Monday is 0, Sunday is 6
hour_of_day = dt_object.hour
heatmap[day_of_week][hour_of_day] += 1
except ValueError:
continue
# Visualization
DAYS = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
HOURS = [f"{h:02d}" for h in range(24)]
print("\n--- Git Commit Activity Heatmap (Commits Per Hour/Day) ---")
print(" " + " ".join(HOURS))
print(" " + "-" * (3 * 24))
for d_index, day in enumerate(DAYS):
row = f"{day} | "
for h_index in range(24):
count = heatmap[d_index][h_index]
# Use simple characters for visualization
if count == 0:
char = ' .'
elif count < 5:
char = ' â–‘'
elif count < 15:
char = ' â–’'
else:
char = ' â–“'
row += char
print(row)
if __name__ == "__main__":
# create_commit_heatmap('.') # Runs on the current directory if it's a Git repo
print("Script ready. Uncomment 'create_commit_heatmap' in a Git repository to run.")
We use subprocess to execute git log --pretty=format:%at, which is the fastest way to get raw timestamps. The output is parsed into a datetime object. We then use a defaultdict to count commits by day (weekday() 0-6) and hour (hour 0-23). The final print loop iterates through the structure, using simple Unicode block characters (â–‘, â–’, â–“) to create a visual heatmap. I love using this to quickly diagnose burnout or spot patterns in team collaboration.
Daily logs, meeting notes, or research journals grow quickly into thousands of lines. Finding the key takeaways or action items from last week's notes is a huge effort.
This script uses the gensim library (specifically the TextRank algorithm) for unsupervised text summarization. It takes a collection of text files (or a single large log file) and generates a concise, fixed-length summary of the content.
import os
from gensim.summarization import summarize # pip install gensim
from collections import deque
LOG_DIR = './daily_logs' # <-- CHANGE THIS
SUMMARY_LENGTH = 0.3 # 30% of original text length
def summarize_logs():
all_text = []
# Load all text files in the log directory
for root, _, files in os.walk(LOG_DIR):
for file_name in files:
if file_name.endswith(('.txt', '.log')):
file_path = os.path.join(root, file_name)
try:
with open(file_path, 'r', encoding='utf-8') as f:
all_text.append(f.read())
except Exception:
continue
# Combine text for holistic summarization
combined_text = "\n\n".join(all_text)
if not combined_text.strip():
print("No content found in logs to summarize.")
return
try:
# Use TextRank algorithm for summarization
summary = summarize(combined_text, ratio=SUMMARY_LENGTH)
print("\n--- Daily Log Summary ---")
print(summary)
print("-" * 30)
# Optional: Save the summary
with open('daily_summary.txt', 'w', encoding='utf-8') as f:
f.write(summary)
print("Summary saved to daily_summary.txt")
except Exception as e:
print(f"Summarization failed (check gensim/TextRank requirements): {e}")
if __name__ == "__main__":
# You need the 'gensim' library: pip install gensim
# summarize_logs()
print("Script ready. Install gensim, update LOG_DIR, and uncomment 'summarize_logs' to run.")
This relies on gensim's summarization module. The summarize function implements the TextRank algorithm, which identifies the most important sentences based on graph theory (sentences that share many common words are ranked higher). By setting ratio=0.3, we ask the script to return a summary that is 30% of the original text's length. I use this every Friday to quickly review my week's work notes and compile my status report.
Splitting a 500-page bank statement or a large technical manual into individual chapters is a manual drag. Merging several small reports into one indexed PDF is also a chore.
The PyPDF2 library (or the newer pypdf) is a workhorse for PDF manipulation. This script provides functions to split a PDF based on page ranges and merge multiple PDFs into one, automatically adding a Table of Contents (TOC) using bookmarks if required.
from PyPDF2 import PdfReader, PdfWriter # pip install PyPDF2
import os
def pdf_split(input_path, output_dir, page_ranges):
"""Splits a PDF based on a dictionary of {title: [start_page, end_page]}."""
reader = PdfReader(input_path)
os.makedirs(output_dir, exist_ok=True)
for title, (start, end) in page_ranges.items():
writer = PdfWriter()
# Pages are 0-indexed, so we use start-1 to end
for page_num in range(start - 1, end):
writer.add_page(reader.pages[page_num])
output_path = os.path.join(output_dir, f"{title}.pdf")
with open(output_path, "wb") as output_file:
writer.write(output_file)
print(f"Created: {output_path}")
def pdf_merge(file_list, output_path):
"""Merges a list of PDFs and adds bookmarks for indexing."""
merger = PdfWriter()
for path in file_list:
title = os.path.basename(path).replace(".pdf", "")
# Get the page number where this file starts
page_start = len(merger.pages)
merger.append(path)
# Add a bookmark pointing to the start of the merged file
merger.add_outline_item(title, page_start)
with open(output_path, "wb") as output_file:
merger.write(output_file)
print(f"Merged and indexed PDF created: {output_path}")
if __name__ == "__main__":
# You need PyPDF2: pip install PyPDF2
# --- SPLIT EXAMPLE ---
# INPUT_PDF = 'large_manual.pdf' # <-- CHANGE THIS
# RANGES = {
# "Chapter 1 - Intro": [1, 10],
# "Chapter 2 - Setup": [11, 25]
# }
# pdf_split(INPUT_PDF, './split_output', RANGES)
# --- MERGE EXAMPLE ---
# MERGE_LIST = ['./report_a.pdf', './report_b.pdf', './report_c.pdf'] # <-- CHANGE THIS
# pdf_merge(MERGE_LIST, 'final_combined_report.pdf')
print("Script ready. Install PyPDF2, update file paths, and uncomment the desired function to run.")
The PdfReader and PdfWriter objects from PyPDF2 do the heavy lifting. In pdf_split, we loop through the page indices and add them to a new PdfWriter. In pdf_merge, we use merger.append(path) to add content, and the key trick is using merger.add_outline_item(title, page_start) which creates a bookmark (TOC entry) in the resulting PDF at the page number where the new file began. This is invaluable for finance, legal, or documentation teams.
Checking CPU load, memory usage, disk activity, or network bandwidth often requires opening three different system monitors, which distracts you from your main work.
The psutil library provides an excellent cross-platform way to access system metrics. This script runs a simple, real-time dashboard in the terminal, giving you key metrics at a glance.
import psutil # pip install psutil
import time
import os
import sys
def clear_screen():
"""Clears the terminal screen."""
os.system('cls' if os.name == 'nt' else 'clear')
def get_sys_metrics():
"""Fetches core system metrics using psutil."""
cpu_percent = psutil.cpu_percent(interval=None) # Non-blocking call
mem_info = psutil.virtual_memory()
disk_usage = psutil.disk_usage('/') # Use '/' for Linux/macOS, or 'C:' for Windows
net_io = psutil.net_io_counters()
return {
'CPU': f"{cpu_percent:0.1f}%",
'Memory': f"{mem_info.percent:0.1f}% ({mem_info.used / (1024**3):.2f} GB used)",
'Disk': f"{disk_usage.percent}% ({disk_usage.free / (1024**3):.2f} GB free)",
'Net_Sent': f"{net_io.bytes_sent / (1024**2):.2f} MB",
'Net_Recv': f"{net_io.bytes_recv / (1024**2):.2f} MB",
}
def run_dashboard(interval=1.0):
clear_screen()
print("System Resource Dashboard (Press Ctrl+C to stop)")
print("-" * 40)
# Get initial metrics to calculate CPU usage over an interval
psutil.cpu_percent(interval=None)
try:
while True:
metrics = get_sys_metrics()
# Print the dashboard
sys.stdout.write('\033[F') # Move cursor up (only works in some terminals)
sys.stdout.write('\033[K') # Clear line
print(f"Time: {time.strftime('%H:%M:%S')}")
print(f"CPU Usage: {metrics['CPU']}")
print(f"Memory: {metrics['Memory']}")
print(f"Disk Root: {metrics['Disk']}")
print(f"Net Sent: {metrics['Net_Sent']}")
print(f"Net Recv: {metrics['Net_Recv']}")
time.sleep(interval)
except KeyboardInterrupt:
print("\nDashboard stopped.")
if __name__ == "__main__":
# You need psutil: pip install psutil
# run_dashboard(interval=2)
print("Script ready. Install psutil and uncomment 'run_dashboard' to run the real-time terminal monitor.")
psutil abstracts away the OS-specific commands. The key is calling psutil.cpu_percent(interval=None) once before the loop to prime it, and then inside the loop, calling it again. It uses the difference between the two calls to get an accurate CPU usage percentage for that interval. The sys.stdout.write('\033[F') command is a neat trick to refresh the output in place in the terminal, creating a true real-time dashboard effect. I keep this running on my secondary monitor during build processes or stress tests.
You want to run a simple Python script (like "Clean Temp Files" or "Trigger Backup") from any device on your local network (e.g., your phone or a separate machine), or trigger it easily via a desktop shortcut.
We can use a super lightweight web framework like Flask to expose our simple automation functions as a local API endpoint. When you visit http://localhost:5000/trigger_cleanup, the Python function runs.
from flask import Flask, jsonify # pip install flask
import os
import time
app = Flask(__name__)
# --- Simple Automation Function ---
def cleanup_temp_files():
"""A placeholder for your actual cleanup automation."""
# Example: Delete files in a temp folder older than 1 day
TEMP_DIR = './temp_storage'
if not os.path.exists(TEMP_DIR):
os.makedirs(TEMP_DIR)
return 0
deleted_count = 0
now = time.time()
for filename in os.listdir(TEMP_DIR):
filepath = os.path.join(TEMP_DIR, filename)
if os.path.isfile(filepath) and now - os.stat(filepath).st_mtime > (24 * 3600):
os.remove(filepath)
deleted_count += 1
return deleted_count
# --- API Endpoint ---
@app.route('/trigger_cleanup', methods=['POST', 'GET'])
def trigger_cleanup():
deleted = cleanup_temp_files()
if deleted > 0:
message = f"Cleanup successful. Deleted {deleted} old temporary files."
else:
message = "Cleanup ran. No old files found or temp directory was empty."
# Return a JSON response
return jsonify({"status": "success", "message": message, "files_deleted": deleted})
@app.route('/', methods=['GET'])
def index():
return "Automation Hub Running. Visit /trigger_cleanup to run the task."
if __name__ == "__main__":
# Run the server on the local network IP (0.0.0.0) so other devices can access it.
# Set host='127.0.0.1' if you only want local access.
# You need Flask: pip install Flask
# app.run(host='0.0.0.0', port=5000, debug=False)
print("Script ready. Install Flask, uncomment 'app.run', and access http://127.0.0.1:5000 in your browser.")
This script requires Flask. We define a function cleanup_temp_files (where you'd put any short automation logic). We then use the @app.route decorator to link this function to a web URL /trigger_cleanup. When someone hits this URL, the function executes, and the server returns a clear JSON response. This is my favorite "lazy button" for tasks. I have this running on a headless Raspberry Pi, and I simply hit the URL from my phone to run a full backup.
These scripts are more than just Python code; they are little pockets of focused time-saving. By moving beyond the basics and tackling advanced, real-world annoyances like IMAP IDLE and AST parsing, you truly harness the power Python offers to us as developers.
The best automation isn't one you run once—it's the one you forget about because it's running silently in the background, making your workday smoother.
Now, take one of these, customize the configuration paths, and set it up on a scheduler (Cron, Task Scheduler, or even our Flask API). I guarantee you'll feel that satisfying click as you eliminate a repetitive task forever. Happy scripting!
November 05 2025
Laravel Cloud vs Vapor — Which One Should You Choose in 2026?Trying to choose between Laravel Cloud and Laravel Vapor? This guide explores their main differences — from pricing and performance to deployment style and scalability — so you can confidently pick the right hosting solution for your Laravel app in 2026
November 06 2025
How to Install Free SSL on Ubuntu Server Using Certbot (Step-by-Step 2026 Guide)Learn how to install a free SSL certificate on your Ubuntu server using Certbot and Let’s Encrypt. This step-by-step 2026 guide will help you secure your website with HTTPS, boost trust, and improve SEO — all without paying a single rupee
November 10 2025
Getting Started with Laravel 12 Starter Kits (React/Vue/Livewire)Discover how to kickstart your Laravel 12 projects with the most popular starter kits—React, Vue, and Livewire. Step-by-step guidance for setting up, configuring, and building your first app efficiently.
© 2025 — Revision. All Rights Reserved.