Skip to main content

Module 8 - File I/O

This module covers file input/output in Python. You'll learn how to read and write files, use the with statement, work with different file formats, and use the modern pathlib module.


1. Opening and Closing Files

1.1 Basic File Operations

# Open file
file = open('example.txt', 'r')

# Read content
content = file.read()
print(content)

# Close file (important!)
file.close()

1.2 File Modes

ModeDescriptionCreates if Not Exists
'r'Read (default)No (error)
'w'Write (overwrites)Yes
'a'AppendYes
'x'Exclusive createNo (error if exists)
'r+'Read and writeNo
'w+'Read and write (overwrites)Yes
'a+'Read and appendYes
'rb'Read binaryNo
'wb'Write binaryYes
# Read mode
file = open('data.txt', 'r')

# Write mode (overwrites)
file = open('output.txt', 'w')

# Append mode
file = open('log.txt', 'a')

# Binary mode
file = open('image.jpg', 'rb')
Always Close Files

Failing to close files can lead to resource leaks and data loss. Use the with statement to ensure files are closed automatically.


2. The with Statement

2.1 Context Manager

# Automatically closes file
with open('example.txt', 'r') as file:
content = file.read()
print(content)
# File is automatically closed here

# Multiple files
with open('input.txt', 'r') as infile, \
open('output.txt', 'w') as outfile:
content = infile.read()
outfile.write(content.upper())

2.2 Why Use with?

# Without with (risky)
file = open('data.txt', 'r')
try:
data = file.read()
process(data)
finally:
file.close() # Must remember to close

# With with (safe)
with open('data.txt', 'r') as file:
data = file.read()
process(data)
# Automatically closed, even if error occurs
Best Practice

Always use the with statement for file operations. It ensures proper resource management.


3. Reading Files

3.1 Reading Methods

# read() - Read entire file
with open('file.txt', 'r') as file:
content = file.read()
print(content)

# readline() - Read one line
with open('file.txt', 'r') as file:
line1 = file.readline()
line2 = file.readline()
print(line1, line2)

# readlines() - Read all lines into list
with open('file.txt', 'r') as file:
lines = file.readlines()
for line in lines:
print(line.strip())

# Iterate over file (memory efficient)
with open('file.txt', 'r') as file:
for line in file:
print(line.strip())

3.2 Reading with Size Limit

# Read first 100 characters
with open('large_file.txt', 'r') as file:
content = file.read(100)
print(content)

# Read in chunks
with open('large_file.txt', 'r') as file:
while True:
chunk = file.read(1024) # 1KB chunks
if not chunk:
break
process(chunk)

3.3 Handling Encodings

# Specify encoding
with open('file.txt', 'r', encoding='utf-8') as file:
content = file.read()

# Handle encoding errors
with open('file.txt', 'r', encoding='utf-8', errors='ignore') as file:
content = file.read()

# Common encodings: utf-8, ascii, latin-1, cp1252

4. Writing Files

4.1 Writing Methods

# write() - Write string
with open('output.txt', 'w') as file:
file.write('Hello, World!\n')
file.write('Second line\n')

# writelines() - Write list of strings
lines = ['Line 1\n', 'Line 2\n', 'Line 3\n']
with open('output.txt', 'w') as file:
file.writelines(lines)

# Using print()
with open('output.txt', 'w') as file:
print('Hello, World!', file=file)
print('Second line', file=file)

4.2 Append Mode

# Append to existing file
with open('log.txt', 'a') as file:
file.write('New log entry\n')

# Append with timestamp
from datetime import datetime

with open('log.txt', 'a') as file:
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
file.write(f'[{timestamp}] Event occurred\n')

4.3 Writing Formatted Data

# Format strings
data = [
('Alice', 25, 'NYC'),
('Bob', 30, 'LA'),
('Charlie', 35, 'Chicago')
]

with open('people.txt', 'w') as file:
for name, age, city in data:
file.write(f'{name:15} {age:3} {city:15}\n')

# CSV-like format
with open('data.csv', 'w') as file:
file.write('Name,Age,City\n')
for name, age, city in data:
file.write(f'{name},{age},{city}\n')

5. Working with Paths (pathlib)

5.1 Path Objects

from pathlib import Path

# Create path object
path = Path('data/file.txt')
path = Path.home() / 'documents' / 'file.txt'

# Current directory
current = Path.cwd()
print(current)

# Home directory
home = Path.home()
print(home)

# Absolute path
absolute = path.resolve()
print(absolute)

5.2 Path Properties

from pathlib import Path

path = Path('/home/user/documents/report.txt')

print(path.name) # 'report.txt'
print(path.stem) # 'report'
print(path.suffix) # '.txt'
print(path.parent) # '/home/user/documents'
print(path.parts) # ('/', 'home', 'user', 'documents', 'report.txt')
print(path.is_absolute()) # True

# Multiple extensions
path = Path('archive.tar.gz')
print(path.suffixes) # ['.tar', '.gz']

5.3 Path Operations

from pathlib import Path

# Check existence
path = Path('file.txt')
print(path.exists())
print(path.is_file())
print(path.is_dir())

# Create directory
directory = Path('new_folder')
directory.mkdir(exist_ok=True) # Don't error if exists
directory.mkdir(parents=True) # Create parent directories

# List directory contents
directory = Path('.')
for item in directory.iterdir():
print(item)

# Glob patterns
for txt_file in directory.glob('*.txt'):
print(txt_file)

# Recursive glob
for py_file in directory.rglob('*.py'):
print(py_file)

5.4 Reading/Writing with pathlib

from pathlib import Path

path = Path('data.txt')

# Read text
content = path.read_text()
print(content)

# Write text
path.write_text('Hello, World!')

# Read bytes
binary_data = path.read_bytes()

# Write bytes
path.write_bytes(b'Binary data')

# Read lines
lines = path.read_text().splitlines()
pathlib vs os.path

pathlib is the modern, object-oriented way to work with paths. Prefer it over os.path for new code.


6. Working with CSV Files

6.1 Reading CSV

import csv

# Read CSV
with open('data.csv', 'r') as file:
reader = csv.reader(file)
header = next(reader) # Skip header
for row in reader:
print(row)

# Read as dictionary
with open('data.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row['name'], row['age'])

# Example data.csv:
# name,age,city
# Alice,25,NYC
# Bob,30,LA

6.2 Writing CSV

import csv

# Write CSV
data = [
['Name', 'Age', 'City'],
['Alice', 25, 'NYC'],
['Bob', 30, 'LA']
]

with open('output.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(data)

# Write from dictionary
data = [
{'name': 'Alice', 'age': 25, 'city': 'NYC'},
{'name': 'Bob', 'age': 30, 'city': 'LA'}
]

with open('output.csv', 'w', newline='') as file:
fieldnames = ['name', 'age', 'city']
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)

6.3 CSV Options

import csv

# Custom delimiter
with open('data.tsv', 'r') as file:
reader = csv.reader(file, delimiter='\t')

# Custom quoting
with open('data.csv', 'w', newline='') as file:
writer = csv.writer(file, quoting=csv.QUOTE_ALL)
writer.writerow(['Value with, comma', 'Normal value'])

# Different dialect
with open('data.csv', 'w', newline='') as file:
writer = csv.writer(file, dialect='excel')

7. Working with JSON

7.1 Reading JSON

import json

# Read JSON file
with open('data.json', 'r') as file:
data = json.load(file)
print(data)

# Parse JSON string
json_string = '{"name": "Alice", "age": 25}'
data = json.loads(json_string)
print(data['name'])

# Example data.json:
# {
# "users": [
# {"name": "Alice", "age": 25},
# {"name": "Bob", "age": 30}
# ]
# }

7.2 Writing JSON

import json

# Write to JSON file
data = {
"name": "Alice",
"age": 25,
"city": "NYC",
"hobbies": ["reading", "coding"]
}

with open('output.json', 'w') as file:
json.dump(data, file, indent=4)

# Convert to JSON string
json_string = json.dumps(data, indent=2)
print(json_string)

# Pretty print
print(json.dumps(data, indent=4, sort_keys=True))

7.3 Custom JSON Encoding

import json
from datetime import datetime

# Custom encoder
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)

data = {
"event": "Meeting",
"timestamp": datetime.now()
}

json_string = json.dumps(data, cls=DateTimeEncoder)
print(json_string)

8. File System Operations

8.1 Using os Module

import os

# Current directory
print(os.getcwd())

# Change directory
os.chdir('/path/to/directory')

# List directory
files = os.listdir('.')
print(files)

# Create directory
os.mkdir('new_folder')
os.makedirs('path/to/folder', exist_ok=True)

# Remove file
os.remove('file.txt')

# Remove directory
os.rmdir('empty_folder')

# Remove directory tree
import shutil
shutil.rmtree('folder_with_contents')

8.2 File Information

import os
from pathlib import Path

# File size
size = os.path.getsize('file.txt')
print(f'Size: {size} bytes')

# Using pathlib
path = Path('file.txt')
size = path.stat().st_size
print(f'Size: {size} bytes')

# File timestamps
mtime = path.stat().st_mtime # Modification time
ctime = path.stat().st_ctime # Creation time

from datetime import datetime
mod_time = datetime.fromtimestamp(mtime)
print(f'Modified: {mod_time}')

# Check permissions
print(f'Readable: {os.access("file.txt", os.R_OK)}')
print(f'Writable: {os.access("file.txt", os.W_OK)}')
print(f'Executable: {os.access("file.txt", os.X_OK)}')

8.3 Copy and Move Files

import shutil

# Copy file
shutil.copy('source.txt', 'destination.txt')
shutil.copy2('source.txt', 'dest.txt') # Preserve metadata

# Copy directory
shutil.copytree('source_dir', 'dest_dir')

# Move file/directory
shutil.move('source.txt', 'destination.txt')

# Using pathlib
from pathlib import Path
source = Path('source.txt')
destination = Path('destination.txt')
source.rename(destination) # Move/rename

9. Temporary Files

9.1 Working with Temporary Files

import tempfile

# Temporary file (auto-deleted)
with tempfile.TemporaryFile(mode='w+t') as temp:
temp.write('Temporary data')
temp.seek(0)
print(temp.read())
# File deleted automatically

# Named temporary file
with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as temp:
temp.write('Temporary data')
temp_name = temp.name
print(f'Temp file: {temp_name}')

# Temporary directory
with tempfile.TemporaryDirectory() as temp_dir:
print(f'Temp directory: {temp_dir}')
# Use directory
# Directory deleted automatically

10. Error Handling

10.1 File Exceptions

# FileNotFoundError
try:
with open('nonexistent.txt', 'r') as file:
content = file.read()
except FileNotFoundError:
print('File not found!')

# PermissionError
try:
with open('/root/protected.txt', 'w') as file:
file.write('data')
except PermissionError:
print('Permission denied!')

# IsADirectoryError
try:
with open('some_directory', 'r') as file:
pass
except IsADirectoryError:
print('This is a directory, not a file!')

# Multiple exceptions
try:
with open('file.txt', 'r') as file:
content = file.read()
except (FileNotFoundError, PermissionError) as e:
print(f'Error: {e}')

10.2 Safe File Operations

from pathlib import Path

def safe_read(filename):
"""Safely read file with error handling."""
path = Path(filename)

if not path.exists():
return None

if not path.is_file():
raise ValueError(f'{filename} is not a file')

try:
return path.read_text()
except UnicodeDecodeError:
return path.read_text(encoding='latin-1')
except Exception as e:
print(f'Error reading file: {e}')
return None

def safe_write(filename, content):
"""Safely write file with backup."""
path = Path(filename)

# Create backup if file exists
if path.exists():
backup = path.with_suffix('.bak')
shutil.copy2(path, backup)

try:
path.write_text(content)
return True
except Exception as e:
print(f'Error writing file: {e}')
return False

11. Best Practices

11.1 File Handling Guidelines

# ✅ Good: Use with statement
with open('file.txt', 'r') as file:
content = file.read()

# ❌ Bad: Manual file closing
file = open('file.txt', 'r')
content = file.read()
file.close()

# ✅ Good: Use pathlib
from pathlib import Path
path = Path('data') / 'file.txt'

# ❌ Bad: String concatenation
path = 'data' + '/' + 'file.txt'

# ✅ Good: Iterate over file
with open('large_file.txt', 'r') as file:
for line in file:
process(line)

# ❌ Bad: Read entire file
with open('large_file.txt', 'r') as file:
lines = file.readlines()
for line in lines:
process(line)

11.2 Performance Tips

# Use buffering for better performance
with open('file.txt', 'r', buffering=8192) as file:
content = file.read()

# Process large files in chunks
def process_large_file(filename, chunk_size=1024*1024):
with open(filename, 'rb') as file:
while True:
chunk = file.read(chunk_size)
if not chunk:
break
process(chunk)

# Use generators for memory efficiency
def read_lines(filename):
with open(filename, 'r') as file:
for line in file:
yield line.strip()

for line in read_lines('large_file.txt'):
process(line)

12. Summary

ConceptDescriptionExample
open()Open fileopen('file.txt', 'r')
withContext managerwith open(...) as f:
read()Read entire filefile.read()
readline()Read one linefile.readline()
write()Write to filefile.write('text')
PathPath objectPath('file.txt')
csvCSV operationscsv.reader(file)
jsonJSON operationsjson.load(file)
Key Takeaways
  • Always use with statement for file operations
  • Use pathlib for modern path handling
  • Handle file exceptions appropriately
  • Use appropriate encodings (UTF-8 is standard)
  • Process large files in chunks
  • Use CSV/JSON modules for structured data

13. What's Next?

In Module 9 - Exception Handling, you'll learn:

  • Try-except-finally blocks
  • Raising exceptions
  • Creating custom exceptions
  • Exception hierarchies
  • Best practices for error handling

14. Practice Exercises

Exercise 1: File Statistics

Create a program that analyzes a text file and reports:

  • Number of lines
  • Number of words
  • Number of characters
  • Most common words

Exercise 2: Log File Parser

Parse a log file and extract error messages into a separate file.

Exercise 3: CSV to JSON Converter

Convert a CSV file to JSON format.

Exercise 4: File Organizer

Create a script that organizes files in a directory by extension.

Exercise 5: Configuration File Handler

Create a module that reads and writes configuration files in JSON format.

Solutions

Try solving these exercises on your own first. Solutions will be provided in the practice section.