Module 19 - JSON & Serialization
Serialization is the process of converting Python objects into a format that can be stored or transmitted. Python provides json for JSON data and pickle for Python-specific serialization.
1. JSON (JavaScript Object Notation)
JSON is a lightweight data-interchange format that's easy for humans to read and write, and easy for machines to parse and generate.
JSON Data Types
| JSON Type | Python Type |
|---|---|
| object | dict |
| array | list |
| string | str |
| number (int) | int |
| number (real) | float |
| true/false | True/False |
| null | None |
2. The json Module
2.1 json.dumps() - Python to JSON String
import json
# Dictionary to JSON
data = {
"name": "John Doe",
"age": 30,
"city": "New York",
"is_employee": True,
"skills": ["Python", "JavaScript", "SQL"]
}
json_string = json.dumps(data)
print(json_string)
# {"name": "John Doe", "age": 30, "city": "New York", ...}
# Pretty print with indentation
json_string = json.dumps(data, indent=4)
print(json_string)
"""
{
"name": "John Doe",
"age": 30,
"city": "New York",
"is_employee": true,
"skills": ["Python", "JavaScript", "SQL"]
}
"""
# Sort keys
json_string = json.dumps(data, indent=2, sort_keys=True)
print(json_string)
2.2 json.dump() - Python to JSON File
import json
data = {
"users": [
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob", "email": "bob@example.com"}
]
}
# Write to file
with open('data.json', 'w') as file:
json.dump(data, file, indent=4)
print("Data written to data.json")
2.3 json.loads() - JSON String to Python
import json
json_string = '{"name": "Alice", "age": 25, "city": "Boston"}'
# Parse JSON string
data = json.loads(json_string)
print(data) # {'name': 'Alice', 'age': 25, 'city': 'Boston'}
print(type(data)) # <class 'dict'>
print(data['name']) # Alice
# Parse nested JSON
json_string = '''
{
"person": {
"name": "John",
"contacts": {
"email": "john@example.com",
"phone": "555-1234"
}
}
}
'''
data = json.loads(json_string)
print(data['person']['contacts']['email']) # john@example.com
2.4 json.load() - JSON File to Python
import json
# Read from file
with open('data.json', 'r') as file:
data = json.load(file)
print(data)
print(f"Found {len(data['users'])} users")
3. Working with Complex Objects
3.1 Handling Non-Serializable Objects
import json
from datetime import datetime
# ❌ This fails - datetime is not JSON serializable
data = {
"event": "Meeting",
"timestamp": datetime.now()
}
# This raises TypeError
# json.dumps(data)
# ✅ Solution 1: Convert to string
data = {
"event": "Meeting",
"timestamp": datetime.now().isoformat()
}
json_string = json.dumps(data)
print(json_string)
# ✅ Solution 2: Custom encoder
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
data = {
"event": "Meeting",
"timestamp": datetime.now()
}
json_string = json.dumps(data, cls=DateTimeEncoder)
print(json_string)
3.2 Custom Object Serialization
import json
class Person:
def __init__(self, name, age, email):
self.name = name
self.age = age
self.email = email
def to_dict(self):
"""Convert to dictionary"""
return {
"name": self.name,
"age": self.age,
"email": self.email
}
@classmethod
def from_dict(cls, data):
"""Create from dictionary"""
return cls(data['name'], data['age'], data['email'])
# Serialize
person = Person("Alice", 25, "alice@example.com")
json_string = json.dumps(person.to_dict(), indent=2)
print(json_string)
# Deserialize
data = json.loads(json_string)
person_restored = Person.from_dict(data)
print(f"{person_restored.name}, {person_restored.age}")
3.3 Using __dict__
import json
class Book:
def __init__(self, title, author, year):
self.title = title
self.author = author
self.year = year
book = Book("1984", "George Orwell", 1949)
# Serialize using __dict__
json_string = json.dumps(book.__dict__, indent=2)
print(json_string)
# Deserialize
data = json.loads(json_string)
book_restored = Book(**data)
print(f"{book_restored.title} by {book_restored.author}")
4. JSON Options and Formatting
import json
data = {
"name": "John",
"score": 95.5,
"passed": True,
"notes": None
}
# Compact output (no whitespace)
compact = json.dumps(data, separators=(',', ':'))
print(compact) # {"name":"John","score":95.5,"passed":true,"notes":null}
# Pretty print
pretty = json.dumps(data, indent=4, sort_keys=True)
print(pretty)
# Ensure ASCII (escape non-ASCII characters)
data_unicode = {"name": "José", "city": "São Paulo"}
ascii_json = json.dumps(data_unicode, ensure_ascii=True)
print(ascii_json) # {"name": "Jos\u00e9", "city": "S\u00e3o Paulo"}
# Keep Unicode characters
unicode_json = json.dumps(data_unicode, ensure_ascii=False)
print(unicode_json) # {"name": "José", "city": "São Paulo"}
5. The pickle Module
pickle serializes Python objects into a binary format. Unlike JSON, it can serialize almost any Python object.
Never unpickle data from untrusted sources! Pickle can execute arbitrary code during deserialization.
5.1 Basic Pickling
import pickle
# Python objects
data = {
"numbers": [1, 2, 3, 4, 5],
"text": "Hello, World!",
"nested": {"key": "value"}
}
# Serialize to bytes
pickled = pickle.dumps(data)
print(type(pickled)) # <class 'bytes'>
# Deserialize
unpickled = pickle.loads(pickled)
print(unpickled) # Original data restored
5.2 Pickle to/from File
import pickle
# Save to file
data = {"name": "Alice", "scores": [95, 87, 92]}
with open('data.pkl', 'wb') as file:
pickle.dump(data, file)
# Load from file
with open('data.pkl', 'rb') as file:
loaded_data = pickle.load(file)
print(loaded_data)
5.3 Pickling Complex Objects
import pickle
class Student:
def __init__(self, name, grades):
self.name = name
self.grades = grades
def average(self):
return sum(self.grades) / len(self.grades)
# Create object
student = Student("Bob", [85, 90, 88, 92])
# Pickle the object
with open('student.pkl', 'wb') as file:
pickle.dump(student, file)
# Unpickle the object
with open('student.pkl', 'rb') as file:
student_loaded = pickle.load(file)
print(f"{student_loaded.name}: {student_loaded.average()}")
5.4 Pickling Functions and Lambdas
import pickle
# Regular function (can be pickled if defined at module level)
def greet(name):
return f"Hello, {name}!"
# Pickle function
pickled_func = pickle.dumps(greet)
# Unpickle and use
unpickled_func = pickle.loads(pickled_func)
print(unpickled_func("Alice")) # Hello, Alice!
# ❌ Lambda functions generally can't be pickled
# lambda_func = lambda x: x * 2
# pickle.dumps(lambda_func) # Raises error
6. JSON vs Pickle Comparison
| Feature | JSON | Pickle |
|---|---|---|
| Format | Text (human-readable) | Binary |
| Language | Language-independent | Python-specific |
| Security | Safe | Unsafe (can execute code) |
| Speed | Slower | Faster |
| Size | Larger | Smaller |
| Data Types | Limited (basic types) | Almost all Python objects |
| Use Case | API, config files, data exchange | Python-to-Python, caching |
import json
import pickle
import sys
data = {
"numbers": list(range(1000)),
"text": "Lorem ipsum " * 100
}
# Compare sizes
json_size = len(json.dumps(data))
pickle_size = len(pickle.dumps(data))
print(f"JSON size: {json_size} bytes")
print(f"Pickle size: {pickle_size} bytes")
print(f"Pickle is {json_size / pickle_size:.1f}x smaller")
- JSON: API responses, configuration files, data exchange between languages
- Pickle: Caching Python objects, saving ML models, temporary storage
7. Practical Examples
7.1 Configuration File
import json
# Save configuration
config = {
"database": {
"host": "localhost",
"port": 5432,
"name": "myapp_db"
},
"logging": {
"level": "INFO",
"file": "app.log"
}
}
with open('config.json', 'w') as file:
json.dump(config, file, indent=4)
# Load configuration
with open('config.json', 'r') as file:
config = json.load(file)
print(f"Database host: {config['database']['host']}")
7.2 API Response Handling
import json
# Simulate API response
api_response = '''
{
"status": "success",
"data": {
"users": [
{"id": 1, "name": "Alice", "active": true},
{"id": 2, "name": "Bob", "active": false}
]
},
"timestamp": "2024-01-15T10:30:00Z"
}
'''
# Parse response
response = json.loads(api_response)
# Extract data
if response['status'] == 'success':
users = response['data']['users']
active_users = [u for u in users if u['active']]
print(f"Active users: {[u['name'] for u in active_users]}")
7.3 Data Export/Import
import json
import csv
# Export CSV to JSON
def csv_to_json(csv_file, json_file):
data = []
with open(csv_file, 'r') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
data.append(row)
with open(json_file, 'w') as file:
json.dump(data, file, indent=2)
# Import JSON to CSV
def json_to_csv(json_file, csv_file):
with open(json_file, 'r') as file:
data = json.load(file)
with open(csv_file, 'w', newline='') as file:
if data:
writer = csv.DictWriter(file, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
7.4 Caching with Pickle
import pickle
import time
from functools import wraps
def cache_result(filename):
"""Decorator to cache function results"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
try:
# Try to load cached result
with open(filename, 'rb') as file:
print("Loading from cache...")
return pickle.load(file)
except FileNotFoundError:
# Compute and cache result
print("Computing result...")
result = func(*args, **kwargs)
with open(filename, 'wb') as file:
pickle.dump(result, file)
return result
return wrapper
return decorator
@cache_result('expensive_calc.pkl')
def expensive_calculation():
"""Simulate expensive operation"""
time.sleep(2)
return sum(i**2 for i in range(1000000))
# First call - computes and caches
result1 = expensive_calculation()
print(result1)
# Second call - loads from cache (instant)
result2 = expensive_calculation()
print(result2)
8. Best Practices
Validate JSON Data
import json
def load_json_safe(filename):
"""Safely load JSON with error handling"""
try:
with open(filename, 'r') as file:
return json.load(file)
except FileNotFoundError:
print(f"Error: {filename} not found")
return None
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON - {e}")
return None
data = load_json_safe('data.json')
if data:
print("Data loaded successfully")
Handle Missing Keys
import json
json_string = '{"name": "Alice", "age": 25}'
data = json.loads(json_string)
# ❌ May raise KeyError
# email = data['email']
# ✅ Safe access with get()
email = data.get('email', 'not_provided@example.com')
print(email)
# ✅ Check key existence
if 'email' in data:
email = data['email']
else:
email = 'not_provided@example.com'
9. Advanced Techniques
Streaming Large JSON Files
import json
# For very large JSON files, use streaming
def stream_large_json(filename):
"""Process large JSON file line by line"""
with open(filename, 'r') as file:
for line in file:
try:
obj = json.loads(line)
yield obj
except json.JSONDecodeError:
continue
# Usage
# for item in stream_large_json('large_data.json'):
# process(item)
Pretty Print JSON
import json
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
# Pretty print to console
print(json.dumps(data, indent=2, sort_keys=True))
# Or use pprint for complex structures
from pprint import pprint
pprint(data)
Summary
✅ JSON is language-independent, human-readable, and ideal for data exchange
✅ Use json.dumps() / json.dump() to serialize Python to JSON
✅ Use json.loads() / json.load() to deserialize JSON to Python
✅ Pickle serializes Python objects to binary format
✅ Never unpickle untrusted data - security risk
✅ Prefer JSON for APIs and config; pickle for Python-specific needs
Next Steps
In Module 20, you'll learn:
- Virtual environments with
venv - Package management with
pip - Creating and managing
requirements.txt - Isolating project dependencies
Practice Exercises
- Create a JSON file storing student records and write functions to add, update, and delete students
- Build a simple REST API response parser that handles nested JSON
- Implement a configuration manager that loads settings from JSON
- Create a data export tool that converts between JSON, CSV, and pickle formats
- Build a cache system using pickle with expiration timestamps
Create a data serialization library that:
- Automatically detects the best format (JSON or pickle) based on data type
- Handles custom objects with
to_dict()/from_dict()methods - Provides compression for large data
- Includes validation and error recovery
- Supports versioning for backward compatibility