Skip to main content

Module 19 - JSON & Serialization

Serialization is the process of converting Python objects into a format that can be stored or transmitted. Python provides json for JSON data and pickle for Python-specific serialization.


1. JSON (JavaScript Object Notation)

JSON is a lightweight data-interchange format that's easy for humans to read and write, and easy for machines to parse and generate.

JSON Data Types

JSON TypePython Type
objectdict
arraylist
stringstr
number (int)int
number (real)float
true/falseTrue/False
nullNone

2. The json Module

2.1 json.dumps() - Python to JSON String

import json

# Dictionary to JSON
data = {
"name": "John Doe",
"age": 30,
"city": "New York",
"is_employee": True,
"skills": ["Python", "JavaScript", "SQL"]
}

json_string = json.dumps(data)
print(json_string)
# {"name": "John Doe", "age": 30, "city": "New York", ...}

# Pretty print with indentation
json_string = json.dumps(data, indent=4)
print(json_string)
"""
{
"name": "John Doe",
"age": 30,
"city": "New York",
"is_employee": true,
"skills": ["Python", "JavaScript", "SQL"]
}
"""

# Sort keys
json_string = json.dumps(data, indent=2, sort_keys=True)
print(json_string)

2.2 json.dump() - Python to JSON File

import json

data = {
"users": [
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob", "email": "bob@example.com"}
]
}

# Write to file
with open('data.json', 'w') as file:
json.dump(data, file, indent=4)

print("Data written to data.json")

2.3 json.loads() - JSON String to Python

import json

json_string = '{"name": "Alice", "age": 25, "city": "Boston"}'

# Parse JSON string
data = json.loads(json_string)
print(data) # {'name': 'Alice', 'age': 25, 'city': 'Boston'}
print(type(data)) # <class 'dict'>
print(data['name']) # Alice

# Parse nested JSON
json_string = '''
{
"person": {
"name": "John",
"contacts": {
"email": "john@example.com",
"phone": "555-1234"
}
}
}
'''

data = json.loads(json_string)
print(data['person']['contacts']['email']) # john@example.com

2.4 json.load() - JSON File to Python

import json

# Read from file
with open('data.json', 'r') as file:
data = json.load(file)

print(data)
print(f"Found {len(data['users'])} users")

3. Working with Complex Objects

3.1 Handling Non-Serializable Objects

import json
from datetime import datetime

# ❌ This fails - datetime is not JSON serializable
data = {
"event": "Meeting",
"timestamp": datetime.now()
}

# This raises TypeError
# json.dumps(data)

# ✅ Solution 1: Convert to string
data = {
"event": "Meeting",
"timestamp": datetime.now().isoformat()
}
json_string = json.dumps(data)
print(json_string)

# ✅ Solution 2: Custom encoder
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)

data = {
"event": "Meeting",
"timestamp": datetime.now()
}
json_string = json.dumps(data, cls=DateTimeEncoder)
print(json_string)

3.2 Custom Object Serialization

import json

class Person:
def __init__(self, name, age, email):
self.name = name
self.age = age
self.email = email

def to_dict(self):
"""Convert to dictionary"""
return {
"name": self.name,
"age": self.age,
"email": self.email
}

@classmethod
def from_dict(cls, data):
"""Create from dictionary"""
return cls(data['name'], data['age'], data['email'])

# Serialize
person = Person("Alice", 25, "alice@example.com")
json_string = json.dumps(person.to_dict(), indent=2)
print(json_string)

# Deserialize
data = json.loads(json_string)
person_restored = Person.from_dict(data)
print(f"{person_restored.name}, {person_restored.age}")

3.3 Using __dict__

import json

class Book:
def __init__(self, title, author, year):
self.title = title
self.author = author
self.year = year

book = Book("1984", "George Orwell", 1949)

# Serialize using __dict__
json_string = json.dumps(book.__dict__, indent=2)
print(json_string)

# Deserialize
data = json.loads(json_string)
book_restored = Book(**data)
print(f"{book_restored.title} by {book_restored.author}")

4. JSON Options and Formatting

import json

data = {
"name": "John",
"score": 95.5,
"passed": True,
"notes": None
}

# Compact output (no whitespace)
compact = json.dumps(data, separators=(',', ':'))
print(compact) # {"name":"John","score":95.5,"passed":true,"notes":null}

# Pretty print
pretty = json.dumps(data, indent=4, sort_keys=True)
print(pretty)

# Ensure ASCII (escape non-ASCII characters)
data_unicode = {"name": "José", "city": "São Paulo"}
ascii_json = json.dumps(data_unicode, ensure_ascii=True)
print(ascii_json) # {"name": "Jos\u00e9", "city": "S\u00e3o Paulo"}

# Keep Unicode characters
unicode_json = json.dumps(data_unicode, ensure_ascii=False)
print(unicode_json) # {"name": "José", "city": "São Paulo"}

5. The pickle Module

pickle serializes Python objects into a binary format. Unlike JSON, it can serialize almost any Python object.

Security

Never unpickle data from untrusted sources! Pickle can execute arbitrary code during deserialization.

5.1 Basic Pickling

import pickle

# Python objects
data = {
"numbers": [1, 2, 3, 4, 5],
"text": "Hello, World!",
"nested": {"key": "value"}
}

# Serialize to bytes
pickled = pickle.dumps(data)
print(type(pickled)) # <class 'bytes'>

# Deserialize
unpickled = pickle.loads(pickled)
print(unpickled) # Original data restored

5.2 Pickle to/from File

import pickle

# Save to file
data = {"name": "Alice", "scores": [95, 87, 92]}

with open('data.pkl', 'wb') as file:
pickle.dump(data, file)

# Load from file
with open('data.pkl', 'rb') as file:
loaded_data = pickle.load(file)

print(loaded_data)

5.3 Pickling Complex Objects

import pickle

class Student:
def __init__(self, name, grades):
self.name = name
self.grades = grades

def average(self):
return sum(self.grades) / len(self.grades)

# Create object
student = Student("Bob", [85, 90, 88, 92])

# Pickle the object
with open('student.pkl', 'wb') as file:
pickle.dump(student, file)

# Unpickle the object
with open('student.pkl', 'rb') as file:
student_loaded = pickle.load(file)

print(f"{student_loaded.name}: {student_loaded.average()}")

5.4 Pickling Functions and Lambdas

import pickle

# Regular function (can be pickled if defined at module level)
def greet(name):
return f"Hello, {name}!"

# Pickle function
pickled_func = pickle.dumps(greet)

# Unpickle and use
unpickled_func = pickle.loads(pickled_func)
print(unpickled_func("Alice")) # Hello, Alice!

# ❌ Lambda functions generally can't be pickled
# lambda_func = lambda x: x * 2
# pickle.dumps(lambda_func) # Raises error

6. JSON vs Pickle Comparison

FeatureJSONPickle
FormatText (human-readable)Binary
LanguageLanguage-independentPython-specific
SecuritySafeUnsafe (can execute code)
SpeedSlowerFaster
SizeLargerSmaller
Data TypesLimited (basic types)Almost all Python objects
Use CaseAPI, config files, data exchangePython-to-Python, caching
import json
import pickle
import sys

data = {
"numbers": list(range(1000)),
"text": "Lorem ipsum " * 100
}

# Compare sizes
json_size = len(json.dumps(data))
pickle_size = len(pickle.dumps(data))

print(f"JSON size: {json_size} bytes")
print(f"Pickle size: {pickle_size} bytes")
print(f"Pickle is {json_size / pickle_size:.1f}x smaller")
When to Use Which
  • JSON: API responses, configuration files, data exchange between languages
  • Pickle: Caching Python objects, saving ML models, temporary storage

7. Practical Examples

7.1 Configuration File

import json

# Save configuration
config = {
"database": {
"host": "localhost",
"port": 5432,
"name": "myapp_db"
},
"logging": {
"level": "INFO",
"file": "app.log"
}
}

with open('config.json', 'w') as file:
json.dump(config, file, indent=4)

# Load configuration
with open('config.json', 'r') as file:
config = json.load(file)

print(f"Database host: {config['database']['host']}")

7.2 API Response Handling

import json

# Simulate API response
api_response = '''
{
"status": "success",
"data": {
"users": [
{"id": 1, "name": "Alice", "active": true},
{"id": 2, "name": "Bob", "active": false}
]
},
"timestamp": "2024-01-15T10:30:00Z"
}
'''

# Parse response
response = json.loads(api_response)

# Extract data
if response['status'] == 'success':
users = response['data']['users']
active_users = [u for u in users if u['active']]
print(f"Active users: {[u['name'] for u in active_users]}")

7.3 Data Export/Import

import json
import csv

# Export CSV to JSON
def csv_to_json(csv_file, json_file):
data = []
with open(csv_file, 'r') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
data.append(row)

with open(json_file, 'w') as file:
json.dump(data, file, indent=2)

# Import JSON to CSV
def json_to_csv(json_file, csv_file):
with open(json_file, 'r') as file:
data = json.load(file)

with open(csv_file, 'w', newline='') as file:
if data:
writer = csv.DictWriter(file, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)

7.4 Caching with Pickle

import pickle
import time
from functools import wraps

def cache_result(filename):
"""Decorator to cache function results"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
try:
# Try to load cached result
with open(filename, 'rb') as file:
print("Loading from cache...")
return pickle.load(file)
except FileNotFoundError:
# Compute and cache result
print("Computing result...")
result = func(*args, **kwargs)
with open(filename, 'wb') as file:
pickle.dump(result, file)
return result
return wrapper
return decorator

@cache_result('expensive_calc.pkl')
def expensive_calculation():
"""Simulate expensive operation"""
time.sleep(2)
return sum(i**2 for i in range(1000000))

# First call - computes and caches
result1 = expensive_calculation()
print(result1)

# Second call - loads from cache (instant)
result2 = expensive_calculation()
print(result2)

8. Best Practices

Validate JSON Data

import json

def load_json_safe(filename):
"""Safely load JSON with error handling"""
try:
with open(filename, 'r') as file:
return json.load(file)
except FileNotFoundError:
print(f"Error: {filename} not found")
return None
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON - {e}")
return None

data = load_json_safe('data.json')
if data:
print("Data loaded successfully")

Handle Missing Keys

import json

json_string = '{"name": "Alice", "age": 25}'
data = json.loads(json_string)

# ❌ May raise KeyError
# email = data['email']

# ✅ Safe access with get()
email = data.get('email', 'not_provided@example.com')
print(email)

# ✅ Check key existence
if 'email' in data:
email = data['email']
else:
email = 'not_provided@example.com'

9. Advanced Techniques

Streaming Large JSON Files

import json

# For very large JSON files, use streaming
def stream_large_json(filename):
"""Process large JSON file line by line"""
with open(filename, 'r') as file:
for line in file:
try:
obj = json.loads(line)
yield obj
except json.JSONDecodeError:
continue

# Usage
# for item in stream_large_json('large_data.json'):
# process(item)

Pretty Print JSON

import json

data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

# Pretty print to console
print(json.dumps(data, indent=2, sort_keys=True))

# Or use pprint for complex structures
from pprint import pprint
pprint(data)

Summary

✅ JSON is language-independent, human-readable, and ideal for data exchange
✅ Use json.dumps() / json.dump() to serialize Python to JSON
✅ Use json.loads() / json.load() to deserialize JSON to Python
✅ Pickle serializes Python objects to binary format
Never unpickle untrusted data - security risk
✅ Prefer JSON for APIs and config; pickle for Python-specific needs


Next Steps

In Module 20, you'll learn:

  • Virtual environments with venv
  • Package management with pip
  • Creating and managing requirements.txt
  • Isolating project dependencies

Practice Exercises

  1. Create a JSON file storing student records and write functions to add, update, and delete students
  2. Build a simple REST API response parser that handles nested JSON
  3. Implement a configuration manager that loads settings from JSON
  4. Create a data export tool that converts between JSON, CSV, and pickle formats
  5. Build a cache system using pickle with expiration timestamps
Challenge

Create a data serialization library that:

  • Automatically detects the best format (JSON or pickle) based on data type
  • Handles custom objects with to_dict() / from_dict() methods
  • Provides compression for large data
  • Includes validation and error recovery
  • Supports versioning for backward compatibility