Making data outlive the program
Everything your program holds in variables and collections lives in RAM — and RAM is wiped the moment the program exits. File handling is how programs persist data to storage and read it back: the simplest form of memory that survives.
It's also the gateway drug to data work: logs, configs, CSV exports, JSON APIs — a huge share of real engineering is reading one file shape and writing another. (And when files stop being enough, you'll know exactly why databases exist.)
Reading and writing text
# Python — write (creates or OVERWRITES the file)
with open("notes.txt", "w") as f:
f.write("First line\n")
f.write("Second line\n")
# read it all back
with open("notes.txt", "r") as f:
content = f.read()
# or the memory-safe way: line by line
with open("notes.txt", "r") as f:
for line in f:
print(line.strip()) # strip() removes the trailing newline
# append — add to the end instead of overwriting
with open("notes.txt", "a") as f:
f.write("Third line\n")
// Java — try-with-resources is Java's 'with'
try (var writer = Files.newBufferedWriter(Path.of("notes.txt"))) {
writer.write("First line\n");
}
List<String> lines = Files.readAllLines(Path.of("notes.txt"));
// C++ — RAII closes the stream when it goes out of scope
#include <fstream>
std::ofstream out("notes.txt");
out << "First line\n";
std::ifstream in("notes.txt");
std::string line;
while (std::getline(in, line)) {
std::cout << line << "\n";
}
The pieces, named:
- Mode —
"r"read,"w"write (⚠️ truncates: an existing file is erased the instant you open it),"a"append. - The file handle (
f) — your connection to the file, with a moving cursor: reading advances it, which is why a secondf.read()returns nothing. with/ try-with-resources / RAII — guarantees the file closes even if the code inside throws. This is the exceptions page cleanup rule applied to its most common case. Unclosed files = buffered writes that never hit disk, locked files on Windows, resource leaks under load. Alwayswith.
Files can be missing, unreadable, or full-disk — file code is exception country:
try:
with open("config.txt") as f:
config = f.read()
except FileNotFoundError:
config = DEFAULT_CONFIG # a default is a fine recovery here
Structured files: CSV and JSON
Raw text is for humans; programs exchange structured data. Two formats dominate, and both map perfectly onto collections you already know:
CSV — tables as text
CSV (Comma-Separated Values) is the spreadsheet/export format: one row per line, commas between columns.
name,city,score
Asha,Mumbai,91
Rahul,Delhi,84
import csv
with open("scores.csv") as f:
for row in csv.DictReader(f): # each row becomes a dict!
print(row["name"], int(row["score"]))
with open("out.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "score"])
writer.writeheader()
writer.writerow({"name": "Meera", "score": 95})
Use the csv module, not line.split(",") — real CSVs contain quoted
commas ("Mumbai, India"), and hand-splitting them is a classic bug.
JSON — nested data as text
You met JSON in Level 0 as the language of APIs. It's also the standard file format for configs and data dumps — and it round-trips exactly to dicts and lists:
import json
settings = {"theme": "dark", "fontSize": 14, "recent": ["a.txt", "b.txt"]}
with open("settings.json", "w") as f:
json.dump(settings, f, indent=2) # dict → file
with open("settings.json") as f:
settings = json.load(f) # file → dict
print(settings["recent"][0]) # "a.txt"
dump/load (and Java's Jackson, C++'s nlohmann/json) is called
serialization — turning in-memory structures into bytes and back. The
concept scales from this 4-line config straight up to every API response in
Level 7.
Rule of thumb: CSV for flat tables (rows × columns, opens in Excel), JSON for nested anything (configs, records with lists inside).
Paths — where files actually are
open("notes.txt") means "in the current working directory" — which is
wherever the program was launched from, not where the script file lives.
This single fact causes a generation of "works in my editor, fails in the
terminal" confusion.
from pathlib import Path
p = Path("data") / "2026" / "scores.csv" # builds data/2026/scores.csv — / joins safely
p.parent.mkdir(parents=True, exist_ok=True) # create folders if missing
if p.exists():
rows = p.read_text().splitlines()
Use pathlib (Python) / Path (Java) / std::filesystem::path (C++)
instead of gluing strings with + — they handle the Windows-\ vs
Unix-/ separator difference and edge cases for you.
One more word you'll meet: encoding — how text becomes bytes on disk.
The answer is UTF-8, everywhere, always (open(..., encoding="utf-8")).
If you ever see Ashä instead of Asha, an encoding was mismatched.
Industry perspective
- Logs are files. Every server you'll ever run appends diagnostic lines to log files; reading/parsing them is daily backend life (Level 7), and log pipelines are a whole system-design topic (ingestion case study).
- Configs are files. JSON/YAML files configure every real deployment — Level 10 (Docker, Kubernetes, Terraform) is largely files describing infrastructure.
- Data engineering starts here. "Read this 10 GB CSV, clean it, write
JSON" is a real job interview task — the line-by-line iteration pattern
(not
f.read()!) is what makes it possible without 10 GB of RAM. - The file→database moment: the instant two users write the same file concurrently, or you need to search it fast, you've rediscovered why databases exist. Files are single-writer, scan-to-search; that's the boundary.
Common beginner mistakes
- Opening with
"w"to read-modify-write."w"truncates immediately — your data is gone before you read it. Read first, then write. - No
with— file never closed; writes may sit in a buffer and never reach disk if the program crashes. f.read()on huge files — loads everything into RAM. Iterate lines.- Building paths with string concatenation —
folder + "/" + namebreaks across OSes and edge cases; use path libraries. - Hand-parsing CSV/JSON with
splitand slicing — quoted commas, nested braces, escapes… use the standard parsers; they exist because everyone who hand-rolled one regretted it. - Storing secrets in code instead of files/env — passwords belong in config files excluded from git or environment variables (security), never hardcoded.
Interview perspective
Practice
Beginner
- Write a program that appends one diary line per run (with a timestamp), and another that prints all entries numbered. Run them several times.
- Save your gradebook dict from the Collections page to JSON; load it back; verify a student's average survives the round trip.
Intermediate
- Word frequency, for real: read any long text file, lowercase it, count
words with a dict, write the top 20 to
top_words.csv(word, count). Streams + collections + CSV in one program. - Make the diary robust: what happens on the very first run if the file
doesn't exist? Handle it cleanly with the
exceptions toolkit — once with try/except,
once with
Path.exists(). Which reads better here, and why?
Advanced
- Build a tiny key-value store:
kv.py set name Ashawrites to a JSON file,kv.py get namereads it back (command-line args:sys.argv). Then break it on purpose: what happens if two copies runsetat the same instant? Write one paragraph connecting what you saw to transactions.
Level 1 complete. You can now write real programs: variables, control flow, functions, objects, errors, collections and persistence. Next stop is Level 2 — Data Structures, where the question flips from "does it work?" to "how fast, and why?"