File Handling · PrepDeck

Reading and writing files — text, CSV and JSON — plus paths, encodings, and why 'with' is non-negotiable.

Making data outlive the program

Everything your program holds in variables and collections lives in RAM — and RAM is wiped the moment the program exits. File handling is how programs persist data to storage and read it back: the simplest form of memory that survives.

It's also the gateway drug to data work: logs, configs, CSV exports, JSON APIs — a huge share of real engineering is reading one file shape and writing another. (And when files stop being enough, you'll know exactly why databases exist.)

Reading and writing text

Python

# Python — write (creates or OVERWRITES the file)
with open("notes.txt", "w") as f:
    f.write("First line\n")
    f.write("Second line\n")

# read it all back
with open("notes.txt", "r") as f:
    content = f.read()

# or the memory-safe way: line by line
with open("notes.txt", "r") as f:
    for line in f:
        print(line.strip())      # strip() removes the trailing newline

# append — add to the end instead of overwriting
with open("notes.txt", "a") as f:
    f.write("Third line\n")

Java

// Java — try-with-resources is Java's 'with'
try (var writer = Files.newBufferedWriter(Path.of("notes.txt"))) {
    writer.write("First line\n");
}
List<String> lines = Files.readAllLines(Path.of("notes.txt"));

C++

// C++ — RAII closes the stream when it goes out of scope
#include <fstream>
std::ofstream out("notes.txt");
out << "First line\n";

std::ifstream in("notes.txt");
std::string line;
while (std::getline(in, line)) {
    std::cout << line << "\n";
}

The pieces, named:

Mode — "r" read, "w" write (⚠️ truncates: an existing file is erased the instant you open it), "a" append.
The file handle (f) — your connection to the file, with a moving cursor: reading advances it, which is why a second f.read() returns nothing.
with / try-with-resources / RAII — guarantees the file closes even if the code inside throws. This is the exceptions page cleanup rule applied to its most common case. Unclosed files = buffered writes that never hit disk, locked files on Windows, resource leaks under load. Always with.

Files can be missing, unreadable, or full-disk — file code is exception country:

Python

try:
    with open("config.txt") as f:
        config = f.read()
except FileNotFoundError:
    config = DEFAULT_CONFIG          # a default is a fine recovery here

Structured files: CSV and JSON

Raw text is for humans; programs exchange structured data. Two formats dominate, and both map perfectly onto collections you already know:

CSV — tables as text

CSV (Comma-Separated Values) is the spreadsheet/export format: one row per line, commas between columns.

name,city,score
Asha,Mumbai,91
Rahul,Delhi,84

Python

import csv

with open("scores.csv") as f:
    for row in csv.DictReader(f):          # each row becomes a dict!
        print(row["name"], int(row["score"]))

with open("out.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "score"])
    writer.writeheader()
    writer.writerow({"name": "Meera", "score": 95})

Use the csv module, not line.split(",") — real CSVs contain quoted commas ("Mumbai, India"), and hand-splitting them is a classic bug.

JSON — nested data as text

You met JSON in Level 0 as the language of APIs. It's also the standard file format for configs and data dumps — and it round-trips exactly to dicts and lists:

Python

import json

settings = {"theme": "dark", "fontSize": 14, "recent": ["a.txt", "b.txt"]}

with open("settings.json", "w") as f:
    json.dump(settings, f, indent=2)       # dict → file

with open("settings.json") as f:
    settings = json.load(f)                # file → dict
print(settings["recent"][0])               # "a.txt"

dump/load (and Java's Jackson, C++'s nlohmann/json) is called serialization — turning in-memory structures into bytes and back. The concept scales from this 4-line config straight up to every API response in Level 7.

Rule of thumb: CSV for flat tables (rows × columns, opens in Excel), JSON for nested anything (configs, records with lists inside).

Paths — where files actually are

open("notes.txt") means "in the current working directory" — which is wherever the program was launched from, not where the script file lives. This single fact causes a generation of "works in my editor, fails in the terminal" confusion.

Python

from pathlib import Path

p = Path("data") / "2026" / "scores.csv"   # builds data/2026/scores.csv — / joins safely
p.parent.mkdir(parents=True, exist_ok=True)  # create folders if missing
if p.exists():
    rows = p.read_text().splitlines()

Use pathlib (Python) / Path (Java) / std::filesystem::path (C++) instead of gluing strings with + — they handle the Windows-\ vs Unix-/ separator difference and edge cases for you.

One more word you'll meet: encoding — how text becomes bytes on disk. The answer is UTF-8, everywhere, always (open(..., encoding="utf-8")). If you ever see Ashä instead of Asha, an encoding was mismatched.

Industry perspective

Logs are files. Every server you'll ever run appends diagnostic lines to log files; reading/parsing them is daily backend life (Level 7), and log pipelines are a whole system-design topic (ingestion case study).
Configs are files. JSON/YAML files configure every real deployment — Level 10 (Docker, Kubernetes, Terraform) is largely files describing infrastructure.
Data engineering starts here. "Read this 10 GB CSV, clean it, write JSON" is a real job interview task — the line-by-line iteration pattern (not f.read()!) is what makes it possible without 10 GB of RAM.
The file→database moment: the instant two users write the same file concurrently, or you need to search it fast, you've rediscovered why databases exist. Files are single-writer, scan-to-search; that's the boundary.

Common beginner mistakes

Opening with "w" to read-modify-write. "w" truncates immediately — your data is gone before you read it. Read first, then write.
No with — file never closed; writes may sit in a buffer and never reach disk if the program crashes.
f.read() on huge files — loads everything into RAM. Iterate lines.
Building paths with string concatenation — folder + "/" + name breaks across OSes and edge cases; use path libraries.
Hand-parsing CSV/JSON with split and slicing — quoted commas, nested braces, escapes… use the standard parsers; they exist because everyone who hand-rolled one regretted it.
Storing secrets in code instead of files/env — passwords belong in config files excluded from git or environment variables (security), never hardcoded.

Think it through

The interview reflex this page builds is "stream, don't slurp." Reason through the canonical big-file task before revealing — the fix is the same accumulator pattern from Control Flow, now over a file too big for RAM.

Think it through: Scan a 50 GB log on 8 GB of RAMBeginner — streaming + accumulators0/5 stages

PROBLEMA 50 GB server log won't fit in memory. In one pass, count how many lines contain 'ERROR' and report the most frequent trailing error code.

1
Spot the trap
“What's the obvious-but-fatal first instinct here?”
2
Stream + accumulate
“If I can't keep the file, what CAN I keep?”
unlocks after the stage above
3
Stay safe with 'with'
“What wraps the loop, and why?”
unlocks after the stage above
4
Code it
“with + a streaming loop + two accumulators.”
unlocks after the stage above
5
Cost & when to preprocess
“Cost, and when does one-pass streaming stop being enough?”
unlocks after the stage above

Check yourself

Check yourself0/4 answered

1. Why is `with open(...) as f:` strongly preferred over a bare open()?

2. You need to read-modify-write a file. Why is opening it with mode 'w' first a bug?

3. Processing a 50 GB file on 8 GB of RAM — the key technique is:

4. CSV vs JSON — which rule of thumb is correct?

Interview perspective

Practice

Beginner

Write a program that appends one diary line per run (with a timestamp), and another that prints all entries numbered. Run them several times.
Save your gradebook dict from the Collections page to JSON; load it back; verify a student's average survives the round trip.

Intermediate

Word frequency, for real: read any long text file, lowercase it, count words with a dict, write the top 20 to top_words.csv (word, count). Streams + collections + CSV in one program.
Make the diary robust: what happens on the very first run if the file doesn't exist? Handle it cleanly with the exceptions toolkit — once with try/except, once with Path.exists(). Which reads better here, and why?

Advanced

Build a tiny key-value store: kv.py set name Asha writes to a JSON file, kv.py get name reads it back (command-line args: sys.argv). Then break it on purpose: what happens if two copies run set at the same instant? Write one paragraph connecting what you saw to transactions.

Level 1 complete. You can now write real programs: variables, control flow, functions, objects, errors, collections and persistence. Next stop is Level 2 — Data Structures, where the question flips from "does it work?" to "how fast, and why?"