The OSI Model: Your Mental Framework for Network Troubleshooting

Introduction: When All Else Fails, Layer by Layer

This week, I found myself knee-deep in troubleshooting Grafana Labs telemetry containers that refused to start properly in an OpenShift cluster (loki, mimir, prometheus, tempo, otel collector, spring boot services, etc.). The workloads looked good, the containers were running, but the telemetry data simply wouldn’t flow. Sound familiar?

After hours of fruitless searching and even getting blocked by AI assistance (yes, even Claude couldn’t crack it), I realized I was in need of applying long learned principles. That’s when I decided to document the mental framework I developed over years of debugging network issues.

This article presents my Network Troubleshooting Matrix—a systematic method based on the OSI model that takes you from symptoms to root cause, layer by layer. Feel free to use it as a reference, or create your own.

My Network Troubleshooting Matrix

Here’s the systematic approach I use for diagnosing network connectivity issues, organized by OSI layer from top to bottom:

OSI Layer	Step	Common Issues	Tools	Example Command	Focus Area
Layer 1 – Physical	Physical Layer	Cable unplugged, bad NIC, interface down, link negotiation failure, hardware error	ethtool, ip	`ethtool eth0`	Hardware / Interface
Layer 2-3 – Data Link / Network	Basic Connectivity	IP conflict, wrong gateway, no link, DHCP failure, interface down, no default route	ip, ping, netstat	`ip addr show`	Local Connectivity / IP Configuration
Layer 2-4 – Data Link / Network / Transport	Packet Analysis	Packet loss, TCP retransmissions, protocol issues, connection resets, latency spikes	tcpdump, netstat	`tcpdump -i eth0 port 80`	Packet Inspection / Protocol Analysis
Layer 3-4 – Network / Transport	Container Networking	Pod communication failure, service discovery issues, CNI problems, overlay network	kubectl, docker, nc, ping	`kubectl get pods -o wide`	Container Orchestration
Layer 3-4 – Network / Transport	Routing & Port Reachability	Firewall block, server unreachable, port closed, routing loop, no route to host	traceroute, nc, nmap, ping, ip	`traceroute example.com`	Routing / Firewall / Connectivity
Layer 4-7 – Transport / Application	Load Balancer / Proxy	Health check failures, upstream timeouts, SSL termination issues, 502/504 errors	curl, wget, openssl	`curl -I https://example.com`	Load Balancing / Reverse Proxy
Layer 7 – Application	Identify Problem	Misconfiguration, app changes, deployment errors, unclear symptoms	Review logs and symptoms	Check recent deployments and changes	Problem Definition / Operations
Layer 7 – Application	Application Layer	Wrong endpoint, auth failure, invalid SSL cert, API timeout, HTTP errors	curl, wget, openssl	`curl -v https://api.example.com`	Application / API Communication
Layer 7 – Application	DNS Resolution	DNS lookup failure, misconfigured nameserver, stale cache, NXDOMAIN	dig, nslookup, host	`dig example.com`	DNS / Name Resolution
Layer 7 – Application	Server / Service Health	Service stopped, port not listening, connections refused, service binding issues	netstat, nc, nmap	`netstat -tuln`	Server / Infrastructure

Why I start from Layer 7? I begin at Layer 7 because the higher you go in the OSI stack, the more semantic and human-readable the information becomes. It’s much easier to understand an error like “API request returned 403 Forbidden” than something like “dropped packets on eth0.” Starting from the application layer gives you immediate clues about intent, configuration, and context—before diving into the lower-level network details.

Introduction: When All Else Fails, Layer by Layer#

My Network Troubleshooting Matrix#

Reference Links#

Introduction: When All Else Fails, Layer by Layer

My Network Troubleshooting Matrix

Reference Links