Paul Butler – What does it mean to listen on a port?
This story explores some concepts in computer networking, inspired by Michael Nielsen’s idea of discovery fiction[1]. Code samples can also be found in this repo[2]. Excerpts use openbsd-flavoured netcat on Debian Linux; behviour and IPv6 support may vary by version.
In the corner of the student union building there is a coffee shop, and in the corner of the coffee shop are two students. Liz taps away at the keyboard of the battered hand-me-down MacBook her brother gave her when she moved away to college. To her left on the bench seat, Tim scrawls equations on a coil-bound notebook. Between them is a half-empty cup of room temperature coffee that Liz sporadically sips from to stay awake.
Across the room, the barista looks up from his phone to glance around the shop. He has one headphone in, the other dangling, while his phone plays the assigned viewing for his film class. It’s an unwritten rule at the student-run shop that employees on the overnight shift can use the long gaps between customers to catch up on homework. Besides Tim and Liz, two other male students sit alone, staring intently at their laptops, as they have for hours. Otherwise, the shop is empty.
Tim stops writing in the middle of a pencil stroke, rips the sheet out of his notebook, crumples it, and puts it next to a small collection of other crumpled up sheets.
“Shit, what time is it?” he asks.
Liz peers at the clock on her laptop. “Just after two”
Tim yawns and starts scrawling again at the top of a new page, but Liz interrupts him.
“Tim”
“What?”, replies Tim, exaggerating his irritation at being interrupted just as he was starting to write.
“What does it mean to listen on a port?”
“Huh”
“I have to write this web server thing for net”, abbreviating Computer Networks 201, a class that Tim had taken the prior semester.
“Yeah, I remember that one”
“So I listen for connections on a port”
“Port 80”, Tim confidently replies, hoping to cut the conversation short by preempting her question.
“Actually, we’re supposed to listen on 8080 so it can run without root, but that’s not the point.”
“Oh right. Then what is?”
“Well, what does it mean to listen on a port?”
“It means that other processes can connect to it on that port.” Tim looks confused at the question.
“Yeah, I know that, but how?”
Tim considers it for a few seconds before replying.
“I guess the operating system has a big table of ports and the process listening on them. When you bind to a port, it puts a pointer to your socket in that table.”
“Yeah, I guess.” says Liz, with a hesitant, dissatisfied tone.
The pair return to their independent work. After some time in silence, Tim mutters a triumphant “yes!” under his breath and crosses off a number on a printed piece of paper. He had finally found a proof he’d been struggling with for his calculus assignment.
Liz takes the opportunity to get his attention again.
“Hey Tim, look, I’m running two processes bound to the same port at the same time.”
She resizes two windows containing Python code:
# server1.py import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(('127.0.0.1', 8080)) sock.listen() print(sock.accept())
And next to it:
# server2.py import socket sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.bind(('127.0.0.1', 8080)) print(sock.recv(1024))
Then she shows him both programs running in their own terminal windows, through a shell connection to the university’s cslab3
Debian server.
Tim pivots the laptop towards himself. He opens a third terminal, pauses for a moment to search his tired brain, and types netcat 127.0.0.1 8080
.
netcat
runs and immediately exits. In another terminal window, the running python server1.py
program exits, printing:
(<socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8080), raddr=('127.0.0.1', 59558)>, ('127.0.0.1', 59558))
He studies the server1.py
code, thinking aloud.
“Ok, the server binds to a port, accepts the first socket to connect to it, and then exits. I see, so the tuple it printed was the result of the accept
call, and then it immediately exits. But now…”, moving the mouse cursor over the editor displaying server2.py
“…is this one even listening?”
He runs netcat 127.0.0.1 8080 -v
again in the same terminal as before, and it prints out:
netcat: connect to 127.0.0.1 port 8080 (tcp) failed: Connection refused
“See”, he says, “there’s a bug in your code. server2
is still running, but you never call listen
. It’s not actually doing anything with port 8080.”
“Sure it is, watch”, Liz says, snatching back her laptop.
She adds a -u
at the end of the netcat
command and hits enter. This time, it doesn’t give an error or exit immediately, but waits for keyboard input. Annoyed that Tim had been so quick to assume her code was buggy, she taps out timmy
, knowing the nickname bugs him.
The netcat
session ends silently, and simultaneously, the python server2.py
program exits printing:
b'timmy\n'
Tim recognizes Liz’s attempt to antagonize him but ignores it, not wanting to give her the satisfaction of getting a rise out of him. He gestures towards the keyboard. Liz twists the laptop in his direction and he types out man netcat
to bring up the manual[3] for netcat
, which describes the tool as the “TCP/IP swiss army knife”. He scrolls down to the -u
flag, which is tersely described by the documentation as “UDP mode”.
“Ah”, he says as a flash of recollection hits him. “I get it. server1 is listening over TCP, and server2 is listening over UDP. That must be what SOCK_DGRAM
means. So they’re different protocols. I guess the operating system has a separate table of ports for each one. I didn’t think net covered UDP until later.”
“Yeah. I read ahead.”
“Natch. How is it that you have time to read ahead but not have time to get these assignments done before the morning they’re due?”
“I could ask you the same about Counter Strike”, Liz counters.
Tim grumbles.
They go back to working in silence for several more minutes before Liz breaks the silence.
“Hey Tim, look at this. I can listen on the same port with two processes, even if they’re both TCP.”
Tim looks up from his work. This time Liz has just one Python program on the screen, and is running it in two terminals:
# server3.py import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1) sock.bind(('127.0.0.1', 8080)) sock.listen() print(sock.accept())
Liz explains “See, this command shows what process is listening on a port”. She types out lsof -i:8080
and hits return.
The program prints:
> lsof -i:8080 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 174265 liz 3u IPv4 23850797 0t0 TCP localhost:http-alt (LISTEN) python3 174337 liz 3u IPv4 23853188 0t0 TCP localhost:http-alt (LISTEN)
“What happens when you connect to it?”, asks Tim, this time with a bit of genuine curiosity in his voice.
“Watch.”
Liz runs netcat localhost 8080
once, and one of the server processes exits while the other stays running. Then she runs it again, and the other process exits.
Tim’s attention turns to the code, and he puts his finger near the screen to read over it. Liz, who hates a smudged screen, says “easy there!” and pushes his hand back. “I wouldn’t touch it”, he protests. Making an exaggerated show of keeping his hand a safe distance back, he points to the setsockopt
line and asks “Hey, what is this sourcery?”
“That’s setting a socket option to allow the port to be reused”
“Huh, that’s in the textbook?”
“Donno, I found it on Stack Overflow.”
“I didn’t know you could reuse a port like that.”
“I didn’t either” She pauses to consider. “So the operating system can’t just have a table of ports to sockets, it has to be a table of ports to a list of sockets. And then a second one for UDP. Maybe more for other protocols.”
“Yeah, that sounds right”, Tim agrees.
“Hmm”, says Liz, suddenly sounding less sure.
“What?”
“Uh, never mind”, she says, as she starts tapping away intently.
Tim returns to his assignment, and after a few minutes passes, he crosses another question off. He is getting close to done, and relaxes his stance a bit. Liz tilts her laptop towards him and says “check this out”. She shows him two programs.
# server4.py import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(('127.0.0.2', 8080)) sock.listen() print(sock.accept())
and next to it,
# server5.py import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(('127.0.0.3', 8080)) sock.listen() print(sock.accept())
“Aren’t these the same”, Tim asks, studying them.
“Look at the bind IP”
“Oh, so you’re listening on the same port but two different IPs. And that works?”
“Seems to. And I can connect to both of them.”
Liz runs netcat 127.0.0.2
and then netcat 127.0.0.3
to show him.
Tim ponders. “So let’s see. The operating system must have a table from each port and IP combination, to a socket. Well actually, two: one for TCP and another for UDP.”
“Yeah”, Liz nods. “And instead of just one socket, it can be multiple. But watch this.” She changes the IP in the server code to 0.0.0.0
.
# server6.py import socket sock socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(('0.0.0.0', 8080)) sock.listen() print(sock.accept())
“Now, when I run the server that binds to 127.0.0.2
, I get this”, she continues.
Traceback (most recent call last): File "server5.py", line 4, in <module> s.bind(('127.0.0.2', 8080)) OSError: [Errno 99] Cannot assign requested address
“But” she concludes, “if I run netcat 127.0.0.2 8080
, it connects to the server on 0.0.0.0
”, and shows him.
“Right, 0.0.0.0
means ‘bind to all local IPs’, didn’t lecture cover that? And addresses that start with 127.
are local loopback IPs, so it makes sense that they’d be bound by it.”
“Yeah, but how does it work? There are about 16 million IPs that start with 127.
. It’s not making a big table with all of them, right?”
“I guess not.” He doesn’t have an answer, and changes the subject. “So anyway, how’s the HTTP server going?” It’s rhetorical, he knows she hadn’t written a line of actual assignment code.
“Yeah, yeah” she replies, already diving into another experiment.
More time passes. Tim, having just completed his assignment, idly checks the time on his phone. He considers going home to his lumpy dorm mattress. He assesses that the bench is just about as comfortable, and tilts his head back against its tall cushion seat-back.
He is staring at the ceiling, half-asleep, when Liz pokes him saying “Tim, look at this”
She shows him another program:
# server7.py import socket sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) sock.bind(('::', 8080)) sock.listen() print(sock.accept())
“Check this out. It’s an IPv6 server”
Tim yawns and leans in. By now, the morning sun is starting to appear through the window behind the bench they sit on. The two other students had quietly left in the early hours of the morning, and the shop’s first customer of the day has arrived and is waiting for her to-go coffee.
“What are the colons, again?” Tim asks.
“It’s a short form for eight zeros in IPv6, it has the same meaning as 0.0.0.0
in IPv4”
“So this is saying to listen on all local IPv6 IPs? Is that how IPv6 works?”
“Yeah, basically.”
She types netcat "::1" 8080 -v
, explaining “::1
is a loopback address in IPv6. It’s like ‘home’.”
“So like 127.0.0.1
in, uh, regular IPs”
“IPv4. Yeah, exactly. But watch this. According to lsof
, I’m only listening on IPv6, see?” Liz runs lsof -i :8080
, which prints one row.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 455017 liz 3u IPv6 25152485 0t0 TCP *:http-alt (LISTEN)
“But”, Liz continues, “I can connect to it through an IPv4 IP.”
netcat 127.0.0.1 8080 -v
“Huh”, Tim murmurs. “What about the other way? Can you connect to an IPv4 server from an IPv6 IP?”
“No, watch this.”
She runs python3 server6.py
, and then netcat "::1" 8080 -v
, which prints
netcat: connect to ::1 port 8080 (tcp) failed: Connection refused
Tim asks “What happens if you try to start listening on 8080 on IPv6 when that IPv4 server is still running?”
Liz shows him, running python server7.py
.
Traceback (most recent call last): File "server7.py", line 4, in <module> s.bind(('::', 8080)) OSError: [Errno 98] Address already in use
“But look at this”, she says, pulling up another code listing.
# server8.py import socket sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 1) sock.bind(('::', 8080)) sock.listen() print(sock.accept())
She points to the setsockopt
line, explaining “When I add this, I can listen on IPv6 and IPv4 on the same port, from different processes.”
She runs python server8.py
, and then lsof -i :8080
.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 460409 liz 3u IPv6 25188010 0t0 TCP *:http-alt (LISTEN) python3 460813 liz 3u IPv4 25191765 0t0 TCP *:http-alt (LISTEN)
Tim takes inventory of what Liz has shown him. “So when you listen on a port, you’re really listening on a combination of a port, an IP, a protocol, and an IP version?”
“Yeah, unless you listen on all local IPs. And if you listen on all IPv6 IPs, you also listen on all IPv4 IPs, unless you specifically ask not to before you call bind.”
“Right. So the operating system must have, like, a hash map from a port and IP pair to a socket, for each combination of TCP or UDP, IPv4 or IPv6.”
“To a list of sockets”, Liz corrects. “Remember how I could listen on more than one?”
“Oh yeah.”
“But it also has to handle listening on all ‘home’ IPs, and to be able to find a socket listening on IPv6 from an IPv4 IP.”
“Anyway, I have to hand this in”, Tim says says, gesturing to the loose collection of papers in his hand. “Are you going to finish that HTTP server before it’s due?”
Liz shrugs “I have a spare late day to use.”
Tim shakes his head in mock paternal disapproval.
Liz rolls her eyes and says “run along, Tim”
“Same time next week?”
“Yeah.”