A Short Introduction to Troubleshooting Docker Networks
I recently just built an unRAID rig which has now been deployed to my DMZ. It’s great, but something I have been struggling with is that periodically my FileZilla docker container will be unable to connect to my FTP server, erroring out with “ENETUNREACH – Network unreachable”. It seems to be exacerbated by large file moves (like when I’m moving whole directories).
I started by researching the error message but there isn’t a whole lot on the formal definition of “ENETUNREACH – Network unreachable”. Presumably because “Network unreachable” is self-explanatory. There are a lot of forum posts about FileZilla being blocked by an antivirus suite’s software firewall, however, so I feel comfortable with the error description given. There are no subtleties to the message “ENETUNREACH – Network unreachable”- it simply means that FileZilla can’t connect to the network (i.e. it can’t dial out).
Docker Networking Overview:
Let’s begin with a brief introduction to Docker networking. At a high level, Docker containers are very similar to virtual machines (VMs) but without the overhead of having to run a duplicate OS. The diagram below, shows our Docker containers which run inside of our Host:
The above diagram shows a basic schematic of a typical Docker network. For my containers, I have the network configuration set to bridge mode which means that, at least for networking purposes, the Host acts like a glorified router to the Container. Put another way, when I want to communicate with the container over the network, I just put in the IP address of my Host and a port specific to that Container. The Host then takes that traffic and forwards it to the internal IP address of the Container at whatever port it’s listening on. This is also true for outbound traffic. In essence, bridge mode on Docker is nothing more than network address translation (NAT). Something you’re undoubtedly familiar with if you have a homelab or are self-hosted like I am.
Back to our problem. We know FileZilla can’t dial out. Referring back to the diagram, we see that there are three checkpoints where traffic could be failing: (1) the interface between the Docker container and Host, (2) the interface between the Host and the router, and (3) the interface between the router and the WAN/internet.
Troubleshooting the Connection:
Since we know our problem is in the outbound direction, let’s focus in that direction. What’s the simplest way to check for a connection? Ping it!
Testing each network interface we identified above in turn:
1) Ping the Host from the Container:
We can accomplish this by executing a command in the Container. This can be accomplished in unRAID by clicking on the Container and selecting console. If you aren’t running unRAID, this is the same as running <docker exec -it [container name here| FileZilla for me] sh>. Now you can simply ping the Host using <ping [insert IP address of Host on LAN here]. Pinging the Host in my case greeted me with a response so we know that connection isn’t the culprit:
Note: You can also ping an address on the internet. When I did that here, I did not receive a response, confirming that the outbound connection is indeed broken somewhere along the chain:
2) Ping the Router from the Host:
This is straightforward enough. In unRAID you just open your console/terminal on the Host (“Tower”) and ping the router’s IP address. I also received a response here so we know that we have a connection to router. We probably already knew this since we could connect to the Host over the network in the first place though.
Pinging an address on the internet also received no response meaning my server was not able to access the internet. This further confirmed that the chain is broken still further upstream.
3) Ping an address on the WAN/internet from the Router:
This can be a little bit trickier. Thankfully I run dd-wrt on my routers so I can easily initiate a terminal session on the router with <terminal [insert router local IP address here]>. Pinging an address on the internet resulted in no response here as well! Well, we know this guy is the last interface in the chain so we know the problem is with him.
[There’s a little bit more to know here which is more specific to my network’s “unique” architecture. My server resides in a physically separate part of my home- away from my core network and my home isn’t physically wired for ethernet, therefore to adapt and overcome, I have dd-wrt set up to create a client bridge with this router (the router mentioned above is actually the client router in the client bridge). For those of you who don’t know what this is, it means that all clients connected to my secondary client router behave like they’re physically connected to the primary router. At least that’s the way it’s supposed to work in theory and it most often does- unfortunately, the client bridge mode is notoriously unreliable as is the case here. Had this been the DMZ’s main router at fault, as would be the case on your home network, a big clue would’ve been that I couldn’t access the internet from my computer.]
Note: I also want to point out another potential cause here. If, at any point when pinging an external site on the internet, you found that you didn’t get a response, try pinging an IP address (as opposed to the website address) you know is good. If that works, that suggests a problem with your DNS lookup and you should begin your investigation there.
Rebooting the router fixed the issue and allowed FileZilla to proceed without error. Honestly, the root cause here is that I am using client bridge mode but getting rid of it really isn’t an option for me at this point. I could try upgrading the router to better hardware but I need to let my wallet recover from this server build first. 🙂
Anyway, I thought this made for an interesting case study and demonstrates how breaking a system down into its simple parts allows for effective troubleshooting. With a methodical process and understanding, every problem can be overcome.
Improvise. Adapt. Overcome.