January 16

Setup an NGINX Reverse Proxy on a Raspberry Pi (or any other Debian OS)

If you’re running a web server out of your homelab (and you should), you really should consider running your servers behind an NGINX reverse proxy. Honestly, this should be the first thing you build in your homelab. It doesn’t take a lot to setup- NGINX is so efficient it can even be run on something as simple as a Raspberry Pi and it pays you back in dividends once you’ve got it up and running.

What Does a Reverse Proxy Do?

A reverse proxy serves as a sort of dispatcher by acting as a central contact point for clients. Based on the information requested by the client, it then routes the request to the appropriate backend server and makes sure the backend server’s response makes it back to the appropriate client.

Prototypical NGINX reverse proxy diagram

What are these dividends you speak of?

A reverse proxy can give you additional flexibility, security, and even a performance bump. It can also greatly simplify your deployment:

  1. Flexibility: An NGINX reverse proxy can allow you to host multiple sites/domains with only one IP address. It accomplishes this by listening on a port (usually port 80 for HTTP traffic) and parsing the http request header for the host. Based on the host specified in the header, NGINX can route a request to the proper backend server (in a reverse proxy, this is also known as an upstream server).
  2. Security: By standing between the client and the backend server, the reverse proxy provides a degree of separation.
  3. Improved performance: NGINX can be used to cache static content which means that not only is content returned faster to the client, but since it often means that the upstream server doesn’t even need to be contacted, it can take a lot of the load off your backend servers.
  4. Simplifies your deployment: If you’re hosting multiple sites, an NGINX reverse proxy can greatly simplify your implementation by giving you a single point to manage your traffic. This means you only have to set up port forwarding once and whenever you create a new site, all you have to do is add an additional configuration listing to NGINX. When you implement HTTPS (and you should), instead of having to implement it on every individual web server you have setup, you can handle it all on your NGINX reverse proxy.

Installing NGINX:

Now that I’ve hopefully convinced you to implement an NGINX reverse proxy, let’s get started. I recommend using a dedicated device for this (again, it need not be expensive, even a Raspberry Pi will do) but it helps keep everything clean and compartmentalized. With a clean install of Ubuntu (if using an x86-64/AMD64 device) or Raspbian (if on the RPi), do the following:

1. Update your package list and make sure your device is updated:

sudo apt-get update
sudo apt-get upgrade

2. Depending on what Linux distro you’ve picked, it might have Apache already installed. We don’t want that so uninstall it with:

sudo apt-get remove apache2

3. Install NGINX:

 sudo apt-get install nginx

4. NGINX should start automatically, but just in case, you can start it manually with:

sudo systemctl start nginx

5. Confirm that NGINX is up and running by opening your browser and visiting your IP address. You should be able to get the default page to display by visiting your loopback IP address (127.0.0.1) or the actual IP assigned to your device (available by running the command ‘hostname -I’ in the terminal):

6. Good! Now we have NGINX up and running! If you don’t have a backend web server running yet, then we’re done since you don’t have anywhere for us to send traffic. Come back to this point when you do. But if you do have a web server for us to proxy traffic to/from, continue on!

Configuring the Reverse Proxy:

So you’ve made it this far and you now have an NGINX server running. Let’s set up the reverse proxy part to make this an NGINX reverse proxy and not just a simple NGINX web server:

1. Go to our NGINX sites-available directory:

cd /etc/nginx/sites-available/

2. Create the configuration file. You’ll eventually accumulate a lot of these, so I recommend naming it based on the site that you’re reverse proxying so you can easily find it again:

sudo nano example.com.conf

3. In nano, add the following:

server {
	listen 80;
	server_name: example.com
	location / {
	proxy_pass http://192.x.x.2;
	}
}

server_name is going to contain the domain name of the website clients are going to be requesting. proxy_pass is going to be the local (internal) IP address of the web server that you’re forwarding traffic to. You can also specify a particular port if your web server is running on a non-standard port (example: proxy_pass http://192.x.x.2:82300).

4. For NGINX to actually serve your site with your new configuration, you need to link it to /sites-enabled/ with:

ln -s /etc/nginx/sites-available/example.com.conf /etc/nginx/sites-enabled/example.com.conf

5. Test your configuration to make sure you aren’t getting any errors:

sudo nginx -t

6. Reload NGINX to tell it that the configuration has been updated:

sudo systemctl reload nginx

That’s all there is to it! In future posts, I’ll cover how to setup that upstream web server, how to configure the DNS for your domain, and port forwarding so that you can access your sites on the internet.

As always, feel free to comment if you run into any problems or need help!

Category: Servers | LEAVE A COMMENT
January 15

SQL Lessons to Remember

I had to write a few SQL queries today so I thought I would toss up a few notes as a sort of SQL FAQ. I’ll try to keep these notes updated.

  • Inner joins fail on null matches
    • When you are joining tables, if you want to keep null values from your foreign key (i.e. the thing you’re trying to match your second table on), you need to use a left outer join.
  • Inner joins on one-to-many
    • When joining one-to-many, an inner join will give you a line for each match to the second table; e.g. if I have one row in table A and it matches on two rows in table B, you will get two rows- one row for each match
  • What table should I start with when writing a query?
    • When writing a query, the best place to start is to figure out what the output should look like. This is important because it helps you determine the granularity of your report. Once you’ve done this, now, when you’re faced with a myriad of tables, you start with the table that matches the granularity of the report you wish to write.

Order Matters

Queries are written in the following order:

  • SELECT
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY

However, they are processed in this order:

  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT
  • ORDER BY

This isn’t just an academic distinction, it can be helpful in troubleshooting your reports when you get either an error or results you don’t expect.

January 11

Troubleshooting Slow Response Times

After creating this WordPress site and finally getting it hooked up to the NGINX reverse proxy, I was left with a horrible website load time (something on the order of 10-12 s). In this post, I will show you a beginning guide to troubleshooting slow website response times.

1. Benchmark Your Site

The first thing you should do is quantify your site’s slowness. If you don’t know how fast (or slow) your site was to begin with, how can you know when you’ve fixed it? One of the best tools for this is to use your browser’s developer tools, but you can also use a service such the Pingdom Website Speed Test or WebPageTest.

Screenshot of website being monitored with Chrome Developer Tools.
Chrome Developer Tools Showing Excessive Time to First Byte (TTFB aka “waiting”)

The above screenshot is from using developer tools in Chrome (Ctrl+Shift+I) and recording the initial page request. I wouldn’t wait such a time for a web page to load and it’s my own web page!

2. Identify Potential Bottlenecks (Divide and Conquer)

Now on to identifying the source of the slow response time. I find diagrams always help me think more clearly:

The diagram above gives a high level overview of how this website is served to you. You send a request to my NGINX reverse proxy, which then requests a web page from Apache (my WordPress server). The Apache server responds with the requested web page to the NGINX reverse proxy server and the reverse proxy server returns your web site request to your browser. Normally, this all happens very quickly, and in fact it does on my other servers, but something was obviously wrong here with my new build.

The real value of drawing a diagram is that it helps you identify all of the moving parts of your system which in turn helps you theorize about how that system could break. In the diagram above, you can see that the problem could be in the:

  1. Client
  2. Interaction between the Client and Reverse Proxy:
    • Download speed (or even ping) of the client
    • Upload speed of the reverse proxy
  3. On the NGINX Reverse Proxy Server itself
  4. Interaction between the Reverse Proxy and Backend (Upstream) Apache WordPress server
  5. On the Apache/WordPress Server

3. Check Your Logs

Logs can be an excellent resource for identifying your bottleneck. To figure out which piece in my above diagram was the culprit, I decided to check the log files on both my NGINX reverse proxy and my backend Apache server running WordPress. I had a TTFB of over 10 seconds- that time had to be going somewhere! As Sherlock Holmes says, “Data, data, data! I cannot make bricks without clay.”

Now, here’s a subtlety you should be familiar with: NGINX access logs are written at the end of a request [source], while Apache logs are written at the beginning of the request [source].

In my case, the NGINX access log showed a request fulfilled time a full 10+ seconds after the Apache log is showing its request received time! This means the slow response time was in the Apache response!

Further confirmation was obtained by including %D in the Apache access log which is a mnemonic that records the time between when the Apache server receives the HTTP request and when it returns a response to the client. Lo and behold, my Apache access logs showed a 10 s response time.

4. Iterate Until Done

So, now I had confirmed that the source of my slow response time was on my Apache server, but now what? What was causing that?

The answer is to go back to step 2 above and repeat. The difference this time is that instead of having the whole system to look at, we can focus our list of potential problems to ones specific to the WordPress server. In my case, since this was running on a Raspberry Pi, I was suspicious of the hardware itself. I ran sar, while visiting my site and sure enough, iowait spiked to 80-90% for several seconds without full RAM utilization. This is a notorious problem on the Raspberry Pi and is indicative of the slow read/write speeds from running everything off of a microSD card. This was even with other optimizations such as proxy_caching on the reverse proxy and fastcgi_caching setup.

So ultimately the problem was the hardware of my server itself. I instead installed a LEMP stack on a LIVA mini PC and then configured WordPress. I suppose I could’ve also tried a higher end microSD like a Samsung; maybe next time. Regardless, this brought me down to a load time of between 0.9-1.4 s and a TTFB to a much more acceptable 150-250 ms.

January 8

Getting Off the Ground

Recently, I decided I wanted to take better notes on my own learning and hopefully help others in the process. Enter this blog.

I’ve been doing development for a while; I even have my own homelab running everything from Raspberry Pi to 24-thread Xeons. It’s handy because whenever I want to test out a new project, I can just rapidly set up a server for it. Since most of my projects are low-demand, I often just deploy them on a Raspberry Pi. There’s a problem with this though, while this usually works well, sometimes you end up with one of two problems:

  1. You need short, “burst” power. On a Raspberry Pi, you redline quickly (either due to the lack of processing power or due to the fact that you only have 1 gig of RAM).
  2. On the other end of the spectrum, most of the time it’s overkill to have a dedicated device for each individual service you intend on running. If your blog only gets a hundred visitors a month, it’s overkill to have an entire dedicated host serving up websites when it’s going to sit idle 99.9% of the time.

An excellent solution to this is Docker which you can sort of think about as a “VM Lite”. In a virtual machine, each VM gets its own dedicated resources assigned to it with its own associated overhead (namely that it has its own installed OS). In my setup though, I almost exclusively use a Debian OS (Ubuntu for most of the “heavy” servers and Raspbian on the Pi) so it’s silly to have such a duplication for each individual image to only run one service.

This is where Docker comes in. With Docker, instead of an individual virtual machine, you create a container which is its own “box” that shares kernel resources. In a lot of ways, it’s actually better, at least for my purposes.

So for this blog, I attempted to deploy WordPress as a docker image. What an absolute time sink. I sank two days attempting to deploy WordPress as a Docker image before ultimately making a strategic retreat.

Now admittedly, I think my biggest problem is that I broke my own rule and failed to stage my learning. I am not familiar with WordPress and I am not familiar with Docker. I wanted to learn both, but instead of learning WordPress by deploying it with a traditional install, and learning Docker by deploying a simple Docker container, I went straight for the moon shot and tried to deploy WordPress as a Docker container. For those of you who are uninitiated with WordPress, as I was, it turns out it’s an…eccentric application to say the least. It does things like specify absolute paths instead of relative ones, so if you don’t change the domain from the internal IP address your server sits on to the domain name you’re using it will actually pass your internal private IP back to the client, which probably isn’t sitting on your network.

Even despite this, I probably could’ve still pulled it off, but my setup is additionally complicated by the fact that my WordPress deployment sits behind a reverse proxy. Moral of the story: when learning something new, do it in stages.