Over the Bridge

Screaming into the Void

Posts

2020-05-26

Small Nginx Module

So I've added in a search bar to the site.

This is a bit more involved than it sounds. One of the things that I really want to keep this site, is static, without any javascript. Or rather, static as much as possible, self-contained with a really simple, cheap, and cost-effective webserver (nginx) serving up the content.

The issue with this is, you can't really do anything dynamic (based on something the user types in, for example), you're limited purely to static stuff, unless you want to write some C and write an nginx module.

So I did. And here's the github repo which hosts the code I'm using to run the search: https://github.com/adamharrison/nginx-xapian.

I know, I know, it's currently riddled with bugs, and is frankly, unsafe. I shouldn't be using it on this live site. It's relatively safe because this machine doesn't really contain anything of value, and is just a VM in Amazon's cloud, which I can simply kill at any time, without losing anything, but still.

That's what I've been working on for the past week or two in my spare time, an hour here, an hour there, in between writing. In the end, I'm actually pretty pleased with it; it's pretty quick, and while it could be a lot more efficient, and there's a lot of work to do in terms of making it update its indices dynamcially based on files updating, and whatnot, but it seems to get the job mostly done.

So, search is up, with no javascript, and no script-based back-end. Good stuff.

2020-05-11

Projects Up!

I've added in a project sections up above to showcase some of the stuff I'm doing, and its current progress. So far, I've just got my writing stuff in there: some details about my first book, The First Estate (probable title), and a small stub about my current project, Eldar's End (tentative title). Currently, it's just a static thing, but I'll probably be working on it going forward to automatically pull in some of my stats.

For example, I have the blog system set up in such a way that I can easily just add little snippets of script if I need to (without actually having a huge whole framework involved in the actual project); I'm already using it to pull in the "What I'm Reading" and "What I'm Playing" from a remote site that tracks my literature/media consumption.

I'm thinking I can include either a progress bar, or a simple word count setting. Progress bar probably makes more sense, because I already have a target amount of words I want to hit for each book I write (100,000), which is a nice blend of "long-enough" and "not too long". All in all, that works out to about a 350-400 page book, which is about stadnard for a novel. I was able to hit that target within 10% for my first book (I ended up writing 110,000 to wrap everything up), and I'm going to try to hit it for my second book as well. So a little progress bar would probably be nice.

I'll probably have it up by next week.

2020-05-04

Gauging Performance

In general, you'd think gauging the performance of a piece of code is easy, right?

Well, I was under that same impression. Even at work, whenever I had to figure out whether something was running faster, my frist instinct was to test it a couple times on a few input sets, and if it sort of felt right, then, well, that was good enough.

I saw fascinating talk over the weekend, "Performance Matters" by Emery Berger. (Linked Below)

Basically the gist of it is that there are a number of things which can randomize performance of native applications (to say nothing about interpreted applications), and that it's relatively hard to determine whether or not any particular change you make (beyond ones that are extremely obvious that make things an order of magnitude faster or slower) is actually affecting the software in a positive or negative way. Small gains in performance, can actually be due to things like memory layout, rather than any particular change you've made.

Essentially, Berger and his team have developed a tool that randomizes the memory layout of the compiled application (amongst other things), that allow you to perform a bunch of runs and get a reasonably accurate distribution of performance measures that will help you figure out whether or not you've actually improved your software.

One of the most imporatnt points he drives home is that, even though you now do have data that accounts for memory layout changes, you can't simply look at it, and do a simple linear comparaison between previous results, and say it's faster. He refers to this as "eyeball statistics". Basically, he makes the point that you want to a take statistical approach to this situation, and make statements like "within a 95% confidence interval, what is the probability that the speedup we're seeing here is by chance?". He does assert that the runs are normally distributed (he does say there are reasons for this, but to my memory doesn't go into them), and so thus, we can treat any run data we have as a normal distribution and perform some easy stats on it to get our answer to that question.

Anyway, by far and away the most interesting talk I've seen in the past couple years. Give it a listen, if you're interested in optimization with any sort of native application.