4 s -> 13 ms

I recently had the urge to learn Rust. However, as usual, I struggled to find a suitable basic project to start with. Eventually, I decided to remake nerdfetch in Rust, but with a twist. I built it for Windows. As we will see later on, this proved to be quite problematic. However, that is also why it was a fun thing to work on.

Building out the basic utility was fairly easy. Actually, let me rephrase that. Building out the basic utility was fairly easy using a lot of crates. But that was just the system metadata. Fetching the package count was a whole another story. Obviously, there were no crates to do this, so I had to do this manually. In the first release, I fetched the package count by running the list commands and counting the lines in their output. However, this proved to unbearably slow, and resulted in execution times of above 4s. This tool was written in Rust and had a execution time of 4s! I myself laughed at this.

I was determined to bring it down to ~10ms, I knew Rust was capable of that. Or well, at least I could try. And, so I got to work. First and foremost, was getting rid of executing commands in the package CLIs. This was not particularly easy, as it meant manipulating a bunch of paths and directories and environment variables. This took me a lot of time, effort, and research.

Figuring out `scoop`

In v1 I was executing scoop list and counting the number of lines in the output barring the header lines. This was however, extremely slow because of two reasons. Firstly, I had to execute the commands through cmd /c because of Windows shell limitations. Secondly, scoop itself is very slow. To get rid of this command execution, I had to find a way to do directly the way scoop does. Although, I originally planned on going through the scoop source, by a turn of events, I discord the path at which scoop itself is located. Running which scoop tells me this path is C:\Users\[username]\scoop\shims\scoop.ps1. However more digging showed that it is possible to install scoop in locations other than C:\Users\[username]\scoop, so I could not hardcode this path into the code. Instead, I used a crate which and located the scoop binary, and then proceeded to get the parent directory from there. Then, it was just a matter of counting the files in the directory. This reduced the execution time by a significant amount to around 2 seconds.

This was however, still quite slow, and noticeable.

Figuring out Chocolatey

Just like scoop, originally, the choco list command, which was also slow and had to be executed via cmd /c. Replacing this command, was comparatively easy since from their documentation it seems that chocolatey always stores this data in %PROGRAMDATA%\chocolatey. This could be hardcoded however, I would have to resolve the PROGRAMDATA environment variable as it could be(although highly unlikely) different on different machines.

And, with this check, and change, I saw an average execution speed of ~50ms. I was content for the moment but not for long. Soon after publishing the v1.1.0 release, I again got to work, to speed it up even more.

Finetuning and speeding up even further

I achieved average speed of 50ms by getting rid of the two of the slowest pieces of code. Speeding it up now, was rather more tricky since there were no slow parts. I thought that perhaps getting rid of as many dependencies as I could. This proved to more difficult than I had imagined. Most of the crates I had been using, under the hood were using the windows or winapi(Windows SDK for Rust) crate to directly communicate with Windows internals. Having never worked with the Windows SDK in my life, I decided to go to good old Google and StackOverflow. After a ton of searching, and researching, I decided to use the winapi crate instead of the windows since it seemed simpler.

However before rewriting the core utilities using the winapi crate I decided to get rid of the colored crate. This was as easy as anything. I simply replaced the "string".color() functions with ANSI escape codes and there was no problem whatsoever.

Few things were easy with the Windows SDK even. Like getting the system uptime. The only code I had to change was from uptime_lib::get().unwrap().as_secs() to GetTickCount64() / 1000. GetTickCount here is a windows SDK function from winapi::um::sysinfoapi, and we divide it by 1000(/ 1000) because the value returned is in milliseconds.

In the earlier releases, I had been using the sysinfo crate to get the total and used memory information. However this was one of the more “minute” very slow parts, since the crate is not just about memory. It lets you get all types system info, and you have to manually refresh that data, probably making it slower(it is still very fast, just not as fast as I want it). Rewriting the memory logic using the Windows SDK was also not too different or complex.

It might sound odd but the complex ones were actually the username and device or hostnames. I had to scour through a lot of source code and Google to get them just right. However, it also made the device name a little more accurate. The username and hostname were retrieved in the exact same way, so I could easily just reuse the same code, but figuring out either one was a hell of a job.

Username and Hostnames

Originally I thought this would be an easy task, as I found GetUserNameW and GetComputerNameW in the Windows SDK, but that was hardly it. The actual process to get the username is rather a lot different.

Here’s the complete code:

src/main.rs

// Username
let mut username_size = 0;
unsafe { GetUserNameW(null_mut(), &mut username_size) };
let mut username_buf = Vec::with_capacity(username_size.try_into().unwrap_or(MAX));
unsafe {
	GetUserNameW(username_buf.as_mut_ptr(), &mut username_size);
	username_buf.set_len(username_size.try_into().unwrap_or(MAX));
}
let username = OsString::from_wide(&username_buf)
	.into_string()
	.unwrap_or(String::from("Unknown"));

This snippet might look weird provided it is just to fetch the username. Like why are we calling the function twice? Why are we using OsString? What the hell is null_mut()

Well, Windows was originally written in C/C++, and well still is. That means the underlying functions which this Rust counterpart calls are after all to that underlying API. C/C++ has to be completely typesafe and avoid memory leaks, and such. Hence, it uses pointers and a bunch of other “low-level” methods to be perfectly fast and secure. But as good as that is, it meant more work for me.

Basically, with this API, I had to first blank buffer for the actual username and a mutable integer to fetch the username length. Using this length, we have to create a buffer(with that length) and finally call the method again but this time with the buffer we created to store the result in. This is still not enough to be displayed on the screen however. The result has to be converted into an OsString and from that into a general rust String to be displayed. This pretty much same code has to be repeated to fetch the computer’s hostname.

And finally, after all these tiny but effective tweaks and tuning, I ran the executable. I saw it. A speed of 13ms. 13ms! I was happy. For now at least. And now, even with the time it takes to discover the executable on the PATH, the average runtime speed is 22ms. It’s hard to tell the difference between such minute speeds by looking at it really, but on paper, it is a lot.

Things I am still unhappy about

Well for starters, I still have to use a crate to get the Windows version. It sounds crazy probably but yeah, getting the version is much difficult than it seems. I can probably do it myself probably but it will have to wait for another release. The process is a little unconventional too, like the crate seems to be getting the version from some file present in the core windows kernel.

I also feel like I can make the package counting faster and better, so I will try my luck there, but I am doubtful of how much success I can get really. But that’s it for this time, be back for the next one!

4 s -> 13 ms

Figuring out scoop §

Figuring out Chocolatey §

Finetuning and speeding up even further §

Username and Hostnames §

Things I am still unhappy about §

Figuring out `scoop`

Figuring out Chocolatey

Finetuning and speeding up even further

Username and Hostnames

Things I am still unhappy about