# Hacking MySQL #1 - Overview, Building, and Testing

An overview of MySQL, as well as how to obtain source, build and run tests

## Overview

MySQL is one of the most widely used OpenSource relational databases and is used by many companies such as Amazon, Facebook, Google, Alibaba, etc. In my current job we deploy MySQL widely within the company, we had our MySQL 5.6 own fork and moving towards MySQL 8.0 currently in a branch. We also have an “new” storage engine built on top of RocksDB, not surprisingly called MyRocks, which lives under storage/rocksdb folder in the MySQL 5.6 fork.

On a 10000-feet view, the architecture of MySQL server looks like this:

1. Connection Management / Authentication
2. Table/Schema Management/Caching
3. SQL Parser
4. SQL Optimizer and Query Executioner
5. Execution Engine
6. Replication and logging

If you dive deeper, an execution engine itself could include following pieces:

1. Transaction / MVCC / Locking / Snapshot support
3. In-memory core database data structure (B+ tree / LSM tree / etc) and operation (insert/delete/update) for records
4. Indexing data structures, and searching/updating
5. Logging, Checkpointing & Recovery
6. Database storage persistence
7. Caching (disk blocks/pages, etc)

One of the most amazing features in MySQL is to swap the underlying storage engine while keeping the same upper layers - this way you can have different in-memory and on-disk representation of databases for different workloads, while keeping the same SQL execution functionality so the client wouldn’t even know the underlying storage engine have changed. The default storage engine is InnoDB - a B+ tree based database storage engine, and the one that we have is MyRocks which is built on top of RocksDB, a LSM-tree based database storage engine. There is an API layer called handler that storage engine need to implement/override. You can go to Comparison of MySQL database engines to see a list of common storage engines in MySQL.

Of course, the statement that they wouldn’t know the storage engine has changed is not entirely accurate. There are specific configurations you might need to tune / config the storage engine to your needs, and different storage engine has different performance / behavior / features / capabilities, so it’s not completely transparent.

## Building

You can obtain source code from MySQL website, but most folks probably would prefer a github mirror:

git clone https://github.com/mysql/mysql-server


This contains the latest MySQL 8.0.16.

In a typical Ubuntu system, you need to install following dependencies:

sudo apt install libssl-dev libzstd-dev libncurses5-dev libreadline-dev bison pkg-config


All my instructions below are tested on a Azure Linux Ubuntu 18.04 VM and on a MacBook Pro 2018. They may vary slightly due to your configuration/distribution if you are on a unix/linux system. Getting it to work on Windows requires installing OpenSSL binaries and GNU Bison. If you are using latest Visual Studio 2019, you may also need to apply a fix to boost 1.69.0 for a outdated VC workaround (a workaround for a workaround, essentially). Fortunately in most cases MySQL is pretty good about telling you exactly what is missing and where to download them.

Now let’s create a debug directory to store all our build files, and start the debug build:

mkdir debug
cd debug
make

1. WITH_DEBUG=1 requests a debug build, which makes debugger easier
2. DOWNLOAD_BOOST=1 WITH_BOOST=~/boost_1_69_0 downloads the boost at ~/boost_1_69_0 (that’s the version MySQL is asking for), and will skip the downloading if it is already there

One the build is done, you can find everything under debug/bin.

Don’t change the build directory after the fact once you done the build. The directory name is remembered and changing that naming requires a rebuild.

## Running a test

To validate that we indeed have a working MySQL build, let’s try running a quick test called select_all.

To run any test, there is a script mysql-test-run.pl located under the mysql-test directory from the build directory, and it takes a test name in the form of <testname> or <testsuite>.<testname>:

cd debug/mysql-test
./mysql-test-run.pl select_all


This runs the test under mysql-test/t/select_all.test with baseline mysql-test/r/select_all.result. It runs a simple test language containing test directives/commands and SQL commands, and compare the output with the baseline. If the output diverges from the baseline the test would fail, otherwise it would pass. Simple enough, right?

Actually, not quite. The testing of MySQL can get quite complicated when it involves multiple connections / servers communicating with each other. And stablizing the results so that they are not affected by external environment / code changes can be also an headache.

Here is what you should see:

[~/local/github/mysql-server/debug/mysql-test, 8.0, 51s, SUDO]: ./mysql-test-run.pl select_all
Logging: /home/yzha/local/github/mysql-server/mysql-test/mysql-test-run.pl  select_all
MySQL Version 8.0.16
Checking supported features
- Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/yzha/local/github/mysql-server/debug/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[100%] main.select_all                           [ pass ]  36259
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 36.259 of 70 seconds executing testcases


## Launching and connecting

Running a test seems straight-forward enough. If you want to launch mysql server and run some SQL commands against it, it takes a bit of work.

First we need to have mysqld initializes a blank data directory:

cd debug/bin
./mysqld --initialize


In Windows you’ll need to add --console to write error output to screen. Otherwise it’s only available in the .err log file.

You should see:

2019-06-05T05:16:31.376510Z 0 [System] [MY-013169] [Server] /datadrive/github/mysql-server/debug/runtime_output_directory/mysqld (mysqld 8.0.16-debug) initializing of server in progress as process 70030
2019-06-05T05:16:44.066787Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: <.....>
2019-06-05T05:16:53.317610Z 0 [System] [MY-013170] [Server] /datadrive/github/mysql-server/debug/runtime_output_directory/mysqld (mysqld 8.0.16-debug) initializing of server has completed


Note the temporary password generated in the second line. You’ll need it later.

This means that mysqld has successfully initialized at debug/data directory:

'#innodb_temp'   auto.cnf   ca-key.pem   ca.pem   client-cert.pem   client-key.pem   ib_buffer_pool   ib_logfile0   ib_logfile1   ibdata1   mysql   mysql.ibd   performance_schema   private_key.pem   public_key.pem   server-cert.pem   server-key.pem   sys   undo_001   undo_002


Now we can finally start the server:

cd debug/bin
./mysqld --debug


In Windows you’ll need to add --console to write error output to screen. Otherwise it’s only available in the .err log file.

--debug switch means we start the mysql server in debug mode.

Now launch another terminal / TMUX window / whatever, and connect to the mysql server:

cd debug/bin
./mysql -uroot --socket=/tmp/mysql.sock -p


You should see:

Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 28
Server version: 8.0.16-debug Source distribution

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


However we are not quite done yet. Mysql will ask us to change the password - you can do it by using the following:

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY "<newpassword>";


Now any future login can be done using this new password you just gave.

Finally we can run some SQL command!

mysql> SELECT @@version;
+--------------+
| @@version    |
+--------------+
| 8.0.16-debug |
+--------------+
1 row in set (0.00 sec)


Before I leave, let me address a question that is absolutely going to be asked - how do I terminate the server gracefully? CTRL+C doesn’t work anymore.

The right way is to use mysqladmin:

cd debug/bin


You’ll see the server waving goodbye:

2019-06-05T05:54:09.028071Z 30 [System] [MY-013172] [Server] Received SHUTDOWN from user root. Shutting down mysqld (Version: 8.0.16-debug).
2019-06-05T05:54:10.670124Z 0 [System] [MY-010910] [Server] /datadrive/github/mysql-server/debug/runtime_output_directory/mysqld: Shutdown complete (mysqld 8.0.16-debug)  Source distribution


## Debugging

In Linux/Mac debugging is relatively straightforward and isn’t any different from other applications. My personal recommendation is to use Visual Studio Code and setup lldb/gdb debugging there.

For Windows, the obvious choice is Visual Studio (2019 Community Edition is the one I’ve tested on). However it looks like mysqld is launching another mysqld instance that does the real work, so F5 debugging that mysqld from Visual Studio requires Child Process Debugging Power Tool in order for your breakpoints to hit since they need to be set in the child mysqld process, not the parent. Of course attaching to the correct mysqld would always work regardless without the help of the Child Process Debugging Power Tool.

## What’s next

I’m planning a series of articles that will go through many interesting aspects of MySQL:

1. A quick tour of the source code and important concepts in MySQL source
2. How is the parsing and AST tree generation done for complex statements
3. How are statement being executed in MySQL
5. How does MySQL optimizer / query execution work
6. How does plugin / storage engine work
7. How does system variables work
8. How does replication work
9. How does SHOW command work
10. How does binlog work

I’m also planning to write about MyRocks, as well as RocksDB / LevelDB / InnoDB, but I’ll priorize MySQL articles first as they lay down a nice foundation for rest of the stuff and this also serves as documentation when people get lost in the vast amount of MySQL source code.

Let me know what do you think about the article and/or if you are running into issues. Feel free to suggest topics as well. But I probably can’t help much with your DBA questions…

# Getting GDB to work on Mac OS X Mojave

How to get GDB to work on MacOSX Mojave

Starting from Mac OS X 10.5 (Leopard), Apple starts to lock down the system further and debuggers like GDB now have to be code signed. There is a great article describing steps to get it to work. However, there are a lot of conflicting information on the web and people are having trouble with some of those instructions, myself included. So I’d like to document what I did to get it to work, and highlight the issues I ran into:

On a high-level you need to perform these steps:

1. You need to create a certificate in System Keychain that is self-sign and always trust for code signing
2. Sign the GDB binary with the certificate. Include proper entitlements if you are on 10.14+.
3. Reboot

The article has detailed steps on these steps so I’m not going to repeat them.

A few gotchas that I ran into myself:

1. If you see this error complaining about code signing even though you had signed the GDB executable:
Starting program: /Users/yzha/github/mysql-server/debug/runtime_output_directory/mysqld
Unable to find Mach task port for process-id 55009: (os/kern) failure (0x5).


Double check if you had the proper entitlements in a XML file and pass to codesign when you are signing GDB. Many articles on the web in fact didn’t have the entitlement step as it likely is a new requirement 10.14+.

1. If you are seeing this error even if you had signed with proper entitlements:
During startup program terminated with signal ?, Unknown signal.


or this:

/Users/yizhang/github/leveldb/debug/db_test": not in executable format: file format not recognized


Make sure you stay off GDB 8.2! Upgrade to 8.3 (available in Homebrew already) or downgrade to 8.0.

# Sorting structured data using memcmp-friendly encoding part 2 - floats

Sorting structured data using memcmp-friendly encoding part 2 - sorting floats

In the last post we’ve discussed converting integers and strings into a memcmp / byte-comparable format for faster comparison (but at the trade off of doing decoding/encoding at reading/writing). In this post let’s take a look at how do we do the same for floating pointers.

# Get cherry-pick to work across file renames

Making cherry-pick work across file renames

Recently I need to port over some changes using cherry-pick and that usually works fine without any issues (except for occasional conflicts), but this time the actual file foo.cc was renamed to bar.cc. In such case git cherry-pick simply gives up and simply tells you the old file you are changing has been deleted. As far as I can tell there isn’t a good way to resolve the conflict.

There are a couple of ways to address this issue. But the easiest way I found is to just rename the file back to the original name where you had made the change on, in order to make git happy. Once that’s done, cherry-picking would work fine as usual. Now just rename the file back to the ‘new’ name. Squash the change.

This can be illustrated in following example - assuming:

2. In the target branch (that you want to cherry-pick) renames foo.cc to bar.cc
# Create the target branch as usual
git checkout -b your-target-branch

# Rename bar.cc back to foo.cc to make git cherry-pick happy
git mv bar.cc foo.cc
git commit -m "Make git happy"

# Cherry-pick as usual
git cherry-pick -x <commit>

# Rename it back
git mv foo.cc bar.cc
git commit -m "Rename back"

# Squash the 3 commits into one


In the rebase file, you’ll see:

pick 95be80db682 Make git happy
pick 3d74c6c9e13 Cherry-pick commit blah
pick 238e3c51354 Rename back


Change to:

pick 95be80db682 Make git happy
s 3d74c6c9e13 Cherry-pick commit blah
s 238e3c51354 Rename back


Here s means squash with previous commit.

Just remember in commit message deleting the first and third unrelated commit.

And now you are all set!

# Repeatable reads in InnoDB comes with a catch

A few days ago I was looking into a deadlock issue that is caused by a behavioral difference between MySQL storage engine transaction behavior in repeatable reads. This leads me to dig deeper into repeatable read behavior in InnoDB and what I found is quite interesting:

## The basics

Before we dig deeper, let’s revisit some of the basics of database isolation levels. You can refer to my earlier post for a more detailed explanation / comparison. Database isolation level defines the behavior of data read/write operations within transactions, and those can have a signficant impact to protecting the data integrity of your application. Repeatable reads guaratees that you would always observe the same value once you read it, and it would never change unless you’ve made the change yourself, giving you the illusion that it is exclusively owned by you and there is no one else. Of course, this isn’t true in practice as there are pessimistic locking and optimistic locking that defines the behavior when write conflict occurs.

# Diagnosing interesting MySQL client connection error in localhost through the source code

The art of argument parsing and policy transparency

When working with MySQL the often most frustrating part is getting strange connection errors. I’ve wasted two hours trying to connect to a MySQL server using TCP port (unix domain sockets works fine) and I’ll talk about why it didn’t work, and as usual we’ll dive into the code to understand exactly why.

To simplify the problem, let’s say I have MySQL server at port 13010 and bound to localhost, with user name root and empty password (don’t do that in production):

[~/mysql]: mysql -p 13010 -h localhost -u root
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)


This is typical error many people will run into and you can find many similar posts that discuss the problem but few ever got to the bottom of it. Let’s jump right in.

## -p and -P

Obviously when I write -p 13010 I meant to tell mysql client to connect to server using port 13010, but that’s not quite right:

[~/mysql]: mysql --help
-P, --port=#        Port number to use for connection or 0 for default


So I actually told mysql the password is 13010 instead. Supporting both -p and -P is a apparently very bad idea.

Linux tools often have excessive amount of short options, like this one from man page for ls:

ls [[email protected]] [file …]

Personally I think they should go easy and only include the most common ones rather than using the entire alphabet.

However, the mystery is not yet solved. Note that we have been asked to enter the password, which explains why most people never suspected -p actually means password. Put in other words - if -p means password, why is this command is still asking for password?

The answer lies in the source code:

my_getopt.cc

  for (optend= cur_arg; *optend; optend++)
{
opt_found= 0;
for (optp= longopts; optp->name; optp++)
{
if (optp->id && optp->id == (int) (uchar) *optend)
{
/* Option recognized. Find next what to do with it */
opt_found= 1;
if (optp->arg_type == REQUIRED_ARG ||
optp->arg_type == OPT_ARG)
{
if (*(optend + 1))
{
/* The rest of the option is option argument */
argument= optend + 1;
/* This is in effect a jump out of the outer loop */
optend= (char*) " ";
}
else
{
if (optp->arg_type == OPT_ARG)
{
if (optp->var_type == GET_BOOL)
*((my_bool*) optp->value)= (my_bool) 1;
if (get_one_option && get_one_option(optp->id, optp, argument))
return EXIT_UNSPECIFIED_ERROR;
continue;
}
/* Check if there are more arguments after this one */
argument= *++pos;
(*argc)--;


The *(optend + 1) is the most interesting part. If a short-form option is being recognized, the rest immediately following the short option is treated as argument:

					if (*(optend + 1))
{
/* The rest of the option is option argument */
argument= optend + 1;
/* This is in effect a jump out of the outer loop */
optend= (char*) " ";


Given that we are not passing -p13010, the 13010 part is ignored.

But wait, why does -h localhost work fine?

Just keep looking:

						if (optp->arg_type == OPT_ARG)
{
if (optp->var_type == GET_BOOL)
*((my_bool*) optp->value)= (my_bool) 1;
if (get_one_option && get_one_option(optp->id, optp, argument))
return EXIT_UNSPECIFIED_ERROR;
continue;
}
/* Check if there are more arguments after this one */
if (!pos[1])
{
return EXIT_ARGUMENT_REQUIRED;
}
argument= *++pos;
(*argc)--;


So if the argument is an optional arg, it’ll give up and only check for immediate following argument. Otherwise, for OPT_REQUIRED, it assumes the next one is the argument.

Let’s take a look at where they are defined:

  {"password", 'p',
"Password to use when connecting to server. If password is not given it's asked from the tty.",
0, 0, 0, GET_PASSWORD, OPT_ARG, 0, 0, 0, 0, 0, 0},
{"host", 'h', "Connect to host.", &current_host,
&current_host, 0, GET_STR_ALLOC, REQUIRED_ARG, 0, 0, 0, 0, 0, 0},


As expected, password is optional and host is required.

Also, note that how it never checked for ‘=’? So the syntax -p=abc wouldn’t work as expected as well. And hilariously =abc would become the password. For arguments with a bit more error checking like port, the error message is a bit better:

[~/mysql]: mysql -P=13010
mysql: [ERROR] Unknown suffix '=' used for variable 'port' (value '=13010')
mysql: [ERROR] mysql: Error while setting value '=13010' to 'port'


Note the ‘=13010’ part?

## Default protocol

OK. Let’s try again:

[~/mysql/mysql-fork]: mysql -P 13010 -h localhost -u root
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)


Still doesn’t work. We know it’s not the parsing of -P because port is OPT_REQUIRED:

  {"port", 'P', "Port number to use for connection or 0 for default to, in "
"order of preference, my.cnf, \$MYSQL_TCP_PORT, "
#if MYSQL_PORT_DEFAULT == 0
"/etc/services, "
#endif
"built-in default (" STRINGIFY_ARG(MYSQL_PORT) ").",
&opt_mysql_port,
&opt_mysql_port, 0, GET_UINT, REQUIRED_ARG, 0, 0, 0, 0, 0,  0},


Note the error message socket '/var/lib/mysql/mysql.sock. This is for domain socket.

To confirm this is the issue, let’s search for the actual error message:

const char *client_errors[]=
{
"Unknown MySQL error",
"Can't create UNIX socket (%d)",
"Can't connect to local MySQL server through socket '%-.100s' (%d)",


The client_errors are looked up from error codes:

#define ER(X) (((X) >= CR_ERROR_FIRST && (X) <= CR_ERROR_LAST)? \
client_errors[(X)-CR_ERROR_FIRST]: client_errors[CR_UNKNOWN_ERROR])


And the 3rd error is CR_SOCKET_CREATE_ERROR:

#define CR_ERROR_FIRST  	2000 /*Copy first error nr.*/
#define CR_UNKNOWN_ERROR	2000
#define CR_SOCKET_CREATE_ERROR	2001


Searching for that leads us back to client.cc:

  if (!net->vio &&
(!mysql->options.protocol ||
mysql->options.protocol == MYSQL_PROTOCOL_SOCKET) &&
(unix_socket || mysql_unix_port) &&
(!host || !strcmp(host,LOCAL_HOST)))
{
my_socket sock= socket(AF_UNIX, SOCK_STREAM, 0);
DBUG_PRINT("info", ("Using socket"));
if (sock == SOCKET_ERROR)
{
set_mysql_extended_error(mysql, CR_SOCKET_CREATE_ERROR,
unknown_sqlstate,
ER(CR_SOCKET_CREATE_ERROR),
socket_errno);
DBUG_RETURN(STATE_MACHINE_FAILED);
}


So this means by default we are connecting using Unix domain socket, and only if host is not specifed or is localhost!

Programs should be transparent about its policies, and give information about what it is doing. If that can end up being too verbose, add a verbose option. I’ll write a separate post about this because I’ve been bitten too many times by similar issues and now my favorite past-time is to add print/printf.

So there are two ways to fix this:

1. Instead of local host, use 127.0.0.1. This fails the UNIX socket check and will fallback to TCP.
2. Use --protocol tcp to force using TCP.

So the right command would be:

mysql -P 13010 -h localhost -u root --protocol tcp


or

mysql -P 13010 -h 127.0.0.1 -u root


## Summary

These two problems can be easily avoided by adding more messages to the mysql client, such as:

Trying to connect to UNIX domain socket localhost...
Connecting to database 12310.


These would’ve avoided wasting collectively god knows how much time wasted. Maybe I should submit a patch when I get a chance.

The gotchas:

1. mysql short-option with optional args only accept arguments when they immediately follow the option, such as ‘-pmypassword’. Specifying as ‘-p blah’ and blah will be interpreted as current database. Short option with required args don’t have this problem.

2. When there is no protocol specified, mysql will try to connect as UNIX domain socket if connecting to localhost or host isn’t specified. To work around it, use IP address instead of localhost, or specify protocol explicitly using --protocol.

# Byebye Windows - going full linux

Going linux full time

In my new job, no one cares about windows.

Every single developer (with a few exceptions) use MacBook Pro, and connect to their linux VM to get work done. Some people have the trash can MacPro. You get the idea. Being in Microsoft for ~12 years, this is admittingly a interesting adventure. Even though in several Microsoft projects in the past that I have been working on had linux versions (CoreCLR, Service Fabric, etc), most development is still done in Windows, and then ported to Linux/Mac. Whenever occasionally you wonder into the no-man’s land in Linux where the project tooling / infrastructure is falling significantly behind, you want to pull your hair out. Not Linux’s fault - but a matter of priority. In some extreme cases you’d wonder how even one can put out a linux version out at all.

Not anymore. Now linux (or Mac, if you count that in) is the full time job.

After a few weeks of research and practice, I’ve been happyily chugging along with TMUX + VIM + MOSH with my custom key bindings. In this article I’ll talk about a bit of my experience of making the transition.

## I miss Visual Studio

Let’s get this one out of the way first. There is no replacement for Visual Studio. Period. The code completion (or Intelli-Sense) and debugging is simply unmatched by anything else in the market. VS Code is awesome in terms of just browsing code and doing some occasional debugging, but for writing code it is just OK as the “inteli-sense” (forgive my Microsoft VS Jargon) can be a hit or miss. Vim is good for text editing, and with plugins you can get some basic stuff to work, but again it’s no where near the quality of experience of Visual Studio. Usually it’s a love/hate relationship with Visual Studio - it’s kinda slow and some times buggy, but you can’t live without it. Well, you can, but you don’t want to.

Nowadays I use vim or VS Code / Atom for writing code, and gdb for debugging.

## Debugging using GDB is fine

Being an reasonably experienced WinDbg user, Gdb’s command line taking a bit getting used to, but that’s about it. GDB also supports a TUI mode that shows the integrated text window for source/register/etc and a command window. It’s not great as many simple key bindings stop working in that mode (taken over by the TUI component) but as long as I can see a source code “window” over SSH I’m happy.

# TMUX is awesome

TMUX is a terminal multiplexer. With TMUX you won’t lose your working state - even if you disconnect from SSH, just ‘tmux attach’ you’ll resume where you left off. In this sense it is equivalent to a Windows Remote Desktop session.

The most powerful part is that it also allow you to break the terminal into multiple panes and windows, and this way you don’t have to leave the terminal and can easily switch between many different tasks with quick shortcuts. No more need to manage windows - everything is within the terminal. It’s like a virtual desktop for terminals. It’s build in the way that you barely had to touch the mouse anymore. Well, until you move to the browser, that is.

# VIM ftw

In my Microsoft job I use vim for simple editing purposes, and I like the vim way of thinking so much that I put all my editors into vim mode / vim plugin / vim key bindings. These days I found myself spending even more time in vim over SSH and so I invested more time finding better VIM configurations and plugins.

I use junegunn/vim-plug as my VIM plugin manager. It’s pretty minimal and gets the job done.

This is the list of plugins I use:

• Command-T - blazing fast fuzzy file finder
• delimitMate - automaticlly inserting delimiters such as (), [], etc
• ack - text search tool
• vim-gitgutter - shows in leftmost column where are the git changes using +/-/~
• vim-fugitive - great git command wrappers
• vim-easytags - automated tag generation and syntax highlighting. I found the syntax highlighting can cause performance issue in large files so I turne the syntax highlighting off.
• vim-tmux-navigator - navigate between vim and tmux like they are integrated
• a - switch between header and source. Enough said.
• tcomment_vim - toggle comment/uncomment for lines
• vim-surround - easy change/add surround characters like (), [], {}
• nerdtree - navigate file/directory tree
• vim-nerdtree-tabs - making nerd-tree like an integrated panel
• vim-better-whitespace - highlight trailing whitespace characters. They are annoying for sure and lint warns about them
• lightline - a light and configurable status line for vim
• goyo - distraction free writing. Best for writing docs

## SSH is the old Remote Desktop

In my old job I usually “remote” into my development machines at office - and “remote” means “Windows Remote Desktop”. In a reasonable connection it is actually quite nice - there is little lag and you almost feel you are working on a local machine, with all the graphical UI - it’s really amazing.

With linux, you fallback to the good old text-based SSH. It’s kinda amazing in its own way that you can have text-based remote protocol for complicated full screen programs like vim. You don’t get graphical UI this way - but for the most part you don’t need to, and it’s usually blazing fast.

Mosh improves over SSH that it is async (doesn’t wait for server response) so it feels even more responsive. The trade-off is that it can get a bit jarring when you type something and it does’t react correctly initially.

## Shell matters

Windows Commmand Prompt is fine. It works. I still remember I learned my first DOS commands at a 33MHZ 386DX. But it hadn’t changed much since then. ConEmu is a popular terminal and some people (especally admins) use PowerShell as well. But none of those match the flexiblity of linux shells - they just have so much more to offer. You can switch between different shells, adding customizations, even plugins.

For now I’m using ZSH with oh-my-zsh. It has fantastic themes and plugins. My favorite features are:

• Plugins that shows me all kind of status, such as git status, any pending background process, how long the last command took, etc.
• Auto-suggestion. It automatically suggest the full command based on best match and it grays out the rest of the command that you didn’t type. It’s simple but almost feels like magic when you see it for the first time in action.
• Syntax highlighting. Enough said.
• VIM editing. Yes, you can now use VIM commands to edit your shell commands. Just think that you can easily navigate with all the muscle memory you had with vim. This should be mandatory in every thing that deal with text editing.

With all these, and throw in a few custom key bindings, the plain shell / windows command prompt just seems so boring.

# You need to work on your configurations

However, tweaking these tools so that they work for you takes time. I find myself spending quite a bit of time tweaking the configurations to make it work better for me - and the time spent paid off. All the different configuration options are indeed quite overwhelming if starting from scratch so I use Awesome dotfiles project as my starting point for tweaking and forked my own version yizhang82/dotfiles. There are a lot of things that I like about the way the things are setup:

• One script to deploy everything - TMUX/ZSH, the entire github repo containing dotfiles, and back them up
• Dotfiles are configured to include the settings/scripts from the repo at ~/dotfiles - this way things can be automatically synchronized through a git pull. This is actually quite brilliant.
• Automatically pulls the github repo every time ZSH starts - so it’s always up to date

Of course, many of the configurations there are already pretty good and is perfect as a starting point for my own configurations.

It contains all my TMUX, ZSH, VIM configurations, and by simplying cloning and running a script it goes into a new machine effortlessly. Most of these is done by the original author and I’m simply tweaking it to my needs.

## I like it

It did take a bit getting used to, but I’m happy to report that I now feel very much productive roughly on the same level of productivity when I’m working on Windows (if not more). I do miss having a fully integrated Visual Studio experience, but the command line experience (with TMUX, etc) in Linux is so much better that it more than makes up for that. Of course, at the end of the day, what matters is getting the job done - just use the right tool for the job. In a future post I can get into a bit more details with my experience with these tools and share some of my learnings/tips.

P.S. I still use Windows at home. I have custom built (by myself) PC that has i7 4770K, 32G RAM, nVidia 2080 RTX mostly for gaming. I think Windows has mostly lost the mindshare of developers these days, but it’s still the OS for gamers, and will be for quite some time.

# Fun C++ bug - transactional objects should have move semantics

Objects with transactional semantics need move support

OK. I must admit this probably isn’t the best title out there.

Let’s imagine I have an object that represents a file in a transaction. Recall that transaction needs to be all-or-nothing - if the transaction is complete the files can be kept around / moved to the final destination, otherwise they need to be deleted.

The more or less obvious idea that comes to mind is to represent this with a TxnFile class (TransactionalFile). This is the best part I love about C++, BTW - very clean scoped / destruction semantics.

class TxnFile {
public:
TxnFile(const std::string &file)
: m_file(file), m_committed(false) {
}

~TxnFile() {
if (!m_committed) {
std::remove(file);
}
}

const std::string &get_file() { return m_file; }

void commit() {
m_comitted = true;
}
}

private:
std::string m_file;
bool m_committed;
};



OK. So far so good. Let’s actually implement that business logic:


std::vector<TxnFile> txn_files;

// Collect the files
for (auto &file : some_files) {
txn_files.emplace_back(file);
}

// Do something with them. If exception is thrown we'll remove the files
for (auto txn_file : txn_files) {
do_some_work(txn_file.get_file());
}

// If all is well, commit
for (auto txn_file : txn_files) {
txn_file.commit();
}



Looks rather straight-forward, right? If you try this out yourself, you’ll soon realize something is off - the files are being deleted for no reason at all!

The problem itself is obvious-ish: it really should’ve been a auto & as otherwise are constructing copies of TxnFile and upon destruction will remove the file!


// If all is well, commit
for (auto &txn_file : txn_files) {
do_some_work(txn_file.get_file());
}



However, we are not done yet. The problem is still happening - and in some cases, the files are even removed before we actually do work!

The problem, perhaps not that surprisingly, lies with the std::vector class. When expanding size of std::vector, STL will try to create a new block of memory, and copy/move the memory to it. If the class doesn’t have a move constructor, it’ll default to copy, and destroy the old one - which isn’t unlike the auto txn_file problem we discussed earlier, though a bit more subtle to catch.

Let’s try fixing it:

class TxnFile {
public:
TxnFile(const std::string &file)
: m_file(file), m_committed(false) {
}

TxnFile(const TxnFile &&file) {
m_file = std::move(file.m_file);
m_committed = file.committed;
}

~TxnFile() {
if (!m_committed) {
std::remove(file);
}
}

const std::string &get_file() { return m_file; }

void commit() {
m_comitted = true;
}
}

private:
std::string m_file;
bool m_committed;
};



Looks reasonable, right? Actually the problem is still there! The problem is that you now have a const r-value reference const TxnFile &&. This means that even though you have a r-value reference, you can’t change it at all - and what’s the point of that if you want the move semantics? The right way is to use a regular r-value reference TxnFile &&. Keep in mind declaring move constructor disable the copy constructor so you shouldn’t run into this problem again. But just for better clarifying the intention, it’s a good practice to delete the copy constructor explicitly.

class TxnFile {
public:
TxnFile(const std::string &file)
: m_file(file), m_committed(false) {
}

TxnFile(const TxnFile &) = delete;
TxnFile &operator =(const TxnFile &) = delete;

TxnFile(TxnFile &&rhs)
: m_file(std::move(rhs.m_file), m_committed(rhs.m_committed) {
}

TxnFile &operator = (TxnFile &&rhs) {
reset();

m_file = std::move(that.m_file);
m_committed = that.m_committed;
}

~TxnFile() {
reset();
}

void reset() {
if (!m_committed) {
std::remove(file);
}
}

const std::string &get_file() { return m_file; }

void commit() {
m_comitted = true;
}
}

private:
std::string m_file;
bool m_committed;
};



## A bit of rant

OK. We are finally done. Personally I think C++ has grown to the point that it is too complicated for most people and the interaction between different features can lead to really surprising behaviors. Even if you want to write a simple class there is already too much things to consider (copy/move semantics). And don’t get me started on template meta programming. However, if you stick to a relatively sane subset of C++, maybe you’ll be fine. Just maybe. I’ve been working in large C++ codebases profesionally for 14+ years and I still make stupid mistakes.

# CoreCLR's environment is not your environment

CoreCLR maintains its own private copy of environment variables

This came up when I was helping another collegue in a previous job (where I still write C# code probably 50% of the time), diagnosing a library load failure problem inside a linux container. Internally there is this library that loads different implementations (mock implementation and real implementation) of another library based on a environment variable USE_MAGIC_TEST_LIB, and the .NET code calling that library is calling SetEnvironmentVariable to set it conditionally as part of a testing framework:

if (useTestFramework)
{
Environment.SetEnvironmentVariable("USE_MAGIC_TEST_LIB", "1");
}



This looks reasonable except that it didn’t work at all. It loaded the wrong library and things quickly went down hill after that.

We were scratching our heads for a while until we decided to add a trace to see if the environment is actually taking effect or not in the native code. Interestingly, the native code didn’t see it at all. It’s like they don’t know about each other’s environment!

Actually that observation is more or less what’s going on. Before we dig in a bit deeper, here is a bit of history of CoreCLR cross-platform implementation. Not surprisingly, .NET code started as Windows centric and all the OS calls are strictly Windows API. At some point folks decide to port it to linux/Mac as part of Rotor (later Silverlight), there are two options:

1. Design a new platform abstraction from scratch and move it to that
2. Align the API design to Windows and implement Windows API on top of Linux API

2 is obviously the cheaper solution and has the advantage that Windows code would be untouched and therefore won’t get regressions, which is super important. The caveat is that implementing Windows API using Linux API can get tricky, but is the risk people are willing to take. So the new PAL layer is introduced with “new” APIs that looks exactly like Windows APIs implemented using Linux APIs.

In the case of SetEnvironmentVariable, it is implemented in PAL/environ.cpp:

BOOL
PALAPI
SetEnvironmentVariableA(
IN LPCSTR lpName,
IN LPCSTR lpValue)
{
// ...

// All the conditions are met. Set the variable.
int iLen = strlen(lpName) + strlen(lpValue) + 2;
LPSTR string = (LPSTR) PAL_malloc(iLen);
if (string == nullptr)
{
bRet = FALSE;
ERROR("Unable to allocate memory\n");
SetLastError(ERROR_NOT_ENOUGH_MEMORY);
goto done;
}

sprintf_s(string, iLen, "%s=%s", lpName, lpValue);
nResult = EnvironPutenv(string, FALSE) ? 0 : -1;

PAL_free(string);
string = nullptr;

// If EnvironPutenv returns FALSE, it almost certainly failed to allocate memory.
if (nResult == -1)
{
bRet = FALSE;
ERROR("Unable to allocate memory\n");
SetLastError(ERROR_NOT_ENOUGH_MEMORY);
goto done;
}



This looks a bit fishy. It’s allocating its own buffer and calls into EnvironPutenv, which basically does this:

    for (i = 0; palEnvironment[i] != nullptr; i++)
{
const char *existingEquals = strchr(palEnvironment[i], '=');
if (existingEquals - palEnvironment[i] == nameLength)
{
if (memcmp(entry, palEnvironment[i], nameLength) == 0)
{
free(palEnvironment[i]);
palEnvironment[i] = copy;

result = TRUE;
break;
}
}
}

if (palEnvironment[i] == nullptr)
{
_ASSERTE(i < palEnvironmentCapacity);
if (i == (palEnvironmentCapacity - 1))
{
// We found the first null, but it's the last element in our environment
// block. We need more space in our environment, so let's double its size.
int resizeRet = ResizeEnvironment(palEnvironmentCapacity * 2);
if (resizeRet != TRUE)
{
free(copy);
goto done;
}
}

_ASSERTE(copy != nullptr);
palEnvironment[i] = copy;
palEnvironment[i + 1] = nullptr;
palEnvironmentCount++;

result = TRUE;
}


So it’s basically managing its own memory in palEnvironment environment array! No wonder things don’t work.

But why go through all the trouble while Linux has its own getenv/setenv?

http://rachelbythebay.com/w/2017/01/30/env/

Modifications of environment variables are not allowed in multi-threaded programs. – the glibc manual

https://github.com/dotnet/coreclr/issues/635

From looking at the code, I suspect that the cached environment was attempt to fix thread safety or consistency problems between Environment.GetEnvironmentVariables and Environment.SetEnvironmentVariable.

The enumeration of the environment starts by reading the environ global variable, without any locks. Consider what may happen if somebody calls setenv while the enumeration is in progress.

It’s because setenv/getenv isn’t particularly thread safe - you can crash when reading environment while the environment get modifed by another thread, or two threads modifying environment at the same time can lead to leaks.

In this case, one can see a few options:

1. Do nothing - the issues are linux-specific and you should take care when calling these functions, the same way just like you call them in linux.
2. Throw PlatformNotSupported - getenv/setenv just isn’t safe
3. Adding critical section around getenv/setenv - make them safe to be called in multiple threads
4. Implement your own safe environment helpers - as a result rest of the native library won’t observe the change through getenv/setenv

1 isn’t really acceptable because .NET developers need the code to be portable - they don’t want handle the platform special oddies to make their code portable. They would like .NET platform library to be safe and reliable.

2 isn’t great either for the same reason, and also it’ll break a ton of code when ported to linux.

3 makes .NET code safe, but it wouldn’t protect against native code racing with getenv/setenv calls from within .NET code, so race conditions would still occur and .NET developer has little control.

4 is safe, but can lead to subtle breaking changes.

Unfortunately there isn’t a great option here. 1, 2, and 4 are safe option, but all of them have their downsides. At the end of the day, it comes down to compatibility/portability vs surprising behavior. .NET team favors compatibility and portability. While it can lead to sutble breaking changes, fortunately the breaking changes are consistent and therefore easier to diagnose. In our case being a container launched by another system makes the whole thing much harder to diagnose, but that’s more a problem of the container launcher itself. Even though we were bit by the very same problem, I agree it is most likely the better choice.

https://github.com/dotnet/coreclr/blob/master/src/pal/src/misc/environ.cpp#L607

And here is a simple code snippet to demonstrate the issue. I’ve tested this on my MBP.

using System;
using System.Runtime.InteropServices;

namespace set_env
{
class Program
{
[DllImport("/usr/lib/system/libsystem_c.dylib")]
static extern IntPtr getenv(string name);

[DllImport("/usr/lib/system/libsystem_c.dylib")]
static extern int setenv(string name, string value);

static void Main(string[] args)
{
string envName = "MY_ENV";

Console.WriteLine("MY_ENV={0}", Environment.GetEnvironmentVariable(envName));

Environment.SetEnvironmentVariable(envName, "~/path");
Console.WriteLine("Setting it to ~/path");

Console.WriteLine("MY_ENV={0}", Environment.GetEnvironmentVariable(envName));

IntPtr env = getenv(envName);
string envStr = Marshal.PtrToStringAnsi(env);
Console.WriteLine("getenv(MY_ENV)={0}", envStr);

Console.WriteLine("Setting it using setenv");
setenv(envName, "~/path");

env = getenv(envName);
envStr = Marshal.PtrToStringAnsi(env);
Console.WriteLine("getenv(MY_ENV)={0}", envStr);
}
}
}


Now if you are interested to dig in a bit more, here are some bonus questions for your consideration:

1. Does the same problem happen in Windows? And why/why not?
2. Why does the above code above use Marshal.PtrToStringAnsi instead of just have getenv returning the string?

That’s all for now. Thanks for reading!

# Sorting structured data in a unstructured way using memcmp-friendly encoding Part 1 - sorting integers and strings

Sorting structured data in a unstructed way using memcmp-friendly encoding part 1

In many interesting data storage applications, one often need to sort the data in order to do efficient searching - it is such a fundamental operation. The data is often structured like in databases (or it is using a database under the hood) - the application knows exactly what the data is - for example, a record/row with an integer field, a string field, as well as datetime field, etc. In those cases, you can easily sort these data by interpreting the data as what it is, and then comparing them one by one. This is usually achieved by having a base class with virtual functions, and having several derived class implementing the comparison function as well as determine the length to move the next one:

class Data {
public:
virtual int Compare(void *left, void *right) = 0;
virtual int GetLength(void *data) = 0;
};
class UInt32_Data : public Data { public:
virtual int Compare(void *left, void *right) {
auto left_int = reinterpret_cast<uint32_t *>(left);
auto right_int = reinterpret_cast<uint32_t *>(right);
if (left_int < right_int) {
return -1;
} else if (left_int == right_int) {
return 0;
} else {
return 1;
}
}
virtual int GetLength(void *data) {
// No need to read data - uint32_t is always fixed size
return sizeof(uint32_t);
}
};



You can merge these two function into one - for better efficiency. For clarity I’m keeping them separate.

Besides virtual functions, you can also implement this with a jump table that points to a list of functions, or even a switch / case, etc. They are not that fundamentally different - all of them can involve having a table of address to call/jump to, and use a memory lookup to determine the target address.

However, the additional cost of invoking the right comparing functions isn’t zero - as a matter fact it is quite significant comparing to the actual comparison function itself in the case of a virtual function call, which involves putting arguments into registry/stack, pushing return address into stack, setting up frame pointer, etc.

If you are an experienced system programmer, you might know tricks to optimize this further. For example, given this is the exact same problem as an interpreter, and people like interpreter to be fast, VM like Dalvik employed advanced techniques like writing the code in assembly, using threaded execution (which is a fancy way of saying the end of interpreter loop decodes the next instruction instead of jumping to the beginning), and using computing address instead of jump table. These are interesting topics that I might talk about at some point in the future. Anyway, those are not easy to implement and maintain.

But are there other ways to get around this? Is there a way to compare this without understanding what the data is?

The most straight-forward comparison is a byte comparison or memcmp, and this is the most universal way to compare two byte sequences. Many key/value stores (like levelDB/RocksDB) only support byte comparison and allow you to plugin a custom comparator. But before you go ahead and try to implement the custom comparator, let’s give one idea a try: what if we can represent the data somehow as a byte-comparison friendly format?

The challenge are two fold:

1. Encoding the data so that byte order is the correct sorting order
2. Support variable length data properly so that you don’t accidentally compare unrelated data

## Unsigned 32-bit integer

Let’s start with something most straight-forward: a 32-bit unsigned integer. Assume you are working on a Intel machine just like everybody else (and not something esoteric like SPARC) - those unsigned 32-bit integers are represented as little-endian, which means least significant byte will be in memory before the most significant ones. So 0x12345678 will be represented in memory as:

0x78, 0x56, 0x34, 0x12


Obviously this isn’t what we want - we need the compare the most significant byte first, which is exactly Big-Endian:

0x12, 0x34, 0x56, 0x78


Now it’s safe to do a memcmp them now - the bytes are in most-significant to least significant order, and the length is fixed 4-bytes.

Now those SPARC CPU looks pretty sweet, right?

## Signed 32-bit

Let’s make this a bit more interesting. What if the integer is signed?

For signed 32-bit, the range is -2147483648 to +2147483647. There are two cases:

1. Negative: -214783648 (0x10000000) to -1 (0xffffffff),
2. Non-Negative: 0 (0x00000000) to +2147483647 (0x7fffffff)

The non-negative case looks alright - just like the unsigned 32-bit integer case, as long as they are converted to Big-Endian.

For the negative case, the order is correct: -214783648 (0x10000000), -214783647 (0x10000001), … (0xffffffff), except the most significant bit is always one, which makes it bigger than the non-negative case. It is not hard to come up with a fix - just flip the sign bit. Now it becomes:

1. Non-Negative: -214783648 (0x00000000) to -1 (0x7fffffff),
2. Negative: 0 (0x80000000) to +2147483647 (0xffffffff)

Now this looks really nice, and these two ranges are now in the right order, and -1 (0x7ffffffff)+1 = 0 (0x80000000). Now the universe is balanced.

## 8-bit ASCII Strings

For strings, let’s again start with the easy case - 8-bit ASCII strings (we’ll refer it to ASCII string from now on, just for simplicity). For a fixed length ASCII string, it’s really easy - memcmp just works. But how about variable length ASCII?

In such case, the real problem happens when you have string A and B and A is a prefix of B:

A: A, A, A, A, A, A
B: A, A, A, A, A, A, B, B, B


What if just after A there is other data:

A: A, A, A, A, A, A, 0x43
B: A, A, A, A, A, A, B, B


In this case, 0x43 = ‘C’ which is bigger than B, even though A string is smaller than B. Oops.

The key to the problem is that you have to compare the two strings by themselves - you can’t compare other unrelated data by accident (which is our challenge #2, earlier, if you paid attention). You could pad the strings so that they are equal, if you know the maximum length ahead of time (for example, in SQL VARCHAR has max length), but that can be a waste of space.

If you dig deeper, one interesting insight is that if you can have a magic special character that is always guarantee to be smaller than any valid contents in the other string before it ends, then it’ll just work. In many cases, we had no such luxury as strings may have embedded NULLs. But that does provide some additional hint: what if we can artifically inject such marker into the string such that the one that is longer has a bigger byte marker?

A: A, A, A, A, A, A, 0x0
B: A, A, A, A, A, A, 0x1, B, B, B, 0x0


In the above case, A ends with 0x0, while B injects 0x1 as 7th char, making sure it is bigger than A when A ends. Note that 0x1 in this case means there are more data after this, so the encoder/decoder need to take that into account. This looks nice, but we do need to make sure the markers are always at the same place. In order to do that, we can pad/break the strings to split them into predictable fixed length parts with a marker at the last byte. Let’s say if we break it apart at 6 characters, it’ll be exactly like this:

A: A, A, A, A, A, A, 0x0
B: A, A, A, A, A, A, 0x1, B, B, B, 0x0, 0x0, 0x0, 0x0


Note the 4 0x0 (‘ ‘) padding in between, making sure we break the strings every 6 characters. Now, any experienced programmer will tell you that you should always ends things at power of 2 (so that it works better with cache, alignment, etc), so 8/16/32/… would be obviously a better choice. For now let’s go with 8 just to make it easier:

A: A, A, A, A, A, A, 0x0, 0x0
B: A, A, A, A, A, A,   A, 0x1, B, B, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0


A bit wasteful, but much better than storing the entire string padded to max length. Also, keep in mind that this encoding supports storing NULL characters as the 0 at every 8th character has special meaning.

But we are not done yet. Do you see there is one more problem?

We are padding the strings with 0x0, and now the strings have some unwanted 0x0 characters padded which we are not able to distingush with actual spaces. Fortunately we still have plenty of run away with the encoding, we can put 1~8 there to indicate number of real characaters (not the padding):

A: A, A, A, A, A, A, 0x0, 0x0
B: A, A, A, A, A, A,   A, 0x2, B, B, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0


But this isn’t quite right yet, this can easily get broken (thanks for Thief pointing it out) as the marker themselves get into comparison:

A: A, A, A, A, A, A, A, 0x3, A, A,   A, 0x0, 0x0, 0x0, 0x0, 0x0
B: A, A, A, A, A, A, A, 0x2, B, B, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0


To fix this, instead of signaling the number of characters in next segment, it can represent the number of characters in the current segment:

A: A, A, A, A, A, A, A, 0x7, A, A, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2
B: A, A, A, A, A, A, A, 0x7, A, A,   B, 0x0, 0x0, 0x0, 0x0, 0x3


For non-NULL characters, it’ll work as any other character will be bigger. For embedded NULL characters, either the last non-NULL character would help:

A: A, A, A, A, A, A, A, 0x7, A, A, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2
B: A, A, A, A, A, A, A, 0x7, A, A, 0x0,   A, 0x0, 0x0, 0x0, 0x4


Or for pure NULL padding case, the last 0x2/0x4 will help disambuigate any difference.

A: A, A, A, A, A, A, A, 0x7, A, A, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2
B: A, A, A, A, A, A, A, 0x7, A, A, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4


This is still not quite perfect, though. If a string happens to end at N boundary:

A, A, A, A, A, A, A, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0


The final N bytes are wasted just to provide the indicator. The fix is simple: instead of N - 1 indicating more characters, we can have two cases:

1. N - 1 : the segment is full and there are no more characters
2. N : the segment is full and there are more characters

To illustrate this idea:

A, A, A, A, A,   A,   A, 0x8, B, B, B, 0x0, 0x0, 0x0, 0x0, 0x3
A, A, A, A, A,   A,   A, 0x7
A, A, A, A, A, 0x0, 0x0, 0x5


In summary, we break down the string in the chunk of 8/16/32/… characters and using the every Nth character a special marker that indicates:

1. 1 ~ N - 1 : the number of characters in the current segment. The rest is 0x0 padding.
2. N : the current segment is full and there are more characters

We can always convert it into a case that we know about - if we can convert such string into UTF-8, which works great in byte comparison even in the case of multi-byte characters. If you haven’t looked at it yet, you should. It’s brilliant, and everyone should be talking in UTF-8 (I’m looking at you, Windows). Just go to https://en.wikipedia.org/wiki/UTF-8.

## What about sort ordering and collation?

Those cases can get really complicated depends on the encoding + collation so I won’t get into them. But the idea is always the same: transform the bytes into the correct sorting order as dicated by the character set/encoding. If the encoding byte order happens to be the right sorting order (UTF-8, for example), all the better.

If you are interested you might want to give your favorite encoding a try.

## We are not done yet

In this post I’ve discussed approaches to encode your data in a way that is sort friendly. In cases where there are lot of read/seek/lookup, it can make a really huge difference in terms of lookup performance, but in write heavy environments it may not be the right trade-off as the overhead of the encoding become bigger and the caching to offset the decoding cost become less effective. At the end of the day, there is no silver bullet and you need to pick the right solution for your scenario at hand.

However, we are not quite done yet. There is one more interesting scenario we can look at: floats. This is a non-trivial topic as we need to dive into the binary format of floats/doubles. Hope I’ll see you next time!

EDIT NOTE: This article has been updated to address some bugs in the examples and in the encoding. Thanks everyone for the suggestion/corrections!