Sorting structured data using memcmp-friendly encoding part 2 - floats

Sorting structured data using memcmp-friendly encoding part 2 - sorting floats

In the last post we’ve discussed converting integers and strings into a memcmp / byte-comparable format for faster comparison (but at the trade off of doing decoding/encoding at reading/writing). In this post let’s take a look at how do we do the same for floating pointers.

Get cherry-pick to work across file renames

Making cherry-pick work across file renames

Recently I need to port over some changes using cherry-pick and that usually works fine without any issues (except for occasional conflicts), but this time the actual file foo.cc was renamed to bar.cc. In such case git cherry-pick simply gives up and simply tells you the old file you are changing has been deleted. As far as I can tell there isn’t a good way to resolve the conflict.

There are a couple of ways to address this issue. But the easiest way I found is to just rename the file back to the original name where you had made the change on, in order to make git happy. Once that’s done, cherry-picking would work fine as usual. Now just rename the file back to the ‘new’ name. Squash the change.

This can be illustrated in following example - assuming:

  1. Your commit modifies foo.cc
  2. In the target branch (that you want to cherry-pick) renames foo.cc to bar.cc
# Create the target branch as usual
git checkout -b your-target-branch

# Rename bar.cc back to foo.cc to make git cherry-pick happy
git mv bar.cc foo.cc 
git commit -m "Make git happy"

# Cherry-pick as usual
git cherry-pick -x <commit>

# Rename it back
git mv foo.cc bar.cc 
git commit -m "Rename back"

# Squash the 3 commits into one
git rebase -i HEAD~3

In the rebase file, you’ll see:

pick 95be80db682 Make git happy
pick 3d74c6c9e13 Cherry-pick commit blah
pick 238e3c51354 Rename back

Change to:

pick 95be80db682 Make git happy
s 3d74c6c9e13 Cherry-pick commit blah
s 238e3c51354 Rename back

Here s means squash with previous commit.

Just remember in commit message deleting the first and third unrelated commit.

And now you are all set!

Repeatable reads in InnoDB comes with a catch

A few days ago I was looking into a deadlock issue that is caused by a behavioral difference between MySQL storage engine transaction behavior in repeatable reads. This leads me to dig deeper into repeatable read behavior in InnoDB and what I found is quite interesting:

The basics

Before we dig deeper, let’s revisit some of the basics of database isolation levels. You can refer to my earlier post for a more detailed explanation / comparison. Database isolation level defines the behavior of data read/write operations within transactions, and those can have a signficant impact to protecting the data integrity of your application. Repeatable reads guaratees that you would always observe the same value once you read it, and it would never change unless you’ve made the change yourself, giving you the illusion that it is exclusively owned by you and there is no one else. Of course, this isn’t true in practice as there are pessimistic locking and optimistic locking that defines the behavior when write conflict occurs.

Diagnosing interesting MySQL client connection error in localhost through the source code

The art of argument parsing and policy transparency

When working with MySQL the often most frustrating part is getting strange connection errors. I’ve wasted two hours trying to connect to a MySQL server using TCP port (unix domain sockets works fine) and I’ll talk about why it didn’t work, and as usual we’ll dive into the code to understand exactly why.

To simplify the problem, let’s say I have MySQL server at port 13010 and bound to localhost, with user name root and empty password (don’t do that in production):

[~/mysql]: mysql -p 13010 -h localhost -u root
Enter password:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

This is typical error many people will run into and you can find many similar posts that discuss the problem but few ever got to the bottom of it. Let’s jump right in.

-p and -P

Obviously when I write -p 13010 I meant to tell mysql client to connect to server using port 13010, but that’s not quite right:

[~/mysql]: mysql --help
  -p, --password[=name]
                      Password to use when connecting to server. If password is
  -P, --port=#        Port number to use for connection or 0 for default

So I actually told mysql the password is 13010 instead. Supporting both -p and -P is a apparently very bad idea.

Linux tools often have excessive amount of short options, like this one from man page for ls:

ls [[email protected]] [file …]

Personally I think they should go easy and only include the most common ones rather than using the entire alphabet.

However, the mystery is not yet solved. Note that we have been asked to enter the password, which explains why most people never suspected -p actually means password. Put in other words - if -p means password, why is this command is still asking for password?

The answer lies in the source code:

my_getopt.cc

  for (optend= cur_arg; *optend; optend++)
	{
	  opt_found= 0;
	  for (optp= longopts; optp->name; optp++)
	  {
	    if (optp->id && optp->id == (int) (uchar) *optend)
	    {
	      /* Option recognized. Find next what to do with it */
	      opt_found= 1;
	      if (optp->arg_type == REQUIRED_ARG ||
		        optp->arg_type == OPT_ARG)
	      {
					if (*(optend + 1))
					{
						/* The rest of the option is option argument */
						argument= optend + 1;
						/* This is in effect a jump out of the outer loop */
						optend= (char*) " ";
					}
					else
					{
						if (optp->arg_type == OPT_ARG)
						{
							if (optp->var_type == GET_BOOL)
								*((my_bool*) optp->value)= (my_bool) 1;
							if (get_one_option && get_one_option(optp->id, optp, argument))
								return EXIT_UNSPECIFIED_ERROR;
							continue;
						}
						/* Check if there are more arguments after this one */
      		  argument= *++pos;
		        (*argc)--;

The *(optend + 1) is the most interesting part. If a short-form option is being recognized, the rest immediately following the short option is treated as argument:

					if (*(optend + 1))
					{
						/* The rest of the option is option argument */
						argument= optend + 1;
						/* This is in effect a jump out of the outer loop */
						optend= (char*) " ";

Given that we are not passing -p13010, the 13010 part is ignored.

But wait, why does -h localhost work fine?

Just keep looking:

						if (optp->arg_type == OPT_ARG)
						{
							if (optp->var_type == GET_BOOL)
								*((my_bool*) optp->value)= (my_bool) 1;
							if (get_one_option && get_one_option(optp->id, optp, argument))
								return EXIT_UNSPECIFIED_ERROR;
							continue;
						}
						/* Check if there are more arguments after this one */
						if (!pos[1])
		        {
              return EXIT_ARGUMENT_REQUIRED;
	       	  }
      		  argument= *++pos;
		        (*argc)--;

So if the argument is an optional arg, it’ll give up and only check for immediate following argument. Otherwise, for OPT_REQUIRED, it assumes the next one is the argument.

Let’s take a look at where they are defined:

  {"password", 'p',
   "Password to use when connecting to server. If password is not given it's asked from the tty.",
   0, 0, 0, GET_PASSWORD, OPT_ARG, 0, 0, 0, 0, 0, 0},
  {"host", 'h', "Connect to host.", &current_host,
   &current_host, 0, GET_STR_ALLOC, REQUIRED_ARG, 0, 0, 0, 0, 0, 0},

As expected, password is optional and host is required.

Also, note that how it never checked for ‘=’? So the syntax -p=abc wouldn’t work as expected as well. And hilariously =abc would become the password. For arguments with a bit more error checking like port, the error message is a bit better:

[~/mysql]: mysql -P=13010 
mysql: [ERROR] Unknown suffix '=' used for variable 'port' (value '=13010')
mysql: [ERROR] mysql: Error while setting value '=13010' to 'port'

Note the ‘=13010’ part?

Default protocol

OK. Let’s try again:

[~/mysql/mysql-fork]: mysql -P 13010 -h localhost -u root
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

Still doesn’t work. We know it’s not the parsing of -P because port is OPT_REQUIRED:

  {"port", 'P', "Port number to use for connection or 0 for default to, in "
   "order of preference, my.cnf, $MYSQL_TCP_PORT, "
#if MYSQL_PORT_DEFAULT == 0
   "/etc/services, "
#endif
   "built-in default (" STRINGIFY_ARG(MYSQL_PORT) ").",
   &opt_mysql_port,
   &opt_mysql_port, 0, GET_UINT, REQUIRED_ARG, 0, 0, 0, 0, 0,  0},

Note the error message socket '/var/lib/mysql/mysql.sock. This is for domain socket.

To confirm this is the issue, let’s search for the actual error message:

const char *client_errors[]=
{
  "Unknown MySQL error",
  "Can't create UNIX socket (%d)",
  "Can't connect to local MySQL server through socket '%-.100s' (%d)",

The client_errors are looked up from error codes:

#define ER(X) (((X) >= CR_ERROR_FIRST && (X) <= CR_ERROR_LAST)? \
               client_errors[(X)-CR_ERROR_FIRST]: client_errors[CR_UNKNOWN_ERROR])

And the 3rd error is CR_SOCKET_CREATE_ERROR:

#define CR_ERROR_FIRST  	2000 /*Copy first error nr.*/
#define CR_UNKNOWN_ERROR	2000
#define CR_SOCKET_CREATE_ERROR	2001

Searching for that leads us back to client.cc:

  if (!net->vio &&
      (!mysql->options.protocol ||
       mysql->options.protocol == MYSQL_PROTOCOL_SOCKET) &&
      (unix_socket || mysql_unix_port) &&
      (!host || !strcmp(host,LOCAL_HOST)))
  {
    my_socket sock= socket(AF_UNIX, SOCK_STREAM, 0);
    DBUG_PRINT("info", ("Using socket"));
    if (sock == SOCKET_ERROR)
    {
      set_mysql_extended_error(mysql, CR_SOCKET_CREATE_ERROR,
                               unknown_sqlstate,
                               ER(CR_SOCKET_CREATE_ERROR),
                               socket_errno);
      DBUG_RETURN(STATE_MACHINE_FAILED);
    }

So this means by default we are connecting using Unix domain socket, and only if host is not specifed or is localhost!

Programs should be transparent about its policies, and give information about what it is doing. If that can end up being too verbose, add a verbose option. I’ll write a separate post about this because I’ve been bitten too many times by similar issues and now my favorite past-time is to add print/printf.

So there are two ways to fix this:

  1. Instead of local host, use 127.0.0.1. This fails the UNIX socket check and will fallback to TCP.
  2. Use --protocol tcp to force using TCP.

So the right command would be:

mysql -P 13010 -h localhost -u root --protocol tcp 

or

mysql -P 13010 -h 127.0.0.1 -u root

Summary

These two problems can be easily avoided by adding more messages to the mysql client, such as:

Trying to connect to UNIX domain socket localhost...
Connecting to database `12310`. 

These would’ve avoided wasting collectively god knows how much time wasted. Maybe I should submit a patch when I get a chance.

The gotchas:

  1. mysql short-option with optional args only accept arguments when they immediately follow the option, such as ‘-pmypassword’. Specifying as ‘-p blah’ and blah will be interpreted as current database. Short option with required args don’t have this problem.

  2. When there is no protocol specified, mysql will try to connect as UNIX domain socket if connecting to localhost or host isn’t specified. To work around it, use IP address instead of localhost, or specify protocol explicitly using --protocol.

Byebye Windows - going full linux

Going linux full time

In my new job, no one cares about windows.

Every single developer (with a few exceptions) use MacBook Pro, and connect to their linux VM to get work done. Some people have the trash can MacPro. You get the idea. Being in Microsoft for ~12 years, this is admittingly a interesting adventure. Even though in several Microsoft projects in the past that I have been working on had linux versions (CoreCLR, Service Fabric, etc), most development is still done in Windows, and then ported to Linux/Mac. Whenever occasionally you wonder into the no-man’s land in Linux where the project tooling / infrastructure is falling significantly behind, you want to pull your hair out. Not Linux’s fault - but a matter of priority. In some extreme cases you’d wonder how even one can put out a linux version out at all.

Not anymore. Now linux (or Mac, if you count that in) is the full time job.

After a few weeks of research and practice, I’ve been happyily chugging along with TMUX + VIM + MOSH with my custom key bindings. In this article I’ll talk about a bit of my experience of making the transition.

I miss Visual Studio

Let’s get this one out of the way first. There is no replacement for Visual Studio. Period. The code completion (or Intelli-Sense) and debugging is simply unmatched by anything else in the market. VS Code is awesome in terms of just browsing code and doing some occasional debugging, but for writing code it is just OK as the “inteli-sense” (forgive my Microsoft VS Jargon) can be a hit or miss. Vim is good for text editing, and with plugins you can get some basic stuff to work, but again it’s no where near the quality of experience of Visual Studio. Usually it’s a love/hate relationship with Visual Studio - it’s kinda slow and some times buggy, but you can’t live without it. Well, you can, but you don’t want to.

Nowadays I use vim or VS Code / Atom for writing code, and gdb for debugging.

Debugging using GDB is fine

Being an reasonably experienced WinDbg user, Gdb’s command line taking a bit getting used to, but that’s about it. GDB also supports a TUI mode that shows the integrated text window for source/register/etc and a command window. It’s not great as many simple key bindings stop working in that mode (taken over by the TUI component) but as long as I can see a source code “window” over SSH I’m happy.

TMUX is awesome

TMUX is a terminal multiplexer. With TMUX you won’t lose your working state - even if you disconnect from SSH, just ‘tmux attach’ you’ll resume where you left off. In this sense it is equivalent to a Windows Remote Desktop session.

The most powerful part is that it also allow you to break the terminal into multiple panes and windows, and this way you don’t have to leave the terminal and can easily switch between many different tasks with quick shortcuts. No more need to manage windows - everything is within the terminal. It’s like a virtual desktop for terminals. It’s build in the way that you barely had to touch the mouse anymore. Well, until you move to the browser, that is.

VIM ftw

In my Microsoft job I use vim for simple editing purposes, and I like the vim way of thinking so much that I put all my editors into vim mode / vim plugin / vim key bindings. These days I found myself spending even more time in vim over SSH and so I invested more time finding better VIM configurations and plugins.

I use junegunn/vim-plug as my VIM plugin manager. It’s pretty minimal and gets the job done.

This is the list of plugins I use:

  • Command-T - blazing fast fuzzy file finder
  • delimitMate - automaticlly inserting delimiters such as (), [], etc
  • ack - text search tool
  • vim-gitgutter - shows in leftmost column where are the git changes using +/-/~
  • vim-fugitive - great git command wrappers
  • vim-easytags - automated tag generation and syntax highlighting. I found the syntax highlighting can cause performance issue in large files so I turne the syntax highlighting off.
  • vim-tmux-navigator - navigate between vim and tmux like they are integrated
  • a - switch between header and source. Enough said.
  • tcomment_vim - toggle comment/uncomment for lines
  • vim-surround - easy change/add surround characters like (), [], {}
  • nerdtree - navigate file/directory tree
  • vim-nerdtree-tabs - making nerd-tree like an integrated panel
  • vim-better-whitespace - highlight trailing whitespace characters. They are annoying for sure and lint warns about them
  • lightline - a light and configurable status line for vim
  • goyo - distraction free writing. Best for writing docs

SSH is the old Remote Desktop

In my old job I usually “remote” into my development machines at office - and “remote” means “Windows Remote Desktop”. In a reasonable connection it is actually quite nice - there is little lag and you almost feel you are working on a local machine, with all the graphical UI - it’s really amazing.

With linux, you fallback to the good old text-based SSH. It’s kinda amazing in its own way that you can have text-based remote protocol for complicated full screen programs like vim. You don’t get graphical UI this way - but for the most part you don’t need to, and it’s usually blazing fast.

Mosh improves over SSH that it is async (doesn’t wait for server response) so it feels even more responsive. The trade-off is that it can get a bit jarring when you type something and it does’t react correctly initially.

Shell matters

Windows Commmand Prompt is fine. It works. I still remember I learned my first DOS commands at a 33MHZ 386DX. But it hadn’t changed much since then. ConEmu is a popular terminal and some people (especally admins) use PowerShell as well. But none of those match the flexiblity of linux shells - they just have so much more to offer. You can switch between different shells, adding customizations, even plugins.

For now I’m using ZSH with oh-my-zsh. It has fantastic themes and plugins. My favorite features are:

  • Plugins that shows me all kind of status, such as git status, any pending background process, how long the last command took, etc.
  • Auto-suggestion. It automatically suggest the full command based on best match and it grays out the rest of the command that you didn’t type. It’s simple but almost feels like magic when you see it for the first time in action.
  • Syntax highlighting. Enough said.
  • VIM editing. Yes, you can now use VIM commands to edit your shell commands. Just think that you can easily navigate with all the muscle memory you had with vim. This should be mandatory in every thing that deal with text editing.

With all these, and throw in a few custom key bindings, the plain shell / windows command prompt just seems so boring.

You need to work on your configurations

However, tweaking these tools so that they work for you takes time. I find myself spending quite a bit of time tweaking the configurations to make it work better for me - and the time spent paid off. All the different configuration options are indeed quite overwhelming if starting from scratch so I use Awesome dotfiles project as my starting point for tweaking and forked my own version yizhang82/dotfiles. There are a lot of things that I like about the way the things are setup:

  • One script to deploy everything - TMUX/ZSH, the entire github repo containing dotfiles, and back them up
  • Dotfiles are configured to include the settings/scripts from the repo at ~/dotfiles - this way things can be automatically synchronized through a git pull. This is actually quite brilliant.
  • Automatically pulls the github repo every time ZSH starts - so it’s always up to date

Of course, many of the configurations there are already pretty good and is perfect as a starting point for my own configurations.

It contains all my TMUX, ZSH, VIM configurations, and by simplying cloning and running a script it goes into a new machine effortlessly. Most of these is done by the original author and I’m simply tweaking it to my needs.

I like it

It did take a bit getting used to, but I’m happy to report that I now feel very much productive roughly on the same level of productivity when I’m working on Windows (if not more). I do miss having a fully integrated Visual Studio experience, but the command line experience (with TMUX, etc) in Linux is so much better that it more than makes up for that. Of course, at the end of the day, what matters is getting the job done - just use the right tool for the job. In a future post I can get into a bit more details with my experience with these tools and share some of my learnings/tips.

P.S. I still use Windows at home. I have custom built (by myself) PC that has i7 4770K, 32G RAM, nVidia 2080 RTX mostly for gaming. I think Windows has mostly lost the mindshare of developers these days, but it’s still the OS for gamers, and will be for quite some time.

Pagination