How to avoid collisions using UUIDs

There’s a simple way to avoid collisions when using UUIDs as primary keys: Use UUIDs as primary keys.

No, that’s not a typo. The idea that Get ( UUID ) will create a duplicate ID in a single table, in a single file, or even all tables in all FileMaker files you ever create is, simply, false.

I can hear someone out there (likely here) saying: “Whaaat? Just because it’s very unlikely, doesn’t mean it can’t happen”.

And my answer is: “Actually, it does mean that. That’s exactly what it means.” Having a collision of a properly created randomly-generated UUID so unlikely that when we start slicing words like “can’t” or “impossible” to include such an unlikely event we are presenting a false view of the world.

The likelihood of a collision is so absurdly rare, that it is impossible as far as human brains can conceive of it. With the levels of risk we’re talking about, saying, “it is impossible” to have a collision is more accurate than saying “it is possible”. It’s like describing the statement, “The sun will rise tomorrow” as being false, because “it’s possible” an alien civilization has launched a near-lightspeed moon-sized object at Earth that will hit tonight destroying the planet and preventing any more sunrises. After all, it’s possible.

This point is about properly created UUIDs. It is non-trivial to create a unique sequence of numbers and letters. While the process has been standardized, some people don’t follow the standards. According to David McKee, Senior Software Engineer at FMI, the Get ( UUID ) function uses the operating systems’ native UUID function. On macOS it is CFUUIDCreate() and on Windows UuidCreate(). FM engineers are smartly delegating the UUID creation to Apple and Microsoft.

It seems completely reasonable that FM’s Get ( UUID ) is rock solid.

The UUID RFC standard used by FM is “subtype 4 variant 1” where the UUID is randomly constructed (not based on the time or the computer ID). Pulling from Wikipedia’s page on UUIDs, the UUID t is composed of 32 hexadecimal digits (using the base 16 system of 0 through 9 and the letters A through F). The ID gets formatted by the standard which reduces the complexity by a few bits and, in the end, there are 2^122 bits of information which produces roughly 5.3 undecillion possible IDs. “Undecillion” is a cool word, but really, it’s impossible for the human brain can’t grasp the significance of its enormity. If you follow the 2, 3, 4 as billion, trillion, quadrillion, eventually you get to undecillion which starts with the Latin prefix for 1 and for 10 also known as eleven.

That number is can’t be understood because of its immensity. Let’s try to reduce it to human-relatable terms. First, few people are concerned about one particular UUID being duplicated; developers care if any duplicate occurs, which is an example of the Birthday Paradox. And then I’ll introduce real life variables to get a sense of the number.

Probability, like hugely large numbers, can be hard to wrap our heads around. We can’t just say, “What are the chances there’s a duplicate UUID?” We need to specify the risk we’re looking at. I think a one in a billion chance is a pretty safe risk.

How many UUIDs need to be created for there to be a one in a billion chance of a collision among any of them? The answer: 103 trillion.

Let ( [
//set probability to a billion.
probability = 1000000000 ;

//these calcs do the prep work
p = 1 / probability ; x = ln ( 1 / ( 1-p ) ) ;
y = 2 * 2^122 * x ;

//number of UUIDs needed
z = Sqrt ( y ) ;

//use r.factor to round result to leading 3 digits
r.factor = -1 * Length ( Int ( z ) ) + 3 ;

result = Round ( z ; r.factor ) ];
result )

There we are. To get a one in a billion chance of a duplicate, we’d have to create 103 trillion UUIDs. 103 trillion. That number is so big, there’s no real way to describe it, except to say it’s impossible.

Some humans, like writers and journalists, try to make Very Large Numbers accessible. “If you stacked 10 trillion dollar bills you’d reach the moon and back.” Which doesn’t actually help; it’s still incomprehensible because who can actually related to moon travel, except for Neil Armstrong and friends? (Side note, Neil Armstrong used to tell bad moon jokes and when no one laughed, he’d mumble, “Guess you had to be there….”).

Imagine there are 100,000 FileMaker developers (we and FMI wishes!). Each developer creates 10 files a year. Each file has 100 tables. Each table 10 million records. Each record uses a UUID.

In 10 years, there’s barely a 1 in a million chance that there will be any duplicate UUIDs among all UUIDs created by all developers in the world. There is drastically less likelihood that such a duplicate would have a real world impact.

The human brain isn’t great at understanding very large numbers or probability, but math doesn’t lie. For any reasonable definition of impossible, UUID collision is impossible.

 

Leave a Reply