I ran into this issue at work several weeks back and figured I’d drop a quick blog post about it, as it’s kinda nifty.
Some string matching code in our backend was reporting mismatches on identical text. I found an example of the text and played around with it in console and was able to isolate it to the following:
This was a bit of a headscratcher. If you don’t believe me, copy and paste this code into console and try it yourself:
'food' == 'food'
So… can you figure out what’s going on here?
Go ahead. I’ll wait…
It took me a hot minute to suss it out, but for me, the thing that clued me in was using the arrow keys to step through the string. You’ll notice that it takes an extra keypress to navigate between the letter d in the first “food” and the closing apostrophe.
It turns out there was a unicode character there called the zero width space. This character exists on the page and will come into play when doing string comparisons, but is effectively invisible.
This got me thinking… I know there are quite a few characters that appear identical but have different code points. So there might be an ê with a little hat on it in one language that appears identical to the ê in a different language, but they are technically different unicode characters and thus not at all the same.
So, if we can have 2 words that look the same but are different, can we have 2 words that look different but are the same?
This was the best I could come up with:
There is a unicode character called the right-to-left override which basically opens a portal to Hell by turning all the text appearing after it backwards. Go ahead and paste this into your console and play around with it a little. But make sure to step away before you shoot yourself:
'food' == 'food'
It turns out this character has some interesting security implications, as it can be used to manipulate URLs to make certain phishing attacks easier.