Skip to content Skip to sidebar Skip to footer

How To Get The Correct Element From A Unicode String?

I want to get specific letters from an unicode string using index. However, it doesn't work as expected. Example: also length doesn't work as expected normal.length gives correct

Solution 1:

In Javascript, a string is a sequence of 16-bit code points. Since these characters are encoded above the Basic Multilingual Plane, it means that they are represented by a pair of code points, also known as a surrogate pair.

Reference

Unicode number of ๐–† is U+1D586. And 0x1D586 is greater than 0xFFFF (2^16). So, ๐–† is represented by a pair of code points, also known as a surrogate pair

console.log("๐–†".length)
console.log("๐–†" === "\uD835\uDD86")

One way is to create an array of characters using the spread syntax or Array.from() and then get the index you need

var handwriting = `๐–†๐–‡๐–ˆ๐–‰๐–Š๐–‹๐–Œ๐–๐–Ž๐–๐–๐–‘๐–’๐–“๐–”๐–•๐––๐–—๐–˜๐–™๐–š๐–›๐–œ๐–๐–ž๐–Ÿ๐•ฌ๐•ญ๐•ฎ๐•ฏ๐•ฐ๐•ฑ๐•ฒ๐•ณ๐•ด๐•ต๐•ถ๐•ท๐•ธ๐•น๐•บ๐•ป๐•ผ๐•ฝ๐•พ๐•ฟ๐–€๐–๐–‚๐–ƒ๐–„๐–…1234567890`console.log([...handwriting][3])
console.log(Array.from(handwriting)[3])

Solution 2:

A unicode character looks like '\u00E9' so if your string is longer this is normal. To have the real length of a unicode string, you have to convert it to an array :

let charArray = [...handwriting]
console.log(charArray.length) //=62

Each item of your array is a char of your string. charArray[3] will return you the unicode char corresponding to '๐–‰'

Post a Comment for "How To Get The Correct Element From A Unicode String?"