Friday, February 10, 2012

Fridaygram: Unicode, ancient lake, very ancient sound

Author Photo
By Scott Knaster, Google Code Blog Editor

Unicode was created with the ambitious goal of representing every human language, with room left over for a whole bunch of symbols, too. More than 20 years after Unicode was started, over 60% of the pages on the web are now encoded in Unicode. That’s pretty good growth when you consider that Unicode’s coverage was less than 5% of the web in 2005. Having a standard like Unicode is important because, as Mark Davis writes, "The more documents that are in Unicode, the less likely you will see mangled characters (what Japanese call mojibake) when you're surfing the web."

In news of older stuff, a Russian expedition that has been working for 10 years has finally drilled through Antarctic ice and reached Lake Vostok, a huge freshwater lake more than 12,000 feet below the surface. The ice has covered this lake for at least 15 million years, which is well before the work on Unicode began. Eventually the team will take samples of the lake water, looking for signs of life and other ancient treasures.

Finally, you can go back even further in time and listen to the song of a cricket that was around during the Jurassic period, 165 million years ago. That cricket really sounds great for its age.

On Fridays we take a break and do a Fridaygram post just for fun. Each Fridaygram item must pass only one test: it has to be interesting to us nerds.


  1. Unicode: how can we check that our application handles unicode properly from the database through the application framework and to the HMTL code? Storing a string with japanese/french or russian characters and see if the client browser displays the string properly? Do you have any pointers for articles that would help? The web part is documented here: but I wonder if you would know of an article/blog that explains it all. Thanks.

    1. I wrote an article on how to use UTF-8 :
      Get, store, manipulate and show data… but it's in french :

  2. Well.. i hope unicode gets even more wide-spred, but my hopes where hopeless lastly, because in our university.. most people just miss the point in encoding or ignore it.. you see a lot of ISO-x examples in newer scripts.. No one really uses utf-8 there.. I think thats really problematic, because i think people should get more aware of encodings these times...

  3. Its a great idea, do not forget to incorporate ideas from music, radio communication and picture art, the original universal languages!