View all

Convert escaped Unicode to HTML entities

January 7th, 2012

unicode-charactersI’m posting this here just because I spent far too long looking for the answer to this problem, before eventually finding a solution at http://ho.runcode.us and then realising the information had been in Stackoverflow all along – the search terms I was using must have just been missing it.

So all of the following comes from:
http://stackoverflow.com/questions/3480074/how-do-i-convert-unicode-codepoints-to-hexadecimal-html-entities and is thanks to Stackoverflow users Joey and Artefacto.

The problem

In short:

Data that I want to display on a Web page contains escaped Unicode code units in it (like \u0096) that I want to convert into HTML entities (like –) so that they will show up correctly on the Web page.

In a little more detail:

I have a Web page with a table control that uses JSON. I have a PHP script that takes some user input (search query terms), goes and searches a database, forms the search results into an array (called $data) with the correct format to pass to json_encode():

$str = json_encode($data);

However, the value you pass to json_encode() must be UTF-8 encoded data, and mine isn’t, so I need to pass it through utf8_encode() first, before I pass it to json_encode(). For example, for the contents of the name field:

utf8_encode($matchedRecord['name'])

The result of this is that any slightly unusual characters, like the en dash (–), come back transformed into one or more escaped Unicode code units (e.g. the en dash becomes \u0096). When this is included in an HTML page and sent to a browser it’s displayed as a strange little symbol by Firefox (e.g. escaped-unicode-on-web-page), or just not displayed at all by Internet Explorer.

So I need to take the string from json_encode() and pass it through something that will convert the escaped Unicode code units (like \u0096) into HTML entities (like –) so that they will show up correctly in the browser.

The solution

$str = preg_replace('/\\\\U0*([0-9a-fA-F]{1,5})/i', '&#x\1;', $str);

This did the trick for me. However, Artefacto supplies a function that handles UTF-16 code points (i.e. characters made up of one or two 16-bit code units):

$str = unenc_utf16_code_units($str);
   
function unenc_utf16_code_units($string) {
    /* go for possible surrogate pairs first */
    $string = preg_replace_callback(
        '/\\\\U(D[89ab][0-9a-f]{2})\\\\U(D[c-f][0-9a-f]{2})/i',
        function ($matches) {
            $hi_surr = hexdec($matches[1]);
            $lo_surr = hexdec($matches[2]);
            $scalar = (0x10000 + (($hi_surr & 0x3FF) << 10) |
                ($lo_surr & 0x3FF));
            return "&#x" . dechex($scalar) . ";";
        }, $string);
    /* now the rest */
    $string = preg_replace_callback('/\\\\U([0-9a-f]{4})/i',
        function ($matches) {
            //just to remove leading zeros
            return "&#x" . dechex(hexdec($matches[1])) . ";";
        }, $string);
    return $string;
}   

Leave a comment

 

How to: Get a white screen

June 28th, 2011    3 Comments

Firstly, why would you want to blank your computer screen?

Answer: If you take screenshots of windows with the Windows Aero theme (i.e. semi-transparent window chrome) then it’s a good idea to always use a plain white background to avoid the distraction of other text or bits of underlying windows showing through.

To quickly display a white screen:

  1. Copy and paste the following into the address bar of your browser:

    javascript:document.write(), document.close(); void 0;

  2. Press Enter.

  3. Press F11 to get rid of the browser chrome (and F11 again to get out of full-screen mode).

Bookmark it

For a quick blank screen at any time (without having to remember the Javascript), save it as a bookmark by either right-clicking the following link and choosing Bookmark This Link (in Firefox) or Add to Favorites (IE), or – if you’re using Chrome or Safari – display the Bookmarks Bar and drag the following link into it:

Blank Page

Alternative solution: Save a blank page

Alternatively, you can make a blank web page and save it to your desktop (or anywhere else) for quick access whenever you need a white background:

  1. Create a text file containing the following:

    <html><body></body></html>

  2. Save this to your desktop (or wherever) with an HTML file name extension – e.g. WhiteScreen.html.

Now you can double-click the file to display a white web page. Again, press F11 for full-screen mode. You can bookmark the page if you prefer using this method rather than the Javascript method.

Leave a comment



YOURLS: Your Own URL Shortener

October 24th, 2010    2 Comments

Shorteners in brief

If you use twitter you'll be familiar with the concept of URL shortening. You want to tweet about that video where the dog thinks its own leg is trying to steal the bone but you've only got 140 characters to say what the video is and include the link to YouTube. URL shorteners allow you to change:

http://www.youtube.com/watch?v=tJgMueh-zLM&feature=youtu.be
to:
http://bit.ly/dfzFE6

Even if you don't use twitter URL shorteners can come in handy. For example, at the beginning and end of the ITauthor podcast I use some music by Amplifico and I like to put a link to their page on musically.com in the MP3 description that you can read on your iPod when you're listening to the podcast. It's much nicer to give the URL http://tinyurl.com/amplifico, rather than http://www.musicalley.com/music/listeners/artistdetails.php?BandHash=cdef1ecef0d12844ed816b922fcada5d.

Some popular URL shorteners

  • tinyurl – This was the first URL shortener most of us will have come across - way back before twitter appeared and ramped up demand for short URLs, leading to a proliferation of shortening services.
  • bit.ly – twitter supported use of bit.ly which made it a popular service. Recently there have been doubts raised about the wisdom of using a Libyan registered domain (.ly) as the Libyan government have said they will take down domains that contain immoral content.
  • j.mp – This is just bit.ly but with 2 fewer characters. If you already have a bit.ly URL you can use the same shortened path, stick it on the end of the j.mp domain and save yourself those 2 precious characters. For example, the dog video gets shortened to http://j.mp/dfzFE6.
  • goo.gl – Google are one of the many big companies that have now got into the URL shortening business.
  • is.gd – just a nice simple Web page that produces nice short URLs.
  • ... I could go on, but there's not a whole lot of difference between these services.

Your very own URL shortener

Shortening URLs isn't difficult to do and there are a selection of free URL shorteners that allow you to produce your own short URLs. All you need is your own Web site and your own domain name. So, for example, I own the domain name itauthor.com, so I can produce short URLs like http://itauthor.com/1 or (more descriptively) http://itauthor.com/podcast36.

imageThe solution I'm using is called yourls. It's a series of PHP scripts with a MySQL database behind it. So if you're already running a Web site based on PHP and MySQL (for example, a WordPress blog) then you've already got everything you need. Just upload it and browse to the admin page. The yourls contains all the instructions you need.

The only problem I had was as a result of some changes not getting written to my .htaccess file in my root Web directory. I had to go and manually add the following at the start of the .htaccess file :

# BEGIN YOURLS
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP_HOST} !^itauthor.com$
RewriteRule . - [S=3]
RewriteRule ^([0-9A-Za-z]+)/?$ /yourls-go.php?id=$1 [L]
RewriteRule ^([0-9A-Za-z]+)\+/?$ /yourls-infos.php?id=$1 [L]
RewriteRule ^([0-9A-Za-z]+)\+all/?$ /yourls-infos.php?id=$1&all=1 [L]
</IfModule>
# END YOURLS

You don't need the two lines highlighted in red if you're not running WordPress, or anything similar that relies on being able to rewrite URLs. The yourls documentation says, in this situation, you need to put all the yourls files and directories in a subdirectory of your root Web directory (e.g. in a directory called "u"). However, this means that you need to include the subdirectory in the YOURLS_SITE configuration setting and it'll then be part of the shortened URL (e.g. http://itauthor.com/u/123, which kind of defeats the purpose. So the two red lines get around this by diverting URLs without "www" to yourls.

The first of the red lines says "only apply the following rule if the URL doesn't begin http://itauthor.com". The second red line says "if the previous condition resolved as true then skip the following three rules".

This seems a bit like a double negative but it's necessary because RewriteCond only applies to the RewriteRule that immediately follows it, so we need the skip rule. The result is that, on my site, the three RewriteRules that divert page requests to the yourls PHP scripts are only applied to URLs beginning http://itauthor.com. The "[L]" means "last" - in other words, if this RewriteRule is applied don't go any further, so we never reach the rules that WordPress uses, which are further down the .htaccess file. If a URL begins http://www.itauthor.com then the yourls rules are skipped and the URL is processed using the WordPress rules.

This means that http://itauthor.com/2 is sent to yourls to retrieve the original, long URL from its database, whereas http://www.itauthor.com/podcasts is sent to WordPress to create a Web page using content from its database.

What's the point?

Well, okay, there's really no point other than a bit of personal domain name vanity. Why have your tweets full of bit.ly or goo.gl URLs when you could have your own domain name showing up – even if clicking the link doesn't take your tweet readers to your Web site.

And to finish, just because I find it very funny, here's that video of the back leg bone thief:

Potentially similar posts

Leave a comment



ITauthor podcast #35 – On Crammond Island, thinking about technical writing

September 26th, 2010    2 Comments

Took the dog for a walk and talked to myself about documentation. Crazy? Or just missing technical writing?

Didn't have my camera with me, but I snapped these with my Blackberry:


Want to get emailed next time I publish a podcast?

   Preview

RSS Feed RSS Feed   Add to del.icio.us Add to del.icio.us   Add to del.icio.us Add to Digg   Add to iTunes Add to iTunes   Add to Zune Add to Zune   Add to Google Add to Google

ITauthor.com/podcasts – the technical writing podcast

Leave a comment



WordPress for Blackberry

July 8th, 2009

I'm just testing posting to this blog from my phone, using the new WordPress for the BlackBerry application - now available, in beta: http://bit.ly/6lXcM

Leave a comment



^ back to top ^

Page 1 of 2612345Older1020OlderLast »