1. Home
  2. Computing & Technology
  3. Java
photo of Paul Leahy
Paul's Java Blog

By Paul Leahy, About.com Guide to Java

Answer to Monday's Code Point Question

Sunday November 23, 2008

The code point programming question was to write a Java program that took a range of Unicode code points and displayed the characters they represent to the user. The range in question was U+0021 to U+007F which corresponds to the following characters:

!"#$%&'()*+,-./0123456789:;<=>?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^ _`abcdefghijklmnopqrstuvwxyz{|}~

The last character is a control character for DELETE and doesn't actually show anything at all.

It's worth pointing out that the code point range I chose was for a specific reason. The characters correlate to the English language and its punctuation. These values are consistent across most character encodings that exist. The reason that's important depends on how you choose to display the characters to the screen. In simple console programs like this one, it's normal to use the System.out.print method to print output to the screen. However, this method will use the character encoding used by the computer's operating system.

For example, I wrote this program on a MacBook using the OS X 10.5 Leopard operating system, which uses MacRoman as its character encoding. Any Unicode code point above the value of U+007F is unlikely to produce anything but a "?" - the universal symbol for I can't display the character you want.

Here's my version of the program:

import java.util.Scanner;
public class CodePoints {

  public static void main(String[] args) {

    Scanner input = new Scanner(System.in);
    System.out.println("Enter in first Unicode code point (e.g., U+0021): ");
    String startCodePoint = input.nextLine();
    System.out.println("Enter in last Unicode code point (e.g., U+007F}: ");
    String lastCodePoint = input.nextLine();

    //The substring is the hexadecimal number (e.g., U+0021 without the "U+").
    //It can then be converted to an int by using a radix of 16
    int hexStartCodePoint = Integer.parseInt(startCodePoint.substring(2),16);
    int hexLastCodePoint = Integer.parseInt(lastCodePoint.substring(2),16);

    for (int hexCodePoint = hexStartCodePoint; hexCodePoint < hexLastCodePoint+1;hexCodePoint++)
    {
      //The toChars method will convert a Unicode Code Point to
      //its UTF-16 representation
      System.out.print(Character.toChars(hexCodePoint));
    }
  }
}
Comments
November 27, 2008 at 1:58 pm
(1) andres says:

No hexa conversion? why? Where do you use hexLastCodePoint? Regards.

November 27, 2008 at 3:49 pm
(2) Paul Leahy says:

Apologies – for some reason Wordpress was truncating the for loop declaration. That’s where I’m using hexLastCodePoint. It’s there now.

Leave a Comment

Line and paragraph breaks are automatic. Some HTML allowed: <a href="" title="">, <b>, <i>, <strike>

Explore Java
About.com Special Features

Stay connected and entertained with reviews on tips on the latest HDTVs, cellphones and more. More >

Easy ways to connect two computers for networking purposes. More >

  1. Home
  2. Computing & Technology
  3. Java

©2009 About.com, a part of The New York Times Company.

All rights reserved.