Thursday, June 22, 2017

Dealing with Data Types

With all of the Java installation and configuration out of the way, it's time to look at some of the fundamental building blocks of the Java language. A main feature of all programming languages is that it should be able to manage data. This could range from something simple such as text entered by the user to complex, structured data such as thousands of parts in a warehouse and the corresponding supplier data. At the end of day though, all data can be broken down into simple values, such as numbers and text entries. For example, a part could be represented as a part/serial number and a name. The supplier could be represented by an ID, name, and address, which further breaks down to street address, city, and zip code. The most basic types of data supported by Java are called primitive types, which will be covered in this section. Later on, we'll see how to combine simple data entries into objects.

Java has 8 primitive types, which are based around numerical and character data. Numerical data can further be broken down into whole number (Integer) types and fractional (Floating point) numbers. The types are byte, short, int, long, float, double, boolean, and char. That may seem like a lot to take in, but a few of these types don't show up very much in actual programs. Typically, int, boolean, and double see the most usage, and the other types have more situational uses. Let's take a more detailed look at each type.

Byte

Bytes are the smallest integer type in Java. It is used to represent whole numbers from -128 to 127. Remember that all data is represented as bits (0s or 1s) on the computer. By using 8 bits, a byte can hold 2^8 = 256 possible values. These numbers are split evenly between positive and negative values, giving the range of -128 to 127. In practice most values can't be restricted to this range. For example, the number of days in a year could not be represented using a byte. Instead, bytes are most commonly used to manage binary data, such as data coming from the network or a a file. We'll see examples of this much later on.

Short

Shorts (and I don't mean the clothing) are used to represent whole numbers from -32,768 to 32,767. Shorts are composed of 16 bits, which gives 2^16 = 65,536 possible values. As before, half of the numbers are negative and the other half are positive. In practice, lots of values still don't fit into this range, such as the number of items in my local grocery store. For this reason, the short datatype doesn't see much use.

Int

Ints are 32 bit whole numbers, which gives a total of 2^32 = 4,294,967,296 possible values. The range of ints are from –2,147,483,648 to 2,147,483,647. This is large enough in most programs. Ints are commonly used in control statements and loops, which will be explained in future posts.

Long

Longs are 64 bit numbers ranging from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (Approximately 9 quintillion). Since ints are usually large enough, you only need to use longs when dealing with large numbers. If longs still aren't big enough (for example, measuring astronomical distances), Java provides a BigInteger class, which is not a primitive type and incurs more overhead. However, it's unlikely that BigInteger is needed in most applications.

Float

Floats are used to represent decimal numbers using 32 bits. They have a range from about 1.4e–45 to 3.4e+38, and can be both positive or negative. Although that seems like a very large range, keep in mind that there are only 2^32 = 4,294,967,296 possible values that can be represented using 32 bits. This means that there is a loss of precision when storing numbers in floating point format. For example, due to how binary numbers work, it is impossible to represent the number .1 using a finite number of bits. The means that .1 might be stored as .0999994 when using floating point numbers. In general, floats have about 7 digits of precision, so decimal numbers with more than 7 digits may have roundoff errors. This can be a big deal in certain applications, such as banking systems, which require a high degree of accuracy when dealing with numbers.

Double

Doubles are basically the same as floats, except they use 64 bits instead of 32. This increases the amount of precision from 7 digits to 15. This makes double a better choice over floats most of the time. Surprisingly, some computers can perform mathematical functions faster on doubles than floats. This is because some processors deal directly with doubles and simulate floats by converting them to doubles. This isn't always true though, and it almost never makes a noticeable difference in speed. I recommend sticking with doubles over floats for the increased precision (at least until you really understand how floats and doubles work). Similar to BigInteger, there is a BigDouble class to represent decimal numbers that require more than 64 bits. Again, BigInteger and BigDouble are not primitive (built in) types, they are supported as an "extension" to the core Java language. BigDecimal usually isn't needed, but may come in handy when doing financial transactions.

Chars

Chars are used to store a single character using 2 bytes of memory. Characters are usually simple letters or numbers (A-Z, a-z, 0-9, etc), but could also be foreign symbols, such as Chinese or Japanese symbols. Chars are put inside of single quotes, such as 'a' or '源'. Since a byte is 8 bits, it's possible to store 2^16 = 65,536 different letters using a char, which is enough to cover all of the English letters and most of the foreign language symbols. Older languages, such as C, used 8 bits to store a character, which caused issues when working with foreign languages. There's a few edge cases such as certain language symbols or emojis that take more than 2 bytes to store, but they can still be used in Java.

Boolean

A boolean is a value that is either true or false. It may seem like booleans only take one bit to store since it can only have 2 possible values. However, the smallest unit of memory that can be addressed (looked up) in most computers is a byte (8 bits), so that is typically the size of a boolean. There are ways to represent boolean values using only 1 bit of memory, but with memory being so plentiful on modern computers, it's unlikely you'll need to do this. Booleans have their own set of unique operators. Integers/doubles can be added, multiplied, etc. Booleans can be combined with other booleans used and, or, not, and a few other operators to create more complicated boolean logic.

Declaring Variables


Now that we have a basic understanding of the basic data types in Java, it's time to look at declaring variables. A variable is simply a named reference that contains a data value. Variables are stored in memory and can be referenced later to retrieve their value. It's similar to using the memory button on a calculator to store the value of a previous computation. Let's look at some examples.

To declare a variable, use the format data-type variable-name;, where data-type is the type of variable (int, long, boolean, etc) and variable-name is the name you give to reference the variable later on. Variable names must should start with a letter and contain only letters, digits, or the underscore character _. Variable names can also start with a $ or _, but that is considered bad practice. Here are some examples of variable declarations:

int numPages; //valid
double age; //valid
boolean isSunday; //valid
float $amountEarned; //valid, but using $ is discouraged
char 1LetterWord; //error, variable name cannot start with a number
integer numPages; //error, integer is not a valid data type, use int instead
short if; //error, if is a reserved word
int numPages //error, semicolon is required at the end

As you can see in the examples, it is common to start the variable name with a lower case letter and put the first letter of subsequent words in upper case. This is called camel casing. Since variable names cannot contain a space, this convention makes it easy to identify where a word starts and ends in a variable name. Variable names should not start with an uppercase letter, as that convention is used for class names, which will be discussed in a later post. Also, variable names cannot be a keyword (also called a reserved word). These are words reserved by the Java language and include fundamental programming constructs things such as datatypes, conditionals, and loop control words (int, if, for, etc). Also, notice the ; is needed at the end of each variable declaration. Semicolons are used to terminate a statement, which is a complete unit of execution. You can think of a statement as a sentence and the ; is the period used to indicate where one sentence ends and another starts. Not every line of Java code is a statement and ends with a ;. Knowing when the ; is needed mostly comes with experience writing programs.

Defining Variables


Declaring variables isn't too useful by itself, because the variables haven't been given a meaningful value. In some cases (for example, class fields) variables are automatically given a default value (0 for numbers and char, false for boolean). In other cases (method level variables) there is no default value given to a declared variable and it is an error to use it without providing a value first. Definitions solve this problem by giving the variable name and value at the same time. The format for defining a variable is data-type variable-name = value; Let's look at some examples:

int numPages = 300; //valid
double dollarToPeso = 18.15; //valid
char letter = 'x'; //valid
boolean isSunday = false; //valid
boolean isSunday = 0; //error, 0 is a number, not a boolean value
int numPages = 310.2; //error, 310.2 is not an integer;
char letter = "x"; //error, characters must be in single quotes, not double quotes
short daysInOneHundredYears = 36500; //error, 36500 is larger than 32767, the largest possible short
float dollarToPeso = 18.15; //error, floats values should have the letter f at the end
float dollarToPeso = 18.15f; //valid
int numPages = 2,000; //error, numbers must not contain a comma
int numPages = 2000 //error, ; is required at the end

Note that it is an error to assign a value that is incompatible with the type of variable. In the example above, numPages was an integer, but was given a value of 310.2, which is a double. Also, it is an error to assign a value that is outside the allowable range for that variable type. The daysInOneHundredYears variable was given a value of 36500, which is larger than the biggest short (32,767) and results in an error. Finally, notice that Java does not allow numbers to have commas. Instead, it is legal to use underscores _ to separate groups of digits in newer versions of Java (7+):

int numPagesInLibrary = 20_000_000; //legal in Java 7+

For now, this is a good starting point to using data types and variables in Java. Let's take a look at putting these concepts together in a sample application. I named the class VariableDeclarations. If you already created the HelloWorld class in Eclipse from the previous tutorial, you can right click on the class and select Refactor -> Rename, and then give the new name of the class. You can also create a new class inside the project if you'd like to keep the existing HelloWorld class as-is. Remember that for now, we're going to put everything inside the main method until we learn how to use classes and methods:

public class VariableDeclarations {
    public static void main(String[] args) {
        int numPages = 300; //valid
        double dollarToPeso = 18.15; //valid
        char letter = 'x'; //valid
        boolean isSunday = false; //valid
        float dollarToYen = 112.42f; //valid
        long $companyEarnings = 4_000_000_000L; //valid, but using $ in the variable name is discouraged, and the L at the end is required since 4_000_000_000 is larger than the biggest int 
        // float dollarToPeso = 18.15; //error, floats values should have the letter f at the end
        // Int numHolidays = 10; //error, java is case sensitive and data types must be lower case (int instead of Int)
        // boolean isSunday = true; //error, isSunday was already declared above, it cannot be declared again;
        // boolean isFriday = 0; //error, 0 is a number, not a boolean value
        // int numBooks = 310.2; //error, 310.2 is not an integer;
        // char anotherLetter = "x"; //error, characters must be in single quotes, not double quotes
        // short daysInOneHundredYears = 36500; //error, 36500 is larger than 32767, the largest possible short
        // int bookPublishedYear = 2,000; //error, numbers must not contain a comma
        // int numAuthors = 3 //error, ; is required at the end
        System.out.println(numPages);
        System.out.println(dollarToPeso);
        System.out.println(letter);
        System.out.println(isSunday);
        System.out.println(dollarToYen);
        System.out.println($companyEarnings);
    }
}

Try removing the // at the beginning of the lines that are marked with error, and see what error messages are displayed from Eclipse. Notice that System.out.println can take different data types and output them to the console as Strings (text). We'll learn how that works in a later post. The output of the program is what we'd expect:

300
18.15
x
false
112.42
4000000000


There's more to understand about variables, such as literals, operators, and expressions, which will be covered in the next section.

No comments:

Post a Comment