Type Systems in Programming Languages

The Type System in Programming Languages manages types. It determines how we declare, use, and manage types in the programming language. This tutorial gives you a high-level overview of the Type systems in programming languages.

What is a type?

The type, or data type, is an attribute of data that tells us what kind of value the data has. It defines a set of values that it can accept and the operations that we can do on those values.

The int type in most programming languages represents integers, which can store a set of values from 2147483648 to 2147483647 (in Java). The +, -, and so on are valid operations that we can do on it.

Most programming languages define a number of built-in types like integer, decimal, string, boolean, etc. They also allow users to add new types to the system. You can also create complex types using the collection of built-in types and other complex types.

The type determines

  1. operations that you can perform on that type.
  2. The storage space that a variable of this type requires.
  3. range of values that you can store in that type.
  4. How it is inherited. What is its base type?
  5. Interfaces that it implements

What is a Type System

As per WikipediaThe Type System is a logical system consisting of a collection of rules that assign properties called types to various structures of a computer program, such as variables, expressions, functions, modules, etc

The Type System Manages the Types. It has collection rules for each type. It ensures that we use the types correctly and abide by the rules set by it.

Every programming language has a type system. It is built into interpreters or compilers or part of the run time. How these type systems work varies from one programming language to another.

Need for the Type System

The type system hides some of the intricacies of handling types. We need not worry about how computers store data, how the data, which is in bits, is interpreted, whether we use it according to the rules set by it, etc. It also makes our lives easier by detecting type errors either at compile time or at run time.

Abstraction of implementation details

Type System makes programmers think about data at a high level of abstraction rather than bits & bytes.

The Computers stores everything in binary where a digit can be either 0 or 1. The number 65 is represented as 1000001. The characters are converted using the ASCII code and stored as numbers. The code for the character A is also 65. Hence the number 65 and the character A have the same representation in memory i.e. 1000001.

When we read the value 1000001 from the computer memory in our code, how do we determine whether it represents character A or number 65? The only way to do it is to store the type of value along with the value. The type system handles this for us. Without the type system, we would need to keep track of the type of the data ourselves.

Another example is the use of + operator. We use it to add numbers & also join strings. They are in fact two different operations. Here the compiler (or interpreter) uses the Type System to detect the type and uses the correct operation to handle them. If they are numbers then it adds them else joins them.

Detecting Errors

A type error is an error that arises when a code performs an operation that is invalid for the type. A sound type system can detect a variety of logical errors, known as type errors or type mismatches, either at compilation or runtime. For example, invalid operations like adding two boolean numbers must result in a compile-time or run-time error.

Type errors detected at the time of compilation have great benefits. because we can fix these errors quickly.

Documentation

The types attached to the variables provide valuable information about the code. It acts as documentation of the code. The comments, on the other hand, need an update whenever the code changes. It will lose its usefulness over time. But the types will stay. It is easier to reason about a code’s purpose by looking at the type of data and how it is used to understand it.

Type Checking

Type checking is the process of verifying and enforcing the rules of types. For example, checking the types of both operands of a division operation to ensure that both the operands are numbers and the divisor is not zero.

Type checking can occur at compile time or at run time. Hence, we classify type systems as statically typed (Type checking happens at compile time) or dynamically typed (when type checking happens at run time).

Static Type Checking

Static type checking occurs at compile time. It gives us early feedback on the type errors, allowing us to rectify them quickly. In static type checking, we need to specify the type of every variable that we use. We can do this explicitly (explicit typing) or let the type system infer it from usage (implicit typing). Since the type information is available at compile time, the compiler can generate optimized code. It also eliminates the need to run type checking at run-time, making the executable run faster.

Some of the examples of static type checking type systems are C, C++, Java, C#, Scala, Haskell, Rust, Kotlin, Go, and TypeScript.

Dynamic Type Checking

Dynamic type checking happens at run time. Types are checked just before they are used. This makes the code flexible because you can assign different types of values to variables.

But dynamic type checking makes the code slower. We detect type errors only when we run the code. This is troublesome as we cannot test all possible scenarios. You may have to write lots of unit tests to eliminate most of the type errors.

Examples of dynamically typed languages are JavaScript, Ruby, Python, Perl, PHP, Lisp, Clojure R, Bash

Type Declaration

Static type-checked systems need the type to be specified upfront. There are two ways in which we can assign a type to a variable. One is implicit and the other one is explicit.

Explicitly Typed

In an explicitly Typed system, we need to specify the type while declaring the variable.

For example, in languages like C# and Java, we need to specify the data type upfront. This example declares the num variable as of type int

Implicitly Typed

In an implicitly typed system, we do not need to specify the type while declaring the variable. The Type System will infer the type from the usage of the variable. But they do need explicit declarations only when automatic inference fails.

TypeScript’s type system is implicitly typed. This example declares the num without assigning any type to it. TypeScript will infer the type from the usage and assign the type Number to it. Note that we can also explicitly specify the type in TypeScript.

Type Safety

Type safety is the ability of the type system to prevent type errors and unsafe behaviors. For example, when our data is of type X and X doesn’t support operation y, then the language will not allow us to execute (y(X)) it.

Accessing memory that you shouldn’t access, or performing “impossible” operations, like division by 0, etc., are some examples of unsafe behaviors.

C++ is very unsafe language. We can dereference a null pointer, Access an array out of bounds, use uninitialized variables, etc.

The Type System is safe if it throws runtime exceptions when it detects unsafe behaviors. For example, C# has a reasonably safe type system. It throws Null Pointer Exception, ArrayIndexOutofBounds Exception, and does not allow use of uninitialized variables, etc.

A language to become type-safe must manage its value “from the cradle to the grave”. it should

  1. Create and initialize objects in a type-safe way.
  2. Ensure that the program does not corrupt objects during their lifetime.
  3. Ensure that we use the type in accordance with the rules of its type.
  4. Finally, it should destroy them and reclaim the memory in a type-safe way.

Programming languages can be safe to a certain degree. For example, c and c++ are very unsafe, while ML, Python, Java, and C# are type-safe.

Strongly Typed & Weakly Typed

Type systems are also classified as either strongly typed or weakly typed (also known as loosely typed). But there is no clear definition of what constitutes a strongly typed or weakly typed type system.

There generally accepted behavior for a strongly typed type system is

  1. Variables do have a type, and you cannot change that. We need to specify the type of a variable upfront (either explicitly or implicitly).
  2. The language will not implicitly change the type of any variable. We must make all type conversions explicitly.
  3. You cannot use a variable as an operand in any operation if the type is not compatible. Even if we have a number stored as a string, we cannot use it as an operand in the addition operation. To use it first, we need to explicitly convert the string to a number.

The above makes code more verbose. But it also makes code easier to understand because there is no (or less) hidden behavior.

Strongly typed languages with static type checking include Java, Pascal, Ada, C, and C#. Python, on the other hand, has dynamic type checking and strong typing.

In a weakly typed (or loosely typed) language, the type of a variable can change its type depending on how we use it. The runtime may also do an implicit type conversion if required. You can pass a string as an operand to a division operation, and the run time will convert the string to a number implicitly. This may cause an error (or worse, an invalid result) if it cannot translate the contents to a valid number.

JavaScript is an example of a weakly typed language. In the following JavaScript code, the number is divided by a string. It does not throw any errors, but the operation fails with a not a number (NaN) result.

Type Compatibility

Type compatibility refers to the similarity between the two different types. The type systems are normally classified as structurally typed, nominally typed, or duck-typed depending on how they compare the types for compatibility.

Consider the primitive data types integer and decimal. We can easily convert an integer to a decimal value without losing any precision. Hence, we can say that the integer is compatible with the decimal data type. But the opposite is not true. We cannot convert a decimal to an integer type. Doing so will result in a loss of precision. Hence, the decimal type is not compatible with the integer type.

Nominally typed

In the nominal system, the compatibility of the types is determined by explicit declarations and/or the names of their types. Each type is unique in the nominal system. Even if they have the same data and shape, we cannot assign them across types.

In this example, the type Dog & Cat contains a single property name. Hence, their structures are the same. The objects created from them (cat & dog) are not considered compatible in the nominal system. Because we create them from different types.

Structurally Typed

In structural typing, two types are considered compatible if they have the same shape.

In the above example, cat & dog are compatible because they have the same shape. We can use the instance of the Cat wherever the program expects an instance of Dog.

Note that type A is compatible with type B does not automatically mean that B is also compatible with A. In the above example, if the Dog has an additional property, then it is still compatible with Cat, but the Cat is now no longer compatible with the Dog. It is like all oranges are fruits and not all fruits are oranges.

Duck Typing

Duck typing neither cares about the name nor the structure of the type. It must have the given method or properties required by the operation. We find duck typing only in dynamically typed languages.

This JavaScript example contains two objects person and a bankAccount. They neither share the same type nor have the same structure. But they have one common method someFn. The invokeSomeFn functions accept both the objects without any issue. In fact, you can pass anything to invokeSomeFn as long as that type has someFn method.

Changing Type

type conversion refers to changing an variable of one type to another. There are two types of type conversion. One is implicit conversion (also known as type coercion) and the other one is explicit conversion (also known as type casting).

Implicit Conversion (type coercion)

Implicit conversion happens automatically by the languages. The following code adds a number to string. Here JavaScript does not raise any error or warning. It just converts the number to its string representation and joins the strings.

The above code also works in C# and Java. These languages do allow implicit type conversions, only when it is guaranteed that the there will be no data loss due the conversion.

The same code in python fails with a Type Error. Because python does not allow type coercion or implicit type conversion.

The rules of implicit conversion differs from language to languages. Some allow many implicit conversions between data types, while some restricts them. Each language has its own set of rules on how and what types can be implicitly converted.

Explicit Conversion (type casting)

Explicit conversion is enforced by the programmer through code. In python we can convert number to string using the str function. This is known as type casting.

References

  1. Type Systems in Software Explained
  2. Functional Programming Type Systems
  3. Type System

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top