Skip to content

Data types

If we were to have the same ancestor with horses instead of apes, we would have been able to invent computers much earlier.
-- Yuhao Zhu, Gate of Heaven

Types of data are the foundation of programming languages. They define how one and zeros in the memory are interpreted as human-readable values. Mojo's data types are similar to Python's, but with some differences due to Mojo's static compilation nature.

In this chapter, we will discuss the most common data types in Mojo. They can be categorized into several categories: numeric types (integer, floats), composite types (list, tuple), and others types (boolean). The string type and the Mojo-featured SMID type will be further discussed in separate chapters String and SIMD. You can easily find the corresponding types in Python. The following table summarizes these data types:

Python typeDefault Mojo typeBe careful that
intIntIntegers in Mojo has ranges. Be careful of overflow.
floatFloat64Almost same behaviors. You can safely use it.
boolBoolSame.
listListElements in List in Mojo must be of the same data type.
tupleTupleVery similar, but you cannot iterate over a Tuple in Mojo.
setcollections.SetElements in Set in Mojo must be of the same data type.
dictcollections.DictKeys and values in Dict in Mojo must be of the same data type, respectively.
strStringSimilar behaviors. Note that String in Mojo is rapidly evolving.

Integer

In Mojo, the most common integer type is Int, which is either a 32-bit or 64-bit signed integer depending on your system. It is ensured to cover the range of addresses on your system. It is similar to the numpy.intp type in Python and the isize type in Rust. Note that it is different from the int type in Python, which is an arbitrary-precision integer type.

Mojo also has other integer types with different sizes in bits, such as Int8, Int16, Int32, Int64, Int128, Int256 and their unsigned counterparts UInt8, UInt16, UInt32, UInt64, UInt128, UInt256. The instance of each type will be stored on the stack with exactly the bits specified by the name of the type.

The table below summarizes the integer types in Mojo and corresponding integer types in Python:

Mojo TypePython TypeDescription
Intnumpy.intp32-bit or 64-bit signed integer, depending on the system.
Int8numpy.int88-bit signed integer. Range: -128 to 127.
Int16numpy.int1616-bit signed integer. Range: -32768 to 32767.
Int32numpy.int3232-bit signed integer. Range: -2147483648 to 2147483647.
Int64numpy.int6464-bit signed integer. Range: -2^63 to 2^63-1.
Int128decimojo.Int128128-bit signed integer. Range: -2^127 to 2^127-1.
Int256decimojo.Int256256-bit signed integer. Range: -2^256 to 2^256-1.
UInt8numpy.uint88-bit unsigned integer. Range: 0 to 255.
UInt16numpy.uint1616-bit unsigned integer. Range: 0 to 65535.
UInt32numpy.uint3232-bit unsigned integer. Range: 0 to 4294967295.
UInt64numpy.uint6464-bit unsigned integer. Range: 0 to 2^64-1.
UInt128decimojo.UInt128128-bit unsigned integer. Range: 0 to 2^128-1.
UInt256decimojo.UInt256256-bit unsigned integer. Range: 0 to 2^256-1.
SIMD[DType.index, 1]numpy.intp32-bit or 64-bit signed integer, depending on the system.
decimojo.BigIntintArbitrary-precision. 9^10-based internal representation.

You can create a integer variable in decimal, binary, hexadecimal, or octal format. If we do not explicitly specify the type of the integer literal, the compiler will infer it as Int by default. If we explicitly specify the type of the integer, or we use the constructor of the integer type, the compiler will use that type. The following example shows how to create integer variables in different formats:

mojo
def main():
    var a = 0x1F2D         # Hexadecimal
    var b = 0b1010         # Binary
    var c = -0o17          # Octal
    var d = 1234567890     # Decimal
    var e: UInt32 = 184    # 32-bit unsigned Integer
    var f = Int128(12345)  # 128-bit Integer from constructor
    var g: Int8 = Int8(12) # 8-bit Integer from constructor and with type annotation
    var h = SIMD[DType.index, 1](10)  # Integer with index type
    print(a, b, c, d, e, f, g, h)
# Output: 7981 10 -15 1234567890 184 12345 12 10

If the value assigned to an integer variable exceeds the range of the specified type, you may encounter either an error or an overflow. For example, if you try to assign the value 256 to a variable of type UInt8, you will encounter an overflow.

mojo
def main():
    var overflow: UInt8 = 256
    print(overflow)
# Output: 0

Note that there is no error message printed in this case. This is because Mojo does not perform runtime checks for integer overflows by default. We need to be very careful when using integer types in Mojo compared to Python.

If you really need to work on big integers that are larger than the capacity of Int, you can consider using the BigInt type in the decimojo package, which has the similar functionality as the int type in Python.

Exercise

Now we want to calculate the 123455. Since the result is big (around 20 digits), we use Int128 to avoid overflow. Try to run the following code in Mojo and see what happens. Explain why the result is unexpected and how to fix it.

mojo
def main():
    var a: Int128 = 12345 ** 5
    print(a)
Answer

The result is -8429566654717360231. It is a negative number, which is unexpected. The correct answer should be 286718338524635465625. Usually, when we see a negative value, we know that it is probably due to an overflow.

The reason is that, when we write 12345 ** 5 in the right-hand side of the assignment, we did not explicitly specify the type of the values. As mentioned above, if we do not explicitly specify the type of the integer literal, the compiler will infer it as Int by default. Thus, 12345 and 5 will be both saved as an Int type (64-bit signed integer on a 64-bit system).

Since the result of 12345 ** 5 exceeds the maximum value of Int type (2^63 - 1), an overflow occurs. And value of 12345 ** 5 becomes -8429566654717360231. This wrong value is then assigned to the variable a of type Int128.

To fix this, we need to explicitly specify the type of the integer literals in the right-hand side of the assignment. We can do this by using the Int128 constructor, like this:

mojo
def main():
    var a = Int128(12345) ** Int128(5)
    print(a)

Now the result will be 286718338524635465625, which is the correct answer.

Float

Compared to integer types, floating-point numbers in Mojo share more similarities with Python. The table below summarizes the floating-point types in Mojo and corresponding types in Python:

Mojo TypePython TypeDescription
Float64float64-bit double-precision floating-point number. Default type for floats in Mojo.
Float32numpy.float3232-bit single-precision floating-point number.
Float16numpy.float1616-bit half-precision floating-point number.

To construct a floating-point number in Mojo, you can do that in three ways:

  1. Simply assign a floating-point literal to a variable without type annotations. A floating-point literal is a number with a decimal point or in scientific notation, e.g, 3.14 or 1e-10. Mojo will automatically use Float64 as the default type.
  2. Use type annotations to specify the type of the floating-point number, e.g., var a: Float32 = 3.14.
  3. Use the corresponding constructor, e.g., Float64(3.14), Float32(2.718), or Float16(1.414).

See the following examples.

mojo
def main():
    var a = 3.14                # Float64 by default
    var b: Float32 = 2.718      # Float32 with type annotation
    var c = Float16(1.414)      # Float16 with constructor
    print(a, b, c)
# Output: 3.14 2.718 1.4140625

Floating-point values are inexact

You may find that the output of Float16(1.414) is 1.4140625, which is not exactly 1.414. This is because floating-point numbers are inexact representations of real numbers. This is because floating-point numbers are internally represented in binary format, while it is printed in decimal format. Not all decimal numbers can be represented exactly in binary format, and vice versa. Thus, when you create a floating-point number, e.g., Float16(1.414), it may be stored as the closest representable value in binary format, which is 1.4140625 in this case.

This is a common issue in many programming languages, including Python. You can have two methods to avoid or mitigate this issue:

  1. Use higher precision floating-point types, such as Float64 or Float32, which can represent more decimal places and reduce the error. Note that the values are still inexact, but the error can be negligible.
  2. Use the Decimal type, which is internally presented in base 10. It can represent decimal numbers exactly, but it is slower than floating-point types. You can find more information about Decimal type in the decimojo package.

Boolean

The boolean type is a simple data type that can only have two possible states: true and false (or, yes and no, 1 and 0...). These two states are mutually exclusive and exhaustive, meaning that a boolean value can only be either true or false, and there are no other possible values.

The mojo's boolean type is renamed as Bool. The two states are True and False. It is comparable to the bool type in Python, but with the first letter capitalized.

The Bool type is saved as a single byte in the memory.

Bool and Int

Just like Python, Boolean values can be implicitly converted to integers. True is equivalent to 1 and False is equivalent to 0. Thus, the following code will work in Mojo:

mojo
def main():
    print(True + False)  
# Output: 1

List

In Mojo, a List is a mutable, variable-length sequence that can hold a collection of elements of the same type. It is similar to Rust's Vec type, but it is different from Python's list type that can hold objects of any type. Here are some key differences between Python's list and Mojo's List:

FunctionalityMojo ListPython list
Type of elementsHomogeneous typeHeterogenous types
MutabilityMutableMutable
InializationList[Type]()list() or []
IndexingUse brackets []Use brackets []
SlicingUse brackets [a:b:c]Use brackets [a:b:c]
Extending by itemsUse append()Use append()
ConcatenationUse + operatorUse + operator
PrintingNot supportedUse print()
IteratingUse for loop and de-referenceUse for loop
Memory layoutMetadata -> ElementsPointer -> metadata -> Pointers -> Elements

Construct a list

To construct a List in Mojo, you have to use the list constructor. For example, to create a list of Int numbers, you can use the following code:

mojo
def main():
    my_list_of_integers = List[Int](1, 2, 3, 4, 5)
    var my_list_of_floats = List[Float64](0.125, 12.0, 12.625, -2.0, -12.0)
    var my_list_of_strings: List[String] = List[String]("Mojo", "is", "awesome")

Index or slice a list

You can retrieve the elements of a List in Mojo using indexing, just like in Python. For example, you can access the first element of my_list_of_integers with my_list_of_integers[0].

You can create another List by slicing an existing List, just like in Python. For example, you can create a new list that contains the first three elements of my_list_of_integers with my_list_of_integers[0:3].

mojo
def main():
    my_list_of_integers = List[Int](1, 2, 3, 4, 5)
    first_element = my_list_of_integers[0]  # Accessing the first element
    sliced_list = my_list_of_integers[0:3]  # Slicing the first three elements

Extend or concat a list

You can append elements to the end of a List in Mojo using the append() method, just like in Python. For example,

mojo
def main():
    my_list_of_integers = List[Int](1, 2, 3, 4, 5)
    my_list_of_integers.append(6)  # Appending a new element
# my_list_of_integers = [1, 2, 3, 4, 5, 6]

You can use the + operator to concatenate two List objects, just like in Python. For example:

mojo
def main():
    first_list = List[Int](1, 2, 3)
    second_list = List[Int](4, 5, 6)
    concatenated_list = first_list + second_list  # Concatenating two lists
# concatenated_list = [1, 2, 3, 4, 5, 6]

You cannot print the List object directly in Mojo (at least at the moment). This is because the List type does not implement the Writable trait, which is required for printing. To print a List, you have to write your own auxiliary function.

mojo
def print_list_of_floats(array: List[Float64]):
    print("[", end="")
    for i in range(len(array)):
        if i < len(array) - 1:
            print(array[i], end=", ")
        else:
            print(array[i], end="]\n")

def print_list_of_strings(array: List[String]):
    print("[", end="")
    for i in range(len(array)):
        if i < len(array) - 1:
            print(array[i], end=", ")
        else:
            print(array[i], end="]\n")

def main():
    var my_list_of_floats = List[Float64](0.125, 12.0, 12.625, -2.0, -12.0)
    var my_list_of_strings = List[String]("Mojo", "is", "awesome")
    print_list_of_floats(my_list_of_floats)
    print_list_of_strings(my_list_of_strings)

print_lists() function

We have already seen this auxiliary function in Chapter Convert Python code into Mojo. We will use this kinds of functions to print lists in the following chapters as well.

Iterate over a list

We can iterate over a List in Mojo using the for ... in keywords. This is similar to how we iterate over a list in Python. But one thing is different:

In Mojo, each item you get from the iteration is a pointer to the address of the element. You have to de-reference it to get the actual value of the element. The de-referencing is done by using the [] operator. See the following example:

mojo
def main():
    my_list = List[Int](1, 2, 3, 4, 5)
    for i in my_list:
        print(i[], end=" ")  # De-referencing the element to get its value
# Output: 1 2 3 4 5

If you forget the [] operator, you will get an error message because you are trying to print the pointer to the element instead of the element itself.

console
error: invalid call to 'print': could not deduce parameter 'Ts' of callee 'print'
        print(i,  end=" ")
        ~~~~~^~~~~~~~~~~~~

address vs value

You may find this a bit cumbersome, but it is actually a good design. It makes Mojo's List more memory-efficient.

Imagine that you want to read a list of books in a library. You can either:

  • Ask the administrator to copy these books and give your the copies.
  • Ask the administrator to give you the locations of these books, and you go to the corresponding shelves to read them.

In the first case, you will have to pay for the cost of copying the books and you have to wait for the copies to be made. In the second case, you can read the books directly without any extra cost.

This is similar to how Mojo's List works: The iterator only returns the address of the element. You go to the address and read the value directly. It does not create of a copy of the element, so no extra memory costs.

List in memory

A Mojo List is actually a structure that contains three fields:

  • A pointer type data that points to a continuous block of memory on the heap that stores the elements of the list contiguously.
  • A integer type _len which stores the number of elements in the list.
  • A integer type capacity which represents the maximum number of elements that can be stored in the list without reallocating memory. When capacity is larger than _len, it means that the memory space is allocated but is fully used. This enable you to append a few new elements to the list without reallocating memory. If you append more elements than the current capacity, the list will request another block of memory on the heap with a larger capacity, copy the existing elements to the new block, and then append the new elements.

Let's take a closer look at how a Mojo List is stored in the memory with a simple example: The code below creates a List of UInt8 numbers representing the ASCII code of 5 letters. We can use the chr() function to convert them into characters and print them out to see what they mean.

mojo
def main():
    var me = List[UInt8](89, 117, 104, 97, 111)
    print(me.capacity)
    for i in me:
        print(chr(Int(i[])), end="")
# Output: Yuhao

When you create a List with List[UInt8](89, 117, 104, 97, 111), Mojo will first allocate a continuous block of memory on stack to store the three fields (data: Pointer, _len: Int and capacity: Int, each of which is 8 bytes long on a 64-bit system. Because we passed 5 elements to the List constructor, the _len field will be set to 5, and the capacity field will also be set to 5 (default setting, capacity = _len).

Then Mojo will allocate a continuous block of memory on heap to store the actual values of the elements of the list, which is 1 bytes (8 bits) for each UInt8 element, equaling to 5 bytes in total for 5 elements. The data field will then store the address of the first byte in this block of memory.

The following figure illustrates how the List is stored in the memory. You can see that a continuous block of memory on the heap (from the address 17ca81f8 to 17ca81a2) stores the actual values of the elements of the list. Each element is a UInt8 value, and thus is of 1 byte long. The data field on the stack store the address of the first byte of the block of memory on the heap, which is 17ca81f8.

console
# Mojo Miji - Data types - List in memory

        local variable `me = List[UInt8](89, 117, 104, 97, 111)`
            ↓  (meta data on stack)
        ┌────────────────┬────────────┬────────────┐
Field   │ data           │ _len       │ capacity   │
        ├────────────────┼────────────┼────────────┤
Type    │ Pointer[UInt8] │  Int       │     Int    │
        ├────────────────┼────────────┼────────────┤
Value   │   17ca81f8     │     5      │     5      │
        ├────────────────┼────────────┼────────────┤
Address │   26c6a89a     │  26c6a8a2  │  26c6a8aa  │
        └────────────────┴────────────┴────────────┘

            ↓ (points to a continuous memory block on heap that stores the list elements)
        ┌────────┬────────┬────────┬────────┬────────┐
Element │  89    │  117   │  104   │  97    │  111   │
        ├────────┼────────┼────────┼────────┼────────┤
Type    │ UInt8  │ UInt8  │ UInt8  │ UInt8  │ UInt8  │
        ├────────┼────────┼────────┼────────┼────────┤
Value   │01011001│01110101│01101000│01100001│01101111│
        ├────────┼────────┼────────┼────────┼────────┤
Address │17ca81f8│17ca81f9│17ca81a0│17ca81a1│17ca81a2│
        └────────┴────────┴────────┴────────┴────────┘

Now we try to see what happens when we use list indexing to get a specific element from the list, for example, me[0]. Mojo will first check the _len field to see if the index is valid (i.e., 0 <= index < me._len). If it is valid, Mojo will then calculate the address of the element by adding the index to the address stored in the data field. In this case, it will return the address of the first byte of the block of memory on the heap, which is 17ca81f8. Then Mojo will de-reference this address to get the value of the element, which is 89 in this case.

If we try me[2], Mojo will calculate address by adding 2 to the address stored in the data field, which is 17ca81f8 + 2 = 17ca81fa. Then Mojo will de-reference this address to get the value of the element, which is 104 in this case.

Index or offset?

You may find that the index starting from 0 in Python or Mojo is a little bit strange. But it will be intuitive if you look at the example above: The index of an element in a list is actually the offset from the address of the first element. When you think of the index as an offset, it will make more sense. Thus, in this Miji, I will sometimes use the term "offset" to refer to the index within the brackets [].

Memory layout of a list in Python and Mojo

In Mojo, the values of the elements of a list is stored consecutively on the heap. In Python, the pointers to the elements of a list is stored consecutively on the heap, while the actual values of the elements are stored in separate memory locations. This means that a Mojo's list is more memory-efficient than a Python's list, as it does not require additional dereferencing to access the values of the elements.

If you are interested in the difference between the the memory layout of a list in Python and Mojo, you can refer to Chapter Memory Layout of Mojo objects for more details, where I use abstract diagrams to compare the memory layouts of a list in Python and Mojo.

Mojo Miji - A Guide to Mojo Programming Language from A Pythonista's Perspective · 魔咒秘籍 - Pythonista 視角下的 Mojo 編程語言指南