Open In Colab

How to create an array wih certain data types, manipulating arrays, selecting elements from arrays, and loading dataset into an array.

import numpy as np
import math

Array creation

a = np.array([1,2,3])
print(a)
print(a.ndim)  #Shows the array dimension
[1 2 3]
1

We can make multi-dimensional array by making an array of arrays:

b = np.array([[1,2,3], [4,5,6]])
print(b)
print(b.ndim)
[[1 2 3]
 [4 5 6]]
2

We can try the length of each dimension (the matrix size?) using the shape attribute, which return a tuple:

b.shape
(2, 3)

We can check the types of items in the array using dtype:

b.dtype
dtype('int64')

Array can also take floats or strings:

c = np.array([3.2, 4.5,7.7,5])      # 5 will be converted into a float
c.dtype.name
print(c)
[3.2 4.5 7.7 5. ]
w = np.array(['one', 'two', 'three'])
print(w)
w.dtype.name
['one' 'two' 'three']
'str160'

If we want to create the shape of an array but don't know which number to put in, we can ask Numpy to fill in with 0s or 1s, and even with random numbers!

d = np.zeros((2,3))
print(d)
d.dtype
[[0. 0. 0.]
 [0. 0. 0.]]
dtype('float64')
e = np.ones((2,2))
print(e)
[[1. 1.]
 [1. 1.]]
np.random.rand(2,3)
array([[0.35359388, 0.29081185, 0.43913489],
       [0.99846144, 0.10830342, 0.88768546]])

We can create a sequence of integers in an array using the arange function: arange(lower bound, ending bound, difference between consecutive numbers)

f = np.arange(10,20,2)
print(f)
[10 12 14 16 18]

We can also generate a sequence of floats using the linspace function: linspace(lower bound, ending bound, number of floats generated)

np.linspace(0, 1, 10)
array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

Array operations

We can do many mathematical operations on arrays, such as addition, subtraction, square, exponents) as well as boolean arrays. Linear algebra is also possible: matrix multiplication, product, transpose, inverse etc.

Arithmetic operators on arrays apply lengthwise.

a = np.array([10,20,30,40])
b = np.array([1,2,3,4])
c = a-b
d = a*b
print(c)
print(d)
[ 9 18 27 36]
[ 10  40  90 160]

Example: if I need to make 4 kinds of muffins and the recipes calls for tsp instead of grams. I want to know how many grams of sugar each muffin will have. Let's create an array of the different teaspoons of sugar for my 4 recipes:

teaspoons = np.array([15,10,6,21])

Google says that 1 teaspoon equals 4 grams of sugar. Each recipe makes 6 muffins, so the conversion will be:

grams = (teaspoons * 4)/6
print(grams)
[10.          6.66666667  4.         14.        ]

Now I know how many grams of sugar my muffins will have. I have a special diet, and I can't eat food containing more than 5gr of sugar. I can use the boolean array to return which muffins have less than 5g of sugar:

grams < 5
array([False, False,  True, False])

So I can eat one type of muffin, that's a relief! We can also check which muffins have an even number of grams of sugar (very important!):

grams%2 == 0
array([ True, False,  True,  True])

Two ways to do array multiplication: elementwise (use *) or a matrix product (use @)

A = np.array([[1,1],[0,1]])
B = np.array([[2,0],[3,4]])
print(A*B)   ###elementwise multiplication
[[2 0]
 [0 4]]
print(A@B)
[[5 4]
 [3 4]]

If we work with array of different types, the resulting array's type will be the more general of the 2 types. This is "upcasting". Ex:

array1 = np.array([[1,2,3,4],[3,4,5,6]])
array2 = np.array([[1.2,3.2,4.3,5.2],[3.5,6.3,2.4,5.4]])
print (array1.dtype)
print(array2.dtype)
int64
float64
array3 = array1 + array2
print(array3)
print(array3.dtype)
[[ 2.2  5.2  7.3  9.2]
 [ 6.5 10.3  7.4 11.4]]
float64

Numpy has function to give the min, max, sum and mean:

print(array3.min())
print(array3.max())
print(array3.sum())
print(array3.mean())
2.2
11.4
59.5
7.4375

For 2 dimensional arrays, we can do these operations for each row or column. Let's create an array with 12 elements, 3X4

b = np.array(np.arange(1,13,1).reshape(3,4))
print(b)
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Indexing, Slicing and iterating

Indexing

For a one-dimensional array, we simply use the offset index:

a = np.array([5,6,7,8,9])
a[2]
7

Fro multidimensional arrays, we enter an index with 2 numbers: the first representing the rows, the second the column:

b = np.array([[1,2],[3,4],[5,6]])
b
array([[1, 2],
       [3, 4],
       [5, 6]])
b[2,1]
6

To get multiple numbers from the array, we make an array of the indexes:

np.array([b[0,0],b[1,0], b[2,1]])
array([1, 3, 6])

We can also do this by "zipping" the two lists of indexes together:

print(b[[0,1,2],[0,0,1]])
[1 3 6]

Boolean indexing

Boolean indexing lets us select elements that respect a condition. Let's look at the elements that are greater than 3 in the matrix b:

print(b>3)
[[False False]
 [False  True]
 [ True  True]]

We can then use this as a mask to obtain a one-dimensional array with only the values that respect the condition:

print(b[b>3])
[4 5 6]

This functionality is essential in the Pandas toolkit!

Slicing

Slicing is a way to create a sub-array based on an original array. For one-dimensional arrays, it works like a list.We use the : sign. For example, let's make a sub-array with the first three values of an original array:

a = np.array([1,2,3,4,5,6,7,8])
print(a[:3]) #Returns values from index 0 to index 2, so up to the last index
[1 2 3]

To return a subset of the array, we use two indexes:

print(a[2:5]) #Return values from index 2 to index 4
[3 4 5]

It works similarly for multi-dimensional arrays.

a = np.array([[1,2,3,4],[5,6,7,8],[3,5,7,2]])
print(a)
[[1 2 3 4]
 [5 6 7 8]
 [3 5 7 2]]

The first slicing will select the rows:

print(a[:2])
[[1 2 3 4]
 [5 6 7 8]]
print(a[2-3])
[3 5 7 2]

To select rows and columns, we need to double slice it. The first argument selects the rows, the second selects the columns:

print(a[:2, 1:3])
[[2 3]
 [6 7]]

A slice of the array is a view of the data. If we modify the sub-array, it will also modify the original array!!! For example, if I change the element at [0,0]:

sub_array = a[:2, 1:3]
print("sub_array index [0,0] before change:", sub_array[0,0])
sub_array[0,0]=30
print("sub_array index [0,0] after change:", sub_array[0,0])
print("array index [0,1] after change:", a[0,1])
sub_array index [0,0] before change: 30
sub_array index [0,0] after change: 30
array index [0,1] after change: 30

To load a dataset in Numpy, we can use the function np.genfromtxt() where we specify the file name, delimiter(optional) number of rows to skip in case of headers. Example from class: wines = np.genfromtxt("datasets/winequality-red.csv", delimiter = ";", skip_header = 1)