NumPy Beginners Guide. From None to One!

Introduction to NumPy:

NumPy is a Python library that stands for "Numerical Python." It is an open-source package that provides a powerful array and matrix computing framework for Python. NumPy is designed to work efficiently with large arrays and matrices of numerical data, making it a key tool for scientific computing, data analysis, and machine learning.

NumPy is built around the ndarray, or N-dimensional array, which is a collection of elements of the same data type. NumPy provides a wide range of operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, and statistical operations. NumPy also provides efficient algorithms for linear algebra, Fourier transforms, and random number generation.

The key features of NumPy include:

  1. Efficient array computing: NumPy provides fast, memory-efficient arrays that allow you to perform complex mathematical operations on large datasets.

  2. Broadcasting: NumPy can broadcast arrays of different shapes and sizes to perform operations on them, making it easier to write code that works with arrays of different dimensions.

  3. Powerful indexing: NumPy provides powerful indexing and slicing capabilities that make it easy to extract specific elements or sections of an array.

  4. Linear algebra operations: NumPy provides a comprehensive set of linear algebra functions, including matrix multiplication, eigenvalue calculations, and singular value decomposition.

  5. Fourier transforms: NumPy provides efficient implementations of Fourier transforms, which are used for signal processing and image analysis.

  6. Random number generation: NumPy provides several functions for generating random numbers and random arrays.

NumPy is a powerful and essential library for anyone working with numerical data in Python. Its efficient array computing, powerful indexing and slicing capabilities, and an extensive collection of mathematical and statistical functions make it an essential tool for scientific computing, data analysis, and machine learning.

Installing NumPy:

here's a step-by-step guide on how to install NumPy in Python:

  1. Open a command prompt or terminal window on your computer.

  2. Make sure you have pip installed. If you don't have pip installed, you can install it by following the instructions on the official pip website.

  3. Once you have pip installed, run the following command to install NumPy:

     pip install numpy
    
  4. After running the command, pip will download and install the latest version of NumPy from the Python Package Index (PyPI). Depending on your internet speed and computer performance, this may take a few minutes.

  5. Once the installation is complete, you can verify that NumPy is installed correctly by opening a Python shell and running the following command:

     import numpy as np
    

    If you don't see any error messages, NumPy is installed correctly.

That's it! You have successfully installed NumPy in Python.

Array Indexing and Slicing:

In NumPy, array indexing and slicing are powerful tools for accessing and manipulating specific elements or sections of a NumPy array. Here's how to use them:

Accessing individual elements

To access an individual element of a NumPy array, you can use square brackets [] with the index of the element you want to access. For example, to access the element in the second row and third column of a 2D array, you would use the following code:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(a[1, 2])  # Output: 6

Slicing arrays

To extract a portion of an array, you can use slicing. Slicing is done using the colon : character between the start and end indices of the slice.

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a[1:4])  # Output: [2 3 4]

Slicing can also be used to extract rows and columns from 2D arrays. For example, to extract the second row of a 2D array, you would use the following code:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(a[1, :])  # Output: [4 5 6]

To extract a column, you would use the following code:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(a[:, 2])  # Output: [3 6 9]

Modifying arrays

You can also use indexing and slicing to modify specific elements or sections of a NumPy array. For example, to change the value of an individual element, you would use the following code:

import numpy as np

a = np.array([1, 2, 3, 4, 5])
a[2] = 10
print(a)  # Output: [ 1  2 10  4  5]

To modify a section of an array, you would use slicing to extract the section you want to modify, and then assign new values to it. For example, to change the values in the second and third columns of a 2D array, you would use the following code:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a[:, 1:3] = 0
print(a)
"""
Output:
[[1 0 0]
 [4 0 0]
 [7 0 0]]
"""

In summary, NumPy array indexing and slicing are powerful tools for accessing and manipulating specific elements or sections of a NumPy array. By using indexing and slicing, you can extract and modify specific elements, rows, and columns of an array with ease.

Array Manipulation:

Array manipulation is a crucial aspect of working with NumPy. It involves changing the shape, size, and structure of arrays. NumPy provides several functions to perform these operations. In this section, we will discuss some of the most commonly used array manipulation functions.

Reshaping Arrays

Reshaping refers to changing the shape or size of an array without changing its data. NumPy provides the reshape() function for this purpose. The reshape() function takes the original array and a new shape as arguments and returns a new array with the specified shape.

import numpy as np

# Create a 1D array of length 6
a = np.array([1, 2, 3, 4, 5, 6])

# Reshape the array to a 2D array with 2 rows and 3 columns
b = a.reshape(2, 3)

print(b)
# Output:
# [[1 2 3]
#  [4 5 6]]

The reshape() function can also be used to create arrays with higher dimensions:

# Create a 1D array of length 8
c = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# Reshape the array to a 3D array with 2 rows, 2 columns and 2 depth
d = c.reshape(2, 2, 2)

print(d)
# Output:
# [[[1 2]
#   [3 4]]
#
#  [[5 6]
#   [7 8]]]

Concatenating Arrays

Concatenation refers to combining two or more arrays into a single array. NumPy provides the concatenate() function for this purpose. The concatenate() function takes a sequence of arrays and an axis parameter as input and returns a new array that is the concatenation of the input arrays along the specified axis.

# Create two 2D arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Concatenate the arrays along the rows (axis 0)
c = np.concatenate((a, b), axis=0)

print(c)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Concatenate the arrays along the columns (axis 1)
d = np.concatenate((a, b), axis=1)

print(d)
# Output:
# [[1 2 5 6]
#  [3 4 7 8]]

Stacking Arrays

Stacking refers to creating a new array by stacking two or more arrays along a new axis. NumPy provides the stack() function for this purpose. The stack() function takes a sequence of arrays and an axis parameter as input and returns a new array that is the stacking of the input arrays along the specified axis.

# Create two 1D arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Stack the arrays vertically
c = np.stack((a, b), axis=0)

print(c)
# Output:
# [[1 2 3]
#  [4 5 6]]

# Stack the arrays horizontally
d = np.stack((a, b), axis=1)

print(d)
# Output:
# [[1 4]
#  [2 5]
#  [3 6]]

Mathematical Operations:

Here's an overview of some of the mathematical operations you can perform with NumPy arrays:

  1. Basic arithmetic:

    NumPy arrays support standard arithmetic operations such as addition, subtraction, multiplication, and division. For example:

     import numpy as np
    
     x = np.array([1, 2, 3])
     y = np.array([4, 5, 6])
    
     z = x + y  # array([5, 7, 9])
     z = x - y  # array([-3, -3, -3])
     z = x * y  # array([4, 10, 18])
     z = x / y  # array([0.25, 0.4, 0.5])
    
  2. Trigonometric functions:

    NumPy provides many trigonometric functions, such as sin, cos, and tan, which can be applied to arrays element-wise. For example:

     import numpy as np
    
     x = np.array([0, np.pi/2, np.pi])
    
     y = np.sin(x)  # array([0, 1, 0])
     y = np.cos(x)  # array([1, 0, -1])
     y = np.tan(x)  # array([0, inf, 0])
    
  3. Linear algebra operations:

    NumPy provides a range of linear algebra functions, such as dot product, matrix multiplication, and determinant calculation. For example:

     import numpy as np
    
     x = np.array([[1, 2], [3, 4]])
     y = np.array([[5, 6], [7, 8]])
    
     z = np.dot(x, y)         # array([[19, 22], [43, 50]])
     z = np.matmul(x, y)      # array([[19, 22], [43, 50]])
     z = np.linalg.det(x)     # -2.0
     z = np.linalg.inv(x)     # array([[-2. ,  1. ], [ 1.5, -0.5]])
    
  4. Statistical functions:

    NumPy provides many statistical functions for arrays, such as mean, median, standard deviation, and variance. For example:

     import numpy as np
    
     x = np.array([1, 2, 3, 4, 5])
    
     y = np.mean(x)           # 3.0
     y = np.median(x)         # 3.0
     y = np.std(x)            # 1.4142135623730951
     y = np.var(x)            # 2.0
    
  5. Broadcasting:

    NumPy provides the ability to perform arithmetic operations between arrays of different shapes and sizes, as long as their dimensions are compatible. This is known as broadcasting. For example:

     import numpy as np
    
     x = np.array([1, 2, 3])
     y = np.array([4, 5, 6])
     z = np.array([7, 8, 9])
    
     a = x + 10      # array([11, 12, 13])
     b = x + y       # array([5, 7, 9])
     c = x + y + z   # array([12, 15, 18])
    

These are just a few examples of the many mathematical operations you can perform with NumPy arrays.

Statistical Operations:

Here's an overview of how to perform statistical operations on NumPy arrays:

NumPy provides various functions for performing statistical operations on arrays. Here are some of the commonly used functions:

  1. np.mean(): Calculates the arithmetic mean (average) of elements in the array.

  2. np.median(): Calculates the median of elements in the array.

  3. np.std(): Calculates the standard deviation of elements in the array.

  4. np.var(): Calculates the variance of elements in the array.

  5. np.min(): Returns the minimum value of elements in the array.

  6. np.max(): Returns the maximum value of elements in the array.

  7. np.percentile(): Calculates the nth percentile of elements in the array.

  8. np.corrcoef(): Calculates the correlation coefficient of elements in the array.

  9. np.histogram(): Calculates the histogram of the elements in the array.

Here's an example that demonstrates how to use some of these functions:

import numpy as np

# Create an array
arr = np.array([10, 20, 30, 40, 50])

# Calculate the mean
mean = np.mean(arr)
print("Mean:", mean)

# Calculate the median
median = np.median(arr)
print("Median:", median)

# Calculate the standard deviation
std = np.std(arr)
print("Standard deviation:", std)

# Calculate the variance
var = np.var(arr)
print("Variance:", var)

# Calculate the minimum and maximum values
min_val = np.min(arr)
max_val = np.max(arr)
print("Minimum value:", min_val)
print("Maximum value:", max_val)

# Calculate the 75th percentile
percentile = np.percentile(arr, 75)
print("75th percentile:", percentile)

# Calculate the correlation coefficient
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([10, 20, 30, 40, 50])
corrcoef = np.corrcoef(arr1, arr2)
print("Correlation coefficient:", corrcoef)

# Calculate the histogram
hist, bins = np.histogram(arr, bins=[0, 20, 40, 60])
print("Histogram:", hist)
print("Bins:", bins)

Output:

Mean: 30.0
Median: 30.0
Standard deviation: 14.1421356237
Variance: 200.0
Minimum value: 10
Maximum value: 50
75th percentile: 40.0
Correlation coefficient: [[1. 1.]
 [1. 1.]]
Histogram: [2 2 1]
Bins: [ 0 20 40 60]

This is just a basic introduction to statistical operations in NumPy. There are many more functions available for performing various statistical operations.

Random Number Generation:

NumPy provides several random number generators that can be used to create arrays of random numbers or random samples from different probability distributions. The random module in NumPy can be accessed using the following import statement:

import numpy as np

# Accessing the random module
np.random

Here are some of the ways to generate random numbers and arrays using NumPy:

Generating random numbers between 0 and 1

The rand() function generates random numbers between 0 and 1, with a uniform distribution.

import numpy as np

# Generating 5 random numbers between 0 and 1
x = np.random.rand(5)
print(x)

Output:

[0.31922828 0.39734217 0.7167664  0.74559779 0.49852028]

Generating random numbers from a normal distribution

The normal() function generates random numbers from a normal distribution with a given mean and standard deviation.

import numpy as np

# Generating 5 random numbers from a normal distribution with mean 0 and standard deviation 1
x = np.random.normal(0, 1, 5)
print(x)

Output:

[-0.68829688 -0.7656165  -1.43216348 -0.43020713  1.48816038]

Generating random integers

The randint() function generates random integers between a given range.

import numpy as np

# Generating 5 random integers between 1 and 10
x = np.random.randint(1, 10, 5)
print(x)

Output:

[3 7 3 2 9]

Generating random samples from an array

The choice() function generates random samples from a given array.

import numpy as np

# Generating 5 random samples from the given array
x = np.random.choice([1, 2, 3, 4, 5], size=5)
print(x)

Output:

[4 2 2 2 4]

Seeding the random number generator

NumPy's random number generator is based on a seed value, which determines the sequence of random numbers that are generated. By default, the seed value is based on the system clock, which means that the sequence of random numbers can be different each time the program is run. However, it is possible to set a fixed seed value using the seed() function, which will ensure that the same sequence of random numbers is generated each time the program is run.

import numpy as np

# Setting the seed value to generate the same sequence of random numbers
np.random.seed(123)

# Generating 5 random numbers between 0 and 1
x = np.random.rand(5)
print(x)

Output:

[0.69646919 0.28613933 0.22685145 0.55131477 0.71946897]

Using NumPy's random number generators can be very useful in generating data for simulations or experiments, or in generating test data for machine learning algorithms.

Broadcasting:

Broadcasting is a powerful feature in NumPy that allows element-wise operations between arrays with different shapes. When we perform arithmetic or other operations on arrays with the same shape, it's straightforward. However, when the shapes of the arrays are different, broadcasting comes in handy, allowing us to avoid unnecessary memory copies and loops.

The broadcasting rules in NumPy are straightforward and can be summarized as follows:

  1. If the two arrays have the same shape, then the operation is performed element-wise.

  2. If the two arrays have different shapes, then NumPy tries to broadcast them to the same shape. Broadcasting is possible when the arrays' dimensions are compatible, i.e., one of the following conditions is met:

    a. The arrays have the same number of dimensions, and each dimension has the same size.

    b. One of the arrays has fewer dimensions than the other, but its shape can be extended to match the other array's shape. This is done by adding dimensions of size 1 to the left of the array's shape.

    c. One of the arrays has a size of 1 in a particular dimension, and that dimension is broadcasted to match the other array's size in that dimension.

For example, let's consider the following two arrays:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

We can perform element-wise operations on these two arrays:

c = a + b
print(c) # Output: [11, 22, 33]

Now let's consider a different example where the two arrays have different shapes:

a = np.array([1, 2, 3])
b = np.array([[10], [20], [30]])

Here, the shape of a is (3,) and the shape of b is (3, 1). Since the number of dimensions is the same and each dimension has the same size, we can perform element-wise operations on these two arrays as follows:

c = a + b
print(c) # Output: [[11, 12, 13], [21, 22, 23], [31, 32, 33]]

In this example, NumPy broadcasts a to the shape (3, 3) by replicating its values along the second dimension. Then, the element-wise operation is performed between the two arrays.

Broadcasting is a powerful tool that enables us to write concise and efficient code in NumPy. However, it's important to use it judiciously and understand how it works to avoid errors and unexpected results.

Working with Missing Values:

NumPy provides several methods to handle missing values, also known as NaN (Not a Number) values. Missing values can occur due to various reasons, such as incomplete data or errors in data collection.

One way to handle missing values in NumPy is by using masking. A mask is a Boolean array that indicates which values are valid and which are missing. The np.isnan() function can be used to create a mask for missing values.

For example, suppose we have an array with missing values:

import numpy as np

arr = np.array([1, 2, np.nan, 4, np.nan, 6])

We can create a mask for the missing values using the np.isnan() function:

mask = np.isnan(arr)

This will give us a Boolean array where True indicates a missing value:

[False False  True False  True False]

We can then use this mask to perform operations on the non-missing values. For example, we can calculate the mean of the non-missing values using the np.mean() function:

mean = np.mean(arr[~mask])

The ~ operator is used to invert the mask, so we select the non-missing values.

Another way to handle missing values is by using the np.ma module, which provides support for masked arrays. A masked array is an array with a mask that indicates which values are valid and which are missing.

For example, we can create a masked array from the original array using the np.ma.array() function:

arr_masked = np.ma.array(arr, mask=mask)

We can then perform operations on the masked array, and the missing values will be automatically ignored:

mean = np.ma.mean(arr_masked)

This will give us the mean of the non-missing values.

In summary, NumPy provides several methods for handling missing values, including masking and masked arrays. These methods allow us to perform operations on arrays with missing values while ignoring the missing values.

Case Study:

Here's a brief example project that uses NumPy to analyze some data:

Project Title: Analyzing Student Exam Scores using NumPy

Project Description:

In this project, we will use NumPy to analyze a dataset of student exam scores. We will load the data, manipulate it using NumPy, and perform some basic analysis to gain insights into the student's performance.

Dataset Description:

The dataset contains scores from 100 students who took an exam. There are four columns: student ID, math score, reading score, and writing score.

Steps:

  1. Import the necessary libraries:
import numpy as np
import pandas as pd
  1. Load the dataset:
df = pd.read_csv('exam_scores.csv')
  1. Convert the DataFrame to a NumPy array:
scores = df.to_numpy()
  1. Calculate the mean score for each subject:
math_mean = np.mean(scores[:, 1])
reading_mean = np.mean(scores[:, 2])
writing_mean = np.mean(scores[:, 3])
  1. Calculate the standard deviation for each subject:
math_std = np.std(scores[:, 1])
reading_std = np.std(scores[:, 2])
writing_std = np.std(scores[:, 3])
  1. Calculate the overall mean score:
overall_mean = np.mean(scores[:, 1:4])
  1. Find the student with the highest math score:
highest_math_score = np.max(scores[:, 1])
highest_math_score_student = scores[np.argmax(scores[:, 1]), 0]
  1. Find the students who scored above the overall mean:
above_mean_students = scores[np.where(np.mean(scores[:, 1:4], axis=1) > overall_mean), 0]
  1. Print out the results:
print("Math mean score:", math_mean)
print("Reading mean score:", reading_mean)
print("Writing mean score:", writing_mean)

print("Math standard deviation:", math_std)
print("Reading standard deviation:", reading_std)
print("Writing standard deviation:", writing_std)

print("Overall mean score:", overall_mean)

print("Student with highest math score:", highest_math_score_student)

print("Students who scored above the overall mean:", above_mean_students)

This project is just a brief example of how NumPy can be used for data analysis. With NumPy, you can perform a wide range of operations on arrays of data, making it a powerful tool for scientific computing.

Conclusion:

NumPy is a powerful and essential library for scientific computing in Python. It provides a wide range of tools for creating, manipulating, and performing operations on arrays of data. In this series of articles, we have covered the key features of NumPy, including array indexing and slicing, array manipulation, mathematical and statistical operations, random number generation, broadcasting, and working with missing values. With this knowledge, you should be well-equipped to use NumPy in your own data analysis and scientific computing projects.

I hope this helps, you!!

More such articles:

https://medium.com/techwasti

https://www.youtube.com/channel/UCiTaHm1AYqMS4F4L9zyO7qA

https://www.techwasti.com/

\==========================**=========================

If this article adds any value to you then please clap and comment.

Let’s connect on Stackoverflow, LinkedIn, & Twitter.

Did you find this article valuable?

Support techwasti by becoming a sponsor. Any amount is appreciated!