Skip to main content

Command Palette

Search for a command to run...

The NumPy Survival Guide

A complete walkthrough of creating, manipulating, and cleaning arrays

Updated
6 min read
The NumPy Survival Guide

To build Machine Learning models, I need to master the ndarray (N-dimensional array). It’s not just about storing numbers; it’s about generating patterns, reshaping matrices, and cleaning data without writing a single loop.

I’ve compiled every function I learned today into this guide.

1. The Basics: ndarray & dtype

A NumPy array is a grid of values, all of the same type.

Homogeneous: It contains elements of only one data type (usually float64 or int64).

Contiguous: It is stored in a continuous block of memory, allowing the CPU to process it efficiently.

Key Attributes

  • Rank: Number of dimensions.

  • Shape: The size (rows, columns).

  • dtype: The specific data type (e.g., int64, float32).

import numpy as np

# Creating a basic array
arr = np.array([1, 2, 3])
print(f"Array: {arr}")
print(f"Shape: {arr.shape}")

The dtype Object

NumPy is strict. If you mix types, it upcasts them (e.g., integers become floats). You can also define structured data using fromrecords (like a mini-database).

# Structured Array (Advanced!)
# Creating a record of (Name, Age, Salary)
core_data = [('Saachi', 21, 5000), ('Jaden', 25, 6000)]
rec_arr = np.core.records.fromrecords(core_data, names='name, age, salary')

print(rec_arr.name) # Output: ['Saachi' 'Jaden']

2. Creation Toolkit: Generating Data

Sometimes we don't have data yet; we need to generate it. NumPy has a function for every pattern.

The "Blank Canvas" Functions

Function Description
np.zeros((2,2)) Creates a matrix of pure 0s.
np.ones((2,2)) Creates a matrix of pure 1s.
np.full((2,2), 7) Fills the matrix with a specific number (e.g., 7).
np.eye(3) Creates an Identity Matrix (1s on diagonal, 0s elsewhere).
np.empty((2,2)) Creates an uninitialized array (fast, but full of garbage memory values).

The "Sequence" Functions

arange vs linspace: This confused me at first.

  • arange: "I want a step size of 2." (0, 2, 4...)

  • linspace: "I want 5 numbers evenly spaced between 0 and 10."

# ARANGE: Start, Stop, Step
print(np.arange(0, 10, 2))  
# Output: [0 2 4 6 8]

# LINSPACE: Start, Stop, Number of points
print(np.linspace(0, 10, 5)) 
# Output: [ 0.   2.5  5.   7.5 10. ]

The "Random" Functions

Creating a Gaussian (Normal) distribution is essential for initializing weights in Neural Networks.

# 2D Gaussian Array (Mean=0, Std=0.1, Shape=(3,3))
gaussian_arr = np.random.normal(0, 0.1, (3, 3))

3. The Silent Bug: Copy vs. View

This is the most dangerous trap in NumPy. If you assign b = a, Python just points to the same memory address. Changing b changes a.

a = np.array([1, 2, 3])
b = a          # This is a VIEW
b[0] = 99
print(a)       # Output: [99, 2, 3] -> Original is modified!

# The Fix:
b = np.copy(a) # This is a COPY (Safe)

4. The Shape Shifter: Reshaping & Axes

Data rarely comes in the shape we want. We have to bend it.

Reshaping

reshape() creates a new view of the data without changing the underlying information.

  • The -1 Trick: Passing -1 lets NumPy calculate the missing dimension automatically.
arr = np.arange(1, 10) # 9 items
grid = arr.reshape(3, 3) # 3x3 matrix
col_vector = arr.reshape(-1, 1) # 9x1 column vector

Inserting a New Axis

Sometimes a model expects a 2D input (3, 1) but you have a 1D vector (3,).

a = np.array([1, 2, 3])

# Add a new axis to make it a column vector
b = a[:, np.newaxis] 
print(b.shape) # Output: (3, 1)

Swapping Columns

We can reorder data just by rearranging indices.

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Swap column 0 and column 2
# Format: arr[:, [new_order]]
arr[:, [0, 2]] = arr[:, [2, 0]]

5.Vectorization & Broadcasting

This is why we use NumPy: Speed.

Vectorization

Applying an operation to the entire array at once, removing loops.

arr = np.array([1, 2, 3])
print(arr * 10) # Output: [10, 20, 30] 

Broadcasting

How NumPy handles math between arrays of different shapes. It "stretches" the smaller array to fit.

The Dot Product (np.dot)

The fundamental operation of Neural Networks.

$$\text{output} = \text{inputs} \cdot \text{weights} + \text{bias}$$

inputs = np.array([2, 4])
weights = np.array([0.5, 0.25])
output = np.dot(inputs, weights) # Result: 2.0

6. The Joining Station: Stacking & Concatenating

Merging datasets is a daily task.

hstack vs vstack vs dstack

  • hstack (Horizontal): Stacks side-by-side (adds columns).

  • vstack (Vertical): Stacks on top (adds rows).

  • dstack (Depth): Stacks along the 3rd axis (3D).

a = np.array([1, 2])
b = np.array([3, 4])

# Vertical Stack
print(np.vstack((a, b)))
# Output:
# [[1, 2],
#  [3, 4]]

# Appending values to the end
np.append(a, [99]) # Output: [1, 2, 99]

concatenate

The generic version of stacking. You specify the axis.

# Join along axis 0 (rows)
np.concatenate((a.reshape(1,2), b.reshape(1,2)), axis=0)

5. Splitting

The opposite of joining. Useful for splitting data into "Training" and "Test" sets.

x = np.arange(9) # [0, 1, 2... 8]

# Split into 3 equal arrays
parts = np.split(x, 3)
print(parts[0]) # Output: [0, 1, 2]

6. Logic Data Cleaning

Real-world data is dirty.How do we find unique items or compare arrays?

Unique & Union

arr_duplicates = np.array([1, 1, 2, 2, 3])

# Find unique values
print(np.unique(arr_duplicates)) # Output: [1 2 3]

# Union of two arrays
arr1 = [10, 20]
arr2 = [20, 30]
print(np.union1d(arr1, arr2)) # Output: [10 20 30]

Comparing Arrays

You can check if two arrays are exactly equal.

a = np.array([1, 2])
b = np.array([1, 2])
print(np.array_equal(a, b)) # Output: True

Trimming Zeros

Useful for signal processing or cleaning sparse data.

arr = np.array([0, 0, 1, 2, 3, 0])
print(np.trim_zeros(arr)) # Output: [1, 2, 3]

7. Math Operations

Finally, the engine itself. NumPy contains highly optimized implementation of standard statistical and algebraic functions.

Statistics

We use these to understand the distribution of our dataset.

  • np.mean(): The average.

  • np.median(): The middle value (robust to outliers).

  • np.std(): The Standard Deviation (how spread out the data is).

The Dot Product (np.dot)

This is the fundamental operation of Neural Networks. It multiplies input vectors by weight vectors and sums the results.

inputs = np.array([2, 4])
weights = np.array([0.5, 0.25])

# (2 * 0.5) + (4 * 0.25) = 1.0 + 1.0 = 2.0
output = np.dot(inputs, weights)
print(output) # 2.0

Steps:

  1. Create the grid (arange, zeros).

  2. Reshape it (newaxis, T).

  3. Combine or Split it (vstack, split).

  4. Clean it (unique, trim_zeros).