The NumPy Survival Guide
A complete walkthrough of creating, manipulating, and cleaning arrays

To build Machine Learning models, I need to master the ndarray (N-dimensional array). It’s not just about storing numbers; it’s about generating patterns, reshaping matrices, and cleaning data without writing a single loop.
I’ve compiled every function I learned today into this guide.
1. The Basics: ndarray & dtype
A NumPy array is a grid of values, all of the same type.
Homogeneous: It contains elements of only one data type (usually float64 or int64).
Contiguous: It is stored in a continuous block of memory, allowing the CPU to process it efficiently.
Key Attributes
Rank: Number of dimensions.
Shape: The size (rows, columns).
dtype: The specific data type (e.g.,
int64,float32).
import numpy as np
# Creating a basic array
arr = np.array([1, 2, 3])
print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
The dtype Object
NumPy is strict. If you mix types, it upcasts them (e.g., integers become floats). You can also define structured data using fromrecords (like a mini-database).
# Structured Array (Advanced!)
# Creating a record of (Name, Age, Salary)
core_data = [('Saachi', 21, 5000), ('Jaden', 25, 6000)]
rec_arr = np.core.records.fromrecords(core_data, names='name, age, salary')
print(rec_arr.name) # Output: ['Saachi' 'Jaden']
2. Creation Toolkit: Generating Data
Sometimes we don't have data yet; we need to generate it. NumPy has a function for every pattern.
The "Blank Canvas" Functions
| Function | Description |
|---|---|
np.zeros((2,2)) |
Creates a matrix of pure 0s. |
np.ones((2,2)) |
Creates a matrix of pure 1s. |
np.full((2,2), 7) |
Fills the matrix with a specific number (e.g., 7). |
np.eye(3) |
Creates an Identity Matrix (1s on diagonal, 0s elsewhere). |
np.empty((2,2)) |
Creates an uninitialized array (fast, but full of garbage memory values). |
The "Sequence" Functions
arange vs linspace: This confused me at first.
arange: "I want a step size of 2." (0, 2, 4...)linspace: "I want 5 numbers evenly spaced between 0 and 10."
# ARANGE: Start, Stop, Step
print(np.arange(0, 10, 2))
# Output: [0 2 4 6 8]
# LINSPACE: Start, Stop, Number of points
print(np.linspace(0, 10, 5))
# Output: [ 0. 2.5 5. 7.5 10. ]
The "Random" Functions
Creating a Gaussian (Normal) distribution is essential for initializing weights in Neural Networks.
# 2D Gaussian Array (Mean=0, Std=0.1, Shape=(3,3))
gaussian_arr = np.random.normal(0, 0.1, (3, 3))
3. The Silent Bug: Copy vs. View
This is the most dangerous trap in NumPy. If you assign b = a, Python just points to the same memory address. Changing b changes a.
a = np.array([1, 2, 3])
b = a # This is a VIEW
b[0] = 99
print(a) # Output: [99, 2, 3] -> Original is modified!
# The Fix:
b = np.copy(a) # This is a COPY (Safe)
4. The Shape Shifter: Reshaping & Axes
Data rarely comes in the shape we want. We have to bend it.
Reshaping
reshape() creates a new view of the data without changing the underlying information.
- The
-1Trick: Passing-1lets NumPy calculate the missing dimension automatically.
arr = np.arange(1, 10) # 9 items
grid = arr.reshape(3, 3) # 3x3 matrix
col_vector = arr.reshape(-1, 1) # 9x1 column vector
Inserting a New Axis
Sometimes a model expects a 2D input (3, 1) but you have a 1D vector (3,).
a = np.array([1, 2, 3])
# Add a new axis to make it a column vector
b = a[:, np.newaxis]
print(b.shape) # Output: (3, 1)
Swapping Columns
We can reorder data just by rearranging indices.
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Swap column 0 and column 2
# Format: arr[:, [new_order]]
arr[:, [0, 2]] = arr[:, [2, 0]]
5.Vectorization & Broadcasting
This is why we use NumPy: Speed.
Vectorization
Applying an operation to the entire array at once, removing loops.
arr = np.array([1, 2, 3])
print(arr * 10) # Output: [10, 20, 30]
Broadcasting
How NumPy handles math between arrays of different shapes. It "stretches" the smaller array to fit.
The Dot Product (np.dot)
The fundamental operation of Neural Networks.
$$\text{output} = \text{inputs} \cdot \text{weights} + \text{bias}$$
inputs = np.array([2, 4])
weights = np.array([0.5, 0.25])
output = np.dot(inputs, weights) # Result: 2.0
6. The Joining Station: Stacking & Concatenating
Merging datasets is a daily task.
hstack vs vstack vs dstack
hstack(Horizontal): Stacks side-by-side (adds columns).vstack(Vertical): Stacks on top (adds rows).dstack(Depth): Stacks along the 3rd axis (3D).
a = np.array([1, 2])
b = np.array([3, 4])
# Vertical Stack
print(np.vstack((a, b)))
# Output:
# [[1, 2],
# [3, 4]]
# Appending values to the end
np.append(a, [99]) # Output: [1, 2, 99]
concatenate
The generic version of stacking. You specify the axis.
# Join along axis 0 (rows)
np.concatenate((a.reshape(1,2), b.reshape(1,2)), axis=0)
5. Splitting
The opposite of joining. Useful for splitting data into "Training" and "Test" sets.
x = np.arange(9) # [0, 1, 2... 8]
# Split into 3 equal arrays
parts = np.split(x, 3)
print(parts[0]) # Output: [0, 1, 2]
6. Logic Data Cleaning
Real-world data is dirty.How do we find unique items or compare arrays?
Unique & Union
arr_duplicates = np.array([1, 1, 2, 2, 3])
# Find unique values
print(np.unique(arr_duplicates)) # Output: [1 2 3]
# Union of two arrays
arr1 = [10, 20]
arr2 = [20, 30]
print(np.union1d(arr1, arr2)) # Output: [10 20 30]
Comparing Arrays
You can check if two arrays are exactly equal.
a = np.array([1, 2])
b = np.array([1, 2])
print(np.array_equal(a, b)) # Output: True
Trimming Zeros
Useful for signal processing or cleaning sparse data.
arr = np.array([0, 0, 1, 2, 3, 0])
print(np.trim_zeros(arr)) # Output: [1, 2, 3]
7. Math Operations
Finally, the engine itself. NumPy contains highly optimized implementation of standard statistical and algebraic functions.
Statistics
We use these to understand the distribution of our dataset.
np.mean(): The average.np.median(): The middle value (robust to outliers).np.std(): The Standard Deviation (how spread out the data is).
The Dot Product (np.dot)
This is the fundamental operation of Neural Networks. It multiplies input vectors by weight vectors and sums the results.
inputs = np.array([2, 4])
weights = np.array([0.5, 0.25])
# (2 * 0.5) + (4 * 0.25) = 1.0 + 1.0 = 2.0
output = np.dot(inputs, weights)
print(output) # 2.0
Steps:
Create the grid (
arange,zeros).Reshape it (
newaxis,T).Combine or Split it (
vstack,split).Clean it (
unique,trim_zeros).



