Median of 2 sorted arrays

Dec 18, 2024 • 4 mins read

Dhaval Trivedi

Co-founder, Airtribe

Problem Overview

The problem, often referred to as "Median of Two Sorted Arrays," is a classic LeetCode problem which asks us to find the median of two sorted arrays. This problem requires an efficient solution because the straightforward merging approach isn't optimal for large datasets.

Below is the simplified problem statement:

Given two sorted arrays, nums1 and nums2, of size m and n respectively, return the median of the two sorted arrays.

Example:

Input:
nums1 = [1, 3]
nums2 = [2]

Output:
2.0

The median is 2.0 because in the sorted array [1, 2, 3], 2 is the middle element.

Understanding the Problem

The median is a statistical measure that separates the higher half from the lower half of a data sample. If the combined sorted array length is odd, the median is the middle element. If even, it is the average of the two middle elements. Finding the median in O((m + n) log (m + n)) time is straightforward by simply merging the arrays and computing the median. However, the task becomes finding this in O(log(min(m, n))), which demands a more sophisticated approach.

Key DSA Concepts and Theory

Binary Search

Binary Search is a search algorithm that finds the position of a target value within a sorted array. It significantly reduces the time complexity from O(n) to O(log n) by repeatedly dividing the search interval in half.

Median

The median is a value that separates a dataset into two equal halves. Finding the median has different approaches based on whether the number of data points is odd or even.

Divide and Conquer

The solution involves a "divide and conquer" approach where we partition the arrays to find a median efficiently. This is an advanced application of Binary Search in practice to identify the correct partition without merging the arrays explicitly.

Solution Approach

To solve this problem efficiently, we'll use a binary search approach on one of the arrays. The algorithm follows these steps:

Ensure nums1 is the smaller array. If not, swap them.
Perform binary search on smaller array. We partition nums1 into parts where a certain number of elements are in each partition. The rest of the elements in nums2 will complement this partition so that the left part has the same number of elements as the right part.
Check partitioning conditions:
- Find partition indices in both arrays.
- Ensure that elements in the left partition are less than or equal to elements in the right partition.
Calculate median based on partitions.

Here are implementations in both C++ and Java:

C++ Code

#include <vector>
#include <algorithm>
#include <iostream>
#include <limits.h>

using namespace std;

double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
    if (nums1.size() > nums2.size()) {
        return findMedianSortedArrays(nums2, nums1);
    }

    int x = nums1.size();
    int y = nums2.size();
    int low = 0, high = x;
    
    while (low <= high) {
        int partitionX = (low + high) / 2;
        int partitionY = (x + y + 1) / 2 - partitionX;
        
        int maxX = (partitionX == 0) ? INT_MIN : nums1[partitionX - 1];
        int minX = (partitionX == x) ? INT_MAX : nums1[partitionX];
        
        int maxY = (partitionY == 0) ? INT_MIN : nums2[partitionY - 1];
        int minY = (partitionY == y) ? INT_MAX : nums2[partitionY];
        
        if (maxX <= minY && maxY <= minX) {
            if ((x + y) % 2 == 0) {
                return ((double)max(maxX, maxY) + min(minX, minY)) / 2;
            } else {
                return (double)max(maxX, maxY);
            }
        } else if (maxX > minY) {
            high = partitionX - 1;
        } else {
            low = partitionX + 1;
        }
    }
    
    throw invalid_argument("Input arrays are not sorted");
}

Java Code

public class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        if (nums1.length > nums2.length) {
            return findMedianSortedArrays(nums2, nums1);
        }
        
        int x = nums1.length;
        int y = nums2.length;
        int low = 0, high = x;
        
        while (low <= high) {
            int partitionX = (low + high) / 2;
            int partitionY = (x + y + 1) / 2 - partitionX;
            
            int maxX = (partitionX == 0) ? Integer.MIN_VALUE : nums1[partitionX - 1];
            int minX = (partitionX == x) ? Integer.MAX_VALUE : nums1[partitionX];
            
            int maxY = (partitionY == 0) ? Integer.MIN_VALUE : nums2[partitionY - 1];
            int minY = (partitionY == y) ? Integer.MAX_VALUE : nums2[partitionY];
            
            if (maxX <= minY && maxY <= minX) {
                if ((x + y) % 2 == 0) {
                    return ((double)Math.max(maxX, maxY) + Math.min(minX, minY)) / 2;
                } else {
                    return (double)Math.max(maxX, maxY);
                }
            } else if (maxX > minY) {
                high = partitionX - 1;
            } else {
                low = partitionX + 1;
            }
        }
        
        throw new IllegalArgumentException("Input arrays are not sorted");
    }
}

Time and Space Complexity Analysis

Time Complexity: The time complexity of this approach is O(log(min(m, n))). This is because we perform a binary search on the smaller array.
Space Complexity: The space complexity is O(1) because no additional space proportional to the input size is used.

Common Mistakes to Avoid

Not swapping arrays if needed: Ensure you always perform the binary search on the smaller array. This aids in minimizing the complexity.
Boundary Conditions: Ensure your partition correctly handles all boundaries (checking for INT_MIN/INT_MAX) to avoid index out-of-bound errors.