Find Median from Data Stream

Hero Image

Problem Overview

The problem "Find Median from Data Stream" is categorized under LeetCode's heap problems. It involves designing a data structure that efficiently handles a dynamic set of numbers to allow retrieval of its median value at any time. This problem spans across maintaining an ongoing median calculation as new elements are continually added, presenting a classic case for utilizing heaps.

URL: Find Median from Data Stream

Understanding the Problem

In basic terms, a median is the middle value of a dataset. In a static array, this is straightforward: sort the data and find the middle element (or the average of two middle elements for even-numbered datasets). However, in this problem, we must manage a continuous data stream, making the typical sorting approach inefficient.

The task requires constructing a MedianFinder class that:

  • Supports adding numbers from a stream.
  • Retrieves the median value with efficient time complexity.

Here's an outline of the required operations:

  • addNum(int num): Incorporate a new number from the stream.
  • findMedian(): Retrieve the median of all numbers received so far.

Key DSA Concepts and Theory

Heaps:

A heap is a specialized tree-based data structure that satisfies the heap property:

  • In a max heap, for any node i, the value of i is greater than or equal to the values of its children.
  • In a min heap, for any node i, the value of i is less than or equal to the values of its children.

For this problem:

  • We use two heaps: a max heap to store the smaller half of the numbers, and a min-heap for the larger half.

Median Calculation:

  • If the dataset is odd, the median is the middle element.
  • If even, it's the average of the two middle elements.

Using heaps efficiently allows maintaining the middle elements without sorting the entire dataset repeatedly.

Solution Approach

The core idea is to employ two heaps:

  • Max Heap (lower_half): To keep track of the smaller half of the numbers.
  • Min Heap (upper_half): To maintain the larger half.

Steps:

  1. Insertion:

    • If the new number is less than or equal to the maximum number in lower_half (max-heap), add it to lower_half.
    • Otherwise, add it to upper_half (min-heap).
  2. Balancing:

    • Ensure both heaps are balanced (difference in size is at most one).
    • If lower_half has more elements, move the root of lower_half to upper_half.
    • Vice versa if upper_half has more elements.
  3. Finding the Median:

    • If the heaps are of equal length, the median is the average of their roots.
    • If one heap has more elements, the root of that heap is the median.

Code Implementation

C++

#include <queue>
using namespace std;

class MedianFinder {
private:
    priority_queue<int> maxHeap; 
    priority_queue<int, vector<int>, greater<int>> minHeap; 

public:
    MedianFinder() {}

    void addNum(int num) {
        if (maxHeap.empty() || num <= maxHeap.top()) {
            maxHeap.push(num);
        } else {
            minHeap.push(num);
        }
        
        // Balancing the heaps
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.push(maxHeap.top());
            maxHeap.pop();
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.push(minHeap.top());
            minHeap.pop();
        }
    }
    
    double findMedian() {
        if (maxHeap.size() == minHeap.size()) {
            return (maxHeap.top() + minHeap.top()) / 2.0;
        } else {
            return maxHeap.top();
        }
    }
};

Java

import java.util.PriorityQueue;

class MedianFinder {
    private PriorityQueue<Integer> maxHeap;
    private PriorityQueue<Integer> minHeap;

    public MedianFinder() {
        maxHeap = new PriorityQueue<>((a, b) -> b - a);
        minHeap = new PriorityQueue<>();
    }

    public void addNum(int num) {
        if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
            maxHeap.offer(num);
        } else {
            minHeap.offer(num);
        }
        
        // Balancing the heaps
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.offer(maxHeap.poll());
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.offer(minHeap.poll());
        }
    }
    
    public double findMedian() {
        if (maxHeap.size() == minHeap.size()) {
            return (maxHeap.peek() + minHeap.peek()) / 2.0;
        } else {
            return maxHeap.peek();
        }
    }
}

Time and Space Complexity Analysis

Time Complexity:

  • addNum(int num): O(log n) due to heap insertions.
  • findMedian(): O(1) because it simply reads the top elements of the heaps.

Space Complexity:

  • O(n), where n is the number of elements, since each element will eventually be stored in one of the two heaps.

Common Mistakes to Avoid

  1. Neglecting the balance: Always ensure that the heaps are balanced correctly after every insertion.
  2. Incorrect Median Computation: Failures in using the right roots when the heap sizes differ.

Similar Problems on LeetCode

  1. Sliding Window Median (Hard) - LeetCode 480
  2. Kth Largest Element in a Stream (Easy) - LeetCode 703
  3. Merge k Sorted Lists (Hard) - LeetCode 23

Additional Resources and References