Find Median from Data Stream

Problem Overview
The problem "Find Median from Data Stream" is categorized under LeetCode's heap problems. It involves designing a data structure that efficiently handles a dynamic set of numbers to allow retrieval of its median value at any time. This problem spans across maintaining an ongoing median calculation as new elements are continually added, presenting a classic case for utilizing heaps.
URL: Find Median from Data Stream
Understanding the Problem
In basic terms, a median is the middle value of a dataset. In a static array, this is straightforward: sort the data and find the middle element (or the average of two middle elements for even-numbered datasets). However, in this problem, we must manage a continuous data stream, making the typical sorting approach inefficient.
The task requires constructing a MedianFinder
class that:
- Supports adding numbers from a stream.
- Retrieves the median value with efficient time complexity.
Here's an outline of the required operations:
addNum(int num)
: Incorporate a new number from the stream.findMedian()
: Retrieve the median of all numbers received so far.
Key DSA Concepts and Theory
Heaps:
A heap is a specialized tree-based data structure that satisfies the heap property:
- In a max heap, for any node
i
, the value ofi
is greater than or equal to the values of its children. - In a min heap, for any node
i
, the value ofi
is less than or equal to the values of its children.
For this problem:
- We use two heaps: a max heap to store the smaller half of the numbers, and a min-heap for the larger half.
Median Calculation:
- If the dataset is odd, the median is the middle element.
- If even, it's the average of the two middle elements.
Using heaps efficiently allows maintaining the middle elements without sorting the entire dataset repeatedly.
Solution Approach
The core idea is to employ two heaps:
- Max Heap (
lower_half
): To keep track of the smaller half of the numbers. - Min Heap (
upper_half
): To maintain the larger half.
Steps:
Insertion:
- If the new number is less than or equal to the maximum number in
lower_half
(max-heap), add it tolower_half
. - Otherwise, add it to
upper_half
(min-heap).
- If the new number is less than or equal to the maximum number in
Balancing:
- Ensure both heaps are balanced (difference in size is at most one).
- If
lower_half
has more elements, move the root oflower_half
toupper_half
. - Vice versa if
upper_half
has more elements.
Finding the Median:
- If the heaps are of equal length, the median is the average of their roots.
- If one heap has more elements, the root of that heap is the median.
Code Implementation
C++
#include <queue>
using namespace std;
class MedianFinder {
private:
priority_queue<int> maxHeap;
priority_queue<int, vector<int>, greater<int>> minHeap;
public:
MedianFinder() {}
void addNum(int num) {
if (maxHeap.empty() || num <= maxHeap.top()) {
maxHeap.push(num);
} else {
minHeap.push(num);
}
// Balancing the heaps
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.push(maxHeap.top());
maxHeap.pop();
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.push(minHeap.top());
minHeap.pop();
}
}
double findMedian() {
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.top() + minHeap.top()) / 2.0;
} else {
return maxHeap.top();
}
}
};
Java
import java.util.PriorityQueue;
class MedianFinder {
private PriorityQueue<Integer> maxHeap;
private PriorityQueue<Integer> minHeap;
public MedianFinder() {
maxHeap = new PriorityQueue<>((a, b) -> b - a);
minHeap = new PriorityQueue<>();
}
public void addNum(int num) {
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.offer(num);
} else {
minHeap.offer(num);
}
// Balancing the heaps
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.offer(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.offer(minHeap.poll());
}
}
public double findMedian() {
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.peek() + minHeap.peek()) / 2.0;
} else {
return maxHeap.peek();
}
}
}
Time and Space Complexity Analysis
Time Complexity:
addNum(int num)
:O(log n)
due to heap insertions.findMedian()
:O(1)
because it simply reads the top elements of the heaps.
Space Complexity:
O(n)
, wheren
is the number of elements, since each element will eventually be stored in one of the two heaps.
Common Mistakes to Avoid
- Neglecting the balance: Always ensure that the heaps are balanced correctly after every insertion.
- Incorrect Median Computation: Failures in using the right roots when the heap sizes differ.
Similar Problems on LeetCode
- Sliding Window Median (Hard) - LeetCode 480
- Kth Largest Element in a Stream (Easy) - LeetCode 703
- Merge k Sorted Lists (Hard) - LeetCode 23
Additional Resources and References
- For a more comprehensive understanding of heaps, consider reviewing this detailed guide on heap data structures.
- GeeksforGeeks Heap Data Structure: A resource to go through heap operations.
- Consider the Coursera Algorithms Specialization for practical courses on algorithmic strategies.
- Video tutorial on heaps from Abdul Bari, providing a thorough introduction to the concept of heaps.