Max number of non overlapping substrings – Strongly Connected Component (using Kosaraju algorithm)

Problem Overview
The problem "Maximum Number of Non-Overlapping Substrings" (LeetCode #1520) requires us to find the maximum number of non-overlapping substrings from a given string. A substring is considered valid if for every character in the substring all instances of that character are contained within it. The goal is then to find the maximum number of non-overlapping such substrings.
Understanding the Problem
Given a string s
, identify the maximum set of non-overlapping substrings that are valid. The intuitive way to approach this is by first determining the smallest possible substring for each character while containing all its occurrences, and then figuring out how many such substrings we can create without overlappings.
To clarify, a substring s[i:j]
is valid if for every character present between indices i
and j
, all occurrences of that character in the string s
lie between i
and j
.
Key DSA Concepts and Theory
Graph Theory and Intervals
Although the problem explicitly deals with strings and substrings, the solution revolves around interval management and slight graph theory concepts:
- Intervals: We treat each character's first and last occurrence as an interval. A valid substring for that character must include all characters within its interval.
- Greedy Algorithm: This approach ensures that the selected substrings are intrinsically maximal yet non-overlapping.
Greedy Approach
A greedy strategy is ideal for this problem:
- Calculate the minimum interval for each character enclosing all its occurrences.
- Sort these intervals and select those that result in a maximal number of non-overlapping substrings.
Solution Approach
Step-by-Step Explanation
Identify Character Limits:
Compute the start and end indices for every character in the string. The character limits should capture every occurrence of the character in the string.Determine Valid Intervals:
For each character, expand its interval to ensure it encompasses all occurrences of intermediate characters within its range.Select Non-Overlapping Intervals: Sort these intervals by their end indices and use a greedy approach to select the maximum number of non-overlapping intervals.
Iterate and Construct Substrings: Iterate over the sorted valid intervals and construct substrings, ensuring that no two substrings overlap.
C++ Code
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;
vector<string> maxNumOfSubstrings(string s) {
int n = s.length();
vector<int> start(26, -1), end(26, -1);
for (int i = 0; i < n; ++i) {
if (start[s[i] - 'a'] == -1) start[s[i] - 'a'] = i;
end[s[i] - 'a'] = i;
}
vector<pair<int, int>> intervals;
for (int i = 0; i < n; ++i) {
if (i == start[s[i] - 'a']) {
int e = end[s[i] - 'a'];
for (int j = i; j <= e; ++j) {
if (start[s[j] - 'a'] < i)
e = max(e, end[s[j] - 'a']);
}
intervals.push_back({i, e});
}
}
sort(intervals.begin(), intervals.end(), [&](pair<int, int> a, pair<int, int> b) {
return a.second < b.second;
});
vector<string> result;
int prev_end = -1;
for (auto& [i, e] : intervals) {
if (i > prev_end) {
result.push_back(s.substr(i, e - i + 1));
prev_end = e;
}
}
return result;
}
int main() {
string s = "adefaddaccc";
vector<string> result = maxNumOfSubstrings(s);
for (string str : result) {
cout << str << " ";
}
return 0;
}
Java Code
import java.util.*;
public class MaxNonOverlappingSubstrings {
public List<String> maxNumOfSubstrings(String s) {
int n = s.length();
int[] start = new int[26], end = new int[26];
Arrays.fill(start, -1);
for (int i = 0; i < n; ++i) {
if (start[s.charAt(i) - 'a'] == -1) start[s.charAt(i) - 'a'] = i;
end[s.charAt(i) - 'a'] = i;
}
List<int[]> intervals = new ArrayList<>();
for (int i = 0; i < n; ++i) {
if (i == start[s.charAt(i) - 'a']) {
int e = end[s.charAt(i) - 'a'];
for (int j = i; j <= e; ++j) {
if (start[s.charAt(j) - 'a'] < i)
e = Math.max(e, end[s.charAt(j) - 'a']);
}
intervals.add(new int[]{i, e});
}
}
intervals.sort((a, b) -> a[1] - b[1]);
List<String> result = new ArrayList<>();
int prev_end = -1;
for (int[] inter : intervals) {
if (inter[0] > prev_end) {
result.add(s.substring(inter[0], inter[1] + 1));
prev_end = inter[1];
}
}
return result;
}
public static void main(String[] args) {
MaxNonOverlappingSubstrings solver = new MaxNonOverlappingSubstrings();
String s = "adefaddaccc";
List<String> result = solver.maxNumOfSubstrings(s);
for (String str : result) {
System.out.print(str + " ");
}
}
}
Time and Space Complexity Analysis
Time Complexity: O(n + m log m), where n is the length of the string, and m is the number of characters. The time complexity arises from scanning the string, calculating intervals, and sorting the interval list.
Space Complexity: O(m), used for maintaining the start and end index arrays for each character.
Common Mistakes to Avoid
Missing Intermediate Characters: Ensure each interval fully encapsulates any intermediate characters between its start and end, which might extend the interval unexpectedly.
Incorrect Interval Merging: During interval expansion, it is crucial to correctly adjust the end of the interval to contain all influenced occurrences.
Similar Problems on LeetCode
- Meeting Rooms (#252)
- Merge Intervals (#56)
- Interval List Intersections (#986)
Additional Resources and References
- Greedy Algorithms - GeeksforGeeks
- Understanding Interval Merging - LeetCode Discuss
- Graph Theory for Beginners - Tutorialspoint
By carefully understanding and analyzing the character limits and implementing a greedy strategy, it's possible to effectively solve this problem, maximizing the number of non-overlapping substrings in an efficient manner.