Given a non-empty array of integers, return the k most frequent elements.
For example,Given [1,1,1,2,2,3] and k = 2, return [1,2].
Note:
You may assume k is always valid, 1 ≤ k ≤ number of unique elements.Your algorithm's time complexity must be better than O(n log n), where n is the array's size.
This solution is very straight forward, it stores all the elements in a HashMap that maps elements to their frequencies, then inserts all the map entry to a max heap so as to get the ones with the highest frequency. The time complexity is O(nlogn) and space complexity is O(n).
public List<Integer> topKFrequent(int[] nums, int k) { Map<Integer, Integer> map = new HashMap<>(); for(int n : nums){ map.put(n, map.getOrDefault(n,0) + 1); } PriorityQueue<Map.Entry<Integer, Integer>> maxHeap = new PriorityQueue<>((a,b) -> (b.getValue() - a.getValue())); for(Map.Entry<Integer,Integer> entry : map.entrySet()){ maxHeap.add(entry); } List<Integer> res = new ArrayList<>(); while(res.size()<k){ Map.Entry<Integer, Integer> entry = maxHeap.poll(); res.add(entry.getKey()); } return res; }
Since sorting takes O(nlogn), can we accelerate the sorting process by using extra space? The answer is yes, bucket sort.
bucketSort(arr[], n) 1) Create n empty buckets (Or lists). 2) Do following for every array element arr[i]. .......a) Insert arr[i] into bucket[n*array[i]] 3) Sort individual buckets using insertion sort. 4) Concatenate all sorted buckets.
public List<Integer> topKFrequent(int[] nums, int k) { List<Integer>[] bucket = new List[nums.length + 1]; Map<Integer, Integer> frequencyMap = new HashMap<>(); for (int n : nums) { frequencyMap.put(n, frequencyMap.getOrDefault(n, 0) + 1); } for (int key : frequencyMap.keySet()) { int frequency = frequencyMap.get(key); if (bucket[frequency] == null) { bucket[frequency] = new ArrayList<>(); } bucket[frequency].add(key); } List<Integer> res = new ArrayList<>(); for (int pos = bucket.length - 1; pos >= 0 && res.size() < k; pos--) { if (bucket[pos] != null) { res.addAll(bucket[pos]); } } return res; }
If we replace the input array with a data stream, it becomes harder for a single machine to store everything in a single HashMap, it’s also impossible to perform sorting for such large data set. Let’s see what can we do if we can use multi machines to handle this work.
We can split the data into n shardings and move a record to a instance respectively. Instance0 only handles the data with its hash code equals hash(item) % n == 0, Instance1 only handles the data with hash(item) % n == 1, etc.
For each instance, we have a HashMap and Heap to store the items as in the first solution for the leetcode question.
In order to retrieve the global top k elements, we only need to gather the heaps from all the instances and run a merge sort, it’s very similar to the leetcode question: 23. Merge k Sorted Lists.
Even though we distribute the work load to multiple machines to reduce the data size to 1/n, it’s still not the optimal way to handle large data set in daily work.
See more details at:
https://zpjiang.me/2017/11/13/top-k-elementes-system-design/
reference: https://zpjiang.me/2017/11/13/top-k-elementes-system-design/
转载于:https://www.cnblogs.com/hygeia/p/5759512.html
相关资源:JAVA上百实例源码以及开源项目