*Top K Frequent Elements

mac2022-06-30 91

Given a non-empty array of integers, return the k most frequent elements.

For example,Given [1,1,1,2,2,3] and k = 2, return [1,2].

Note:

You may assume k is always valid, 1 ≤ k ≤ number of unique elements.Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

Solution 1: HashMap + Heap

This solution is very straight forward, it stores all the elements in a HashMap that maps elements to their frequencies, then inserts all the map entry to a max heap so as to get the ones with the highest frequency. The time complexity is O(nlogn) and space complexity is O(n).

public List<Integer> topKFrequent(int[] nums, int k) { Map<Integer, Integer> map = new HashMap<>(); for(int n : nums){ map.put(n, map.getOrDefault(n,0) + 1); } PriorityQueue<Map.Entry<Integer, Integer>> maxHeap = new PriorityQueue<>((a,b) -> (b.getValue() - a.getValue())); for(Map.Entry<Integer,Integer> entry : map.entrySet()){ maxHeap.add(entry); } List<Integer> res = new ArrayList<>(); while(res.size()<k){ Map.Entry<Integer, Integer> entry = maxHeap.poll(); res.add(entry.getKey()); } return res; }

Solution 2: Bucket Sort

Since sorting takes O(nlogn), can we accelerate the sorting process by using extra space? The answer is yes, bucket sort.

bucketSort(arr[], n) 1) Create n empty buckets (Or lists). 2) Do following for every array element arr[i]. .......a) Insert arr[i] into bucket[n*array[i]] 3) Sort individual buckets using insertion sort. 4) Concatenate all sorted buckets.

public List<Integer> topKFrequent(int[] nums, int k) { List<Integer>[] bucket = new List[nums.length + 1]; Map<Integer, Integer> frequencyMap = new HashMap<>(); for (int n : nums) { frequencyMap.put(n, frequencyMap.getOrDefault(n, 0) + 1); } for (int key : frequencyMap.keySet()) { int frequency = frequencyMap.get(key); if (bucket[frequency] == null) { bucket[frequency] = new ArrayList<>(); } bucket[frequency].add(key); } List<Integer> res = new ArrayList<>(); for (int pos = bucket.length - 1; pos >= 0 && res.size() < k; pos--) { if (bucket[pos] != null) { res.addAll(bucket[pos]); } } return res; }

Top K Frequent Elements in a Data Stream (System Design)

If we replace the input array with a data stream, it becomes harder for a single machine to store everything in a single HashMap, it’s also impossible to perform sorting for such large data set. Let’s see what can we do if we can use multi machines to handle this work.

Solution 1: Multi HashMap + heap

We can split the data into n shardings and move a record to a instance respectively. Instance0 only handles the data with its hash code equals hash(item) % n == 0, Instance1 only handles the data with hash(item) % n == 1, etc.

For each instance, we have a HashMap and Heap to store the items as in the first solution for the leetcode question.

In order to retrieve the global top k elements, we only need to gather the heaps from all the instances and run a merge sort, it’s very similar to the leetcode question: 23. Merge k Sorted Lists.

Even though we distribute the work load to multiple machines to reduce the data size to 1/n, it’s still not the optimal way to handle large data set in daily work.

See more details at:

https://zpjiang.me/2017/11/13/top-k-elementes-system-design/

reference: https://zpjiang.me/2017/11/13/top-k-elementes-system-design/

转载于:https://www.cnblogs.com/hygeia/p/5759512.html

相关资源：JAVA上百实例源码以及开源项目

最新回复(0)