**Question: **There is a huge file containing billions of integers. Find maximum 20 integers form that file.

(1st Question form Amazon 2nd telephonic)

**Answer: **Always remember that when you are asked about these types of question where you have to find max N elements or so, best DS to use id HEAP. Here also, I took some time in thinking but finally decide to use heap.

A solution for this problem may be to divide the data in some sets of 1000 elements (let’s say 1000), make a heap of them, and take 20 elements from each heap one by one. Finally heap sort all the sets of 20 elements and take top 20 among those. But the problem in this approach is where to store 20 elements from each heap. That may require a large amount of memory as we have billions of numbers.

Reuse top 20 elements from earlier heap in subsequent elements can solve this problem. I meant to take first block of 1000 elements and subsequent blocks of 980 elements each. Initially heapsort first set of 1000 numbers, took max 20 elements and mix them with 980 elements of 2^{nd} set. Again heapsort these 1000 numbers (20 from first set and 980 from 2^{nd} set), take 20 max element and mix those with 980 elements of 3^{rd} set. Repeat the same exercise till last set of 980 (or less) elements and take max 20 elements from final heap. Those 20 elements will be your answer.

**Complexity:** O(n) = n/1000*(complexity of heapsort 1000 elements)

Since complexity of heap sorting 1000 elements will be a constant so the **O(n) = N i.e. linear complexity **:)

**PS:** Meanwhile this problem, interviewer asked me to give him an algorithm to create a heap, removing and element from heap and heapify procedure. So, I again insist to have sound knowledge of any data structure you use.

**Subscribe**- To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box. You can also like us on facebook and follow me on Twitter @akashag1001.
SohilSince complexity of heap sorting 1000 elements will be a constant so the O(n) = N i.e. linear complexity :)

Above needs a correction

The heap sort is nlogn and NOT n complexity.

secondly you need not sort entire arry, just heapify it 20 times

so complexity is 20*nlogn at each step

and over all will be

m/1000*20*nlogn

where n=1000 and m = no of elem

ie over all cmpx --

m/1000 *20 *1000log 1000

SohilThere is another way to solve the prob.

Maintain a MIN heap of 20 elements and iterate through given data

each time a no is greater then min in heap replace min with that number

Worst case compl --

m*log20