17

Jul

2014

### Finding the Nth ordered element in a large array

A common task when working with large arrays is to find the Nth array value in the ordered array. This can be useful for finding the Nth smallest or largest pixel value, as well as for statistical analysis of floating point data samples, (i.e. find the 95% percentile or similar). The shortest IDL code for finding the Nth value in the ordered sequence is only 2 lines of actual code. Here is a short function that accomplishes this:

;+

; Returns the Nth number in theordered sequence

;-

**function** **ordinal_1**, array, N

**compile_opt** idl2,logical_predicate

s = **sort**(array)

**return**, array[s[N]]

**end**

However, because sort is an expensive computation, it runs fairly slow, especially, when the array gets larger. I did the following time test.

IDL> a = **total**(**read_image**(**filepath**('ohare.jpg',subdir=['examples','data'])),**1**)

IDL> **tic** & x = **ordinal_1**(a, **123456**) & **toc** & **print**, x

% Time elapsed: 3.7200000 seconds.

150.000

In my case it took 3.72 seconds to find the 123456th smallest array element. The MEDIAN function in IDL, returns the central element in the ordered sequence without doing a full sorting. It is much faster than sorting, because it doesn't have to keep track of all elements and their ordered positions. It only cares about element N/2 in the ordered array. In the following example, repeated calls to MEDIAN and reducing the array size in half every iteration, is used to find the Nth element in the ordered sequence. The code is much longer than the code above, but it does end up running faster:

;+

; Returns the Nth number in theordered sequence.

;

; Uses repeated median.

;-

**function** **ordinal_2**, array, N

**compile_opt** idl2,logical_predicate

na = **n_elements**(array)

type =**size**(array, /type)

target_index = N

tmp = **arg_present**(array) ? array : **temporary**(array)

ntmp = na

**while** ntmp **ne** target_index **do** **begin**

ntmp = **n_elements**(tmp)

val = **fix**(**median**(tmp), type=type)

**if** target_index **gt** ntmp/**2** **then** **begin**

tmp = tmp[**where**(tmp **gt** val, count)]

target_index -= ntmp-count

**endif** **else** **if** target_index **lt** ntmp+**1**/**2** **then** **begin**

tmp = tmp[**where**(tmp **lt** val, count)]

**endif** **else** **break**

**if** target_index **lt** **0** **then** **break**

**if** target_index **ge** count **then** **break**

**if** target_index **eq** **0** **then** **begin**

val = **min**(tmp)

**break**

**endif**

**if** target_index **eq** count-**1** **then** **begin**

val = **max**(tmp)

**break**

**endif**

**endwhile**

**return**, val

**end**

This is the same time test as with the short code:

IDL> **tic** & x = **ordinal_2**(a, **123456**) & **toc** & **print**, x

% Time elapsed: 0.57999992 seconds.

150.000

As can be seen here, the time saving is significant, it goes from 3.72 to 0.58 seconds, and as the array grows larger, the savings can get more significant. This function works for numeric data types such as floating point and integer arrays.

Categories: IDL Data Point