Sunday, 16 July 2017

Data Structures and Algorithms | Complexity Analysis | Linear and Binary Search







When we write a code, many a times it happens that the functionality is met, but still the way the code runs with, doesn’t please the perfectionist developer inside us.

Those buddies who are well experienced with difficult stakeholders, clients or even interviewers, may know the pain, when someone says “Your code works fine, but still it doesn’t look efficient… “.

To avoid such encounters, it becomes crucial to emphasize on the complexity analysis of the algorithm, at the very early stages of the career. But as we always say, it's never too late to start a good practice.

So this time, we will use this post to go through OR revisit the concepts of ‘Complexity Analysis’.

And it is promised, that the following posts will try to dig a deeper insight in the ‘Data Structures And Algorithm’ arena.




How complexity is analysed?

There are N number of ways to do so, but the heroes that beat us often while writing the code are -
  1. Space Complexity
  2. Time Complexity

Disclaimer: The more you go inside these topics the more you will deviate towards the Maths you saw during Engineering :P

But we will stick to the useful take-aways that will help us in day-to-day coding and facing challenging problems.




Complexity here considers the memory the code is consuming and the time it is taking to process.




Space Complexity

With respect to JAVA, we can figure out how the memory management will be taking place while writing. What to consider here is -
1. Fixed consumption
The variables and constants defined during coding, being consumed for sure in the code path.
2. Variable consumption
The variables used for conditional assignments and calculations.




I would like to reflect the above with a small example here -

Consider the following snippet -

int a = 2;   // fixed 4 bytes
int b = 3; // fixed 4 bytes

if (a>b) {
return a;      // conditional 0 byte
} else {
int c = a*2 + c*3;
   // conditional
// allocates additional memory to store c
// and do the calculations
return c;
}


Cool Tip:
For more info, we can spend some time going through computer science books and learn how internal memory allocation works.





Time Complexity

This champ will be under the limelight if you are aiming at cracking IT giants, or want to mark your presence in various coding competitions.

It undertakes the compile and the process time, hence making us sure that the code works within the expected time limits.

So here the fixed factor is the ‘Compile Time’, and the variable one is ‘Process Time’.

I would like to share the 3 ways we can follow, which I read in a very good article long back, consolidated briefly, but unfortunately forgot the source.




1. Note the time, before the operation starts and after it ends. Get the difference and you get an idea of the time taken.

2. Do a dry run. I.e. grab a pen and paper, and execute the code step by step. This way you will consider all the loops and and iterations and get an idea on how effective your code is, is there any way to make it run better, are there any vague iterations, and so on.
A very helpful outcome of the point#2 strategy is that it makes you answerable well if you are asked by someone for N number of iterations.

For example,
When you are iterating N X N matrix, clearly you are involving 2 for loops.
Outer loop iteration 1 -> inner full iterations - N
Outer loop iteration 2 -> inner full iterations - N
Outer loop iteration 3 -> inner full iterations - N
.
. and so on...
.
Outer loop iteration N -> inner full iterations - N

Hence it takes N^2 steps.

But imagine if the number of iterations increase to millions or billions, you will be clever enough to not to start dry running with this much number. This is where point#3 comes to rescue.

3. Asymptotic Analysis.

Where your maths and calculus skills will be at stake. (Kidding, just do a smart work here and learn what exactly it tries to say!)
This analysis aims at giving an estimate about the code’s behavior, when number of iterations exceed manual control, and it comes in 3 flavors -

A. Big O Notation
Worst Case - when you get the result in the last step.
Upper Bound -
denoted as

                    f(n)=O(g(n))f(n)=O(g(n))

if and only if there exists constants k and n such that,

                    |f(x)|≤k|g(x)|∀x>n

B. Big Theta Notation
Average Case - what it generally takes on an average to get the result.
denoted as
                   
                    f(n)=θ(g(n))f(n)=θ(g(n))

if and only if there exists constants j,k and n such that,                   

                    j|g(x)|≤|f(x)|≤k|g(x)|∀x>n

C. Big Omega Notation
Best Case - when you get the result in first step.
denoted as

                    f(n)=Ω(g(n))f(n)=Ω(g(n))

if and only if there exists constants k and n such that,

                    |f(x)|≥k|g(x)|∀x>n
Thanks to this answer on Quora for sharing formulas, and making my life easier. LOL.




If you have time, check this out for how these actually are understood with the help of graphs.

So you can take away the following for now, if you are not having enough time to explore -

1. Use Big O in interviews and technical discussions, because no one will give you the best case scenario. Always prepare for the worst case scenario and hope for the best ;)
2. Big Theta is helpful to generalize an algo, using the average cases. Hence do some head banging on it whenever time allows you.
3. Big Omega is a dreamy concept, hence you can leave it for the last.




Moving on, as we have a custom here on TCB, we believe in demo and practicals more than the theory, so here we go.


Kicking off with the first step in Data Structures - Searching an element.


So broadly speaking we have 2 types of searches -

Unordered and Ordered.


Unordered -
That means no order is being followed in the elements’ placement, and for the very worst, you can assume that the element to be searched is at the end, and you are scanning from the beginning.
So If there are N elements, it requires N iterations and the operation would be of order O(N).


Ordered -
That means there is a logical order while placing the elements, either natural or alphabetical. 2 generally used ordered searches - Linear and Binary.

For Linear search, efficiency depends on the average number of steps taken i.e. if there are N elements, operation would be of O(N/2).

For Binary search also, the efficiency depends on the average number of steps taken, but since we are always having half of the steps remaining with each and every iteration, it introduces the overall function’s behavior as that of mathematical log function.
Hence it reflects as O(log2(N)).


To see how the search algorithms work feel free to drop by at some previously created blogs, just for you -







Now, let’s take a deep dive on the analysis of time taken.

Say time taken is directly dependent on the order of processing, hence in terms of equation it can be shown as:
T = K * log2(N).

To make it more clear on how steps can be realized, have a look at this example:

For 10 elements →  
The number of steps taken would be log2(10) ~= 3.322 ~= 4
Shortcut Tip:
Simply find the nearest power of 2 i.e. 2^3 = 8, but it’s lesser than 10, and we want to cover entire 10 elements. Hence 2^4 = 16 > 10, so 4 steps to cover 10 elements.

Similarly, for 100 elements → number of steps = log2(100) = 2*3.322 = 6.644 ~= 7 steps
Shortcut Tip:
Simply 2^6 = 64 < 100, hence 2^7 = 128 > 100, hence 7 steps to cover 100 elements... and so on...

Practice Tip:
Do some mathematical analysis on finding the number of steps taken. Once done with this, it won’t look much difficult.

Before leaving let’s have a quick look at some take aways -

Big O Notation - a rough measure of complexity
T:   time taken
N:  number of elements


1. Insert an element into an unordered array -

T = K (constant), since it is not dependent N. Complexity ~ O(1)

2. Performing a Linear Search

T = K * N/2. Complexity ~ O(N)

3. Performing Binary Search

T = K * log2(N). Complexity ~ O(log2(N))

4. For ordered array insert, delete, and unordered delete

Complexity ~ O(n)

And the rest is up to you. :)

You can dig as much as you want and do share your thoughts and feedback.
Those are always welcome and looked upon for making this place a better one for the coders.




Next series of blogs will be covering few more Data Structures’ concepts.




Cheers!!

Happy Coding.






















No comments:

Post a Comment

Featured post

JAVA based project, that can be used to hit DB using JDBC, from WSO2 ESB

Hi Buddies, Here is a small project that will enable you to hit MySQL DB using WSO2 ESB - https://github.com/namitsharma99/customM...