Question: You are given a 2D array like below:

x o o o o x
x x x o o x
o o x o x o
o o x o o o

A blob is defined as 1 or more adjacent (8 adjacent) Xs. Find the number of blobs in the most efficient manner.

Example: Below 2D array has 2 blobs. One blob consists of cells which have red 'x' and another consists of cells which have blue 'x'.


Extension: This question is also asked as to count the number of islands in an ocean where ocean is represented as a grid. Cells, having a part of any island, are marked as 'x' otherwise 'o'.


Answer: There is a recursive solution to check 'x' cells every time you encounter 'x' in any cell. But this solution is very inefficient coz you will be checking 8 cells for every 'x' cell and will result in a number of recursive loops. Also keeping count of blobs in this case is a bit difficult.

An efficient solution will be to mark all the cells in a single blob as used and increase the count by 1 for every blob. For this we need a data structure, which we can use to find cells in a single blob. This can be achieved by using a queue as described below.

While traversing the array in row/column major order, as soon as you encounter an unused 'x' cell, push it in queue. Also increment the count by 1 and mark this cell as used. Now pop front element from queue and check for its 8 neighbors, if they contain 'x' and not used, put them in queue. Repeat this process until queue is empty.

Using above method, you will be able to mark all cells of a blob as used. Now traverse array further and repeat the same process if you encounter any unused 'x' cell.

Algorithm:
count = 0
while row < total rows
while column < total columns
if cell[row][column] is unused
put in queue
increase count by 1
while queue is not empty
tmp = front element of queue
pop from queue
push unused 'x' neighbors of tmp in queue
end while
end if
end while
end while
return count


Code:

I have used MSB of every cell to use as a flag for used and unused cell. You can always use a different method.
#define MAX 100
#define USED 0x80

struct point
{
int x,y;
};

void push_in_queue(int x, int y, queue<struct point> *q, char mat[MAX][MAX], int nrow, int ncol)
{
if(x < 0 || x >= nrow || y < 0 || y >= ncol)
return;
struct point pnt;
if (mat[x][y] == 'X' && !(USED & mat[x][y]))
{
mat[x][y] = USED | mat[x][y];
pnt.x = x;
pnt.y = y;
q->push(pnt);
}
}

int count_blob(char mat[MAX][MAX], int nrow, int ncol)
{
queue<struct point> que;
int cnt = 0;
int x, y;
point pnt, tmp;
if (nrow == 0 && ncol == 0)
return 0;

for(x = 0; x < nrow; x++)
{
for(y = 0; y < ncol; y++)
{
if (USED & mat[x][y])
continue;

if (mat[x][y] == 'X')
{
cnt += 1;
mat[x][y] = USED | mat[x][y];
pnt.x = x;
pnt.y = y;
que.push(pnt);

while( !que.empty() )
{
tmp = que.front();
que.pop();
push_in_queue(tmp.x - 1, tmp.y, &que, mat, nrow, ncol);
push_in_queue(tmp.x - 1, tmp.y - 1, &que, mat, nrow, ncol);
push_in_queue(tmp.x - 1, tmp.y + 1, &que, mat, nrow, ncol);
push_in_queue(tmp.x + 1, tmp.y, &que, mat, nrow, ncol);
push_in_queue(tmp.x + 1, tmp.y - 1, &que, mat, nrow, ncol);
push_in_queue(tmp.x + 1, tmp.y + 1, &que, mat, nrow, ncol);
push_in_queue(tmp.x, tmp.y - 1, &que, mat, nrow, ncol);
push_in_queue(tmp.x, tmp.y + 1, &que, mat, nrow, ncol);
}
}
}
}
return cnt;
}

Complexity: Since every element of the matrix is processed at the most twice, once while putting in queue and once while normal 2D array traversal, time complexity is O(n) where n is total number of cells. You may argue that we are checking a few element more than 2 times while pushing in the queue, in that case also complexity will be O(n).

Space complexity: Extra space is occupied by queue. Please notice that queue size will never be more than 'n'. You can also deduce that queue size will always be much lesser than 'n' in worst case as well. So space complexity is of order 'n' = O(n).

Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box. You can also like us on facebook and follow me on Twitter @akashag1001.