AI : Search Algorithms – part 1 (BFS, DFS and UFS)
In this blog I am going to share my findings about AI search Algorithms. This will be a big summary about the different kind of search algorithms that are out there and what their respective advantages and disadvantages are.
As a resource, I have read a blog post of Brandon Skerritt (https://skerritt.blog/a-primer-on-search-algorithms/) and a book called Artificial Intelligence : A Modern Approach by Stuart Russell and Peter Norvig. I have also visited multiple website, but there are simply too many in order to list them all.
So to get started, what are search algorithms and when would you use them? By determining what a problem and a solution are, the idea of a search algorithm becomes clearer.
Whats a “problem”?
So we need to determine what the exact definition of a “problem” is. A problem can be defined by the following five components:
-
- The initial state, so where are you located at the beginning.
- A description of all possible actions that can be taken, so given a particular state what actions can be executed.
- A description of actions’ results, so after executing a certain action what happens and what is the current state? The term successor is commonly used to describe the state before executing the action.
- A goal test, this test is used to determine if a certain state is the “goal” state. It is possible for the goal to be specified by an abstract property instead of an explicitly set of states.
- A path cost function, this function assigns a cost to every action that can be taken.
When considering all 5 components that define a problem and when the components are part of a data structure definition, the information can be passed as a input to a problem solving algorithm in order to get solved.
How to solve the problem?
Considering the components initial state, the possible action, and results from the problem definition, the components together form the set of all states reachable from the initial state (root of the graph/search tree). The states are represented by the different nodes in the search tree and the links between the nodes are the different actions that can be taken. By expanding the current state and applying a legal action to the current state, generates a new set of states. Each nodes has its parent and child nodes, unless the node is either the root or a leaf.
First, it is important to understand what a parent and a child node are. So let’s take a random node, which is connected to nodes above it and to nodes below it. The nodes that are a level higher are called the parent nodes, by executing an action in the parent node, it is possible to arrive at the current node. The same applies to child nodes, as by executing an action it is possible to arrive at one of the child nodes of the current node. A leaf node is nothing more than a node without any child nodes. The same applies to the root node, but instead of not having any child nodes, the root node does not have any parent nodes.
Of course, it is possible that nodes are connected and create a loop. This way a state can be repeated as executed actions can return the current state back to a state that was previously visited. This path is called an loopy path.
Example
So to give a visual example of what is meant with starting a initial state and moving towards a goal state by expanding to child nodes, consider the following directed graph.
First of all, the initial state needs to be determined. Looking at the directed graph it should be obvious that the node with number 8 in it is the root of the graph. This is because it does not have any directed links to it and this means that it does not have any parent nodes. Let’s call a set of states S. S now contains 8 and can be written as S = { 8 }
Second of all, determine what the child nodes are. In this case the child nodes of 8 are 4 and 12 as a directed link goes from 8 to both 4 and 12. This is actually the expansion of the initial state. From this point, an action can be executed, which result in the expansion of one of the child nodes. The set of states in this case could be S = { 8, 4}
There are different ways of “walking” through a graph. This “walking” through a graph is exactly what a search algorithm and will be discussed in the next paragraph.
What’s Search Algorithm?
A search algorithm is, like previously stated, nothing more than moving through a graph. However, there are different kind of strategies that can be applied in the search of a goal state. There are two kind of search algorithms, the uniformed and the informed search algorithms. In this blogpost only Bread first search, depth first search, and uniform cost search will be discussed.
To make a distinction between search algorithms and to determine the advantages and disadvantages, the performance of the algorithms need to be determined. This can be done by checking the following points:
One of the important things to check is the Completeness of the search. Meaning, is there a guarantee that the algorithm will find the solution? Another question that needs to be asked about find the solution is, is it the Optimal solution? Having a solution does not always quiet do it. It needs to be right one! A factor that needs to be checked during the search of solution is the duration of the search. So what is the Time Complexity of the search algorithm? Because the algorithm will keep track of the nodes that it has explored (explored set), and the nodes it is still needs to explore (frontier) at a certain state, a certain amount of memory is confiscated. This measurement is called the Space complexity.
Uninformed Search Algorithms
The uninformed search algorithms, also called the blind search algorithms, only have limited information about states beyond what is provided in the problem definition. All the different search strategies can be distinguished by the order in which the nodes are expanded.
Breadth first search:
With the search strategy Bread First, abbreviation is BFS, the root node is expanded first. After that the child nodes of the root are expanded, then the child nodes of the child nodes are expanded and so on. In short, all the nodes at a certain depth (the letter d will be used for the depth level) of the search tree are expanded before any of child nodes are expanded. See the BFS representation below :
By using a FIFO queue, the BFS search can be realized. The new nodes are placed at the end of the queue so the older ones, which are at the beginning of the queue, will be expanded first.
Advantages:
Considering that the branching factor b (number of child nodes at each node) in finite and the goal node is a finite depth d, then it can be concluded that BFS is complete. It will definitely find the goal node with expanding from level to level. Depending on the path cost, BFS is optimal if all the actions have the same costs.
Disadvantage:
However, the time and space complexity are not that good considering BFS. Imagine a search tree where every state has b successors. The root of the tree generates b nodes at the first level, each of these generates b more nodes on the second level and so on. When writing this as a math formula, it results in:
Imagine if the goal node is located at the deepest depth, all the nodes of the levels above are expanded before the goal node and this means the duration will be long. Also, all the nodes at the different levels are added to the explored set which will become bigger and bigger. This means that BFS may need a lot of memory.
Depth first search:
With the search strategy Bread First, abbreviation is BFS, the root node is expanded first.
With the search strategy Depth First, abbreviation is DFS, the deepest node in the current frontier of the search tree is expanded. The search proceeds directly to the deepest level of the search tree until a leaf node (no child nodes) is reached. The deepest node is then dropped from the frontier, and the search “backs up” and expands to the other deepest node until it reaches another leaf or the goal node.
Where BFS is using a FIFO queue, DFS uses a LIFO queue. A LIFO queue means that the most recent node is chose for expansion.
Advantages:
The space complexity is the only advantage of the depth first search. Once the search has expanded on to a branch and has not found the goal node in that branch, the path and nodes that were explored and expanded can be deleted. Which can save a lot of memory for a big search tree.
Disadvantage:
The depth first search is not complete considering that a graph can have a loop. This means that the search can get stuck in the loop and it will never find the goal node. Depth first can however be modified that it does not check on states that it already has checked on. This will avoid the infinite loops in finite spaces but does not avoid the generation of redundant paths. This can be seen in the example above, if node 5 is the goal node, DFS will still explore the entire left subbranch before finding the goal node. Considering these arguments, it can be concluded that DFS is neither is optimal nor does it have a good time complexity.
Uniform Cost Search:
If all the cost for the different actions are the same, BFS is the most optimal search algorithm as it always expands to the child nodes at one level of depth. The BFS search algorithm can, however, be extended with a step cost function. So instead of expanding to the closest (child) node, the Uniform Cost Search (UFS) expands to the node n with the lowest path cost g(n). This achieved by ordering the frontier by g.
There are, however, two significant differences compared to BFS. At first, the goal test is applied to the node that is selected for expansion instead. Secondly, a test is added in case a better path is found for a node that is still on the frontier. The test calculates the total cost amount and compares the paths to see which one is better. When a better path is found, the old path is discarded. It is of no interest of many steps a path has, the search algorithm is only interested in the total cost of a certain path.
Advantages:
Because UFS discards the lesser path for a better one, it is easy to conclude that UFS finds the most optimal solution. If there is a path with an infinite sequence of zero cost actions, the search algorithm will get stuck into a infinite loop. So completeness is only guaranteed if the cost of an action is greater than a small positive constant.
Disadvantage:
UFS is directed by the cost of a path and not by depth, so the complexity is not easily to be characterized in terms of b and d. Alternatively, the optimal solution is characterized as C* and it assumed that every action costs at least e. The algorithm’s worst case time and space complexity is:
which is definitely a greater value than :
The time and space complexity formula gives a much greater value, because UFS can explore a big tree with smaller steps before exploring any of the bigger useful steps. If all costs are the same,
and this means that UFS is similar to BFS. Only difference is that BFS stops as soon as it finds the goal state, whereas UFS examines all the nodes around the goal node to see if there is a lower cost. There can be concluded that UFS can do some unnecessary work.
There are many more search algorithms, but I will discuss them in part 2 of the AI search algorithms post.