Uses Google search fuzzy logic

Fuzzy Search: Fuzzy Set Theory and its Application to Internet Search Engines

Table of Contents

1 content

2 Introduction
2.1 Reference to the seminar: knowledge management
2.2 Narrowing the subject
2.3 Motivation

3 fuzzy
3.1 What is fuzzy?
3.2 The term "fuzziness"
3.3 Mathematical basics
3.3.1 The membership function
3.3.2 Set theory
3.3.3 Standardization
3.3.4 logical operators
3.3.5 Differentiation from probability theory

4 Search with fuzzy
4.1 Previous keyword searches
4.2 Combined search with a fuzzy term
4.3 Combined search with several fuzzy terms

5 Involvement of the user

6 Evaluation of the fuzzy search
6.1 Advantages
6.2 Disadvantages
6.3 Symbiosis

7 Conclusion

bibliography

1 content

This seminar paper deals with the topic of fuzzy search.

By limiting the topic to the search on the Internet, it is possible to set up a scenario. This scenario shows that searching with conventional search engines without fuzzy can lead to problematic search results. Building on this scenario, it is explained how fuzzy theory can help to get the problem under control. For this purpose, the mathematical conception of the fuzzy theory is first discussed, which is then transferred to the search problem of the scenario. In the following, it will be discussed what requirements are placed on the user and what advantages and disadvantages this entails. Finally, the results are assessed.

2 Introduction

2.1 Reference to the seminar: knowledge management

The seminar deals with the topic of knowledge management. It can be seen that this makes finding knowledge a central point. Knowledge is usually stored somewhere in a certain form. The aim should therefore be to find the knowledge in its specific form and to gain access to it.

2.2 Narrowing the subject

The focus of the work is on the search for sources, especially homepages on the Internet. Of course, the following difficulties and the proposed extensions to the existing search engines can also be applied to searches in databases or other data collections. The special problem of Internet search provides a clear example, since a large proportion of readers - if not all of them - have already encountered difficulties when searching with Google & Co.

2.3 Motivation

Let's imagine the following scenario based on [Cho03]:

We are located in the USA in San Francisco. We want to visit a famous national park in one day, but we don't know which one. So what could be more obvious than using the Internet to find out about the location, opening times, etc.

to inquire. So we enter the following into the search field at www.google.com: popular national parks in the USA and get approx. 174,000 results. So the search did not provide us with what we actually wanted, because it would take far too long to go through all the suggested websites for the information we were looking for. For the sake of completeness, it should be noted that this is not a Google-specific search result. Another example is the Yahoo search engine, which finds 176,000 pages for the input you are looking for.

The problem with the search services offered on the Internet such as Google, Yahoo, Altavista and others is that they are based on keywords. This means that they search for keywords contained on the respective pages and then create a ranking of the pages found according to the number of hits. So the question that arises is: How can we design a search engine to give us the results we want?

3 fuzzy

3.1 What is fuzzy?

We first consider a very general definition from [Bro99]:

”Fuzzy logic is, in the broader sense, an extension of classical logic and set theory, [...] so that the representation and processing of imprecise information (such as: [...], quite hot, slow down hard) is possible."

Here we can already see the relation to the search problem from the scenario described above. We want to visit a popular national park. But what is a popular national park? The term popular is not exact. It is fuzzy, i.e. not precisely delimited or imprecise in the sense of the above definition. In order to be able to clarify what a popular national park is, we have to deal with the term blurring on the one hand and the mathematical basis of the fuzzy theory on the other.

3.2 The term

"Blurring"

People often use fuzzy terms, especially when the situation is unsafe. According to [Zim93], there are three reasons for this uncertainty:

1. The initial situation is too complex to be fully understood by humans.
2. Perhaps the person in a situation does not even know exactly what he wants.
3. There can be things that humans cannot even know because of future events.

If we look at our scenario again here, we see that point 2 is the one which reflects the reason for the blurring in the term popular (or popular).

However, the fuzzy terms do not prevent people from making decisions. Let us use an example from [Zim93] for clarification. Let's imagine helping someone to park. Our commands would be something like that

like: “go a little further back” or “take a closer look”. The driver will know how to interpret the instructions and then make decisions about how to maneuver his car.

A conventional computer cannot do anything with such commands. He needs exact information; to stay with the example something like mm back "or" 13.5 "the steering wheel 12 degrees 14 minutes to the right".

Because of the reasons for uncertainty mentioned above, our terminology is full of fuzzy concepts. These should be formally recorded by the fuzzy sets and made available for processing by computers [Dud [93]].

3.3 Mathematical basics

The mathematical foundations go back to the scientist L. Zadeh.

L. Zadeh developed the fuzzy set theory (theory of fuzzy sets) in 1965. The theory represents a generalization of classical set theory as well as dual logic [Zim99].

The central role of theory lies in the idea and conception of the membership function.

3.3.1 The membership function

The membership function assigns values ​​of the set X a degree of membership to the concept F by inserting X into the real interval [0; 1] [Figure not included in this excerpt] is shown [Dud93].

Figure not included in this excerpt

Here, X represents a quantity that we consider to be an indicator of the fuzzy concept. Each element of this set is assigned a value between 0 and 1 using the function µ. The value 1 should stand for full membership of the concept F and 0 for no membership at all.

If we transfer this theory to our scenario, we can choose a size as an indicator for the fuzzy term popular, such as visitors in millions per year. The concept F is popularity. Let us assume that from a visitor number of 5 million a national park belongs to the popular ones (affiliation: [Figure not included in this reading excerpt], up to a visitor number of 1 million it is one of the unpopular ones (affiliation: [figure in this reading sample) Reading sample not included]. If the number of visitors is between 1 and 5 million, a corresponding affiliation value is assigned. For example, a national park with 4 million visitors per year has an affiliation of 0.8 to the popular national parks. Let's take a look at the drawing the function in Figure 1.

Figure 1: Membership function according to the fuzzy logic

For comparison, the classic dual logic is used here:

Here we would set a limit, for example 3 million visitors per year. National parks with more visitors would be considered popular and those with fewer visitors would be considered unpopular. It is immediately noticeable how limited the perspective of Boolean logic is here. Because a national park with 2,999,999 visitors is not considered popular and a park with 2 more visitors per year is then a popular national park. We can illustrate this with Figure 2.

The difference that becomes clear here is that by means of fuzzy theory we also allow values ​​between 0 and 1 (mapping into the real interval), thus more states than just true or false. So there are also expressions such as not very popular or very popular.

3.3.2 Set theory

We can understand the fuzzy theory as an extension of the general logic as well as an extension of the set theory. In classical set theory, an x ​​∈ X can be an element of a set F or not (F). The fuzzy theory also allows other membership values ​​to a set than just 0 or 1 - symbolized in the following by

Figure not included in this excerpt

Figure 2: Membership function according to the Boolean logic

The conception of these diagrams is taken from [Tra94]. The diagrams can be found in Figures 3 and 4.

Figure not included in this excerpt

Figure 3: Venn diagram for fuzzy sets

We can of course also describe this by the membership functions, because the following applies:

Figure not included in this excerpt

[...]

End of the reading sample from 17 pages