Computability Logic
(CoL)


Section 8

CoL-based knowledgebase and resourcebase systems

  

The reason for the failure of  x(p(x) p(x)) as a computability-theoretic principle is that the problem represented by this formula may have no effective solution --- that is, the predicate p(x) may be undecidable. The reason why this principle would fail in the context of knowledgebase systems, however, is much simpler. A knowledgebase system may fail to solve the problem x(Married(x) Married(x)) not because the latter has no effective solution (of course it has one), but because the system simply lacks sufficient knowledge to tell any given person’s marital status.  On the other hand, any system would be able to “solve” the problem x(Married(x) Married(x)) as this is an automatically won elementary game so that there is nothing to solve at all. Similarly, while yxFather(x,y) is an automatically solved elementary problem expressing the trivial knowledge that every person has a father, ability to solve the problem yxFather(x,y) implies the nontrivial knowledge of everyone’s actual father. Obviously the knowledge expressed by AB or xA(x) is generally stronger than the knowledge expressed by AB or xA(x), yet the formalism of classical logic fails to capture this difference --- the difference whose relevance hardly requires any explanation. The traditional approaches to knowledgebase systems try to mend this gap by augmenting the language of classical logic with special epistemic constructs, such as the modal “know that” operator , after which probably  AB would be suggested as a translation for AB and yxA(x,y) for yxA(x,y). Leaving it for the philosophers to argue whether, say, yxA(x,y)  really expresses the constructive meaning of yxA(x,y), and forgetting that epistemic constructs often yield unnecessary and unpleasant  complications such as messiness and non-semidecidability of the resulting logics, some of the major issues still do not seem to be taken care of. Most of the actual knowledgebase and information systems are interactive, and what we really need is a logic of interaction rather than just a logic of knowledge. Furthermore, a knowledgebase logic needs to be resource-conscious. The informational resource expressed by  x(Pregnant(x) Pregnant(x))  is not as strong as the one expressed by x(Pregnant(x) Pregnant(x))  x(Pregnant(x) Pregnant(x)): the former implies the resource provider’s commitment to tell only one (even though an arbitrary one) woman’s pregnancy status,  while the latter is about doing the same for any two Women.  The former is the informational resource provided by a single disposable pregnancy test device, while the latter is the resource provided by two such devices. Neither classical logic nor its standard epistemic extensions have the ability to account for such differences. But CoL promises to be adequate. It is a logic of interaction, it is resource-conscious, and it does capture the relevant differences between truth and actual ability to find/compute/know truth. 

When CoL is used as a logic of knowledgebases, its formulas represent interactive queries. A formula whose main operator is  or  can be understood as a question asked by the user, and a formula whose main operator is  or  as a question asked by the system. Consider the problem xyHas(x,y), where Has(x,y) means “patient x has disease  y” (with Healthy counting as one of the possible “diseases”). This formula is the following question asked by the system: “Who do you want me to diagnose?”  The user's response can be “Laura”. This move brings the game down to yHas(Laura,y). This is now a question asked by the user: “What does Laura have?”. The system’s response can be “flu”, taking us to the terminal position Has(Laura,Flu). The system has been successful iff Laura really has the flu.

Successfully solving the above problem xyHas(x,y)  requires having all relevant medical information for each possible patient, which in a real diagnostic system would hardly be the case. Most likely, such a system, after receiving a request to diagnose x, would make counterqueries regarding x’s symptoms, blood pressure, test results, age, gender, etc., so that the query that the system will be solving would have a higher degree of interactivity than the two-step query xyHas(x,y) does, with questions and counterquestions interspersed in some complex fashion. Here is when other CoL operations come into play. Negation turns queries into counterqueries; parallel operations generate combined queries; recurrence operations allow repeated queries; implications and rimplications act as various sorts of query reduction operations; etc. Here we are expanding our example. Let Sympt(x,s) mean “patient x has (set of) symptoms s”, and Pos(x,t) mean “patient x tests positive for test t”. Imagine a diagnostic system which can diagnose any particular patient x, but needs some additional information. Specifically, it needs to know x’s symptoms; plus, the system may require to have x taken a test t which it selects dynamically in the course of a dialogue with the user depending on what responses it received. The interactive task/query that such a system is performing/solving can then be expressed by the following formula, which we shall call Diagn for further references: 

Diagn    x(sSympt(x,s)  t(Pos(x,t) Pos(x,t))   yHas(x,y)).

A possible scenario of playing the above game is the following. At the beginning, the system waits until the user specifies a patient x to be diagnosed. We can think of this stage as systems’s requesting the user to select a particular (value of) x, remembering that the presence of  x automatically implies such a request. After a patient x --- say x=X --- is selected, the system requests to specify X’s symptoms. Notice that our game rules make the system successful if the user fails to provide this information, i.e. specify a (the true) value for s in sSympt(X,s). Once a response --- say, s=S --- is received, the system selects a test t=T and asks the user to perform it on X, i.e. to choose the true disjunct of   Pos(X,T) Pos(X,T). Finally, provided that the user gave correct answers to all counterqueries (and if not, the user has lost), the system makes a diagnostic decision, i.e. specifies a value Y for y in yHas(X,y) for which Has(X,Y) is true.

The presence of a single “copy” of t(Pos(x,t) Pos(x,t)) in the antecedent of Diagn means that the system may request testing a given patient only once. If up to n tests were potentially needed instead, this would be expressed by taking the -conjunction of n identical conjuncts t(Pos(x,t) Pos(x,t)). And if the system potentially needed an unbounded number of tests, then we would write t(Pos(x,t) Pos(x,t)), thus further weakening Diagnoser: a system that performs this weakened task is not as good as the one performing the original Diagn, as it requires stronger external (user-provided) informational resources. Replacing the main quantifier x by x, on the other hand, would strengthen Diagn, signifying the system’s ability to diagnose a patent purely on the basis of his/her symptoms and test result without knowing who the patient really is. However, if in its diagnostic decisions the system uses some additional information on patients such their medical histories stored in its knowledgebase and hence needs to know the patient’s identity, x cannot be upgraded to x. Replacing x by x would be a yet another way to strengthen Diagn, signifying the system’s ability to diagnose all patients rather than any particular one; obviously effects of at least the same strength would be achieved by just prefixing Diagn with  or .

As we’ve just mentioned system’s knowledgebase, let us make clear what it means. Formally, this is a finite -conjunction KB of formulas, which can also be thought of as the (multi)set of its conjuncts. We call the elements of this set the internal informational resources of the system. Intuitively, KB represents all of the nonlogical knowledge available to the system, so that (with a fixed built-in logic in mind) the strength of the former determines the query-solving power of the latter. Conceptually, however, we do not think of KB as a part of the system properly. The latter is just “pure”, logic-based problem-solving software of universal utility that initially comes to the user without any nonlogical knowledge whatsoever. Indeed, built-in nonlogical knowledge would make it no longer universally applicable: Laura can have the flu in the world of one potential user while be healthy in the world of another user, and xy(xy = yx) can be false to a user who understands  as Cartesian rather than number-theoretic product. It is the user who selects and maintains KB for the system, putting into it all informational resources that (s)he believes are relevant, correct and maintainable. Think of the formalism of CoL as a highly declarative programming language, and the process of creating KB as programming in it.  

The knowledgebase KB of the system may include atomic elementary formulas expressing factual knowledge, such as Married(Laura), or non-atomic elementary formulas expressing general knowledge, such as x(yFather(x,y) M ale(x)) and xy(x(y+1) = (xy)+x); it can also include nonclassical formulas such as x(Female(x) Male(x)), expressing potential knowledge of everyone's gender, or xy(x2=y), expressing ability to repeatedly compute the square function, or something more complex and more interactive, such as the earlier seen formula Diagn. With each resource R?#381;KB is associated (if not physically, at least conceptually) its provider --- an agent that solves the query R for the system, i.e. plays the game R against the system. Physically the provider could be a computer program allocated to the system, or a network server having the system as a client, or another knowledgebase system to which the system has querying access, or even human personnel servicing the system. E.g., the provider for as xyBloodpressure(x,y) would probably be a team of nurses repeatedly performing the task of measuring the blood pressure of a patient specified by the system and reporting the outcome back to the system. Again, we do not think of providers as a part of the system itself. The latter only sees what resources are available to it, without knowing or caring about how the corresponding providers do their job; furthermore, the system does not even care whether the providers really do their job right. The system's responsibility is only to correctly solve queries for the user as long as none of the providers fails to do their job.  Indeed, if the system misdiagnoses a patient because a nurse-provider gave it wrong information about that patient’s blood pressure, the hospital (ultimate user) is unlikely to  fire the system and demand refund from its vendor; more likely, it would fire the nurse. Of course, when R is elementary, the provider has nothing to do, and its successfully playing R against the system simply means that R is true. Note that in the picture that we have just presented, the system plays each game RKB in the role of , so that, from the system’s perspective, the game that it plays against the provider of R is R rather than R.

The most typical internal informational resources, such as factual knowledge or queries solved by computer programs, can be reused an arbitrary number of times and with unlimited branching capabilities, i.e. in the strong sense captured by , and thus they would be prefixed with a  as we did with x(Female(x) Male(x)). There was no point in -prefixing Married(Laura), x(yFather(x,y) Male(x)) or xy(x(y+1) = (xy)+x)  because every elementary game A is equivalent to A and hence remains “recyclable” even without recurrence operators. As noted in Section 3.8 for the corresponding rimplications, there is no difference between  and  as long as “simple” resources are concerned such as such xy(x2=y). However, in some cases --- say, when a resource with a high degree of interactivity is supported by an unlimited number of independent providers each of which however allows to run only one single “session” --- the weaker operator  will have to be used instead of . Yet, some of the internal informational resources could be essentially non-reusable. A provider possessing a single piece of litmus paper would be able to support the resource  x(Acid(x) Acid(x)) but not x(Acid(x) Acid(x)) and not even x(Acid(x) Acid(x))  x(Acid(x) Acid(x)). Most users, however, would try to refrain from including this sort of a resource into KB, and rather make it a part (antecedent) of possible queries. Indeed, knowledgebases with non-recyclable resources would tend to weaken from query to query and require more careful maintainance and updates. Whether recyclable or not, all of the resources of KB can be used independently and in parallel. This is exactly what allows us to identify KB with the -conjunction of its elements.

Assume KB = R1 ... Rn, and let us now try to visualize a system solving a query F for the user. The designer would probably select an interface where the user only sees the moves made by the system in F, and hence gets the illusion that the system is just playing F. But in fact the game that the system is really playing is KBF, i.e.  R1...RnF. Indeed, the system is not only interacting with the user in F, but --- in parallel --- also with its resource providers against whom, as we already know, it plays R1,...,Rn. As long as those providers do not fail to do their job, the system loses each of the games R1,...,Rn.  Then our semantics for  implies that the system wins its play over the “big game” R1...RnF if and only if it wins it in the F component, i.e. successfully solves the query F.

Thus, the system’s ability to solve a query F reduces to its ability to generate a solution for KBF, i.e. generate a reduction of F to KB. What would give the system such an ability is built-in knowledge of CoL --- in particular, a uniform-constructively sound axiomatization of it. According to the uniform-constructive soundness property, it would be sufficient for the system to find a proof of KBF, which would allow it to automatically construct a machine M and then run it on KBF with a guaranteed success.

Notice that it is uniform-constructive soundness rather than simple soundness of the built-in (axiomatization of the) logic that allows the knowledgebase system to function. By simple soundness here we mean that every provable formula is nonlogically valid. This is not sufficient for two reasons.

One reason is that nonlogical validity of a formula E only implies that, for every interpretation *, a solution for the problem E* exists. It may be the case, however, that  different interpretations require different solutions, so that choosing the right solution requires knowledge of the actual interpretation, i.e. the meaning, of the atoms of E. Our assumption is that the system has no nonlogical knowledge, which, in more precise terms, means nothing but   that it has no knowledge of the interpretation *. Thus, a solution that the system generates for E* should be successful for any possible interpretation *. In other words, it should be a uniform solution for E. This is where uniform-constructive soundness of the underlying logic becomes critical, by which every provable formula is not only nonlogically, but also logically valid.

The other reason why simple soundness of the built-in logic would not be sufficient for a knowledgebase system to function --- even if every provable formula was known to be logically valid --- is the following. With simple soundness, after finding a proof of E, even though the system would know that a solution for E* exists, it might have no way to actually find such a solution. On the other hand, uniform-constructive soundness guarantees that a (uniform) solution for every provable formula not only exists, but can be effectively extracted from a proof.

As for completeness of the built-in logic, unlike uniform-constructive soundness, it is a desirable but not necessary condition. So far complete axiomatizations have been found only for various limited fragments of the language of CoL. We hope that the future will bring completeness results for more and more expressive fragments as well. But even if not, we can still certainly succeed in finding ever stronger axiomatizations that are uniform-constructively sound even if not necessarily complete. It should be remembered that, when it comes to practical applications in the proper sense, the logic that will be used is likely to be far from complete anyway. For example, the popular classical-logic-based systems and programming languages are incomplete, and the reason is not that a complete axiomatization for classical logic is not known, but rather the unfortunate fact of life that often efficiency only comes at the expense of completeness.

But even CL4, despite the absence of recurrence operators in it, is already very powerful. Why don’t we see a simple example to get the taste of it as a query-solving logic. Let, as before, Acid(x) mean “solution x contains acid”, and Red(x) mean “litmus paper turns red in solution x”. Assume that the knowledgebase KB of a CL4-based system contains x(Red(x) Acid(x)) and x(Red(x) Red(x)), accounting for knowledge of the fact that a solution contains acid iff the litmus paper turns red in it, and for availability of a provider who possesses a piece of litmus paper that it can dip into any solution and report the paper’s color to the system. Then the system can solve the acidity query x(Acid(x) Acid(x)). This follows from the fact, left as an exercise for the reader to verify, that CL4 proves KB x(Acid(x) Acid(x)).

The context of knowledgebase systems can be further extended to systems for planning and action. Roughly, the formal semantics in such applications remains the same, and what changes is only the underlying philosophical assumption that the truth values of predicates and propositions are fixed or predetermined. Rather, those values in CoL-based planning systems will be viewed as something that interacting agents may be able to manage. That is, predicates or propositions there stand for tasks rather than facts. E.g., Pregnant(Laura) --- or, perhaps, Impregnate(Laura) instead --- can be seen as having no predetermined truth value, with Laura or her mate being in control of whether to make it true or not. And the nonelementary formula xHit(x) describes the task of hitting any one target x selected by the environment/commander/user. Note how naturally resource-consciousness arises here: while  xHit(x) is a task accomplishable with one ballistic missile, the stronger task xHit(x) xHit(x) would require two missiles instead. All of the other operators of CoL, too, have natural interpretations as operations on physical (as opposed to just informational) tasks, with  acting as a task reduction operation. To get a feel of this, let us look at the task

Give me a wooden stake  Give me a silver bullet Destroy the vampire  Kill the werewolf.

This is a task accomplishable by an agent who has a mallet and a gun as well as sufficient time, energy and bravery to chase and eliminate any one (but not both) of the two monsters, and only needs  a wooden stake and/or a silver bullet to complete his noble mission. Then the story told by the legal run 1.1, 0.1 of the above game is that the environment asked the agent to kill the werewolf, to which the agent replied by the counterrequest to give him a silver bullet. The task will be considered eventually accomplished by the agent iff he indeed killed the werewolf as long as a silver bullet was indeed given to him.