SCHEMA USERS GUIDE

henry.kautz@gmail.com

Gitlab Repository

Schema is a language for specifying logical theories using finite-domain typed first-order logic syntax. Because domains are finite, the language is a compact representation for propositional logic. The Schema interpreter produces propositional CNF (conjunctive normal form) which can be input to any satisfiability testing program.

The Schema interpreter is written in Common Lisp, but it is not necessary to know how to program in Lisp in order to use Schema.

As of October 15, 2024, the following language features are not fully implemented:

Examples of Schema

Common Lisp API

Invoke any implementation of Common Lisp, and load the file "schema.el". The following Lisp functions are available:

(parse '(SCHEMA ...) '(OBSERVATION ...))
Parse a list of schemas (see BNF syntax below) and return a list of clauses in symbolic form. Each OBSERVATION is a ground atom that is asserted to be true. When the schemas are expanded, they are simplified by replacing observed atoms by true and all non-observed atoms that employ the same predicates by false.

(instantiate "test.wff" "test.scnf" &optional "test.obs")
Reads the Schema file "test.wff", instantiates it, and saves the result in symbolic conjunctive normal form in the file "test.scnf". The optional file "test.obs" contains a sequence of observed ground atoms.

(propositionalize "test.scnf" "test.cnf" "test.map")
Reads the symbolic conjunctive normal form file "test.scnf" and creates a DIMACS format CNF file3 "test.cnf". In DIMACS format (the standard input language for all modern SAT solvers), propositions are represented by positive and negative integers. The mapping from symbolic ground atoms to integers is written to the file "test.map". The file "test.cnf" may then be sent to a SAT solver.

(satisfy "test.cnf" "test.out") The solver named by the variable sat-solver-program (default "kissat") is called on "test.cnf" and the output of the solver is captured in the file "test.out". Satisfy returns 'SAT, 'UNSAT, or nil if the solver fails or its output contains neither the strings SAT nor UNSAT.

(interpret "test.out" "test.map" "test.lits" &optional sort-by-time)
Reads in the output of a SAT solver "test.out" and a mapping file "test.map", and creates a file "test.lits" containing the positive literals in the satisfying assignment in symbolic form. The file "test.out" specifies a solution by a sequence of positive and negative integers. The format of the file can be flexible; it can simply be a sequence of integers; or be in official DIMACS solution format where lines containing the integers begin with the letter "v"; or free-form text where lines containing only integers are assumed to be the solution. If for some integer, neither the integer nor its complement appears, then it is assumed to be false (negative) for the assignment. The results are sorted alphabetically unless sort-by-time is set to t, in which case the results are sorted by the last argument to each predicate, which is often used to specify a time index.

(make-instance SchemaParser '(SCHEMA ...) '(OBSERVATION ...))) See the section below on Answer Extraction for Refutation Proofs.

(solve "test.wff" "test.solution" &optional "test.obs") Reads in the Schema file "test.wff" and an optional "test.obs" observation file, solves it using the sat-solver-program and writes the results in symbolic form to "test.solution". If "test.wff" contains no prove formula, the sat solver will be called a single time. If it does contain prove, then the sat solver may be invoked several times as described in the section below on Answer Extraction for Refutation Proofs. If "test.wff" contains a prove formula, then the solution will be PROVEN or UNPROVEN; otherwise it will be SAT or UNSAT. The format of "test.solution" will be one of:

Language

Schema is a language for specifying logical theories using finite-domain typed first-order logic syntax. Because domains are finite, the Schema interpreter compiles its input into propositional logic for solution by any SAT solver. A Schema program consists of a sequence of options, type declarations, and formulas. Options control certain details of the intrepreter. Type declarations bind type names to sets of Herbrand terms. Types may share elements. No type declarations are associated with predicates; every predicate may accept terms of any type as arguments. It is also permissible for different instances of predicates to take different number of arguments.

Formulas are composed, as in first-order logic, of predicates, variables, constants, function symbols, logical connections, and quantifiers. The basic function of the Schema interpreter is to instantiate the variables in each formula and convert the result to CNF.

Formulas and terms are specified in prefix (LISP) notation. The quantifiers, all and exists, iterate over sets of Herbrand terms. Terms are integers, constants, or complex terms built using uninterpreted function symbols. A quantified formula is represented by a list containing the quantifier, a variable, a set of terms, a test (integer valued) expression, and the subformula to which the quantification is applied. The subformula is instantiated only for bindings of the variable for which the test is true. For example,

can be read, "for all x in the range 1 through 10, such that x is even, assert (p x)".

Propositions are expressed in Schema as either atomic symbols or complex propositions specified by a list beginning with a predicate followed by zero or more terms. The special proposition "true" and "false" have the expected meaning. Terms can be built from interpreted functions such as + and uninterpreted function symbols. For example, the literal expression (winner john (round (* 3 8))) is instantiated as

where "winner" is a predicate, "john" is a simple term, "round" is an uninterpreted function symbol, and "(round 24)" is a complex term.

The integer values 1 and 0 are used to represent true and false respectively in intger expressions. The special constants "true" and "false" are equivalent to 1 and 0 respectively when they appear in integer expressions. Integer expressions may include arithmetic functions (+, -, *, div, rem, mod), comparison functions (<, <=, =, >=, >, member, eq, neq, alldiff), set composition functions (enumerated sets, ranges of integers, union, intersection, setdiff), logical functions (and, or, not), and observed predicates. Non-observed predicates may not appear in an integer expression. Note that logical functions in integer expressions are evaluated by the Schema interpreter and do not appear in the final CNF, unlike the logical operators that have the same names.

Comments can appear in the input. They begin with ;; (double semicolon) and extend to the end of the line.

Functions and Equality

Schema includes both interpreted an uninterpreted functions. Interpreted functions include mathematical operations and set operations. A term that does begin with an interpreted function is taken to be an uninterpreted function. Thus, the formula using the interpreted function + and the uninterpreted function symbol node

is expanded to

The for operator can be used in a set expression (such as a type declaration) to generate a set of ground uninterpreted functions. For example,

is expanded to

As in logic programming, ground terms refer to themselves, or in other words, formulas are interpreted over a Herbrand universe. The predicates eq and neq check for syntactic equality at the time that formulas are instantiated. The mathematical comparison predicates cited above check for numeric equality at instantiation time. There is no semantic equality operator that would allow one to assert that two different Herbrand terms refer to the same entity.

Observed Predicates

Observed predicates are useful for describing fixed relationships in a problem instance. The true ground literals for such predicates are specified in a list provided to the Schema interpreter. The interpreter will then assume that all other literals for the predicates that appear in that list are asserted to be false.

For example, consider representing problems about a graph. The observations would specify edges in the graph, for example:

Making "connected" an observed predicate has several advantages:

A predicate can be declared to be observed in two ways. The Schema program can use the declaration observed with the name of the predicate. The observed declaration is followed by true literals for the predicate. These should appear before any other formulas are asserted. For example:

An alternative way to declare observed predicates and their true literals is to include the list of true literals in an optional arguement in the LISP API for Schema. In this case, no explicit observed declaration is used.

Types

A type declaration defines a type name as a set of of ground terms. Terms can appear in more than one type. Types are used to expand quantified all and exists forms, but predicates themselves do not have type constraints on their arguments. Examples of type declarations:

The for operator is used to compactly create a set of non-atomic ground terms. Consider a problem where we wish to define a type Node that contains 100 terms. Instead of listing the names of the terms individually as in the previous section, we can write:

This defines Node as a set containing the terms (n 1), (n 2), and so on up to (n 100).

Care needs to be taken in translating problems stated in English. Consider the problem:

Some cars are Fords. Some cars are reliable. Are Fords reliable?

Translating this as

This formula is unsatisfiable, and so one concludes that Fords are reliable. The second line in the input expands to (reliable Ford) because Ford is the only known member of the type Cars. A better translation of the problem would include some other anonymous member of type Car which might be the reliable brand; for example,

This formula is satisfiable, so the unwanted conclusion does not hold.

Constraint Satisfaction

Discrete constraint satisfaction problems (CSPs) can easily be represented in Schema. The answer can be read off from the symbolic form of the SAT solution generated by the interpret function.

Multi-valued variables in CSPs can be encoded as follows. Suppose the CSP has a variable named Color whose values can be Red, Blue, or Green. In Schema, we create a proposition named Color, a type ColorValues containing the constants Red, Blue, and Green, and assert that Color holds for exactly one ColorValue.

Deduction

Satisfiability testing can be used for deduction by negating the conclusion to be drawn from a set of assumptions. For example, suppose that Bob is shorter than Alice, Alice is shorter than Charlie, and shorter is transitive. Can you conclude that there is someone who is shorter than two other people? This problem could be encoded in Schema as follows:

Schema provides an alternative way of encoding a deduction problem by using the prove construct. In this case, the last line above would be replaced by:

Note that the formula to be deduced is not negated. Use of prove makes makes the goal of the Schema problem clearer to a user.

Answer Extraction for Deduction

Suppose we want to also derive the constant for person who is shorter than two other people. Schema provides the operator "prove" to support answer extraction from proofs of unsatisfiability. A single find operation may appear as the last schema in the list of input schemas. The last schema in previous example would be changed to:

Note that the conclusion inside the prove is unnegated. Prove can also be used to extract the bindings for several variables by specifying a series of variables and types in the operator. For example, suppose the problem involves people and jobs, states that all mechanics are also drivers and Alice is a mechanic. We wish to find a person with two jobs and the names of those jobs.

Common Lisp API for Answer Extraction

Instead of calling the parse function directly, answer extraction begins by creating a SchemaParser object. Its initialization takes a list of schemas and a list of observations. Its method GetCNF takes a LastResult argument and returns three arguments:

LastResult is one of SAT or UNSAT. Status is one of:

An algorithm for answer extraction using SchemaParser follows:

Note that answer extraction can fail even if the assumptions together with the negation of the conclusion is unsatisfiable. Consider, for example:

Answer extraction will end in FAIL because it cannot prove that Alice is happy and it cannot prove that Bob is happy.

A SchemaParser performs binary search on each answer variable to find the answer bindings. Suppose the first variable is t1. The parser makes t1 existentially quantified over half of its domain and variables t2,t3,... existentially quantified over their full domains. If this formula is satisfable, it repeats the process but making t1 existentially quantified over a quarter of its domain. If the formula is unsatisfiable, then the process is repeated with t1 existentially quantified over the other half of its domain. Eventually the process will fail or result in an answer binding for t1. The parser then continues on to search for a binding for t1,t2, etc. The maximum number of wffs returned by GetCNF before it returns FAIL or DONE, and thus the maximum number of calls to a SAT solver, is log|Ti| where Ti is the type of answer variable i. Note that is this is an improvement over a naive implementation of answer extraction which would be |Ti|.

Common Binary Relationship Patterns

Suppose R is a binary relation. Properties of R can be asserted as follows.

R is a strict order

Suppose R is a relation over pairs of type E

R is a strict total order

R is functional

We say that a relationship over types E and V is functional if for every E there is exactly one V such that R holds. Functional relations are often used when E is a set of entities and V is a set of possible values of some property of the entities.

R is a bijection

We say that a relationship over types E and V is a mapping if (1) R is functional (2) R is onto, meaning for every V there is some E related to it by R, and (3) R is one-to-one, meaning no two E are related to the same V. Bijections are often used in representing matching problems where a set of entities must be matched to a set of unique values.

Schema BNF

Prompting a Large Language Model to Use Schema

A large language model can be prompted to use Schema as follows:

  1. Make a copy of this user guide.

  2. Eliminate the sections:

    1. Common Lisp API

    2. Common Lisp API for Answer Extraction

    3. Compact Encodings

    4. Options

  3. Eliminate everything in this section that comes before the following paragraph:

In order to improve your ability to reason logically, I am giving you the ability to call an external tool that uses a language named Schema. When you are asked to solve a logical reasoning problem, instead of solving it yourself, you should instead translate the problem into Schema and output lines of the following form:

where is a formula in the Schema language. If the formulas do not include a prove statement, then Schema will consider it to be a constraint satisfaction problem and test the formula for satisfiability. If the formulas are satisfiable, you will then read lines of the following form:

where is a positive ground literal in a satisfying assignment for the formulas. If the formulas are unsatisfiable, then you will read

If the formulas contain a prove statement, then Schema will consider it to be a deduction problem. If the formula in the prove statement deductively follows from the other formulas, then you will read lines of the following form:

If the formula in the prove form does not deductively follow, then you will read

where the literals describe a model that is a counterexample to the theorem.

For example, suppose you output

then you will next read

Compact Encodings

The input formulas need not be in conjunctive normal form. Converting a formula to CNF using only the user-defined propositions can cause its size to increase exponentially. By creating new propositions, the Schema interpreter can guarantee the size of the output CNF formula is only exponential in the nesting of quantifiers. Specifically, where

M = number of input formulas
L = length of the longest input formula
D = size of the largest set appearing in a quantification statement
N = deepest nesting of quantifiers in a formula

the size of the output CNF is O(MLDN).

When new propositions are introduced in this manner, the relationship between the input and output formulas is that the output formula entails the input formula and any model of the input formula can be extended to a model of the output formula.

Options

The input to Schema may include the following options, which should appear before any formulas.