Proper Treatment 正當作法/ blog/ posts/ Self-quoting fix-points: Quines, in Scheme and MetaOCaml
oleg at
標籤 Tags:
2008-08-17 19:19

Quines are normally defined as “self-reproducing programs”, or programs that output the copy of themselves:

If we define an interpreter IN as a function that takes an encoding ET of a term t to t’s value, then quine q is a term such that its value is its encoding

    q = EQ


    IN EQ = EQ

from which follows that a quine is a fix-point of the interpreter.

A quine is a fix-point, usually not the least fix-point. Indeed, in Scheme or other CBV languages, the least fix-point of eval is bottom – which is not that interesting.

Let’s consider Scheme first [footnote: our eval here is (define (our-eval term) (eval term r5rs-environment)). Many Scheme systems, e.g., Petite, offer such a single-argument eval]. The shortest fix-point of eval is 1 (the numeral one). Indeed, (eval 1) is 1. The same holds for other numbers, as well as characters, strings, and booleans. These are often called self-evaluating data – the very name self-evaluating suggests they are the fix-points of eval.

Can we find a fix-point of eval that actually “does something”? The answer comes from the expression for the core of the fix-point combinator (so-called “the omega term”)

    ((lambda (f) (f f)) (lambda (f) (f f)))

Clearly one step of reducing this expression gives the same expression back, so the evaluation never finishes. We do wish the evaluation finish however, to give us a value to pass to eval (whose result can again be passed to eval, etc). We just need to place quotes in the right places:

    (define ff ((lambda (f) `(,f ',f)) '(lambda (f) `(,f ',f))))


    > ff
    ((lambda (f) `(,f ',f)) '(lambda (f) `(,f ',f)))
    > (eval ff)
    ((lambda (f) `(,f ',f)) '(lambda (f) `(,f ',f)))
    > (equal? ff (eval ff))

The repeated re-evaluations of ff are similar to the stepping through the evaluation of the omega term.

We can re-write ff using only quasi-quote. That is not entirely trivial because now we have to mind the levels of quotation and quasi-quotation. But that expression will be more useful in other staged languages, which only have quasi-quotation and which insists on properly matching unquotes with quasi-quotes.

    (define ff1
       ((lambda (g) `(,g (,`quasiquote ,g))) 
          `(lambda (g) `(,g (,`quasiquote ,g)))))

    > ff1
      ((lambda (g) `(,g (,`quasiquote ,g)))
        `(lambda (g) `(,g (,`quasiquote ,g))))
    > (eval ff1)
      ((lambda (g) `(,g (,`quasiquote ,g)))
       `(lambda (g) `(,g (,`quasiquote ,g))))
    > (equal? ff1 (eval ff1))

Let us now turn to MetaOCaml, where we immediately run into problem with types. In MetaOCaml, the evaluator is an operation .! of the type ('a,'b) code -> 'b. Its fix-point should have 'b = ('a,'b) code, which impossible without equi-recursive types. There is the second problem in the above equation: The evaluator .! insists that 'a be a free type variable. We need first-class polymorphism (to “generate” these free type variables, informally speaking). We arrive at the first approach. We define an iso-recursive type with first-class polymorphism

    type rc = {rc : 'c. ('c, rc) code};;

and a custom run function that does appropriate record unwrapping for us:

    let runrc t = .! (t.rc);;
      val runrc : rc -> rc = <fun>

Now the type of runrc is such that it may indeed have a fix-point.

Our first approach to the fix-point of runrc uses the value recursion and cross-stage persistent values (CSP). It is cheating as we shall see – yet the expression is so simple it is hard to pass on.

    let rec q1 () = {rc = .<q1 ()>.};;
      val q1 : unit -> rc = <fun>


    let tq1 = q1 ();;
     val tq1 : rc =
       {rc = .<(((* cross-stage persistent value (as id: q1) *)) ())>.}

    let tq2 = runrc tq1;;
       val tq2 : rc =
         {rc = .<(((* cross-stage persistent value (as id: q1) *)) ())>.}


The result shows why using CSP is cheating: the encoding, tq1, does not make sense in the empty environment. Our run-time system should have the custom binding of q1. We cannot “pack” tq1 and transmit to another MetaOCaml system running on a separate host. That system will have no idea about what q1 is being associated with.

One may informally say that q1 () creates an non-founded code value: code value that refers to itself. That is reminiscent of an old trick of writing loops in Lisp interpreter by creating such cyclic code (e.g., in macros). This technique was popularized by Olin Shivers

Stylish Lisp programming techniques
Technique n: passing circular list structures to the evaluator

We may attempt to port the ff1 function of Scheme to MetaOCaml. At first, porting seems easy: MetaOCaml’s brackets are the equivalent of a quasi-quote, and MetaOCaml’s escapes are like unquote. But we run across two difficulties: we can’t (quasi-) quote quasiquote in MetaOCaml. Brackets always “take effect” and can’t be quoted. We can attempt to emulate Scheme’s quasi-quote by defining a form lift:

    let lift body = .<body ()>.;;

But that will only introduce CSP (variable body).

This points out the major difficulty with writing quines in MetaOCaml. Many (most?) approaches to quines rely on being able to “inline” a value inside quotation marks. For example, if s:string is the encoding of some term, we can insert that s replacing $x within the string "here $x" and "here \"$x\"". In fact, it is this approach that is used when writing quines in C and other such languages.

Likewise, the first Scheme solution relied on inlining f at two places within `(,f ',f), the latter is under the quotation. But in MetaOCaml, we can’t inline under the brackets without resorting to CSP. If g is a code value in MetaOCaml, we can inline it here

    (fun g -> .<begin .~g; () end>.) .<1::[]>.;;

but not under brackets here

    (fun g -> .<begin .<.~g>.; () end>.) .<1::[]>.;;

(one level of brackets disable escape, giving us a CSP g) or here

    (fun g -> .<begin .<.~.~g>.; () end>.) .<1::[]>.;;

(type error)

Indeed, the common approach to quine construction relies on the conversion of a value of the type ('v code) into the value of the type (('v code) code) – or, to persist a value from one stage onto the latter stage. But the this is the very definition of CSP.

There might be one way to avoid CSP: that is, writing a code that takes an interpreter. By supplying different interpreters, we can generate different quoted versions of the same code. Here’s a simple illustration:

    (* a sample term *)
    let t1 = (fun (l,a) -> l (fun f -> a f));;

    (* an immediate interpreter *)
    let i0 = ((fun x -> x), (fun f -> begin f; f; () end));;

    t1 i0;;
      - : '_a -> unit = <fun>

    (* a lifting interpreter *)
    let i1 = ((fun f -> .<fun x -> .~(f .<x>.)>.), 
              (fun f -> .<begin .~f; .~f; () end>.));;

    t1 i1;;
      - : ('a, '_b -> unit) code = .<fun x_1 -> x_1; x_1; ()>.

    (* a twice lifting interpreter *)
    let i2 = ((fun f -> .<.<fun x -> .~.~(f .<.<x>.>.)>.>.), 
              (fun f -> .<.<begin .~.~f; .~.~f; () end>.>.));;

    t1 i2;;
     - : ('a, ('b, '_c -> unit) code) code =
       .<.<fun x_1 -> .~(.<begin .~(.<x_1>.); .~(.<x_1>.); () end>.)>.>.

    .! (t1 i2);; (* is same as t1 i1 *)

To use this approach in writing quines, we need to be generating functions that take interpreters – poly-interpreted terms, so to speak. This approach may indeed work out, but I haven’t pursued it in detail.

We finish with a more straightforward encoding of ff1 in MetaOCaml, which does, alas, use CSP (as explained above) – albeit theoretically avoidable CSP:

    type af = {af : 'c. ('c, af -> rc) code};;

    let t1 =
       (fun f -> {rc = .<.~( f>.}) {af = .<(fun f -> {rc = .<.~( f>.})>.};;
     val t1 : rc =
      {rc =
        .<((fun f_1 -> {rc = .<((.~( f_1)>.})
           (* cross-stage persistent value (as id: f) *))>.}

    let t2 = runrc t1;;
      This expression has type ('a, rc) code but is here used with type
      ('b, rc) code
      Exception: Trx.TypeCheckingError.

We get a type-checking error at run-time! Something is odd, which might be indicative of an issue with MetaOCaml. One can see however that manually running t1 indeed gives the same code as t1

The cross-stage persistent value af is the closed code, which is serializable and so can be “copied” rather than made available by reference. Mutable values or closures can’t generally be copied and so it is their reference that cross-stage persists. For values of primitive or code types and their tuples and records – values that can be serialized and migrated – one may persist either the reference or the value itself. In the latter case, we inline the cross-staged persistent value into the host AST. The current version of MetaOCaml inlines only integers, booleans, characters and doubles – but not tuples of integers, for example. Therefore, the appearance of the CSP in t1 above is an artefact of the MetaOCaml implementation. That CSP is theoretically avoidable.