Here’s another bit of spontaneous Facebook philosophy, responding to a post by my friend Carl Sachs. The following thoughts are largely inspired by Samson Abramsky‘s majestic ‘Information, Processes and Games‘.
So, say I’ve got a biological organism and a computer system, and both are physically embedded in an ambient environment. From the perspective of thermodynamics, they are both dissipative systems (i.e., their continued functioning depends on energy exchange with their environment). From the perspective of Shannon-Weaver information theory, they are both exchanging information with their environment through dedicated channels (i.e., sensation/behaviour or input/output). The parallel between these things should not surprise us insofar as they’re basically both entropy.
The question is where we might locate the difference you want to posit between these two systems: how can we say that sensation/behaviour is somehow more energetic than I/O? And how can we say that I/O is somehow more structured than sensation/behaviour?
On the one hand, I think that you might want to say something about cybernetics and control systems. For instance, that the energetic strength of some sensory signal might be channelled by a control structure in a way that produces a behaviour with energetic strength that is correlated in a similar way (e.g., a greater concentration of pheromones producing a more pronounced interest in an object/area/etc.). However, we can build these sorts of control systems using computer systems of the sort I’ve described, precisely insofar as the strength of an input can simply be seen as part of the structure of the input data, and the same goes for the output.
On the other hand, I think that you might want to say something about data structures and the programs that operate on them. For instance, that the input data comes in units of a specified type that are transformed by certain abstract functions (which are multiply realisable) into output units of a distinct type (e.g., a function from a fixed resolution bitmap from a camera to Boolean value that controls the opening or closing or a gate). However, the way one designs and reasons about such functional programs effectively abstracts away from the fact that the domain and codomain of the function are I/O coming from an environment, and therefore that the executed program is a process that involves both internal computational state and external environmental state. Functional programming languages like Haskell have ways of getting around this (using monads), but the actual information dynamics are stuffed under the hood of the interpreter/compiler.
Here’s a really important distinction in computer science that most people don’t really appreciate, despite the fact that it increasingly governs our lives: there is a difference between what used to be called batch processing and online systems. A batch process takes a finite input and produces a finite output in a way that can be reasoned about fairly neatly (e.g., renaming a series of files). An online process takes indefinite input and produces indefinite output, and the input it receives can depend upon the output it produces, in a cybernetic fashion (e.g., an operating system that is always ready to interact with its user, in such a way that its response depends on user input and the user input depends on its output). There is a hard mathematical distinction between these two things that is often obviated in practice (e.g., by modelling streams as infinite lists in Haskell). It essentially corresponds to the difference between finite and infinite (captured by algebra/coalgebra), and that between space and time (captured by parallelism/concurrency).
We should now be able to see that the choice of deep neural networks as your model for AI is problematic in two ways:
1) A neural network is a batch process, both in training and execution. It can be iterated, but the difference between a batch process and an online process that iterates batch processes as part of its control flow is non-trivial. In particular, one has to stop talking about data (and inductive types), and start talking about co-data (and co-inductive types), and that’s just in order to make sense of I/O. No one has of yet come up with a good conceptual model of online training, precisely because it would require something like a ‘co-data set’. Still, even if we restrict ourselves to the pre-trained case, we are thinking about it by analogy with an abstract function rather than something handling I/O proper. This is what it means to say that ‘its environment consists of code’.
2) The whole point of using neural networks rather than strictly typed programs is precisely that we begin with unstructured data. There is a trivial sense in which the training data is structured in a certain way (e.g., a set of bitmaps of the same resolution), but the whole point is that by labelling this data and training a NN on it, that the NN will learn structural patterns in the data (i.e., ‘representations‘) that we’re not aware of in advance. We cannot reason about these representations as if they were data types (syntax/semantics), without collapsing the difference between machine learning and program design (pragmatics). A trained NN is a black box function from input to output that we need not understand as long as it seems to work. We could theoretically train up an NN on discrete inputs modelled on animal sensory systems and then use it to control a robot navigating the same sorts of environment as the animal. This would mean that any sense in which the latter was structured would apply to the former, and vice versa.
Here’s the moral that I draw from this. On the one hand, there are two concepts of control: the signal based concept of control from engineering and cybernetics (e.g., feedback), and the state based concept of control flow from computer science (e.g., exception handling). On the other, there are two concepts of information: the Shannon-Weaver model of syntactic information (e.g., encryption/compression) and a bundle of models of semantic information (e.g., Curry-Howard correspondences for typed term-rewriting calculi, algebraic/coalgebraic models of inductive/coinductive types, state-transition machine models with dynamic logic, and Scott’s domain theoretic semantics for untyped lambda calculus). I think unifying the concept of control depends on unifying the concept of information, and vice versa. They are the same problem, and it’s essentially the problem of how to realise the promise of cybernetics.
However, it is also essentially the problem we began with: how do we develop a theory of information processing systems as dissipative systems embedded in an ambient environment? The answer can only be to move beyond the concept of entropy that ties together energy and information from the outset.
Friston et al have a nice piece on this very topic
https://www.frontiersin.org/articles/10.3389/frobt.2018.00021/full