-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Circcash's mining algorithm is designed to accelerate the development of reversible computing hardware. The capabilities of AI systems with reversible computing hardware will vastly outperform any possible AI systems without such reversible computing hardware. AI technologies are inherently risky especially if these technologies include energy efficient reversible hardware. We believe that the benefits of reversible computing technologies outweigh the risks, but Circcash is committed to mitigating these risks. Circcash is in a good position to mitigate the risks posed by AI systems since AI systems that have been developed to investigate Circcash's mining algorithm seem to be inherently safer than other AI systems for several reasons. Circcash is committed to developing AI systems that are more safe, understandable, interpretable, predictable, and mathematical, but are just as powerful as other AI systems. Circcash is also committed to open communication about its AI research and making as much of its research open source as possible. Circcash is also non-alarmist at the moment in the sense that we do not believe that AI progress should be halted as we do not see any clear threat from AI. We simply want to perform our basic responsibility at addressing any risks that may arise from reversible computing technologies.
Comparing efficiency of current computation to reversible computation
Our current computing hardware is very inefficient compared to the future reversible computing hardware. The paper Mechanical Computing Systems Using Only Links and Rotary Joints concludes that it may be feasible (but probably exceedingly difficult) to make necessarily reversible computing hardware that computes "10^21 FLOPS in a sugar cube using 1 watt of power with a 100 MHz clock (10 ns)." To put this into comparison, an NVIDA GeForce RTX 4090 uses 450 W of power while it produces 10^14 FLOPS of computation. Training the 175 billion parameter version of GPT 3 used 3*10^23 FLOPS which would translate to 100 Joules of energy using future reversible computing hardware. I am not an expert in nano-technology nor am I an engineer, so I cannot comment too much on the feasibility of using mechanical molecular computing systems for computation. But there are plenty of other ideas for reversible computing hardware that are more efficient than the hardware we have today.
Some principles of AI safety
-
Low entropy random variables: The entropy of a discrete random variable
$\mathcal{X}$ on the set$A$ is the value
$H(\mathcal{X})=-\sum_{a\in A}P(\mathcal{X}=a)\cdot\log(P(\mathcal{X}=a)).$ Suppose that$D$ is our data set. Then let$\mathcal{X}_D$ denote the machine learning model trained with the data set$D$ . Then a lower value of$H(\mathcal{X}_D)$ should be preferred over a higher value of$H(\mathcal{X})$ . The reason for this is that if$H(\mathcal{X}_D)$ is higher, then the trained model$\mathcal{X}_D$ will contain much random information that is not part of the training data$D$ . This random information will make the machine learning model less understandable and interpretable. -
Smoothness: The function
$D\mapsto\mathcal{X}_D$ should be smooth. -
Strong convexity around optima: If
$A$ is a non-zero positive semidefinite matrix, then let$A(r)=r=\lambda_n/\lambda_1$ where$\lambda_1\geq\dots\geq\lambda_n$ are the eigenvalues of$A$ . Suppose now that$F_D:X\rightarrow\mathbb{R}$ is the loss function for the training set$D$ and that$X$ is given a Riemannian metric so that there is a notion of a Hessian for the function$F_D$ . Let$x_0$ be the value such that$F_D(x_0)$ is locally maximized. Then the ratio$A(H(F_D)(x_0))$ of the smallest to the largest eigenvalues of the Hessian$H(F_D)(x_0)$ should be large. A large value of$H(F_D)(x_0)$ means that the local maximum$x_0$ is more robust to changes in the training data$D$ and other changes in the loss function$F_D$ . It is preferrable if most of the eigenvalues of$H(F_D)(x_0)$ are close together. -
Large basin of attraction. Suppose that
$U$ is an open subset of Euclidean space. Let$F_D$ be a loss function. Suppose that$F_D(x_0)$ is our loss function. Let$r$ be the maximum value such that if$y\in B_r(x_0)$ , then
$\langle\nabla F_d(y),y-x_0\rangle\leq 0$ . Then the value$r$ should be large in order to ensure that the gradient descent process converges to the same local minimum$F_D(x_0)$ around$x_0$ .
Spectral and spectral-like algorithms
Suppose that
If
LSRDRs and similar constructions seem to be capable of performing machine learning tasks including that of constructing matrix-valued word embeddings, graph embeddings, evaluating the security of block ciphers, but LSRDRs do not seem to be capable of replacing neural networks. More research is needed in order to explore the full potential of LSRDRs.
Matrix-valued natural language processing
In natural language processing, tokens are typically represented as vectors. This may be problematic since vectors are useful for representing a single meaning of a token but vectors are not very well suited for handling tokens which may have multiple meanings. It is better for the tokens to be represented as a sort of superposition of vectors to represent the variety of meanings that a token could have. In other words, it is better for tokens to be represented by matrices than simply as vectors.
Suppose that
Suppose that