[Completed] Logic Qualified Data Types for Sophia – master thesis

radrow.chain · April 13, 2021, 4:13pm

As some of you may already know, I am working on my master thesis right now. I have decided to combine my education with my work and passion to compiler construction — therefore my research is focused here at aeternity, on our brilliant Sophia language. My plan is to enrich its amazing type system with an advanced feature of simplified version of dependent types called LiQuiD types.

Application Status

Status: Completed on the 14.10.2021, Approved on the 21.04.2021
Last updated: 15.04.2021
Submitted by Radosław Rowicki, [email protected] on the 13.04.2021
Team: @radrow
Approved Budget (in h): –
Used Budget (in h): –
Planned Delivery: 30th of September

Specify the funding category

Research, Education, Open Source

Application Title

Logic Qualified Data Types for Sophia

Applicant

Radosław Rowicki – core developer with 2 year experience at aeternity. Focused on Sophia and FATE.

Value Application

Sophia is a novel language in the world of smart contract languages. It offers static and strong type system with powerful inference algorithm, so despite being strict on the rules, the syntax is clearly readable and convenient to use. My purpose is to extend its awesomeness even further by enhancing it with liquid types which will allow more precise control over the processed data and apply assertions not only to the domains of the variables, but also to their values.
The extension will allow the compiler to perform additional reasoning on the data flow by disallowing constructions that would normally result in runtime errors – like negative spends or out-of-bounds list queries.

The project is going to be the base of my master thesis in computer science at the University of Warsaw. The currently declared title is Formal Verification of Smart Contracts, but I will probably specialize it to something more concrete.

Example of usage:

contract C =
  entrypoint withdraw(x : int) =
    Chain.spend(Call.caller, x)

should result in error:

Refined type error at line 3, column 5:
  Could not solve promise `0 =< x` in context:
    x : int

Refined type error at line 3, column 5:
  Could not solve promise `x =< Contract.balance` in context:
    x : int

This, however, will pass:

contract C =
  entrypoint withdraw(x : int) =
    require(x > 0, "non-positive amount!")
    if(x > Contract.balance) Chain.spend(Contract.balance)
    else Chain.spend(x)

The trick is that Chain.spend will have type {value : int | 0 =< value && value =< Contract.balance} instead of just boring int. This will limit the space spanned by int datatype only to these integers that fit in the valid range. The system will automatically infer and solve the constraints to be met in a path-sensitive way – meaning that it will take into account if expressions and require statements.

Definition of Terms

liquid type – logically qualified data type. A type that carries additional logical assertions on the contained values.

Status Quo

The work on this has been started. Explaining the details is fairly technical, but I am close to having some minimal viable product that will handle some simple Sophia expressions. The progress can be tracked on my fork of the Sophia compiler.

Required Work

Implement new types and syntax in the Sophia language

Implement path sensitive constraint generation

Implement constraint solver

Integrate nicely with the compiler (readable error messages included)

Extend the checker to handle more complex Sophia features

Estimate

This project can be made as complex as one wishes, but I need to stick to my university deadline which should be somewhere around end of August, which gives approximately 4 months.

Known Limitations

Sophia supports some non-standard constructions known from basic ML lanugages, like records, contracts, named arguments, etc. While this is not something that makes the task much harder, it still needs to be addressed properly which makes it longer to finish.

The proposed system may be sometimes hard to convince that the code is correct (i.e. the solver complains on some things that seem “obvious”). Also, the readability of error messages may be something non-trivial.

Outlook

What happens after this project is completed?

aeternity will have even cooler contract language

Less bugs in smart contracts, more reliable standard library

PR value – afaik this will be the first scientific paper that refers aeternity as a subject of research.

Do you plan to populate your project?

Definitely. There is nothing more satisfying than seeing people making cool use of your creations.

Publishment

Technically, since it is a master thesis, the University of Warsaw will have the rights to its text. Although I am sure it won’t be a problem to publish it anyway. In case it is not true, I am happy to write separate articles about the results and show the use cases.

Maintainance

I declare to provide maintenance of the project on request for at least one year after delivery.

crypto_user · April 14, 2021, 6:48pm

how do u defend your thesis? by proving your added feature is working?

radrow.chain · April 15, 2021, 7:51am

It is done as a presentation in front of the commission. I will talk about my research, techniques and results - including showoff of the program. After that I will need to answer some questions related (directly or not) to my subject. All of these things count to my final grade.

lydia · April 21, 2021, 2:25pm

Dear @radrow.chain,
Your Application has been approved by the AF Board. Thank you very much for improving Sophia!

crypto_user · April 21, 2021, 4:22pm

congrats! Incase you have problem similar as mine : Tim Urban: Inside the mind of a master procrastinator | TED - YouTube

radrow.chain · May 26, 2021, 4:21pm

Update 26.05.2021

This is the first update. The refiner is now in a state that is not very ready for production use, but most crucial features and algorithms are already doing their job. Some chain-agnostic and monomorphic subset of Sophia is being processed correctly and the system infers interesting information about written programs.

Example: `max` function

contract Test =
  entrypoint max(a : int, b : int) =
    if(a >= b) a
    else b

The refiner returns:

contract Test =
  entrypoint max : {a:int} {b:int} => {int | $nu >= b && $nu >= a}

(note: $nu is a tag for the value of described type)
One may read it as “function max takes two arguments, a : int and b : int and returns an integer that is greater or equal to both of them”. This is true indeed and the desired expectation of a typical max function.

Example: factorial

contract Test =
  entrypoint fac(n) =
    if(n =< 1) 1
    else n * fac(n - 1)

The refiner returns:

contract Test =
  entrypoint fac : {n:int} => {int | $nu > 0 && $nu >= n}

This shows how does the recursion work. The termination property is not being checked yet, but theoretically the refiner could be upgraded to assert it (reminder that this problem is generally undecidable, i.e. it just impossible to solve). However, good thoughts:

factorial is greater or equal to its argument
factorial is greater than zero

Example: negative spend

contract Test =
  stateful entrypoint f(addr, x : int) =
    Chain.spend(addr, x)

The refiner returns:

Could not prove the promise created at test.aes 3:35
arising from an application of "Chain.spend" to its 2nd argument:
  $nu != 0
from the assumption
  $nu == x

However, this goes absolutely fine:

contract Test =
  stateful entrypoint f(addr, x : int) =
    require(x > 0, "Invalid amount!")
    Chain.spend(addr, x)

The refiner returns:

contract Test =
  stateful entrypoint f : {addr:address} {x:int} => unit

Example: pattern matching

contract Test =
  stateful entrypoint f(x : int) =
    switch((x, x))
      (1, 2) => 1
      (a, 2) => 3

The refiner will complain because

Found dead code at test.aes 5:9
by proving that
  (x : int) == 1 && (x : int) == 2
never holds.

Indeed, x cannot be simultanousely 1 and 2, so the first case is impossible to trigger.

If we try to fix it by turning 2 to 1:

contract Test =
  stateful entrypoint f(x : int) =
    switch((x, x))
      (1, 1) => 1
      (a, 2) => 3

We will get another error:

Could not ensure safety of the control flow at test.aes 4:12
by proving that
  ! ((x : int) == 2) && ! ((x : int) == 1)
never holds.

This means that the refiner tried to assert that the pattern matching will never exhaust its alternatives, but failed in doing so. It wasn’t said explicitly, but this code is unsafe for x == 3.

We can fix this code by filling up the holes:

contract Test =
  stateful entrypoint f(x : int) =
    switch((x, x))
      (1, 1) => 1
      (a, 2) => 3
      (a, b) => 123

So the refiner returns

contract Test =
  stateful entrypoint f : () {x:int} => {int | $nu > 0}

This is everything for today. Now I am working on providing full tuple support along with user defined types.

Next steps should include:

lists
contract state
chain state (eg. balances)
user definable refined types (syntax support)
polymorphism support
weak arguments on non-entrypoint functions, so instead of complaining on improper code, the refiner may provide restrictions on the arguments

The project also needs a better structure, in between be separated from the compiler and have much more sensible SMT handling.

radrow.chain · May 27, 2021, 12:07pm

Nothing spectacular, but I love the way my refiner expresses the fact that the function just doesn’t return any results:

entrypoint loop() : int = loop()

The refiner returns:

entrypoint loop : () => {int | $nu == 0 && $nu == 1}

"The results are integers such that they are simultaneously equal to 0 and equal to 1" – which is kinda true. Assuming that this function ever terminates gives you false statements. This way the following of-course-wrong function

entrypoint h(x) = loop() + (1 / 0)

Refines correctly, as the division by zero assertion gets “proven” by a false assumption that g() == 1 && g() == 0.

radrow.chain · June 21, 2021, 9:17am

Update 21.06.2021

The things are going very well and I am about to finish the main parts. Things that are working so far:

Pattern matching
Numerous predicates on integers
Algebraic datatypes
Constructor assertion
Refined types in record fields and constructor parameters
Custom dependent type creation
State management and balance assertions!
Lists keep information about their lengths
Parameters of lists can carry logical qualifiers
Syntax support for dependent types
Precise error locations
Numerous optimizations

Later on I will publish syntax and use instructions. But generally speaking the refiner doesn’t require any action from the user. It just lints the code and provides warnings.

radrow.chain · July 7, 2021, 9:04am

Update 07.07.2021

I consider the core part of the project finished. It is not very much ready for the production yet, but provides a working proof of concept that can serve as a foundation for a decent full-blown type refinement system. It is capable of catching many common errors related to integer boundaries at this moment: up to this point it managed to point out even my own mistakes and typos while writing the tests. For example it could easily tell that there is something wrong in the following code:

function 
  length : {l_ : list('a)} => {len : int | l_== len}
  length([]) = 0
  length(_::t) = length(t)

which indeed does not compute the length of the list. Though It counts the reasons for not using strictly typed programing languages.

The code above was a legitimate test I wrote and I simply forgot to increase the counter at the recursive step. After a while of debugging why it couldn’t compile I found out that the system was actually right and it was me who screwed up the code.

Performance

isn’t satisfying, sadly. There are several issues I identified so far which would drastically improve the efficiency of the algorithm:

Reduce the Herbrand’s base for fresh constraints. Now I create many, many logical qualifications for zero-knowledge variables and most of them are redundant or just absurd. Implementing a smarter strategy for initializing the liquid type variables would take much load off the solver.
Take advantage of the incremental solving. Z3 offers pushing and popping contexts which could be used to handle extended environments. Right now I recreate the whole context for every constraint from scratch, even if the envs don’t change at all or differ just by a single variable introduction. This would require some additional work before the solving and changing the environment management as it would need to store the constraints in a topologically sorted tree and the environments should contain only the diffs from the previous states. I believe that would have a huge impact on the performance.
Make use of the processes. This is Erlang.

TODO features

Some of the interesting functionalities could be added:

State management, while works now, could be improved to cover more exotic cases (like higher order functions and more detailed reasoning about record fields).
Tuples and records could have mutually recursive constraints. Now they are independent which could be limiting if one wanted to have eg a pair of ints where the right is greater than the left hand (useful for integer ranges)
Maps are not supported at all.
Remote contracts could be easily added once maps are working. One could reason about their balances and maybe even states.
Remote calls are not supported (not a very big deal though)

Quo

As for now the project focuses on the arithmetic functionalities of Sophia. This essentially proves that the language is well suited for more sophisticated type systems with value-dependent reasoning which could be later extended to cover more blokchain specific areas. One of the already implemented ones is the balance management – the system will forbid negative spends and spends that are not checked to be affordable by the calling contract.

Right now I am going to focus on the thesis writeup that will report my research, implementations, experiments, general thoughts, possible improvements and alternative approaches.

radrow.chain · August 2, 2021, 9:58am

Update 01.08.2021

The thesis title has been changed to more precise “Liquid types for verification of smart contracts”.

The writeup already has the main body and abstract, but they will remain latent until they are properly reviewed and accepted. As for now I can tell that the document has over 60 pages

Table of contents if anyone is interested:

Introduction
1.1 Blockchain
1.2 Smart contracts
1.3 aeternity
The Sophia Language
2.1 Language overview
2.2 Syntax
2.3 Type system
2.4 Potential issues
Liquid types
3.1 Motivation
3.2 History and overview
3.3 Definition
3.4 Syntax
3.5 Application
Implementation
4.1 Inspiration and overview
4.2 Used technology
4.3 Preprocessing
4.4 The gamma-star environment
4.5 Constraint generation
4.6 Solving
4.7 Postprocessing
Outcome
5.1 Summary
5.2 Value and examples of use
Conclusions
6.1 Future work
6.2 Alternative approaches

radrow.chain · September 30, 2021, 4:46pm

Finished!

The thesis was defended today and received the maximal grade
The reviews are highly enthusiastic and encouraging for publication, so if you are interested in the lecture — here you are (945.1 KB).

The implementation is in the related pull-request.

Abstract

In recent years smart contracts have been developed within the blockchain technology as a trustless tool for managing digital assets, many of which are of considerable value. Due to the nature of blockchains, smart contracts are vulnerable to bugs, because once deployed, they cannot be fixed. This can have wide-ranging financial consequences, as cryptocurrencies and NFTs rapidly gain popularity. Therefore, a great need for meticulous audits emerges, so the vulnerabilities can be found and mitigated before they cause harm.

This work presents liquid types as a method of computer-aided verification of smart contracts. The analysed implementation named Hagia is an extension of the type system of the Sophia language, which has been designed for smart contract development on the æternity blockchain. Liquid types have already found application in many projects written in functional languages such as OCaml or Haskell. The presented algorithms and discussions provide a framework for understanding the key concepts of the liquid inference, use cases and the benefits of the proposed solution in the context of smart contract development.

Acknowledgements

(please refer to the document for the full text)

This research could not come to its shape without my supervisor, Ulf Norell. I am thankful for all his support, suggestions and discussions we had.

I would also like to thank the Aeternity Crypto Foundation for supporting my research.

crypto_user · October 2, 2021, 2:59pm

typo at page 41

erik.chain · October 5, 2021, 8:52am

Wow congrats @radrow.chain!! So glad to see the results of your research

[Completed] Logic Qualified Data Types for Sophia – master thesis

Application Status

Specify the funding category

Application Title

Applicant

Value Application

Definition of Terms

Status Quo

Required Work

Estimate

Known Limitations

Outlook

What happens after this project is completed?

Do you plan to populate your project?

Publishment

Maintainance

Update 26.05.2021

Example: max function

Example: factorial

Example: negative spend

Example: pattern matching

Update 21.06.2021

Update 07.07.2021

Performance

TODO features

Quo

Update 01.08.2021

Finished!

Abstract

Acknowledgements

Example: `max` function