Understanding Solidity Part 3: Virtual Machine

Understanding Solidity Part 3: Virtual Machine


Part 1 and Part 2:

In the previous part we’ve written a Solidity grammar, or at least we’ve written enough to parse our example contract. Today we’ll go one step further and make it actually run by writing our own stack-based virtual machine. If you somehow missed the previous tutorial - you can download the source code from here. This part will involve a lot of Rust, but I’ll gradually introduce new Rust concepts for those who are new to the language so that anyone reading it can follow along. Rust Book

Stack

EVM is a stack based virtual machine. It means that all operations occur on the stack data structure. While queues are FIFO (first in first out), stacks are LIFO (last in first out). If you don’t know what a queue is, you’ve probably been in one.

In essence EVM stack is just an array of unsigned 256bit integers sometimes referred to as words. The stack is limited to 1024 words in total with 32 bytes each, so that’s a limit of 32 MB. Since Rust doesn’t natively support 256bit integers, we’ll use the ethnum crate. Let’s add it to our dependency:

[dependencies]
ethnum = "1.3.2"

Create a new module tinyvm.rs inside our src/ folder and import the U256 type as well as types from our grammar:

use ethnum::U256;
use crate::solidity::grammar::*;

And our newly created module to our main.rs:

pub mod solidity;
pub mod tinyvm; // <--- Add this
use crate::tinyvm::*; // <--- Add this

Now go back to our newly created tinyvm.rs module and declare a struct that will hold our stack:

pub struct Stack {
    stackarr: [U256; 1024],
    top: usize,
}

All we need now is to implement some of the most common stack operations, like push, pop and swap, as all stacks share those in common. EVM in particular has multiple push operations, 32 in fact, but since we’re dealing with booleans we won’t need the entire range of opcodes and just need push1 and push for now.

impl Stack {
    pub fn new() -> Self {
        Self {
            stackarr: [U256::ZERO; 1024],
            top: 0,
        }
    }

    pub fn push32(&mut self, value: U256) {
        if self.top < 1024 {
            self.stackarr[self.top] = value;
            self.top += 1;
        }
    }

    pub fn push1(&mut self, value: u8) {
        self.push32(U256::from(value));
    }

    pub fn pop(&mut self) -> Option<U256> {
        if self.top == 0 {
            None
        } else {
            self.top -= 1;
            Some(self.stackarr[self.top])
        }   //no semicolon in Rust means this expression is returned, 
            //and it will return either None or Some() depending on the condition
    }

    pub fn swap(&mut self) {
        self.stackarr.swap(self.top - 1, self.top - 2);
    }
}

Note Rust doesn’t have inheritance or prototypes like in other object oriented programming languages. Instead it has traits that allow us to extend functionality of any type, including built-in ones. Also note the lack of semicolons and lack of return keyword, as in Rust you don’t need to explicitly return a value from a function.

Both push and pop are pretty self explanatory. push adds one element on top of the stack while pop removes the top most element of the stack. swap swaps the two top most elements.

EVM has over 140 opcodes, but in order to run our little contract we’d only need a small subset of those. Let’s also declare some of the OPcodes we’re actually going to use:

#[derive(Debug, Clone)]
pub enum OP {
    PUSH32(U256),
    PUSH1(u8),
    POP,
    DUP1,
    SWAP1,
    SLOAD,
    SSTORE,
    ISZERO,
    RETURN,
}

Storage

In order to store state variables we need contract storage. It’s simply an array of unsigned 256bit integers where each state variable is stored in a separate slot, so our state variable would be stored at slot 0 (or index 0).

#[derive(Debug, Clone, Default)]
pub struct ContractStorage {
    slots: Vec<U256>,
}

SLOAD and SSTORE OPcodes will be used to read and write to this storage.

Tiny Virtual Machine

A virtual machine is something that steps through each OPcode and evaluates them. Our VM just needs a stack, a program to execute, a program counter and calldata. Calldata is an immutable storage type that contains the function name and arguments passed to that function as raw bytes, where the first 4 bytes are the function signature. For now we’re going to assume that our calldata only contains function signature.

pub struct VM<'a> {
    pub stack: Stack,
    program: Vec<OP>,
    pc: usize,
    calldata: &'a [u8],
}

It’s implementation looks like this:

impl<'a> VM<'a> {
    pub fn new(program: Vec<OP>, calldata: &'a [u8]) -> Self {
        Self {
            stack: Stack::new(),
            program,
            pc: 0,
            calldata: calldata,
        }
    }

    pub fn run(&mut self, storage: ContractStorage) -> ContractStorage {
        let mut storage = storage;
        while self.pc < self.program.len() {
            match self.program[self.pc] {
                OP::PUSH32(word) => todo!(),
                OP::PUSH1(value) => todo!(),
                OP::POP => todo!(),
                OP::DUP1 => todo!(),
                OP::SWAP1 => todo!(),
                OP::SLOAD => todo!(),
                OP::SSTORE => todo!(),
                OP::RETURN => todo!(),
                OP::ISZERO => todo!(),
            }
        };
        storage
    }
}

Note You’re probably wondering what the heck are <'a>, &'a and <'_>. That’s a lifetime with arbitrary name a that we created for our calldata byte array, we’re basically telling the Rust compiler that this data should live as long as our VM instance lives. No need to manually allocate/deallocate memory. todo!() is a macro that will result in our program panicing when called. It’s useful for prototyping as we’ll know if we forgot to implement something while still allowing us to compile and run.

Here we iterate over the instruction set and use Rust’s pattern matching syntax to match each instruction. Kinda like switch case expression in other languages but way more powerful. For every new VM instance we pass it a set of instructions to execute as an argument, and when we run it we pass it a ContractStorage to execute on, returning modified contract storage.

Now let’s implement the OPcodes:

                OP::PUSH32(word) => {
                    self.stack.push32(word);
                    self.pc += 1;
                },
                OP::PUSH1(value) => {
                    self.stack.push1(value);
                    self.pc += 1;
                },
                OP::POP => {
                    self.stack.pop();
                    self.pc += 1;
                },
                OP::SWAP1 => {
                    self.stack.swap();
                    self.pc += 1;
                },
                OP::DUP1 => {
                    let top = self.stack.pop().unwrap();
                    self.stack.push32(top);
                    self.stack.push32(top);
                    self.pc += 1;
                },
                OP::SLOAD => {
                    let key = self.stack.pop().unwrap();
                    let val = storage.slots[key.as_usize()];
                    self.stack.push32(val);
                    self.pc += 1;
                },
                OP::SSTORE => {
                    let key = self.stack.pop().unwrap();
                    let val = self.stack.pop().unwrap();
                    storage.slots[key.as_usize()] = val;
                    self.pc += 1;
                },

SLOAD loads a state variable from storage with top most element as key. SSTORE stores a state variable to storage with top most element as key and second top most element as value.

                OP::RETURN => {
                    self.pc += 1;
                    break;
                },

RETURN OPCode stops the program execution and breaks the loop.

                OP::ISZERO => {
                    let top = self.stack.pop().unwrap();

                    if top == U256::ZERO {
                        self.stack.push32(U256::ONE);
                    } else {
                        self.stack.push32(U256::ZERO);
                    }
                    self.pc += 1;
                },

ISZERO OPCode looks at the top most element of the stack and replaces it with either 1 or 0 depending on whether it’s zero or not, so it flips the value.

Now what we’re missing is an instruction set to run and a way to interact with our VM.

Putting it all together

Start by adding these two structs and these two enums to our tinyvm.rs module:

#[derive(Debug, Default, Clone)]
pub struct Contract {
    pub name: String,
    pub functions: HashMap<String, Function>,
    pub variable_map: HashMap<String, usize>,
    pub storage: ContractStorage,
}

impl Contract {
    pub fn new(name: String) -> Self {
        Self {
            name,
            ..Contract::default()
        }
    }
}

#[derive(Debug, Clone, Default)]
pub struct Function {
    program: Vec<OP>,
    pub visibility: FuncVisibility,
    pub mutability: FuncMutability,
    pub returns: Vec<Parameter>,
}

#[derive(Debug, Clone, Default)]
pub enum FuncVisibility {
    Public,
    Private,
    #[default]
    Internal,
    External,
}
#[derive(Debug, Clone, Default)]
pub enum FuncMutability {
    Constant,
    #[default]
    NonPayable,
    Payable,
    View,
    Pure,
}

Note Enum variants annotated with #[default] will become default values unless explicitly set.

Here we’ve created a Contract struct to hold our contract data, and a Function struct to hold our instructions as well as function attributes. You might also want to import HashMap at the top of the file:

use std::collections::HashMap;

Note HashMap is a simple KV key-value store allowing us to map strings to data.

Remember when in the previous part we’ve written a parser that generates an AST tree? It’s about time we used it for something, so we’ll use that parsed data to walk the tree and search for relevant contract data. Start by implementing a public function create_contracts:

pub fn create_contracts(source_unit: SourceUnit) -> Vec<Contract> {
    handle_source_unit(source_unit)
}

fn handle_source_unit(source_unit: SourceUnit) -> Vec<Contract> {
    //TODO: Iterate over source_unit.parts
}

Your programming instincts are probably telling you to use a for loop here, but no.

fn handle_source_unit(source_unit: SourceUnit) -> Vec<Contract> {
    source_unit.parts.iter().map(|part| handle_source_unit_part(part.clone())).flatten().collect::<Vec<Contract>>()
}

fn handle_source_unit_part(part: SourceUnitPart) -> Option<Contract> {
    todo!()
}

Rust encourages a different approach to iteration “the functional programming way” with the help of combinators. Let’s break it down:

  • .iter() returns an iterator over the elements of a collection
  • .map() transforms each element of the iterator. Map takes in a closure (kinda like a lambda function), for example |num| num * 2 returns value doubled.
  • .flatten() flattens the iterator of iterators into a single iterator, so instead of ending up with Vec<Vec<Contract>> we’ll end up with Vec<Contract>
  • .collect() collects the iterator into a Vec<Contract> collection

So you simply chain them together to transform your data the way you want it.

You can even skip the .map().flatten() and reduce amount of calls by just using .flat_map() instead:

fn handle_source_unit(source_unit: SourceUnit) -> Vec<Contract> {
    source_unit.parts.iter().flat_map(|part| handle_source_unit_part(part.clone())).collect::<Vec<Contract>>()
}

Note We have to clone() the part because we’re passing it by value. Cloning is a rather cheap operation in Rust (at least for small structs) and compiler can optimize it away if it’s not needed, so clone away!

Now we need to handle SourceUnitPart that holds our contract definition. If we find one - we instantiate a new Contract, handle functions and state variables inside the contract and return the contract instance.

fn handle_source_unit_part(part: SourceUnitPart) -> Option<Contract> {
    match part {
        SourceUnitPart::ContractDefinition(_, name, _, parts, _) => {
            let mut contract = Contract::new(name);
            let _ = parts.iter().map(|part| handle_contract_part(part.clone(), &mut contract)).collect::<Vec<_>>();
            Some(contract)
        },
        _ => None,
    }
}

fn handle_contract_part(part: ContractPart, contract: &mut Contract) {
    match part {
        ContractPart::FunctionDefinition(_, name, params, attr_list, ret_params, _, statement, _) => {
            todo!()
        },
        ContractPart::VariableDefinition(ty, visibility, name, _) => {
            todo!()
        },
        ContractPart::ConstructorDefinition(_, params, attr_list, _, statement, _) => {
            //TODO
        }
    }
}

Note &mut allows us to pass a mutable reference to a function, so that we can modify contract instance inside the function.

Let’s first take care of state variables since they’re the low hanging fruit. All we do is insert the state variable into contract’s variable map where we keep track which variable belongs to which slot.

        ContractPart::VariableDefinition(ty, visibility, name, _) => {
            contract.variable_map.insert(name, contract.variable_map.len());
            contract.storage.slots.push(U256::ZERO);
        },

Next we have functions.

fn handle_contract_part(part: ContractPart, contract: &mut Contract) {
    match part {
        ContractPart::FunctionDefinition(_, name, params, attr_list, ret_params, _, statement, _) => {
            if let Some(statement) = statement {
                //TODO: handle function arguments

                let program = handle_statement(statement, contract);
                
                let (visibility, mutability) = handle_attrs(attr_list.clone());
                
                let mut returns = vec![];
                if let Some(FunctionReturnParams::ParameterList(_, ParameterList::Params(_, Some(ret_param), _))) = ret_params.clone() {
                    returns = ret_param.params;
                }

                contract.functions.insert(
                    find_function_signature(name.clone(), params.clone()),
                    Function {
                        program: program,
                        visibility: visibility,
                        mutability: mutability,
                        returns: returns,
                        ..Function::default()
                    }
                );
            }
        },
        ContractPart::VariableDef...
    }
}

fn handle_attrs(attr_list: Vec<Option<FunctionAttribute>>) -> (FuncVisibility, FuncMutability) {
    todo!()
}

fn handle_statement(statement: Statement, contract: &mut Contract) -> Vec<OP> {
    vec![] //TODO: handle statements and expressions next
}

fn find_function_signature(name: String, params: ParameterList) -> String {
    if let ParameterList::Params((), Some(p), ()) = params {
        let params_string = p.params.iter().map(|param| {
            match param.ty {
                Expression::Type(Type::Bool(_)) => Some("bool"),
                _ => None,
            }
        })
        .collect::<Option<Vec<&str>>>().map(|v| v.join(",")).unwrap_or_default();

        format!("{}({})", name, params_str)
        //get_func_sig(format!("{}({})", name, params_string))
    } else {
        format!("{}()", name)
        //get_func_sig(format!("{}()", name))
    }
}

pub fn get_func_sig(in_str: String) -> String {
    in_str
}

Note The "if let" is a handy Rust syntactic sugar to quickly match on a single match arm.

Now to handle function attributes we simply iterate over the list of attributes and check whether they’re private, view, etc.

fn handle_attrs(attr_list: Vec<Option<FunctionAttribute>>) -> (FuncVisibility, FuncMutability) {
    let mut visibility = FuncVisibility::default();
    let mut mutability = FuncMutability::default();

    attr_list.iter().for_each(|attr| {
        if let Some(attr) = attr {
            match attr {
                FunctionAttribute::Visibility(v) => {
                    visibility = match v {
                        Visibility::Public(_) => FuncVisibility::Public,
                        Visibility::Private(_) => FuncVisibility::Private,
                        Visibility::Internal(_) => FuncVisibility::Internal,
                        Visibility::External(_) => FuncVisibility::External,
                    }
                },
                FunctionAttribute::Mutability(m) => {
                    mutability = match m {
                        Mutability::Constant(_) => FuncMutability::Constant,
                        Mutability::Payable(_) => FuncMutability::Payable,
                        Mutability::View(_) => FuncMutability::View,
                        Mutability::Pure(_) => FuncMutability::Pure,
                    }
                },
            }
        }
    });
    (visibility, mutability)
}

Calling functions

So far we’ve built each individual piece of our VM, but we still need a way to interact with it. We do so by giving it calldata.

Let’s implement a call function on Contract that takes calldata as hex and returns a a tuple containing a new modified contract and return the top most value from the stack:

impl Contract {
    pub fn new(name: String) -> Self {
        Self {
            name,
            ..Contract::default()
        }
    }
    
    pub fn call(&self, calldata: &str) -> (Contract, Vec<Expression>) {
        match self.functions.get(&calldata.to_string()) {
            Some(function) => {
                let mut vm = VM::new(function.program.clone(), calldata.as_bytes());
                let new_storage = vm.run(self.storage.clone());
        
                //Read return values from stack
                let mut ret: Vec<Expression> = vec![];
                function.returns.iter().for_each(|param| {
                    if let Some(r) = vm.stack.pop() {
                        match param {
                            Parameter { ty: Expression::Type(Type::Bool(_)), .. } => {
                                    ret.push(Expression::BoolLiteral(r == U256::ONE));
                            },
                            _ => {},
                        }
                    }
                });
        
                (Contract {
                    storage: if let FuncMutability::View | FuncMutability::Pure = function.mutability { self.storage.clone() } else { new_storage },
                    ..self.clone()
                }, ret)
            }
            None => {
                return (self.clone(), vec![]);
            }
        }
    }
}

Calling functions by hash

In EVM you have to call functions by their signature hash. Add a new dependency to our Cargo.toml:

keccak-hash = "0.10.0"

Now import the hash function at the top of tinyvm.rs:

use keccak_hash::{keccak};
fn find_function_signature(name: String, params: ParameterList) -> String {
    ...
        //format!("{}({})", name, params_str)
        get_func_sig(format!("{}({})", name, params_string))   // <-- Add this
    } else {
        //format!("{}()", name)
        get_func_sig(format!("{}()", name))  // <-- Add this
    }
}

pub fn get_func_sig(in_str: String) -> String {
    keccak(in_str.as_bytes())[..4].to_vec().iter().map(|b| format!("{:02x}", b)).collect::<String>() // <-- Add this
}

Don’t relax just yet, as our contract functions do not contain any OPcodes still. To fix that we have to handle statements and expressions found inside function body.

Statements and Expressions

To make our functions actually do anything we need to handle statements and expressions.

fn handle_statement(statement: Statement, contract: &mut Contract) -> Vec<OP> {
    match statement {
        Statement::Expression(expr, _) => {
            handle_expression(expr, contract)
        },
        Statement::Return(_, expr, _) => {
            match expr {
                Some(expr) => [handle_expression(expr, contract), vec![OP::RETURN]].concat(),
                None => vec![OP::RETURN],
            }
        },
    }
}

fn handle_expression(expr: Expression, contract: &mut Contract) -> Vec<OP> {
    todo!()
}

The idea is pretty simple, if we encounter a return statement we add a RETURN OPCode, if we encounter an expression we pass it to another handler.

Now for the final piece of the puzzle, handling of expressions:

fn handle_expression(expr: Expression, contract: &mut Contract) -> Vec<OP> {
    match expr {
        Expression::BoolLiteral(val) => {
            vec![]
        },
        Expression::Variable(identifier) => {
            let mut slot = 0;
            if let Some(found) = contract.variable_map.get(&identifier.name.clone()) {
                slot = *found;
            }

            vec![
                OP::PUSH1(slot as u8),
                OP::SLOAD
            ]
        },
        Expression::Assign(left, _, right) => {
            if let Expression::Variable(identifier) = *left {
                let mut slot = 0;
                if let Some(found) = contract.variable_map.get(&identifier.name.clone()) {
                    slot = *found;
                }
                [handle_expression(*right, contract),
                vec![OP::PUSH1(slot as u8), OP::SSTORE]].concat()
            } else {
                vec![]
            }
        },
        Expression::Not(_, expr) => {
            [handle_expression(*expr, contract), vec![OP::ISZERO]].concat()
        },
        Expression::Type(ty) => {
            match ty {
                Type::Bool(_) => vec![], //TODO
                _ => vec![],
            }
        },
    }
}

Assuming you have been following the steps correctly, you should now have a working VM that can toggle our state variable. Let’s test whether we’re able to toggle the state variable by creating a new unit test:

    #[test]
    fn test_call_flipper_contract_flip() {
        let code = std::fs::read_to_string("./contracts/flipper.sol").expect("Unable to read source file");
        let parsed = parse(code.as_str());
        println!("{:#?}", parsed);
        assert!(parsed.is_ok());

        match parsed {
            Ok(source_unit) => {
                let contracts = create_contracts(source_unit);
                let contract = contracts.first();
                assert!(contract.is_some());

                let flip_func_sig = get_func_sig("flip()".to_string());
                let mutated_contract = contract.unwrap().call(flip_func_sig.as_str()).0;

                let get_func_sig = get_func_sig("get()".to_string());
                let ret = mutated_contract.call(get_func_sig.as_str()).1;
                match ret.as_slice() {
                    [solidity::grammar::Expression::BoolLiteral(val)] => {
                        assert_eq!(*val, true);
                    }
                    _ => assert!(false, "Unexpected return value: {:?}", ret)
                }
                println!("Return value: {:?}", ret);
                
            },
            Err(e) => assert!(false, "Error: {:?}", e)
        }
    }

Run this command to confirm that our contract call indeed returns true:

cargo test test_call_flipper_contract_flip -- --nocapture

The output should be:

Return value: [BoolLiteral(true)]
test tests::test_call_flipper_contract_flip ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.02s

And that is it for now! I hope this tutorial was educational and you have actually learned a thing or two about Solidity, EVM or Rust itself. Next time we’re going to revisit function arguments, calldata, constructors, getters and more. You can also download the entire project here Until next time!

overlay