A compiler plugin to support casting

 
Download the simple compiler and virtual machine in Php with plugins to support casting:
 
 
 

1. Pluggability

As argued in previous blog posts, programming is primarily the search for an efficient extension mechanism for the program that you are creating. The only result that matters, is the emergence of an appropriate extension mechanism. Everything else can be replaced later on. If programming is "difficult", then it means that designing this extension mechanism is difficult. The extension mechanism design may also simply be wrong. The extension mechanism is faulty when there is a serious extension impedance mismatch. Programming results in the appearance of artifacts such files, folders, tables, columns, and similar objects. There is an extension impedance mismatch when adding or removing features leads to modifying artifacts. When the extension mechanism works smoothly, adding and removing features leads correctly to the addition and removal of artifacts.
 
 

2. Pluggability as the solution to merge conflicts

It is also the extension impedance mismatch that creates the versioning problem. The solution to the versioning problem is not to use source control but to avoid the extension impedance mismatch problem altogether. The appropriate engineering strategy consists in re-architecting and re-factoring the program until adding features no longer leads to modifying existing artifacts but to adding them. Consequently, correctly-designed programs are built from the ground up from extensions.
 
Imagine we have a program P0 and the features f1 and f2. We want add features f1 and f2 to the program. Adding feature f1 amounts to modifying artifacts in program P0 leading to program P1. Adding feature f2 to program P0 amounts modifying different or even the same artifacts leading to program P2. From there, there is absolutely no guarantee that we can add feature f2 to program P1 without incurring a merge conflict.
 
If adding feature f1 does not lead to modifying artifacts in program P0, but only to adding artifacts, the new program P1 will contain a verbatim copy of P0. If the addition of feature f2 was originally tested against P0, there can be no physical merge conflict by adding it to P1. We have effectively avoided the extension impedance mismatch and remove the need for version control.
 
Obviously, this procedure will do nothing to avoid semantical merge conflicts. But then again, avoiding physical merge conflicts, is in itself already a major victory.
 
 

3. Extending the compiler to support casting

In this blog post, I will add casting support to the compiler prototype.
 
A variable is essentially a memory address. At this memory address, there is a particular memory area, e.g. four bytes, which contain a value, i.e., the memory value. At this level, there is no difference between statically -and dynamically-typed languages. When we have a closer look at the memory value, however, we can see a difference. In dynamically-typed languages the memory value contains an indication of what type of memory value it is. In statically-typed languages the interpretation of a memory value is attached to the variable pointing to the memory value.
 
 
language style memory value
static typing 5
dynamic typing int; 5
 
 
An example of a cast:
 


Show/Hidden php code
View source
 
long a=3;					
int b=(int)a;
 
 
Casting amounts to looking at the same memory location but now in a different way. In dynamically-typed languages this is not possible. You would have to change the type information embedded in the value itself. Casting in dynamically-typed languages would look like this:
 


Show/Hidden php code
View source
 
a->value=3;
a->type=TYPE_LONG;
b=a;
 
 
This operation is potentially unsafe just like in statically-typed languages:
 


Show/Hidden php code
View source
 
char *a = "hello";
int *b=(int *)a;
 
 
Even though casting is often necessary in statically-typed languages it is questionable if there are truly valid use cases for doing so in dynamically-typed languages. But then again, we can still implement the same casting syntax in scripting in order to do conversion instead of a re-interpretation of a value from the one type to the other:
 


Show/Hidden php code
View source
 
a=3;
b=(long)a;
 
 

4. Types in scripting languages

A value in scripting will typically be of one of the following types:
 
Primitive bool Can be true or false  
  int Usually 32-bit integers stack overflow
  long Usually 64-bit integers This definition does not seem to change with the move from 32-bit to 64-bit architectures
  float, double Usually 8-byte real numbers  
  string Usually null-terminated utf-8 characters  
  native, resource A pointer to native data structure defined and managed in detail outside the scripting engine  
Complex list Usually a pointer to an array list; sometimes to a linked list  
  hash table Usually a pointer to an efficient key-value collection; the workhorse data type in most scripting languages  
  object Usually a pointer to a hash table "blessed" with a few added fields; allows for OOP synctactic sugar Javascript and Lua do not even implement a separate syntax for this. That practice indeed ends up creating more problems than it solves.
 
 

5. Converting between types

Not all types can be converted. Converting a native type to anything else is too dangerous to contemplate. It also does not make sense to look at a list, hash table, or object as anything else than they are. Conversion of complex types to string (__toString()) can be done is so many different ways that it is not necessarily a good idea to impose a standard way for doing this.
 
Converting to and from booleans is also prone to misinterpretation. For example, converting from int to bool can be absolutely counterintuitive. It is probably better to leave it to the programmer to decide if -1 really means true while 0 and all other numbers including 1 mean false. If converting to and from floats is done with maximum precision the process could be relatively unambiguous. Automatic conversion is therefore only meaningful between the int, float, and string types. Note that we do not just change the type information embedded inside the value. We really convert the entire content itself, i.e. both value and type indication.
 
from/to int float string
int - yes yes
float yes - yes
string yes yes -
 
The Php script engine supplies all of the logic already to implement the conversion. First, the variable must satisfy one of the following functions: is_float($x), is_int($x), is_string($x). Next, the scripting engine needs to pick the appropriate conversion function from the following list:
 
  • int2float: return floatval($x);
  • int2string: return strval($x);
  • float2int: return intval($x);
  • float2string: return strval($x);
  • string2int: return intval($x);
  • string2float: return floatval($x);

 

6. Casting syntax support

The casting syntax could conveniently be reused for conversion but it may confuse people. It is not the same operation as in C where it amounts to a re-interpretation of the same memory value.
 
It is also not the same casting as in Java or C#. Most casting in Java/C# amounts to looking at the same object but through the eyes of a parent or a child class. This is another issue that would never occur in scripting languages because the act of invoking a method on an object is always validated at run time. Especially in highly configurable languages such as Javascript where you can add and remove methods on the fly it it generally not possible to say at compile time if a method will exist for an object. This flexibility is "a good thing". Without it, the user interfaces in web pages would be as inflexible as on the desktop.
 
In fact, casting does not really exist in scripting languages. What exists, is conversion. In this blog post, I implement casting just as an excercise, and not because it would be tremendously useful.
 
 

7. Lexer support



Show/Hidden php code
View source
 
a = (int)b;
 
 
All symbols are already supported in the existing prototype. We do not need to extend the lexer for casting.
 
 

8. Parser support

The traditional casting notation is actually ambiguous. The following expression would be a perfectly valid arithmetic expression:
 
 


Show/Hidden php code
View source
 
a = (x);
 
 
However, we do not know if the x symbol is a type name or an identifier at the point of lexing. It is a well-known problem. In order to know, the lexer would need the entire list of all possible types before lexing this statement. In languages that define new types on the fly the lexer must use data produced earlier by the parser to distinguish between type names and identifiers. We do not need to put in that effort since we only convert between bool, int, float, and string. So, for us the allowable type names are known beforehand. We just define these type names as new keywords.
 
 

9. Example program



Show/Hidden php code
View source
 
//These are examples of CAST conversions
// (1) int2string
a=(string)-100;
println('a='.a);
 
// (2) string2int
a=(int)"50";
println('a='.a);
 
// (3) int2float
a=(float)5;
println('a='.a);
 
// (4) float2int
a=(int)5.3;
println('a='.a);
 
// (5) float2string
a=(string)5.3;
println('a='.a);
 
// (6) string2float
a=(float)"5.3";
println('a='.a);
 
 
The program output is:


Show/Hidden php code
View source
 
$ ./tl examples/example5.tl
a=-100
a=50
a=5
a=5
a=5.3
a=5.3
 
 
 

10. Compiler plugin

 
A semantic compiler plugin is sufficient to support the casting syntax in the scripting engine. There is no need for other plugins:


Show/Hidden php code
View source
 
class _Cast implements ICompilerSemanticPlugin
{
        function onGeneratingGrammar($generator) 
        {
                $generator->addToken('TYPE_BOOL');
                $generator->addToken('TYPE_INT');
                $generator->addToken('TYPE_FLOAT');
                $generator->addToken('TYPE_STRING');
                $generator->addPriority('CAST','left',PARSER_PRIORITY_CAST);
                $generator->addGrammarRule('expression: cast expression %prec CAST');
                $generator->addGrammarRule('cast: BRACKET_LEFT type_name BRACKET_RIGHT');
                $generator->addGrammarRule('type_name: TYPE_BOOL');
                $generator->addGrammarRule('type_name: TYPE_INT');
                $generator->addGrammarRule('type_name: TYPE_FLOAT');
                $generator->addGrammarRule('type_name: TYPE_STRING');
        }
 
        function beforeLexing($compiler)
        {
                $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_BOOL','bool');                
                $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_INT','int');                
                $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_FLOAT','float');                
                $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_STRING','string');                
        }
 
        function beforeParsing($compiler) 
        {
                $compiler->associateRuleNumberForLHSSymbol('CAST','cast');
                $compiler->registerNamedReduceCallback(
                        'CAST',
                        function($me,$stackStates)
                        {
                                $embeddedStackStates=$stackStates[1]->token->value;
                                $embeddedStackStates[0]->token->symbol='TYPE';
                                emitOperand($me,$embeddedStackStates);
                        }
                );
 
                $compiler->associateRuleNumberForRHSSymbolAndNumberOfTerms('CAST_OPERATOR','cast',2);
                $compiler->registerNamedReduceCallback(
                        'CAST_OPERATOR',
                        function($me,$stackStates)
                        {
                                $embeddedStackStates=$stackStates[0]->token->value;
                                $stackState=$embeddedStackStates[0];
                                $stackState->token->symbol='CAST';
                                emitOperator($me,$stackState,2);
                        }
                );
        }
}
?>
 
 
The compiler generates the following bytecode:
 


Show/Hidden php code
View source
 
----          |----|------        |--- |---|---|---    |-----
code          |args|symbol        |pos |len|lin|col    |value 
----          |----|------        |--- |---|---|---    |----- 
PUSH_OPERAND  |    |IDENTIFIER    |60  |1  |4  |1      |a 
PUSH_OPERAND  |    |TYPE          |63  |6  |4  |4      |string 
PUSH_OPERAND  |    |NUMBER        |71  |3  |4  |12     |100 
EXEC_OPERATOR |1   |UNARY_MINUS   |70  |1  |4  |11     |- 
EXEC_OPERATOR |2   |CAST          |62  |1  |4  |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |61  |1  |4  |2      |= 
RESET_STACK   |    |SEMICOLON     |74  |1  |4  |15     |; 
PUSH_OPERAND  |    |STRING        |84  |4  |5  |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |89  |1  |5  |14     |a 
EXEC_OPERATOR |2   |CONCAT        |88  |1  |5  |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |76  |7  |5  |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |76  |0  |5  |1      | 
RESET_STACK   |    |SEMICOLON     |91  |1  |5  |16     |; 
PUSH_OPERAND  |    |IDENTIFIER    |112 |1  |8  |1      |a 
PUSH_OPERAND  |    |TYPE          |115 |3  |8  |4      |int 
PUSH_OPERAND  |    |STRING        |119 |4  |8  |8      |50 
EXEC_OPERATOR |2   |CAST          |114 |1  |8  |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |113 |1  |8  |2      |= 
RESET_STACK   |    |SEMICOLON     |123 |1  |8  |12     |; 
PUSH_OPERAND  |    |STRING        |133 |4  |9  |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |138 |1  |9  |14     |a 
EXEC_OPERATOR |2   |CONCAT        |137 |1  |9  |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |125 |7  |9  |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |125 |0  |9  |1      | 
RESET_STACK   |    |SEMICOLON     |140 |1  |9  |16     |; 
PUSH_OPERAND  |    |IDENTIFIER    |159 |1  |12 |1      |a 
PUSH_OPERAND  |    |TYPE          |162 |4  |12 |4      |bool 
PUSH_OPERAND  |    |NUMBER        |167 |1  |12 |9      |5 
EXEC_OPERATOR |2   |CAST          |161 |1  |12 |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |160 |1  |12 |2      |= 
RESET_STACK   |    |SEMICOLON     |168 |1  |12 |10     |; 
PUSH_OPERAND  |    |STRING        |178 |4  |13 |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |183 |1  |13 |14     |a 
EXEC_OPERATOR |2   |CONCAT        |182 |1  |13 |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |170 |7  |13 |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |170 |0  |13 |1      | 
RESET_STACK   |    |SEMICOLON     |185 |1  |13 |16     |; 
PUSH_OPERAND  |    |IDENTIFIER    |205 |1  |16 |1      |a 
PUSH_OPERAND  |    |TYPE          |208 |5  |16 |4      |float 
PUSH_OPERAND  |    |NUMBER        |214 |1  |16 |10     |5 
EXEC_OPERATOR |2   |CAST          |207 |1  |16 |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |206 |1  |16 |2      |= 
RESET_STACK   |    |SEMICOLON     |215 |1  |16 |11     |; 
PUSH_OPERAND  |    |STRING        |225 |4  |17 |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |230 |1  |17 |14     |a 
EXEC_OPERATOR |2   |CONCAT        |229 |1  |17 |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |217 |7  |17 |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |217 |0  |17 |1      | 
RESET_STACK   |    |SEMICOLON     |232 |1  |17 |16     |; 
PUSH_OPERAND  |    |IDENTIFIER    |252 |1  |20 |1      |a 
PUSH_OPERAND  |    |TYPE          |255 |3  |20 |4      |int 
PUSH_OPERAND  |    |NUMBER        |259 |3  |20 |8      |5.3 
EXEC_OPERATOR |2   |CAST          |254 |1  |20 |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |253 |1  |20 |2      |= 
RESET_STACK   |    |SEMICOLON     |262 |1  |20 |11     |; 
PUSH_OPERAND  |    |STRING        |272 |4  |21 |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |277 |1  |21 |14     |a 
EXEC_OPERATOR |2   |CONCAT        |276 |1  |21 |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |264 |7  |21 |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |264 |0  |21 |1      | 
RESET_STACK   |    |SEMICOLON     |279 |1  |21 |16     |; 
PUSH_OPERAND  |    |IDENTIFIER    |302 |1  |24 |1      |a 
PUSH_OPERAND  |    |TYPE          |305 |6  |24 |4      |string 
PUSH_OPERAND  |    |NUMBER        |312 |3  |24 |11     |5.3 
EXEC_OPERATOR |2   |CAST          |304 |1  |24 |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |303 |1  |24 |2      |= 
RESET_STACK   |    |SEMICOLON     |315 |1  |24 |14     |; 
PUSH_OPERAND  |    |STRING        |325 |4  |25 |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |330 |1  |25 |14     |a 
EXEC_OPERATOR |2   |CONCAT        |329 |1  |25 |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |317 |7  |25 |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |317 |0  |25 |1      | 
RESET_STACK   |    |SEMICOLON     |332 |1  |25 |16     |; 
PUSH_OPERAND  |    |IDENTIFIER    |355 |1  |28 |1      |a 
PUSH_OPERAND  |    |TYPE          |358 |5  |28 |4      |float 
PUSH_OPERAND  |    |STRING        |364 |5  |28 |10     |5.3 
EXEC_OPERATOR |2   |CAST          |357 |1  |28 |3      |( 
EXEC_OPERATOR |2   |ASSIGN        |356 |1  |28 |2      |= 
RESET_STACK   |    |SEMICOLON     |369 |1  |28 |15     |; 
PUSH_OPERAND  |    |STRING        |379 |4  |29 |9      |a= 
PUSH_OPERAND  |    |IDENTIFIER    |384 |1  |29 |14     |a 
EXEC_OPERATOR |2   |CONCAT        |383 |1  |29 |13     |. 
PUSH_OPERAND  |    |IDENTIFIER    |371 |7  |29 |1      |println 
EXEC_OPERATOR |2   |FUNCTION_CALL |371 |0  |29 |1      | 
RESET_STACK   |    |SEMICOLON     |386 |1  |29 |16     |; 
 
 
The compiler generates a new kind of operator for the virtual machine instruction EXEC_OPERATOR: CAST.
 
 
 

11. Virtual machine plugin

We implement the virtual machine plugin as following:


Show/Hidden php code
View source
 
<?php 
/* 
        Virtual Machine operator   
        Convention: vmOperator+operator name 
        For binary operators, the VM already pops 2 elements from the stack 
        and sends them through the function call 
 
*/ 
 
function vmTypeName($value) 
{ 
        if(is_int($value)) return 'int'; 
        else if(is_float($value)) return 'float'; 
        else if(is_string($value)) return 'string'; 
        else return null; 
} 
 
function vmSupportedCastTypeNames($typeName) 
{ 
        switch($typeName) 
        { 
                case 'int': return true; 
                case 'float': return true; 
                case 'string': return true; 
                default: return false; 
        } 
} 
 
function vmOperatorCast($vm,$vmInstruction,$value1,$value2) 
{ 
        $typeNameFrom=vmTypeName($value2); 
        if(!vmSupportedCastTypeNames($typeNameFrom)) 
                        vmInstructionError($vm,$vmInstruction, 
                                        "Cannot cast from type ".gettype($value2)); 
        $typeNameTo=$value1; 
        if(!vmSupportedCastTypeNames($typeNameTo)) 
                        vmInstructionError($vm,$vmInstruction, 
                                        "Cannot cast to type ".$typeNameTo); 
 
        if($typeNameFrom==$typeNameTo) 
        { 
                //nothing to do 
                return; 
        } 
        $conversionFunctionName=$typeNameFrom.'2'.ucfirst($typeNameTo); 
        vmPushResult($vm,$value2); 
        vmPushResult($vm,$conversionFunctionName); 
        vmOperatorFunction_Call($vm,$vmInstruction); 
} 
?> 
 
 
 
The cast instruction is just a binary operator taking a typename and the result of an expression. It looks up the conversion function between the actual type of the value and the desired one and then calls the corresponding function. We simply add these conversion functions to the collection of builtin functions:
 
  • float2Int.php
  • float2String.php
  • int2Float.php
  • int2String.php
  • string2Float.php
  • string2Int.php
 
The implementation for such conversion function is rather simple. For example, float2Int.php:
 


Show/Hidden php code
View source
 
<?php  
/*  
            Virtual Machine function plugin  
            Convention: vmFunction+function name  
            Will be called with a reference to the vm: $vm  
            a reference to the instruction that is executing the function call: $vmInstruction  
            the arguments to the function: $args  
 
*/  
function vmFunctionfloat2Int($vm,$vmInstruction,$args)  
{  
            vmAssertNumberOfFunctionArgs($vm,$vmInstruction,'float2Int',$args,1);  
            return intval($args[0]);  
}  
?>  
 
 
 

12. Conclusion

 
In dynamically-typed languages casting is at best just syntactic sugar for type conversion.