A compiler plugin to support casting
- Details
- Published on Sunday, 01 April 2012 07:15
- Hits: 1365
Download the simple compiler and virtual machine in Php with plugins to support casting:
1. Pluggability
As argued in previous blog posts, programming is primarily the search for an efficient extension mechanism for the program that you are creating. The only result that matters, is the emergence of an appropriate extension mechanism. Everything else can be replaced later on. If programming is "difficult", then it means that designing this extension mechanism is difficult. The extension mechanism design may also simply be wrong. The extension mechanism is faulty when there is a serious extension impedance mismatch. Programming results in the appearance of artifacts such files, folders, tables, columns, and similar objects. There is an extension impedance mismatch when adding or removing features leads to modifying artifacts. When the extension mechanism works smoothly, adding and removing features leads correctly to the addition and removal of artifacts.
2. Pluggability as the solution to merge conflicts
It is also the extension impedance mismatch that creates the versioning problem. The solution to the versioning problem is not to use source control but to avoid the extension impedance mismatch problem altogether. The appropriate engineering strategy consists in re-architecting and re-factoring the program until adding features no longer leads to modifying existing artifacts but to adding them. Consequently, correctly-designed programs are built from the ground up from extensions.
Imagine we have a program P0 and the features f1 and f2. We want add features f1 and f2 to the program. Adding feature f1 amounts to modifying artifacts in program P0 leading to program P1. Adding feature f2 to program P0 amounts modifying different or even the same artifacts leading to program P2. From there, there is absolutely no guarantee that we can add feature f2 to program P1 without incurring a merge conflict.
If adding feature f1 does not lead to modifying artifacts in program P0, but only to adding artifacts, the new program P1 will contain a verbatim copy of P0. If the addition of feature f2 was originally tested against P0, there can be no physical merge conflict by adding it to P1. We have effectively avoided the extension impedance mismatch and remove the need for version control.
Obviously, this procedure will do nothing to avoid semantical merge conflicts. But then again, avoiding physical merge conflicts, is in itself already a major victory.
3. Extending the compiler to support casting
In this blog post, I will add casting support to the compiler prototype.
A variable is essentially a memory address. At this memory address, there is a particular memory area, e.g. four bytes, which contain a value, i.e., the memory value. At this level, there is no difference between statically -and dynamically-typed languages. When we have a closer look at the memory value, however, we can see a difference. In dynamically-typed languages the memory value contains an indication of what type of memory value it is. In statically-typed languages the interpretation of a memory value is attached to the variable pointing to the memory value.
| language style | memory value |
| static typing | 5 |
| dynamic typing | int; 5 |
An example of a cast:
Show/Hidden php code
Casting amounts to looking at the same memory location but now in a different way. In dynamically-typed languages this is not possible. You would have to change the type information embedded in the value itself. Casting in dynamically-typed languages would look like this:
This operation is potentially unsafe just like in statically-typed languages:
Show/Hidden php code
Even though casting is often necessary in statically-typed languages it is questionable if there are truly valid use cases for doing so in dynamically-typed languages. But then again, we can still implement the same casting syntax in scripting in order to do conversion instead of a re-interpretation of a value from the one type to the other:
Show/Hidden php code
4. Types in scripting languages
A value in scripting will typically be of one of the following types:
| Primitive | bool | Can be true or false | |
| int | Usually 32-bit integers | stack overflow | |
| long | Usually 64-bit integers | This definition does not seem to change with the move from 32-bit to 64-bit architectures | |
| float, double | Usually 8-byte real numbers | ||
| string | Usually null-terminated utf-8 characters | ||
| native, resource | A pointer to native data structure defined and managed in detail outside the scripting engine | ||
| Complex | list | Usually a pointer to an array list; sometimes to a linked list | |
| hash table | Usually a pointer to an efficient key-value collection; the workhorse data type in most scripting languages | ||
| object | Usually a pointer to a hash table "blessed" with a few added fields; allows for OOP synctactic sugar | Javascript and Lua do not even implement a separate syntax for this. That practice indeed ends up creating more problems than it solves. |
5. Converting between types
Not all types can be converted. Converting a native type to anything else is too dangerous to contemplate. It also does not make sense to look at a list, hash table, or object as anything else than they are. Conversion of complex types to string (__toString()) can be done is so many different ways that it is not necessarily a good idea to impose a standard way for doing this.
Converting to and from booleans is also prone to misinterpretation. For example, converting from int to bool can be absolutely counterintuitive. It is probably better to leave it to the programmer to decide if -1 really means true while 0 and all other numbers including 1 mean false. If converting to and from floats is done with maximum precision the process could be relatively unambiguous. Automatic conversion is therefore only meaningful between the int, float, and string types. Note that we do not just change the type information embedded inside the value. We really convert the entire content itself, i.e. both value and type indication.
| from/to | int | float | string |
| int | - | yes | yes |
| float | yes | - | yes |
| string | yes | yes | - |
The Php script engine supplies all of the logic already to implement the conversion. First, the variable must satisfy one of the following functions: is_float($x), is_int($x), is_string($x). Next, the scripting engine needs to pick the appropriate conversion function from the following list:
- int2float: return floatval($x);
- int2string: return strval($x);
- float2int: return intval($x);
- float2string: return strval($x);
- string2int: return intval($x);
- string2float: return floatval($x);
6. Casting syntax support
The casting syntax could conveniently be reused for conversion but it may confuse people. It is not the same operation as in C where it amounts to a re-interpretation of the same memory value.
It is also not the same casting as in Java or C#. Most casting in Java/C# amounts to looking at the same object but through the eyes of a parent or a child class. This is another issue that would never occur in scripting languages because the act of invoking a method on an object is always validated at run time. Especially in highly configurable languages such as Javascript where you can add and remove methods on the fly it it generally not possible to say at compile time if a method will exist for an object. This flexibility is "a good thing". Without it, the user interfaces in web pages would be as inflexible as on the desktop.
In fact, casting does not really exist in scripting languages. What exists, is conversion. In this blog post, I implement casting just as an excercise, and not because it would be tremendously useful.
7. Lexer support
Show/Hidden php code
All symbols are already supported in the existing prototype. We do not need to extend the lexer for casting.
8. Parser support
The traditional casting notation is actually ambiguous. The following expression would be a perfectly valid arithmetic expression:
Show/Hidden php code
However, we do not know if the x symbol is a type name or an identifier at the point of lexing. It is a well-known problem. In order to know, the lexer would need the entire list of all possible types before lexing this statement. In languages that define new types on the fly the lexer must use data produced earlier by the parser to distinguish between type names and identifiers. We do not need to put in that effort since we only convert between bool, int, float, and string. So, for us the allowable type names are known beforehand. We just define these type names as new keywords.
9. Example program
Show/Hidden php code//These are examples of CAST conversions // (1) int2string a=(string)-100; println('a='.a); // (2) string2int a=(int)"50"; println('a='.a); // (3) int2float a=(float)5; println('a='.a); // (4) float2int a=(int)5.3; println('a='.a); // (5) float2string a=(string)5.3; println('a='.a); // (6) string2float a=(float)"5.3"; println('a='.a);
The program output is:
Show/Hidden php code
10. Compiler plugin
A semantic compiler plugin is sufficient to support the casting syntax in the scripting engine. There is no need for other plugins:
Show/Hidden php codeclass _Cast implements ICompilerSemanticPlugin { function onGeneratingGrammar($generator) { $generator->addToken('TYPE_BOOL'); $generator->addToken('TYPE_INT'); $generator->addToken('TYPE_FLOAT'); $generator->addToken('TYPE_STRING'); $generator->addPriority('CAST','left',PARSER_PRIORITY_CAST); $generator->addGrammarRule('expression: cast expression %prec CAST'); $generator->addGrammarRule('cast: BRACKET_LEFT type_name BRACKET_RIGHT'); $generator->addGrammarRule('type_name: TYPE_BOOL'); $generator->addGrammarRule('type_name: TYPE_INT'); $generator->addGrammarRule('type_name: TYPE_FLOAT'); $generator->addGrammarRule('type_name: TYPE_STRING'); } function beforeLexing($compiler) { $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_BOOL','bool'); $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_INT','int'); $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_FLOAT','float'); $compiler->addLexerKeywordPatternRule(LEXER_PRIORITY_KEYWORD,'TYPE_STRING','string'); } function beforeParsing($compiler) { $compiler->associateRuleNumberForLHSSymbol('CAST','cast'); $compiler->registerNamedReduceCallback( 'CAST', function($me,$stackStates) { $embeddedStackStates=$stackStates[1]->token->value; $embeddedStackStates[0]->token->symbol='TYPE'; emitOperand($me,$embeddedStackStates); } ); $compiler->associateRuleNumberForRHSSymbolAndNumberOfTerms('CAST_OPERATOR','cast',2); $compiler->registerNamedReduceCallback( 'CAST_OPERATOR', function($me,$stackStates) { $embeddedStackStates=$stackStates[0]->token->value; $stackState=$embeddedStackStates[0]; $stackState->token->symbol='CAST'; emitOperator($me,$stackState,2); } ); } } ?>
The compiler generates the following bytecode:
Show/Hidden php code---- |----|------ |--- |---|---|--- |----- code |args|symbol |pos |len|lin|col |value ---- |----|------ |--- |---|---|--- |----- PUSH_OPERAND | |IDENTIFIER |60 |1 |4 |1 |a PUSH_OPERAND | |TYPE |63 |6 |4 |4 |string PUSH_OPERAND | |NUMBER |71 |3 |4 |12 |100 EXEC_OPERATOR |1 |UNARY_MINUS |70 |1 |4 |11 |- EXEC_OPERATOR |2 |CAST |62 |1 |4 |3 |( EXEC_OPERATOR |2 |ASSIGN |61 |1 |4 |2 |= RESET_STACK | |SEMICOLON |74 |1 |4 |15 |; PUSH_OPERAND | |STRING |84 |4 |5 |9 |a= PUSH_OPERAND | |IDENTIFIER |89 |1 |5 |14 |a EXEC_OPERATOR |2 |CONCAT |88 |1 |5 |13 |. PUSH_OPERAND | |IDENTIFIER |76 |7 |5 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |76 |0 |5 |1 | RESET_STACK | |SEMICOLON |91 |1 |5 |16 |; PUSH_OPERAND | |IDENTIFIER |112 |1 |8 |1 |a PUSH_OPERAND | |TYPE |115 |3 |8 |4 |int PUSH_OPERAND | |STRING |119 |4 |8 |8 |50 EXEC_OPERATOR |2 |CAST |114 |1 |8 |3 |( EXEC_OPERATOR |2 |ASSIGN |113 |1 |8 |2 |= RESET_STACK | |SEMICOLON |123 |1 |8 |12 |; PUSH_OPERAND | |STRING |133 |4 |9 |9 |a= PUSH_OPERAND | |IDENTIFIER |138 |1 |9 |14 |a EXEC_OPERATOR |2 |CONCAT |137 |1 |9 |13 |. PUSH_OPERAND | |IDENTIFIER |125 |7 |9 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |125 |0 |9 |1 | RESET_STACK | |SEMICOLON |140 |1 |9 |16 |; PUSH_OPERAND | |IDENTIFIER |159 |1 |12 |1 |a PUSH_OPERAND | |TYPE |162 |4 |12 |4 |bool PUSH_OPERAND | |NUMBER |167 |1 |12 |9 |5 EXEC_OPERATOR |2 |CAST |161 |1 |12 |3 |( EXEC_OPERATOR |2 |ASSIGN |160 |1 |12 |2 |= RESET_STACK | |SEMICOLON |168 |1 |12 |10 |; PUSH_OPERAND | |STRING |178 |4 |13 |9 |a= PUSH_OPERAND | |IDENTIFIER |183 |1 |13 |14 |a EXEC_OPERATOR |2 |CONCAT |182 |1 |13 |13 |. PUSH_OPERAND | |IDENTIFIER |170 |7 |13 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |170 |0 |13 |1 | RESET_STACK | |SEMICOLON |185 |1 |13 |16 |; PUSH_OPERAND | |IDENTIFIER |205 |1 |16 |1 |a PUSH_OPERAND | |TYPE |208 |5 |16 |4 |float PUSH_OPERAND | |NUMBER |214 |1 |16 |10 |5 EXEC_OPERATOR |2 |CAST |207 |1 |16 |3 |( EXEC_OPERATOR |2 |ASSIGN |206 |1 |16 |2 |= RESET_STACK | |SEMICOLON |215 |1 |16 |11 |; PUSH_OPERAND | |STRING |225 |4 |17 |9 |a= PUSH_OPERAND | |IDENTIFIER |230 |1 |17 |14 |a EXEC_OPERATOR |2 |CONCAT |229 |1 |17 |13 |. PUSH_OPERAND | |IDENTIFIER |217 |7 |17 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |217 |0 |17 |1 | RESET_STACK | |SEMICOLON |232 |1 |17 |16 |; PUSH_OPERAND | |IDENTIFIER |252 |1 |20 |1 |a PUSH_OPERAND | |TYPE |255 |3 |20 |4 |int PUSH_OPERAND | |NUMBER |259 |3 |20 |8 |5.3 EXEC_OPERATOR |2 |CAST |254 |1 |20 |3 |( EXEC_OPERATOR |2 |ASSIGN |253 |1 |20 |2 |= RESET_STACK | |SEMICOLON |262 |1 |20 |11 |; PUSH_OPERAND | |STRING |272 |4 |21 |9 |a= PUSH_OPERAND | |IDENTIFIER |277 |1 |21 |14 |a EXEC_OPERATOR |2 |CONCAT |276 |1 |21 |13 |. PUSH_OPERAND | |IDENTIFIER |264 |7 |21 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |264 |0 |21 |1 | RESET_STACK | |SEMICOLON |279 |1 |21 |16 |; PUSH_OPERAND | |IDENTIFIER |302 |1 |24 |1 |a PUSH_OPERAND | |TYPE |305 |6 |24 |4 |string PUSH_OPERAND | |NUMBER |312 |3 |24 |11 |5.3 EXEC_OPERATOR |2 |CAST |304 |1 |24 |3 |( EXEC_OPERATOR |2 |ASSIGN |303 |1 |24 |2 |= RESET_STACK | |SEMICOLON |315 |1 |24 |14 |; PUSH_OPERAND | |STRING |325 |4 |25 |9 |a= PUSH_OPERAND | |IDENTIFIER |330 |1 |25 |14 |a EXEC_OPERATOR |2 |CONCAT |329 |1 |25 |13 |. PUSH_OPERAND | |IDENTIFIER |317 |7 |25 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |317 |0 |25 |1 | RESET_STACK | |SEMICOLON |332 |1 |25 |16 |; PUSH_OPERAND | |IDENTIFIER |355 |1 |28 |1 |a PUSH_OPERAND | |TYPE |358 |5 |28 |4 |float PUSH_OPERAND | |STRING |364 |5 |28 |10 |5.3 EXEC_OPERATOR |2 |CAST |357 |1 |28 |3 |( EXEC_OPERATOR |2 |ASSIGN |356 |1 |28 |2 |= RESET_STACK | |SEMICOLON |369 |1 |28 |15 |; PUSH_OPERAND | |STRING |379 |4 |29 |9 |a= PUSH_OPERAND | |IDENTIFIER |384 |1 |29 |14 |a EXEC_OPERATOR |2 |CONCAT |383 |1 |29 |13 |. PUSH_OPERAND | |IDENTIFIER |371 |7 |29 |1 |println EXEC_OPERATOR |2 |FUNCTION_CALL |371 |0 |29 |1 | RESET_STACK | |SEMICOLON |386 |1 |29 |16 |;
The compiler generates a new kind of operator for the virtual machine instruction EXEC_OPERATOR: CAST.
11. Virtual machine plugin
We implement the virtual machine plugin as following:
Show/Hidden php code<?php /* Virtual Machine operator Convention: vmOperator+operator name For binary operators, the VM already pops 2 elements from the stack and sends them through the function call */ function vmTypeName($value) { if(is_int($value)) return 'int'; else if(is_float($value)) return 'float'; else if(is_string($value)) return 'string'; else return null; } function vmSupportedCastTypeNames($typeName) { switch($typeName) { case 'int': return true; case 'float': return true; case 'string': return true; default: return false; } } function vmOperatorCast($vm,$vmInstruction,$value1,$value2) { $typeNameFrom=vmTypeName($value2); if(!vmSupportedCastTypeNames($typeNameFrom)) vmInstructionError($vm,$vmInstruction, "Cannot cast from type ".gettype($value2)); $typeNameTo=$value1; if(!vmSupportedCastTypeNames($typeNameTo)) vmInstructionError($vm,$vmInstruction, "Cannot cast to type ".$typeNameTo); if($typeNameFrom==$typeNameTo) { //nothing to do return; } $conversionFunctionName=$typeNameFrom.'2'.ucfirst($typeNameTo); vmPushResult($vm,$value2); vmPushResult($vm,$conversionFunctionName); vmOperatorFunction_Call($vm,$vmInstruction); } ?>
The cast instruction is just a binary operator taking a typename and the result of an expression. It looks up the conversion function between the actual type of the value and the desired one and then calls the corresponding function. We simply add these conversion functions to the collection of builtin functions:
- float2Int.php
- float2String.php
- int2Float.php
- int2String.php
- string2Float.php
- string2Int.php
The implementation for such conversion function is rather simple. For example, float2Int.php:
Show/Hidden php code<?php /* Virtual Machine function plugin Convention: vmFunction+function name Will be called with a reference to the vm: $vm a reference to the instruction that is executing the function call: $vmInstruction the arguments to the function: $args */ function vmFunctionfloat2Int($vm,$vmInstruction,$args) { vmAssertNumberOfFunctionArgs($vm,$vmInstruction,'float2Int',$args,1); return intval($args[0]); } ?>
12. Conclusion
In dynamically-typed languages casting is at best just syntactic sugar for type conversion.



