Before anyone asks me “why a new interpreter from scratch why not Parrot”, well if I am honest i didn't hear about parrot until about a year ago! And I am getting involved with those guys, and really I think for languages and interpreter/compiler/VM 's to develop and extend we need more implementations of different types since there don't seem to be many new implementations of languages from scratch these days. Allison Randal said in a talk, current languages currently all share ideas from languages from the 70's that even includes mine, you rarely will ever see a truly new language, and one of the main reasons for that there really aren't that many tools to build and prototype ideas for languages cheaply and quickly, unlike web development for example there are thousands. I always say in my papers and talks, “If I have been able to see further than others, it is because I have stood on the shoulders of giants.” which is what we need more of since we are in danger of moving at a snails pace to improve programming languages to become more accessible or intelligent or expressive to bring ideas together without sacrificing common-sense to keep them usable for serious computation.
So lets move onto hacking, the most important thing to get across here is that I have the vision of Crules being a total 'common sense' and no 'non-sense' infrastructure. I personally believe having a clean interpreter with a well written runtime and efficient garbage collection will beat any byte-compiled (with a JIT) dynamic language like Java or Ruby; With the exception for the 1.5 release I want to start to work towards having a JIT compiler being incorporated or some designs thought of, to truely see how much a JIT actually helps, by either building our own or using LibJit or GNU lightening or LLVM; most likely candidate is LLVM with libjit a close second. I want to see how dynamic a JIT can be with hopefully minimum impact to the running of the interpreter before the running code is JIT'd.
So if you want to see how Crules works internally I could go on and explain high level aims and ideas but really the most useful way is to get stuck in turn on debug and try programming and read the output of what's going on its simple enough there isn't any crazy cool ideas going on yet its designed to be clean and simple for speed for now! Take a look at this debug section for more detailed instructions…
If you come into bugs please:
./configure --with-debug=yes make clean make ./src/crules # reproduce the bug and mail all the stdout/stderr see http://crules.org # of where to post and more info on bugs
Even if the bug is well known or obvious please let me know since any info helps!
Lets take a look at the output it may give for a program like ``examples/goto.crl'':
list = { 1,2,3,4,5 };
defun function {
print "$list";
list = 5;
}
function();
print "$list";
Without Debug:
redbrain ~/workspace/crules-dev/imp/crules $ ./src/crules examples/goto.crl
{ 1, 2, 3, 4, 5 }
5
With Debug:
redbrain@omicron:~/workspace/crules/imp/crules$ ./src/crules examples/goto.crl
debug: <hash_table.c:init_tables:476> -> Initilizing...
debug: <hash_table.c:init_tables:495> -> System tables initilized!
debug: <main.c:crules:60> -> trying to open 'examples/goto.crl'!
debug: <runtime.c:toplevel_exec_symbol:1215> -> exec of pass symbol type: '0xF021'
debug: <runtime.c:expression_assign:624> -> expression assign!
debug: <runtime.c:evaluate_expression:666> -> evaluation expression!
debug: <runtime.c:evaluate_expression:669> -> expression is a singleton item!
debug: <runtime.c:format_list:50> -> formating list...
debug: <runtime.c:function_branch_exec_symbol:1100> -> exec of pass symbol type: '0xF01F'
debug: <runtime.c:variable_assignment:153> -> variable assignment!
debug: <hash_table.c:lookup_context_table:258> -> looking up identifier 'list' - hash 0xCFB5881
debug: <hash_table.c:lookup_context_table:261> -> stack frame size :: '1'!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <runtime.c:function_branch_exec_symbol:1203> -> symbol pass fully executed!
debug: <runtime.c:toplevel_exec_symbol:1309> -> symbol pass fully executed!
debug: <garbage.c:free_hash_table:160> -> freeing hash table!
debug: <garbage.c:free_hash_table_symbols:170> -> free_hash_table_symbols!
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '3'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '1'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '2'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '0'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '4'
debug: <parser.y:yyparse:582> -> anon method definition!
debug: <parser.y:yyparse:154> -> pushing method definition!
debug: <parser.y:yyparse:713> -> function call without args!
debug: <runtime.c:toplevel_exec_symbol:1215> -> exec of pass symbol type: '0xF033'
debug: <runtime.c:function_branch_exec_symbol:1100> -> exec of pass symbol type: '0xF035'
debug: <hash_table.c:lookup_context_table:258> -> looking up identifier 'list' - hash 0xCFB5881
debug: <hash_table.c:lookup_context_table:261> -> stack frame size :: '2'!
{ 1, 2, 3, 4, 5 }
debug: <runtime.c:function_branch_exec_symbol:1203> -> symbol pass fully executed!
debug: <runtime.c:function_branch_exec_symbol:1100> -> exec of pass symbol type: '0xF021'
debug: <runtime.c:expression_assign:624> -> expression assign!
debug: <runtime.c:evaluate_expression:666> -> evaluation expression!
debug: <runtime.c:evaluate_expression:669> -> expression is a singleton item!
debug: <runtime.c:function_branch_exec_symbol:1100> -> exec of pass symbol type: '0xF01F'
debug: <runtime.c:variable_assignment:153> -> variable assignment!
debug: <hash_table.c:lookup_context_table:258> -> looking up identifier 'list' - hash 0xCFB5881
debug: <hash_table.c:lookup_context_table:261> -> stack frame size :: '2'!
debug: <runtime.c:symbol_clone:516> -> symbol cloning!
debug: <garbage.c:free_hash_table:160> -> freeing hash table!
debug: <garbage.c:free_hash_table_symbols:170> -> free_hash_table_symbols!
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '3'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '1'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '2'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '0'
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier '4'
debug: <runtime.c:function_branch_exec_symbol:1203> -> symbol pass fully executed!
debug: <runtime.c:function_branch_exec_symbol:1203> -> symbol pass fully executed!
debug: <runtime.c:exec_branch:1340> -> freeing local stack frame!
debug: <garbage.c:free_hash_table:160> -> freeing hash table!
debug: <garbage.c:free_hash_table_symbols:170> -> free_hash_table_symbols!
debug: <runtime.c:exec_branch:1343> -> done freeing stack frame!
debug: <runtime.c:toplevel_exec_symbol:1309> -> symbol pass fully executed!
debug: <runtime.c:toplevel_exec_symbol:1215> -> exec of pass symbol type: '0xF035'
debug: <hash_table.c:lookup_context_table:258> -> looking up identifier 'list' - hash 0xCFB5881
debug: <hash_table.c:lookup_context_table:261> -> stack frame size :: '1'!
5
debug: <runtime.c:toplevel_exec_symbol:1309> -> symbol pass fully executed!
debug: <garbage.c:cleanup:92> -> cleanup.......
log: <util.c:print_stats:160> -> --- Crules Stats ---
log: <util.c:print_stats:161> -> * heap size 0xF0
log: <util.c:print_stats:162> -> * heap limit 0x1000000
log: <util.c:print_stats:163> -> * number malloc 35
log: <util.c:print_stats:164> -> * number calloc 7
log: <util.c:print_stats:165> -> * number realloc 0
log: <util.c:print_stats:166> -> * number free 47
log: <util.c:print_stats:170> -> * function table size 1
log: <util.c:print_stats:179> -> * rule table size 0
log: <util.c:print_stats:185> -> --- --- --- --- ---
debug: <garbage.c:free_hash_table:160> -> freeing hash table!
debug: <garbage.c:free_hash_table_symbols:170> -> free_hash_table_symbols!
debug: <garbage.c:free_hash_table:160> -> freeing hash table!
debug: <garbage.c:free_hash_table_symbols:170> -> free_hash_table_symbols!
debug: <garbage.c:free_symbol:121> -> garbage symbol identifier 'function'
debug: <garbage.c:free_context_table:215> -> stack frame size :: '1'!
debug: <garbage.c:free_hash_table:160> -> freeing hash table!
debug: <garbage.c:free_hash_table_symbols:170> -> free_hash_table_symbols!
log: <util.c:print_stats:160> -> --- Crules Stats ---
log: <util.c:print_stats:161> -> * heap size 0xF
log: <util.c:print_stats:162> -> * heap limit 0x1000000
log: <util.c:print_stats:163> -> * number malloc 35
log: <util.c:print_stats:164> -> * number calloc 7
log: <util.c:print_stats:165> -> * number realloc 0
log: <util.c:print_stats:166> -> * number free 62
log: <util.c:print_stats:174> -> * function table size nill
log: <util.c:print_stats:183> -> * rule table size nill
log: <util.c:print_stats:185> -> --- --- --- --- ---
So you can easily see there is a lot behind the scenes!
Personally I find Valgrind much more useful than GDB as most errors in C/C++ programs is generally due to memory related problems and this is what valgrind is brilliant at finding for you!
% aptitude install valgrind % ./configure --with-debug=yes % make clean && make % valgrind ./src/crules #For interactive session #% valgrind ./src/crules <bla.crl> #To run against a test # Useful options for valgrind may include: valgrind -v --tool=memcheck --track-fds=yes ./src/crules
You may find this link to be usefull Valgrind Tutorial
Personally i have not really found GDB useful, since i know this code so well since its all written from scratch by me i think i generally know what's going on pretty well, though new people to projects find GDB useful to trace programs to get better understanding of the logic and ideas in a software project. So lets do a basic GDB setup from command line:
% ./configure --with-debug=yes
% make clean && make
% gdb --args ./src/crules <test.crl> # It is best to use a source code file for a test since readline will
# mess-up your gdb interaction
%% start
%% next...
%% breakpoint ...
A much better tutorial lies on google or try this one is good: GDB Tutorial, though if your using Linux and more specificity Gnome try this app out: Nemiver, the Lead Developer was very nice and let us use his gitlog2gnucl ruby script to generate a proper Changelog! Its much easier to use!
This is a very broad topic so i'll just talk about it all in general here!
Please see the Coding Guidlines before submitting code since it keeps the project tidy and easy to read.
This is the most important part of Crules in its design, i spent a very long time getting this IR(Intermediate Representation) correct. Basically i needed something to represent language semantics at any point within the language. So for example lets take a look at how expressions are represented then lets see how blocks are represented and functions should give you a good idea.
(( x + y ) - ( ( x + y ) * ( x - y ) ) ) + ( ( x + y ) * ( x - y ) )
This is a very long expression but how does a Compiler or VM or Interpreter evaluate this! So my parser is precedence sensitive parser. And it will construct a symbol table or 3-address code that looks like this:
So that's ok but how is it represented, well this is the interesting part about crules I have this one IR which can represent language semantics of any type in the language.
/** * 3 adress code style symtab **/ struct symtab { char *identifier; //fixed width because pre-defined size known unsigned short sym_type; unsigned short op_a_t; unsigned short op_b_t; union { int integer; long int lint; double dbl; unsigned char ch; bool bl; const char *str; struct symtab *syms; struct hash_table *list; struct stack_t *stack; } op_a; union { int integer; long int lint; double dbl; unsigned char ch; bool bl; const char *str; struct symtab *syms; struct hash_table *list; struct stack_t *stack; } op_b; struct symtab *next; } ;
So what is this? Well early on when i was getting into compiler construction when doing semantic analysis you will need to find a way of representing these semantics for your language, and you will find there is always 3 things you need to know a DESTINATION OPERAND A and an OPERAND B. This is what's called 3-address code. This is what sets compilers apart and can make or break them! Its what gives implementations flexibility and the ability to 'control the flow' of execution! Any good compiler book will give you more detailed discussion on this topic its my main interest is these intermediate languages which compilers need! So lets give it a syntax it so happens LISP is a similar'ish syntax so lets keep it something similar.
( ( IDENTIFIER ( TYPE ) ) => ( OPERAND A ) => ( OPERAND B ) );
Since the parser is precedence aware it will construct the tree as shown in a symbol language which looks like this if it was output ( outputting the IR like this is a future TODO ); It may look a little like GIMPLE but this IR I have as I will demonstrate more examples can encompass much much more.. ( this is a 3-address code by the way )
( ( NIL ( OP_ADD ) ) => ( ( ( NIL ( OP_SUBTRACT ) ) => ( ( ( NIL ( OP_ADD ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) ) ; ) => ( ( ( NIL ( OP_MULTIPLY ) ) => ( ( ( NIL ( OP_ADD ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) ); ) => ( ( ( NIL ( OP_SUBTRACT ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) ); ) ); ) ); ) => ( ( ( NIL ( OP_MULTIPLY ) ) => ( ( ( NIL ( OP_ADD ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) ); ) => ( ( ( NIL ( OP_SUBTRACT ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) ); ) ); ) );
So that goes on and on so we evaluate fully recursively, that was kind of a bad example since it was so deep and it was my first example, so what about an assignment like:
retval = ( 5 + x ) - 2;
( ( NIL ( OP_ASSIGN_EVAL ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'retval' ) => ( NIL ) ); ) => ( ( ( NIL ( OP_SUBTRACT ) ) => ( ( ( NIL ( OP_ADD ) ) => ( ( ( NIL ( SYMBOL_ITEM ) ) => ( 5 ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) ); ) => ( ( ( NIL ( SYMBOL_ITEM ) ) => ( 2 ) => ( NIL ) ); ) ); ) );
What about a function? Well its the same idea but we make use of 'next' which i didn't show so far lets look at the function:
# A very pointless computation but just needed a quick idea to show what i want!
defun functor( x , y , z ) {
x = 5; y = 3;
z = x * y ;
}
So in this example i use a '–>' syntax which means 'next' so when you define a function it has an IDENTIFIER, PARAMETERS, and a BLOCK of symbols to execute and evaluate, so we need this block to have some kind of next to address each expression one after the other in a procedural way! So we can loop like :
while ( sym ) { execute_symbol( sym ); sym= sym->next ; }
( ( 'functor' ( FUNCTION_DEFINITION ) ) => ( ( ( NIL ( OP_ASSIGN_EVAL ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ITEM ) ) => ( 5 ) => ( NIL ) ); ) --> ( ( ( NIL ( OP_ASSIGN_EVAL ) ) => ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) => ( ( ( NIL ( SYMBOL_ITEM ) ) => ( 3 ) => ( NIL ) ); ) --> ( ( NIL ( OP_ASSIGN_EVAL ) ) => ( ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'z' ) => ( NIL ) ) ); ) => ( ( ( NIL ( OP_MULTIPLY ) ) => ( ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'x' ) => ( NIL ) ); ) ) => ( ( ( ( NIL ( SYMBOL_ACCESS ) ) => ( 'y' ) => ( NIL ) ); ) ) ); ) ); => ( $PARAMETERS ( 'z' , 'y' , 'x' ) ) # yes its represented as a stack );