<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/1.5.1.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
>

<channel>
	<title>Jaen's blog</title>
	<link>http://jaen.saul.ee</link>
	<description>Life, coding and everything</description>
	<pubDate>Tue, 06 Dec 2005 21:23:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=1.5.1.3</generator>
	<language>en</language>

		<item>
		<title>Ruby.NET status</title>
		<link>http://jaen.saul.ee/index.php/2005/12/07/rubynet-status/</link>
		<comments>http://jaen.saul.ee/index.php/2005/12/07/rubynet-status/#comments</comments>
		<pubDate>Tue, 06 Dec 2005 21:22:39 +0000</pubDate>
		<dc:creator>Jaen</dc:creator>
		
	<category>Coding</category>
	<category>Ruby</category>
	<category>.NET</category>
		<guid>http://jaen.saul.ee/index.php/2005/12/07/rubynet-status/</guid>
		<description><![CDATA[	A pretty unusable version of Ruby.NET is available from Mono SVN, the URL is on the information page. It can do the basic types, flow control, blocks and method calls. I have not managed to write a single line of (any) code in the last couple of months, must be coder&#8217;s block. Hopefully that will [...]]]></description>
			<content:encoded><![CDATA[	<p>A pretty unusable version of Ruby.NET is available from Mono SVN, the URL is on the <a href="http://jaen.saul.ee/rubynet/">information page</a>. It can do the basic types, flow control, blocks and method calls. I have not managed to write a single line of (any) code in the last couple of months, must be coder&#8217;s block. Hopefully that will change. The code is pretty ugly and needs to be overhauled, though. </p>
	<p>The parser and other parts are modified versions of arton&#8217;s netruby, and the age is beginning to show, the parser has never been fully tested and is not Ruby 1.8 compatible. The produced abstract syntax tree is not very suitable for compiling either, it&#8217;s meant for straight interpretation.</p>
	<p>The code generator is basically all right but incomplete, the code produced is bloated, a more &#8220;microcode&#8221;-like approach where the code would be mostly calls to static helper methods is better.</p>
	<p>The Ruby/.NET interface is a hopeless hack, needs to be replaced with annotations and dynamic stub generation instead of reflection. John Lam has been working on an <a href="http://www.iunknown.com/articles/2005/12/05/refining-the-ruby-cil-dsl">interesting interface between native Ruby and .NET</a>, I like the idea, it should be usable in a slightly different form.</p>
	<p>The biggest problem is getting all the current Ruby libraries/extensions to work, which is practically impossible. In the short term, this makes Ruby.NET only suitable for new code. Some people are working on rewriting the Ruby standard library in Ruby, but no visible results so far, I need to check if we can collaborate.</p>
	<p>A general Ruby code suite needs to be assembled that can be used for conformance, performance and coverage testing, to find out used parser productions, library methods, extensions etc.</p>
	<p>Nothing else comes to mind now, but a message from the bottom of the abyss is still better than silence.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://jaen.saul.ee/index.php/2005/12/07/rubynet-status/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Summer of Code</title>
		<link>http://jaen.saul.ee/index.php/2005/07/01/summer-of-code/</link>
		<comments>http://jaen.saul.ee/index.php/2005/07/01/summer-of-code/#comments</comments>
		<pubDate>Fri, 01 Jul 2005 17:53:16 +0000</pubDate>
		<dc:creator>Jaen</dc:creator>
		
	<category>Uncategorized</category>
		<guid>http://jaen.saul.ee/index.php/2005/07/01/summer-of-code/</guid>
		<description><![CDATA[	My proposal (next post) has been accepted for Google&#8217;s Summer of Code! My project, a  Ruby compiler for .NET, is one of the 16 projects Mono will be mentoring during the summer. The rewards for a successful project are a grant of $4500 and a Google T-shirt.
]]></description>
			<content:encoded><![CDATA[	<p>My proposal (next post) has been accepted for Google&#8217;s <a href='http://code.google.com/summerofcode.html'>Summer of Code</a>! My project, a  Ruby compiler for .NET, is one of the <a href='http://www.mono-project.com/Summer2005'>16 projects</a> <a href='http://www.mono-project.com/'>Mono</a> will be mentoring during the summer. The rewards for a successful project are a grant of $4500 and a Google T-shirt.</p>
]]></content:encoded>
			<wfw:commentRSS>http://jaen.saul.ee/index.php/2005/07/01/summer-of-code/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Ruby.NET proposal</title>
		<link>http://jaen.saul.ee/index.php/2005/07/01/rubynet-proposal/</link>
		<comments>http://jaen.saul.ee/index.php/2005/07/01/rubynet-proposal/#comments</comments>
		<pubDate>Fri, 01 Jul 2005 17:28:21 +0000</pubDate>
		<dc:creator>Jaen</dc:creator>
		
	<category>Uncategorized</category>
	<category>Ruby</category>
	<category>.NET</category>
		<guid>http://jaen.saul.ee/index.php/2005/07/01/rubynet-proposal/</guid>
		<description><![CDATA[	This is a slightly revised version of the proposal I submitted to Google for the Summer of Code. It started out as a simple checklist and as I did not have much time to delve deeper into things there will be errors and omissions. Feel free to comment!
	Abstract
	The .NET platform has been a succesful target [...]]]></description>
			<content:encoded><![CDATA[	<p>This is a slightly revised version of the proposal I submitted to Google for the <a href='http://code.google.com/summerofcode.html'>Summer of Code</a>. It started out as a simple checklist and as I did not have much time to delve deeper into things there will be errors and omissions. Feel free to comment!</p>
	<h3>Abstract</h3>
	<p>The .NET platform has been a succesful target for many languages, but few implementations of truly dynamic languages such as Ruby exist. Attempts have been made to create compilers, but the source code for those has not been released. Due to the mismatch between dynamic languages and the regular .NET model, a compiler is somewhat difficult to create, but the results are worth it. A Ruby to CIL compiler might outperform the existing Ruby implementation, enable compilation of Ruby code to binaries, and provide very easy access to the rich framework .NET provides, covering all of the scripting needs for the platform. The proposal takes a look at the problems of implementing the compiler, learning from previous projects and highlighting parts that need consideration.</p>
	<h3>Goals</h3>
	<ul>
	<li>a Ruby to CIL compiler for the .NET CLR/Mono</li>
	<li>a basic Ruby 1.8 compatible standard library implementation</li>
	<li>a basic Ruby/CLS interoperability framework</li>
	<li>to explore possible optimizations</li>
	</ul>
	<p>The finished compiler will be able to run the Ruby 1.8.2 self-test in sample/test.rb.</p>
	<h3>Previous projects</h3>
	<p>There are various efforts underway to create alternative Ruby implementations, from which ideas could be used in the compiler and the standard library.</p>
	<p>So far, the most successful strategy of running Ruby on other platforms has been to translate the C interpreter into another language.</p>
	<p><a href='http://www.geocities.co.jp/SiliconValley-PaloAlto/9251/ruby/nrb.html'>NETRuby</a> by arton is a .NET implementation of the Ruby interpreter under a liberal license. Unfortunately it has not been updated for 3 years. Still, it includes a Ruby 1.6 parser in C# and a standard library, which most likely can not be used without modifications since it is coupled to the interpreter.</p>
	<p><a href='http://jruby.sourceforge.net/'>JRuby</a> is a fairly complete implementation with a CPL/GPL/LGPL tri-license. A small part of its Ruby standard library is written in Ruby.</p>
	<p><a href='http://www.pronovomundo.com/htu/theses2004/'>RubySharp</a> is a thesis project for a Ruby .NET compiler. No source code has been released but the paper gives a nice overview of the compiler. The project took 10 weeks, giving a rough example of how long writing a prototype Ruby compiler would take.</p>
	<p><a href='http://www.asakawa.net/ruby/rubynet_memo.html'>Hiroki Asakawa&#8217;s Ruby.NET</a> (in Japanese) is another Ruby compiler for .NET, but there is not much information about it except the web page. The web page deals with the mostly static subset of Ruby, lacking details about dynamic features. No source released.</p>
	<p><a href='http://rubyforge.org/projects/ruby2c/'>Ruby2C</a> translates a subset of Ruby into C, using type inference to get rid of dynamic types. It would be possible to retarget the backend to C#.</p>
	<p><a href='http://rubyforge.org/projects/rubydium/'>Rubydium</a> is an optimizing compiler for Ruby in Ruby targeting NanoVM. Rubydium is in early stages of development.</p>
	<p><a href='http://www.atdot.net/yarv/'>YARV</a> will probably be the successor to the current Ruby interpreter. It is a bytecode VM developed on top of the current Ruby implementation, scheduled to be merged into Ruby mainline in the end of this year.</p>
	<p><a href='http://rubytests.rubyforge.org/'>Rubicon</a> is a test suite for Ruby interpreters, currently used to test the main Ruby interpreter and JRuby. It requires a working Test::Unit to run, but the tests can be used separately with some work.</p>
	<p><a href='http://artengine.ca/matju/MetaRuby/'>MetaRuby</a> was one of the first projects that tried to create a Ruby implementation written in Ruby. Unfinished source code of Ruby built-ins.</p>
	<p>Other projects not related to Ruby that are interesting include:</p>
	<p><a href='http://www.ironpython.com/'>IronPython</a>, a .NET Python compiler, which has a lot of ideas that are applicable to Ruby compilation. It has a shared source license that is not OSI recognized yet so source code from it is probably not usable.</p>
	<p><a href='http://www.refactory.com/Software/SharpSmalltalk/'>#Smalltalk</a> is a Smalltalk compiler, with an object model similar to that of Ruby. Due to the lack of open classes it gets away with a pretty standard mapping of classes and methods to CLS/.NET. Similar methods could be used for a subset of Ruby for optimization.</p>
	<p><a href='http://www-sop.inria.fr/mimosa/fp/Bigloo/'>Bigloo</a> is a mature, highly optimizing Scheme compiler with backends for C, Java and .NET. A lot of good ideas, some discussed below.</p>
	<p><a href='http://www.php-compiler.net/'>Phalanger</a> compiles PHP, a weakly typed typed language. The resulting code is also somewhat faster than the original PHP implementation, which is about as fast as the current Ruby one.</p>
	<p><a href='http://research.sun.com/self/language.html'>Self</a>, the grandfather of all highly optimizing compilers for dynamic languages. Lovely papers, with ideas that are still usable today.</p>
	<p><a href='http://www.cs.ucsb.edu/projects/strongtalk/pages/index.html'>Strongtalk</a> is a softly typed compiler with type-feedback for Smalltalk. Adapting this approach for Ruby promises the highest performance.</p>
	<p><a href='http://boo.codehaus.org/'>Boo</a>, a newcomer for .NET inspired by Python, with mostly static typing and type inference. Extensible compiler infrastructure that could also be used for compiling other languages.</p>
	<h3>Ruby features</h3>
	<p>While both Ruby and CIL are Turing complete, it is not that straightforward to efficiently translate some of the features. *wink*</p>
	<p>An overview of some of the details that have to be considered follows. (in no particular order)</p>
	<h4>Code (control structures)</h4>
	<p>Blocks are essentially anonymous first class functions with closures (with a bit more information than your average closure). One way to translate them would be to use a class with an invoke method per function, but a lot of unnecessary metadata would be added. For example, it did not work for <a href='http://www-sop.inria.fr/mimosa/Manuel.Serrano/publi/jot04/jot04.html'>Bigloo.NET</a> which had &gt;4k closure types in the standard library. It was solved by compiling several functions into one class and storing a function index in an instance variable and a switch statement<a href='http://www-sop.inria.fr/mimosa/Manuel.Serrano/publi/jot04/jot04.html'>*</a>. In .NET 2.0 System.Reflection.Emit. DynamicMethod might be usable.</p>
	<p>Methods can be compiled using a similar strategy in the beginning, storing the function object in a hash table.</p>
	<p>Non-local returns allow a block to return from the function that creates the block. The catch and throw method pair can be used for dynamic non-local returns. Block returns can be implemented using .NET exceptions with an unique exception per block that is catched in the function that should return. For throw/catch, catch would rethrow the exception if it is not of the expected type. <a href='http://www.nemerle.org/  '>Nemerle</a> and #Smalltalk use a similar technique. The performance is not that good but it has been improving. Non-local returns interact with lambda, wrapped in a lambda a &#8220;return&#8221; in a block will do a standard return, not a non-local one.</p>
	<p>Ruby statements are executed in various scopes, methods are defined in the current class, executed in self etc. For a compiler, as much of the scope as possible should be resolved statically and the rest passed on through the method invocations. A shadow stack could also be used but it would harm interoperability.</p>
	<p>Ruby exceptions would be mapped to a CLR exception, with a check and rethrow if not handled. Ruby also allows retrying (rerunning) the block that caused the exception.</p>
	<p>Continuations could be implemented by doing a CPS (Continuation Passing Style) transform and using tailcalls, somewhat similar to <a href='http://www.call-with-current-continuation.org/'>Chicken Scheme</a> and the <a href='http://home.pipeline.com/~hbaker1/CheneyMTA.html'>Cheney on the M.T.A.</a> approach. Unfortunately, it would probably harm performance a lot and is not in the scope of this project.</p>
	<p>Safe mode disables executing certain operations according to the current safe level. Should be handled in code, although there is the possibility of using the underlying .NET security framework for some parts.</p>
	<p>Ruby has an extensive debugging API, a lot of which unfortunately implies a performance penalty or is difficult to implement. set_trace_func allows a function to be called on method invocations, after every line etc. Could be enabled using a global flag, but should be disabled by default since it affects performance gravely. caller returns the call stack, can be implemented if enough information can be extracted out of the .NET stack. trace_var traces global variables. Since globals will probably not be compiled, it is easily implemented. local_variables returns the names of the local variables in the current scope. Not likely to be implemented, maybe with a static hack.</p>
	<p>Multiple assignment is used to assign to multiple variables in parallel. Compilable to single assignments with temporary variables. Ruby also allows for arrays to be splatted and unsplatted, converting an array into values (variables) and vice versa. This is used in various ways, eg. for variable arguments and multiple return values.</p>
	<p>Arguments can have default values. Keyword arguments are passed as a hash object in the last argument.</p>
	<p>Akin to functional languages, expressions such as if, while, etc. return values in Ruby. The value returned by a group of statements is usually the value of the last expression in that group.</p>
	<p>Case/when statements can be converted to nested ifs. For certain types, they may be optimized to switches.</p>
	<p>Threads in the current implementation of Ruby are green threads that are managed by the interpreter itself. Doing it that way makes a lot of problems disappear. A compiler using native threads would have to insert appropriate synchronization primitives to places where the internal structures are manipulated.</p>
	<p>The defined? expression allows to check for the existence of variables, methods, constants etc. in the current scope. For locals it would have to be compiled mostly statically, while for other types a special method will be called.</p>
	<p>Eval (and Bindings) allow for the evaluation of code in the current scope, an instance&#8217;s scope etc. The usual cases where eval is used are not very problematic, but special cases involving dynamic locals will require more effort.</p>
	<h4>Data (object model)</h4>
	<p>In the C Ruby interpreter, objects are represented by/stored as tagged pointers, a type tag is stored in the lower order bits of the pointer. Certain types like Fixnums use the pointer bits themselves to store the value of the object. Other languages like Lua use a discriminated union (struct) that has a type tag and the value of the object. On .NET, those models are probably not usable for managed code, so objects would have to be fully boxed (stored as references to the heap) like in the Python C implementation.</p>
	<p>Built-in types: Singleton objects such as true, false, nil are converted to static members of a .NET class. Fixnum should use a cache to reduce heap allocation. Since Strings in Ruby are mutable, the implementation can not be based on System.String. Currently a char array is used, although there are plans for built-in Unicode strings.</p>
	<p>Ruby objects, modules and classes would be plain .NET objects with a hash table to store the instance variables and method references. Because a class of an object can be changed at runtime by adding singleton classes or mixins, a simple conversion to CLR classes is not possible. The same applies for instance variables. Constant, instance variable lookups and message sends (method lookups) are converted to a CIL method call. Since Ruby has private and public methods, sends to (static) self are handled differently.</p>
	<p>The special variables $1-9, $&amp;&#8230; are used for regular expression matches.</p>
	<p>In case a message send fails, method_missing is invoked if it exists, and the method name and arguments passed to it. Should be handled in the message send primitive.</p>
	<p>An object has various flags that specify its state. Objects can be freezed (to forbid modications) and tainted to mark that they come from an insecure source.</p>
	<p>ObjectSpace is used for reflection of the heap. It can be used to register finalizers (used for weak references), convert ids to objects and iterate through all live objects. Most of that is not possible on the standard CLR without doing your own memory allocation so the methods can not be implemented. Weak references can use the underlying .NET library, just having a finalizer on all Ruby objects is not really a good idea.</p>
	<h4>Built-ins</h4>
	<p>Ruby has a lot of built-ins that a functional compiler should implement.</p>
	<p>Platform independent classes can be implemented in Ruby and reused. These include:</p>
	<ul>
	<li>parts of Array, Hash, String, FalseClass, TrueClass, NilClass, Numeric, Integer, Fixnum, Bignum[*], Range</li>
	<li>Comparable, Enumerable</li>
	<li>Time[*]</li>
	<li>Struct</li>
	<li>Regexp[*], MatchData[*]</li>
	<li>some of Kernel</li>
	</ul>
	<p>[*] Can be implemented using .NET libraries</p>
	<p>Interpreter specific classes depend on the compiler/interpreter and would have to be written specifically for the compiler:</p>
	<ul>
	<li>Fixnum, Float, String, Symbol, Array, Hash, Range, false, true, nil base</li>
	<li>Math</li>
	<li>Binding, Proc, Method, UnboundMethod, Exception, Continuation, Object, Module, Class, GC, ObjectSpace</li>
	<li>Marshal</li>
	<li>some of Kernel</li>
	</ul>
	<p>Platform specific depend classes on the underlying platform and would have to be implemented for .NET:</p>
	<ul>
	<li>Thread, ThreadGroup, IO, File, FileTest, Errno, Dir, Process, Signal</li>
	</ul>
	<h4>Extensions</h4>
	<p>Written in C and have to be rewritten in managed code:</p>
	<p>Socket, StringScanner, dl, zlib, openssl, digest, stringio, &#8230;</p>
	<p>Written in Ruby:</p>
	<p>A compliant Ruby compiler should be able to use all of the Ruby standard library without considerable modifications.</p>
	<h3>Compiler structure</h3>
	<pre>
Lexer   -> Parser -> Simplifier -> Emitter  -> Runtime
Strings -> Tokens -> AST        -> Core AST -> CIL
</pre>
	<p>The lexer and parser convert the character stream to a node tree (AST). In the beginning, the NETRuby parser can be used, and if it has difficulties parsing recent Ruby code, a port of the more up-to-date JRuby parser is necessary.</p>
	<p>The simplifier would convert the Ruby AST to a form that is conveniently translatable to CIL (using System.Reflection.Emit). Responsibilities include resolving scopes, expanding multiple assignments, array splatting, conversion to primitive method calls etc. Somewhat similar to the Ruby2C rewriter.</p>
	<p>Built-ins/standard libraries would have to be written in parallel to the compiler, the development is probably best done test-driven.</p>
	<h3>Optimizations</h3>
	<p>The compiler would also enable exploring various optimizations. A couple of ideas follow.</p>
	<p>Open coding is inlining of certain method sends such as #+, #-, #*, #/ etc. for types such as Fixnums. Ruby allows one to redefine those methods.</p>
	<p>Linearization would translate Ruby method hashes to CIL virtual method tables and Ruby instance variable hashes to CIL instance variables. A simple approach would be to do it only for certain selectors, but the choice of selectors needs more research. Also, including too many selectors would unnecessarily bloat the vtables, the Object class for example responds to hundreds of messages due to the Kernel mix-in. Self-dispatch is also optimizable if several methods are compiled into one class. If the CLR implements S.R.E.MethodRental, new implementations could be swapped in for methods later.</p>
	<p>Type inference can be done to discover Fixnums so they can remain unboxed, or objects with a certain interface.</p>
	<p>Specialization is the generation/inlining of code with respect to the dynamic type or the value of an expression. Block inlining is a special case of this.</p>
	<p>Polymorphic inline caches store a selector and a type pair so further calls to the same type would not have to invoke a method (hash) lookup. The same approach could be used globally and the current Ruby implementation does it, but the resulting performance increase would be not that great.</p>
	<p>Perfect or just open addressed hashes could be used instead of the normal Hashtable for a slight performance increase.</p>
	<p>Generics and value types would also save some indirections.</p>
	<p>A restricted form of quasiquotation or partial evaluation can be used for most situations where eval is used, so the full parser would not have to be loaded.</p>
	<p><i>The End</i>. Comments?
</p>
]]></content:encoded>
			<wfw:commentRSS>http://jaen.saul.ee/index.php/2005/07/01/rubynet-proposal/feed/</wfw:commentRSS>
	</item>
	</channel>
</rss>
