Dynamic Languages Discussion #520

smd21 · 2025-04-11T19:08:23Z

smd21
Apr 11, 2025

Discussion thread for An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes

Discussion leads: @ananyagoenka @smd21

neel-patel-1 · 2025-04-14T18:17:57Z

neel-patel-1
Apr 14, 2025

I was unfamiliar with Smalltalk, SELF, and didn't fully understand the motivation behind prototype-based programming before reading this paper.

The ability to change the behavior of an object at runtime is useful and I understand why SELF is a prototype language.
With respect to SELF's design, I think the way it handles method calls is elegant -- cloning the method and setting the parent to the receiving object. It is interesting that seemingly everything -- control-flow, assignment, method calls -- is done using messages. Also neat that they are able to retain the storage benefits of class-based objects using maps.

Discussion
Each of the language features and optimizations described above adds a layer of indirection to interact with an object's state. In Section 7, when the authors compare the performance of programs written in Smalltalk, SELF, and C, they mention limitations of SELF's compiler (poorer register allocation, a lack of peephole optimizations, array-bounds checking, lack of type information), but not these overheads -- maybe they are mostly addressed by inlining, but I'm not entirely sure.

Question
A prototype language makes development easier, but adds execution-time and storage overhead. These can be partially addressed before execution time, by an optimizing compiler, or at runtime, by a JIT. However, there are performance optimizations that are only possible when all types are known ahead of time and every object's functionality is completely predefined (e.g., in a class).
How could a language/programming environment enable programmers to get the flexibility of a prototype language while also getting the code optimizations possible with class-based object-oriented programming, where types and object functionality is known ahead-of-time?

1 reply

sampsyo Apr 16, 2025
Maintainer

Maybe this goes without saying, but one modern reason to care about prototype-based OO is because that is how JavaScript works (and people care a lot about making JavaScript go fast).

samuelbreckenridge · 2025-04-16T21:55:27Z

samuelbreckenridge
Apr 16, 2025

I thought the discussion of the approaches to optimization in Smalltalk versus Self at the end of Section 4 raised good questions about the tradeoffs of having more or less information available to the compiler statically. The authors point to the special handling of common cases in Smalltalk-80 systems as violating the "extensible and flexible spirit" of the language. In this context it was interesting to consider how the compilation of Self attempts to sidestep some of these issues. The crux of the approach seems to be attempting to dynamically infer a small set of information that can enable optimization (the type of receiver) while preserving the ability to fall back to the default slow (but flexible) handling. As the customized compilation approach seems to involve storing multiple versions of the compiled code, I think it could have been helpful to have more evaluation/analysis of the performance impact of incremental recompilation.

Discussion question: Self uses maps to efficiently represent objects that belong to the same clone family, which as the authors point out look similar to classes but are transparent at the language level. In what situations does this transparency benefit the programmer? What is an example of concrete functionality enabled by this transparency?

1 reply

neel-patel-1 Apr 17, 2025

I think this transparency has the potential to make development faster compared to class-based object-oriented programming -- example from the SELF wikipedia page:

Simple tasks like test methods can be handled by making a copy, dragging the method into the copy, then changing it. Unlike traditional systems, only the changed object has the new code, and nothing has to be rebuilt in order to test it.

ngernest · 2025-04-16T23:59:20Z

ngernest
Apr 16, 2025

I found the discussion on Customized Compilation (§5.1) interesting. The way the SELF compiler generates different machine code for each possible type a method can be applied to reminds me of monomorphization in languages w/ parametric polymorphism (e.g., Haskell, Rust). In monomorphization, given a polymorphic function, the compiler instantiates type variables with concrete types and produces a specialized version of the function.

The type prediction behavior discussed in §5.5 seems particularly intriguing, as it demonstrates how the compiler exploits the fact that certain methods are more likely to used with specific types to generate optimized run-time tests. How did the authors determine a priori what types are more likely to be used with certain methods — did they do this in an ad-hoc manner by inspecting existing Smalltalk code?
(Aside: superficially, this reminds me of Haskell’s monomorphism restriction, in which the Haskell compiler automatically instantiates type variables to certain “default” types in the absence of any type annotations on a polymorphic function.)

Question: When performing customized compilation / “monomorphization” as discussed above, how do we handle the tradeoff between performance and code size? (Monormophization results in code duplication, since a copy of the same function has to be generated for different types.) The evaluation section in the paper only discusses performance and not code size — does this mean compiler designers prioritize performance over code size in practice?

1 reply

mb64 Apr 17, 2025

Hi Ernest!

Replying to your aside about Haskell, I think it's a super interesting question of how these compilation techniques for dynamic programming languages could be applied as well to try to improve performance of the dynamic aspects of statically-typed languages, like typeclasses in Haskell. One difference I can think of with the monomorphism restriction in particular, is that the Self paper is very focused on making sure it's completely transparent to users what happens in the runtime, making sure it works well with debugging and interactively and everything, whereas in Haskell the monomorphism restriction is a user-facing type system thing.

One other unrelated (and perhaps superficial) thing I thought was really cool when reading the paper was noting how the then-new features in Self corresponded to features of modern languages that I was already familiar with, but didn't know came from Self. For example, I'm familiar with prototypal inheritance from JS / Lua, but I didn't know it was from Self. Similarly, I was familiar with Self-style blocks from Ruby.

My discussion question is: To what extent are Self's dynamic optimizations applicable to a statically-typed language? Where (and how) in static languages might you be able to reuse some of its insights / techniques?

scober · 2025-04-17T00:06:39Z

scober
Apr 17, 2025

My reading of this paper was handicapped by my lack of previous interactions with SmallTalk and SELF. As an example, I never quite wrapped my head around blocks: what are they for, in what sense do they allow SELF programmers to "define their own control structures"? The authors compare blocks to closures, so maybe they are just closures.

I also found the paper a little hard to read because it seemed to describe many different kinds of optimizations. Some that were just for prototype-based languages, some that were for all dynamically-typed OO languages, and some were that for all OO languages dynamic or static. So I was constantly re-orienting myself, trying to understand what class of languages a given optimization might apply to.

All that being said, I liked the paper and I thought it offered some nice compiler optimizations. I particularly liked the first section on dynamically constructing what is essentially the class hierarchy of the running program. It made a compelling argument that all of the type/class information that programmers provide statically is basically superfluous to the runtime system.

Questions:
The authors claim that prototype-based OO languages can be implemented just as efficiently as class-based OO languages, but they also make (or at least suggest) a broader claim, that static type information does not improve runtime efficiency since type information can re-constructed dynamically. Both of my questions are related to this claim.

In the ~35 years since this paper was written, arithmetic operations have gotten cheaper relative to memory accesses and branching instructions (branch mispredictions, really). Is it still true that runtime type information is just as good as static type information? Taking the paper's min function as an example, are the code size overhead of message-splitting and the runtime overhead of dynamic type-checking still acceptable?
Despite the author's claims, static type information has not disappeared from programming languages. Why? The official answer to the question "why does TypeScript exist" suggests that static types are more for the programmer and other static analysis tools than for the interpreter.

0 replies

mt-xing · 2025-04-17T00:25:43Z

mt-xing
Apr 17, 2025

Maybe this is sacrilege, but I struggled to find this paper particularly compelling. I respect that the actual contributions made are probably useful - I know for a fact that the second most used language in the world (JavaScript) is a dynamically typed prototype-inheritance based language, so work to speed up prototype-inheritance can and probably is used in modern JavaScript runtimes. Still, this paper is written as just one big deep dive into how one specific compiler for one specific language works instead of focusing the paper on talking more about the applicable lessons learned, and given that this language seems effectively dead, I couldn't help but find this choice of framing to make the experience of reading the paper quite dull.

For example, they're providing these huge multi-page long explanations of how message passing works in SELF and I just feel like

I don't know. Maybe that's just a me problem. Perhaps part of it is just that the context of today no longer matches the context the paper was written in. The authors just assert very confidently that untyped languages make development faster and are easier, and I think today basically everyone, from academia to industry, has reached the consensus that this is blatantly not true.

Also the authors seem to assume there is some implicit understanding of how message-passing as a language mechanism works and why it's desirable, but I'm not sure I know. I think Objective C uses it, but do any other languages today still do?

Again, I don't wish to devalue the actual novel contributions, like how to compile prototype chains faster, which genuinely seem useful. I just find the way the paper presents its findings to not have held up that well after a few decades.

Discussion Question

To optimize prototype-based inheritance, the paper describes a system of using maps for each "clone family" of objects that inherit from the same prototype. Is this meaningfully different from a class object and a vtable? Since they create a brand new clone family every time an object's format changes, performance-wise this feels very similar to just creating a new class every time the user changes the object's format. Would a program that truly takes advantage of prototype semantics by constantly changing many of its object's formats actually be able to benefit from any performance improvements here? Or have the authors only succeeded in applying traditional OOP optimizations to specific use cases that closely mirror OOP patterns?

1 reply

InnovativeInventor Apr 17, 2025

The authors just assert very confidently that untyped languages make development faster and are easier, and I think today basically everyone, from academia to industry, has reached the consensus that this is blatantly not true.

I suspect this is because we have good type systems and, more importantly, understand how to do type inference well.

I agree with your sentiment that the paper seemed to focus a lot of various things that I would consider to be too in-the-weeds for an (modern) OOPSLA paper (like the parser section!). But it is an implementation paper and from a bygone era, so perhaps we should apply a different standard here.

My discussion question is a bit more philosophical: given that things (such as hardware performance tradeoffs) change rather rapidly (especially from that time period), what is the purpose of an "implementation" paper? What should it focus on? What kinds of details should it ignore? It seems that implementation papers are particularly prone (as compared to other types of papers) to becoming irrelevant over a shorter timeframe.

UnsignedByte · 2025-04-17T01:02:16Z

UnsignedByte
Apr 17, 2025

I found this paper interesting mainly because Javascript is a prototype-based language, and having dealt with all the quirks of JS for many years now it was interesting to see the arguments for prototype-based languages and the thought going into optimizing them. I do think it is worth looking into optimizing (simply because of how ubiquitous javascript is) but I will have to say as a whole I am not the biggest fan of prototype systems in general. My main critique is that the paper seems to be solving a very specific problem (optimizing the SELF compiler) and I am not sure that most of the lessons learned here are that applicable to more widely used languages. Many of the optimizations mentioned (like JIT compilation of functions for specific types dynamically) also seemed not so revolutionary, though I think that is mainly because the paper is from 1989 so I'll cut it some slack. I think as a whole I was not particularly surprised or excited by this paper, though I'm not sure how much of it is my distaste of prototype languages and how much was just that the paper is old and many things discussed are now fairly ubiquitous (maybe a sign of its success?).

Discussion Question

Prototype-based languages provide a very high level of flexibility to users, allowing not only dynamic type systems but dynamic data/struct layouts and more. As such, the lack of structure forces compiler engineers to work with less information, which naturally prevents many opportunities for optimization. At the same time, increasing the dynamic runtime capabilities of the programmer also heavily correlate with lack of application security (prototype pollution, etc) as fewer constraints on program behavior force developers to rely on fewer things. Where should language developers draw the line between flexibility and performance/security? If (as the paper states) it is extremely rare to create multiple clone families, then why even allow the developer to modify object layouts at runtime? What are some ways to minimize restrictions and load on developers while maintaining performance guarantees in dynamically-typed scripting languages?

1 reply

sampsyo Apr 17, 2025
Maintainer

About this:

Many of the optimizations mentioned (like JIT compilation of functions for specific types dynamically) also seemed not so revolutionary, though I think that is mainly because the paper is from 1989 so I'll cut it some slack.

It’s a bit more than that: this paper is where those ideas come from!

Annacaro22 · 2025-04-17T01:23:07Z

Annacaro22
Apr 17, 2025

Critique: Like many others in this thread, I found this paper a little bit hard to get through this week; to be honest, I mostly was interested in it as a relic of an earlier era of CS research. I'm not familiar with Smalltalk, which made it a bit hard to situate myself in the specific improvements of SELF over Smalltalk, but it was interesting to see how new (I gather from the way they were talking about it) the field of dynamically typed languages was, to the point that this analysis of optimizations on a language that is not really used anymore in the present day was then so important.

Discussion Question: In the conclusion, the authors mention: "Our techniques are not restricted to SELF; they apply to other dynamically-typed object-oriented languages like Smalltalk, Flavors, and CLOS. Many of our techniques could even be applied to statically-typed object-oriented languages like C++ and Trellis/Owl." I wonder if the techniques were indeed applied to other languages (perhaps some still in more modern use) based on this paper? This may help people to see the worth behind this paper (I've seen some uncertainty around this topic in the threat) and an answer to the question, 'why should we care'.

0 replies

bryantpark04 · 2025-04-17T02:05:22Z

bryantpark04
Apr 17, 2025

I think this was probably the hardest paper to understand that we've read so far. I'm familiar with JavaScript, another prototype-based programming language, but it was still quite hard to wrap my head around the features of SELF described in the paper and map them to JavaScript concepts. The descriptions of how messages are sent to objects actually tripped me up a bit, because I was thinking of messages as in distributed systems, not method calls on objects that SELf messages seem to be closer to. A lot of the paper seemed to be very specific to the SELF compiler and I wasn't immediately seeing how these optimizations could be applicable to prototype-based languages in general, although maybe this was because I was trying to find a JavaScript equivalent for everything I didn't understand in the paper.

My discussion question comes from the section of the paper outlining how the SELF compiler supports source-level debugging. From reading that section and experience with C/C++, it seems clear that there is an inherent trade-off between code optimization and ease of outputting useful debugging information, at least to the level of adding significant complexity to debuggers operating on optimized code. However, is the utility gained from a debugger being able to work on JIT-optimized code worth the added complexity/reduced functionality versus a debugger that just works by interpreting the code line-by-line? In what cases would the former be more useful than the other?

0 replies

mariasoroka · 2025-04-17T02:23:39Z

mariasoroka
Apr 17, 2025

The wiki page about SELF was very useful and helped put the paper in some context. For me, it is often the case with older papers that I simply do not understand what the authors are talking about. It was very validating to see others describe the paper as 'hard to read' and 'hard to relate'; I feel the same about it.

It surprised me that the language is alive and had its latest release in 2024.

Lately, I have been thinking that I do not like untyped languages. It makes it harder for me to write or read code if I can not easily conclude what the type of the arguments or returned objects is. I wonder how others feel about it. When is it preferable to use typed vs untyped languages?

0 replies

katherinewu312 · 2025-04-17T03:08:19Z

katherinewu312
Apr 17, 2025

Overall, it was nice to see how the ideas in the paper could be widely adopted in modern JIT compilers for dynamic languages such as JavaScript, a prototype-based language. The compiler optimizations were interesting, particularly how the SELF compiler minimizes space usage of clones derived from the same prototype by using clone families. The authors use maps as an implementation technique to efficiently represent members of a clone family, and their reasoning for this is pretty clear, as maps allow multiple objects to share a single metadata structure, and their immutability further enhances their utility by ensuring that changes to one object do not inadvertently affect others in the clone family.

The authors also talk about message inlining and splitting as part of their compiler optimizations too. As they mention, message inlining is helpful because it eliminates the overhead in SELF of using message passing to access variables. While message inlining improves performance by reducing runtime overhead, it heavily relies on accurate type information and can lead to extensive code during compile time. I would suppose that this trade-off between speed and memory usage may create challenges in memory-constrained environments, but maybe there are already workarounds to this that I am unaware of.

Discussion questions:
The SELF compiler supports a high-productivity programming environment, one that requires that programming changes take effect within a fraction of a second, encouraging exploratory programming by allowing dynamic modification of objects and the environment (ie incremental, rapid recompilation). How might this affect the suitability of SELF for large-scale systems (or does it affect it at all)?

0 replies

gabizon103 · 2025-04-17T03:16:29Z

gabizon103
Apr 17, 2025

I appreciate the background the authors provide on SELF. I didn't understand some parts of it, but I think I understood enough to get why its a good candidate for the type of dynamic compilation the authors implement. I'm not really that familiar with Javascript so I didn't know about prototypes, which I guess made it harder to understand the language. But basically I was thinking of it vaguely as an object-oriented programming model that doesn't use classes, but has to do a similar dynamic dispatch/lookup thing? I did think it was pretty crazy they doubled the compiler's performance over the existing one -- I can't imagine a paper coming out today that suddenly makes a mainstream-ish language twice as fast as it was just before. And also it reminded me of Proebsting's law from the first lecture, lol.

I also thought the authors raised an interesting point when comparing their compiler to a C compiler. They attribute a lot of the slowdown in their compiler to the fact that they have poorer implementations of standard compiler operations like register allocation and peephole optimizations. In this way, it seems pretty unfair to compare their compiler developed by 3 people to a mature C compiler that is faster at least because of the sheer person-hours that have gone into it. They attribute some of the slowdown also to the difference in semantics between SELF and C; I kind of wonder what the point of this apples-to-oranges comparison is.

This paper also made me wonder about the relationship between designing and implementing a language. Allegedly SELF has all this funky message passing business going on because it makes the language simpler (is that true?); even if it were true, when might be worth it to trade language performance for expressiveness? When do we want a language that might be easy to program, but is slower? Where might we draw a line here? Should we design language semantics with an implementation in mind?

0 replies

devv64 · 2025-04-17T03:23:26Z

devv64
Apr 17, 2025

The paper seems to downplay benefits of static typing. It feels like there is a trade off regarding language expressiveness. These techniques claim to have efficiency without compromising the language but it is a bit skeptical. It is hard to ignore cases where the design architecture calls for object structures that may change at runtime. How could this trade off from implementation flexibility to development difficulty be measured? Is there a point where the overhead becomes to large to deem worthy?

0 replies

gerardogtn · 2025-04-17T03:37:29Z

gerardogtn
Apr 17, 2025

This was a fun read! As others have mentioned it took me a while to get through the paper and in particular section five, it's been a while since I last saw smalltalk and pure message passing style code and it took me a while to get familiar with the examples that were presented in the paper.

Personally, I love types (and while not an enthusiastic fan of statically typed programming languages i'm more of the perspective that they provide good enough benefits that I can deal with their quirks). I know that they are sometimes annoying to work with and the compiler sometimes get in the way of things just working (like writing code that I know will not throw a null pointer exception and still have to fight my way with the Kotlin compiler to let it know that I promise this field won't be null and if it is, it's okay to throw an exception). But they also enable us to model systems in a way that can make reasoning about behavior of a system more easily, and it can help you catch silly mistakes quickly and without having to actually run the program.

From the paper I got the sense that there is an argument that statically typed systems are more for the benefit of the programmers and that the runtime doesn't need much type information to be efficient. And while I understand where this perspective comes from, I believe that types can also be pretty useful for optimization purposes; like we saw on TBAA and on the discussion in class about mutable vs immutable types and the optimizations that they enable us to run. I think that types are useful for humans, but I also think that they can help optimize programs.

Discussion question

Static variable scoping has become the norm for most programming languages as it can help to write programs that are easier to reason about, and it is now uncommon to find languages that use dynamic variable scoping; but the same is not true for statically typed vs dynamically typed programming languages. It seems that we've been able to build large complex systems in both of them, and there are even some languages that are mainly statically typed but allow dynamic types in some areas (like C# which helps avoid visitor patterns by using dynamic types). It also seems that we can build performant systems in both paradigms. Is that how we expect type systems to continue evolving, with a mix of both static and dynamic typing? Or is there a future in which one of them tips the balance (like static scoping did)? Perhaps there's a middle where better type-inference systems help move the needle one way or another?

0 replies

lisarli · 2025-04-17T04:08:08Z

lisarli
Apr 17, 2025

Critique: Like others mentioned, some of the concepts and references in this paper were a bit difficult to follow, which made it hard to fully appreciate some of the points and contributions being made. The background on SELF was helpful though, and there were some cool ideas like the tagged pointer representation that seemed pretty clever. I think it was cool to read about how the SELF compiler tackles challenges arising from the lack of static types and message passing structure for accessing variables, like using message inlining to avoid overhead from sending certain messages. Some of the sections, like Section 5.4 on message splitting and Section 5.5 on type prediction, felt just like simple extensions of existing compilation strategies (optimizing the hot path), unless these ideas were not as common when this paper was written.

Discussion Question: The paper discusses many interesting compilation techniques and clever workarounds in a context that is specific to SELF and its compiler. However, some ideas, like message inlining, seem valuable for other languages with dynamic dispatch like virtual method calls. What are some other ideas from this paper that could inspire optimizations in modern dynamic language runtimes, even if they don’t follow the SELF or prototype model strictly?

0 replies

arnavm30 · 2025-04-17T04:23:10Z

arnavm30
Apr 17, 2025

Like others, I wasn’t familiar with SELF or prototype-based languages, so this paper was difficult to understand. One of their arguments I’m not sure I fully agree with is their claim that “Researchers seeking to improve performance should improve their compilers instead of compromising their languages.” While the idea of preserving flexible languages is appealing, I think the downside is that it places a huge burden on compiler infrastructure, which can become overly complex and hard to maintain. Not every project has the resources to build or rely on advanced compilers, and pushing all optimization responsibility into tooling seems like a bottleneck. I think small compromises in the language, like restricted dynamic features can make it easier to write correct and fast code, rather than relying on the compiler to figure everything out.

The paper highlights SELF’s dynamic inheritance (the ability for objects to change their parents at runtime) as a useful object-oriented feature. At the same time, this flexibility allows for security vulnerabilities like prototype pollution, so is the flexibility of dynamic inheritance worth the risk of enabling such bugs?

0 replies

zihan0822 · 2025-04-17T05:32:39Z

zihan0822
Apr 17, 2025

Critique:

This paper was among the earliest efforts to implement a dynamically-typed OOP language. I appreciate its impact on later jit design in java vm and other prototyped-base language, like javascript. Their emphasis on the simplicity of design and usage of message passing somehow reminds me of microkernel, which also adopts IPC as the main components of syscalls. Simple design does make stuff easier to reason about but their level of abstraction may put extra burden on the user of that tool. The concept of map they introduced to save the memory usage per object reminds me of the key-sharing dictionary PEP-412 for cpython, which is proposed to share common portion of __dict__ across different class instances. The author highlights dynamic inheritance of one of their linguistic innovation. I'm not sure whether that is still relevant today. For the jit strategies proposed in the paper, they are pretty standard methods today. Like some others in this thread, I personally prefer statically typed language. I kind of like the amount of static checking can be emitted by the compiler.

Discussion Question

What's the difference between prototype-based and class-based design. How they solve the common problems in OOP, like inheritance, polymorphism etc.

0 replies

Dynamic Languages Discussion #520

Uh oh!

Replies: 16 comments · 5 replies

Uh oh!

Uh oh!

sampsyo Apr 16, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Discussion Question

Uh oh!

sampsyo Apr 17, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Discussion question

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Critique:

Discussion Question

Replies: 16 comments 5 replies

sampsyo Apr 16, 2025
Maintainer

sampsyo Apr 17, 2025
Maintainer