Skip to content

Generics

This is an extension of the TypeSystem3 documentation focusing on the support of generics.

A generic type is a type which contains type parameters. For example, List<T> is a generic type with the name List and one type parameter T. A generic type cannot be used in a model directly! Instead, it needs to be instantiated by providing type arguments. List<int> is an instantiation of the generic type List<T>, and as such can be used in a model as, e.g., the type of a variable. In other words, generic types are type-level functions, that map a non-empty list of types to a new type.

For MCTypes, the type arguments have to be provided explicitly, as with List<int>. That is not the case for expressions; [] is an expression that returns an empty List. The type argument is implicitly inferred by using information of the context the expression is used in. An Example:

List<Person> persons = [];
List<Car> cars = [];

For persons a List<Person> is expected. List<Person> is the target type of the expression []. Therefore, (in languages that have side effects) the type argument must be Person, thus resulting in the type List<Person> for []. The same goes for cars; Here, the type argument is required to be Car. As can be seen, the same expression can have different types depending on the context the expression is used in.

The implicit type arguments are inferred for

  • checking the type-correctness of the expression
  • providing additional information for potential CoCos
  • providing additional information for code generators

To this end, MontiCore provides a type inference algorithm.

Given infrastructure in MontiCore wrt. Type Inference

  • TypeParameterRelations (relations over SymTypeExpressions wrt. generics, e.g., hasWildcards)
  • Constraint (statements over SymTypeExpression that have to hold for inference to succeed. Leads to Bounds.)
  • Bound (limits the range of inference variables, intermediate result of inference)
    • SubTypingBound (e.g., FV#0 <: Person: represents a variable to be a subtype/supertype of a SymTypeExpression)
    • TypeEqualityBound (e.g., FV#0 = int: represents a variable being a SymTypeExpression)
    • CaptureBound (e.g., List<CAP#1> = capture(List<FV#0>): represents that capture conversion needs to be done later)
    • UnsatisfiableBound (unsatisfiable: represents that no instantiation can be found)
  • Bounds and Constraints:
  • InferenceContext4Ast (maps ASTNodes to SymTypeExpressions, filled by the TypeVisitors)
  • CompileTimeTypeCalculator (handles type inference for most visitors. Requires the inference problem to be representable as a function call.)
  • CTTIVisitors are TypeVisitors extended with type inference, they only exist for expressions

Extended SymTypeExpression World

When adding generics support, one can encounter new SymTypeExpressions; First of all, there is SymTypeOfGenerics , which directly corresponds to SymTypeOfObject and represents an object type with type arguments, e.g., List<int> (s. above),

SymTypeOfFunction does not have a corresponding, e.g., SymTypeOfGenericFunction; Instead, type arguments are calculated by comparing the SymTypeOfFunction with the function defined by the FunctionSymbol. S. SymtypeOfFunction::getTypeArguments().

There are further types that can be encountered. Types printed with a # are types that are used internally during type calculations but cannot be explicitly used in a model. As such, a modeler may only ever encounter them in error messages.

"FV#{number}" (and CAP#{number}, etc.) represent inference variables (free type variables). They are used in place of the actual type arguments, until their instantiations are calculated using bounds.

#TOP (commonly written as ⊤) represents the top type, the type that is the supertype of all types. It is used internally to represent that there is no upper bound that limits the set of types an inference variable can be instantiated to. Example: <T> T f() here, the type variable T is has no upper bound and a corresponding inference variable would use #TOP to represent this.

Inference variables can be instantiated to #TOP in specific circumstances; The return type of f in the statement f(); has no bounds, it can be any type. As such, #TOP is used to represent this. More often than not, a type of #TOP suggests that a type/function has not been used "to their full potential":

  • some information goes unused (for f it is the return value), or
  • e.g., in the comparison of an empty List and Set [] == {} the potential to store values in the collections goes unused.

#BOTTOM (commonly written as ⊥) represents the bottom type, the type that is the subtype of all types. Analogue to #TOP, #BOTTOM represents the lack of a more specific lower bound. #BOTTOM has no values. Inference variables are never instantiated to #BOTTOM.

How to get the type of an ASTNode with generics?

Note: This is an extension of the general documentation, which you should read first.

When calculating types in a language with support for generics, additional steps have to be taken, due to the inference algorithm

  • using different visitors (not necessarely, but that is usually the case)
  • needing to store additional information for ASTNodes,
  • and requiring access to the target types of expressions,

The additional information will be stored in the InferenceContext4Ast that is shared between all visitors.

The visitors supporting type inference, so far they exist, are usually called CTTIVisitors, for "(extended) compile-time type inference".

Example:

MyLangTraverser traverser = MyLang.traverser();
Type4Ast type4Ast = new Type4Ast();
InferenceContext4Ast ctx4Ast = new InferenceContext4Ast();

// one of many type visitors
SetExpressionsCTTIVisitor visSetExpressions = new SetExpressionsCTTIVisitor();
visSetExpressions.setType4Ast(type4Ast);
visSetExpressions.setContext4Ast(ctx4Ast);
traverser.add4SetExpressions(visSetExpressions);
// due to more complex traversal during type inference,
// most CTTIVisitors need to control the order of traversal
traverser.setSetExpressionsHandler(visSetExpressions);

// create the TypeCheck3 delegate
new MapBasedTypeCheck3(traverser, type4Ast, ctx4Ast)
    .setThisAsDelegate();

Furthermore, the set of typing rules is extended, thus, further static delegates exist which can be extended if required;

Again, after initializing the TypeCheck3 delegates, TypeCheck3 can be used to query SymTypeExpressions of expressions TypeCheck3.typeOf(expr), as well as MCTypes TypeCheck3.symTypeFromAST(mcType). However, for the expressions, the target types are required, as far as they exist. To pass this information to TypeCheck3, the function TypeCheck3.typeOf(expr, targetType) can be used. This, in addition to calculating the type of expr, stores the target type information; Any subsequent call to TypeCheck3.typeOf(expr) will have the type according to the target type information stored by the first call. This data is stored, as there are circumstances there the target type information is not available (e.g., inside some CoCos).

It holds that

  • Each ASTExpression has exactly one target type
  • Each ASTExpression has exactly one type. In the case of a language with variability, special care has to be taken to
    • either reset the value cache accordingly,
    • or clone the ASTExpression
  • The call to TypeCheck3.typeOf with the target type has to be the first. It is recommended to split the post-symbol-table-creation CoCos into two groups
    1. The group of CoCos that provide the target type information
    2. The group of CoCos that cannot provide the information, they must run after the CoCos of the first group