Child pages
  • System Specification of TSPHP
Skip to end of metadata
Go to start of metadata

Table of Contents

Introduction

This page describes the system specification of the compiler. Please have a look at Motivation behind TSPHP, Vision of TSPHP and Goals of TSPHP to get a better understanding of the global context and goal. More information can also be found in Rough Concept of the bachelor project but be aware that it refers to the bachelor project which is already finished (yet, many information still apply). In addition, have a look at the Customer Requirements of TSPHP if you are not familiar with them.

Very briefly, the compiler will be used to compile TSPHP, check if the code is type safe and translate it to PHP 5.4. The compiler is designed in a way that further steps such as optimisation, code weaving, code obfuscating etc. could be added easily. Furthermore it is designed to support multiple output formats.
The following chapter describe more requirements for the compiler.

Software Requirements and Design Decisions

The following software requirements precise the Customer Requirements of TSPHP and the corresponding design decisions have been made to support them. For a better overview, the requirements have been split up in different sub chapters.
The follwing key applies for all tables in all chapters:

Key:
Affects: Mention which customer requirement is refined.

Explicit Design Decisions

The following design decisions describe what has to be done to support the affected customer requirement rather than refine it.

IDNameDescriptionAffectsBenefit
SysD1ANTLR v3.5 GrammarAn ANTLR v3.5 based grammar for TSPHP must be created which generates an ASTAll requirements in chapter Language Specification,
NF6 - Independence

An ANTLR v3.5 based grammar improve the portability to another language such as C, C++, C# etc. Furthremore, the AST

SysD2ANTLR v3.5 Tree GrammarAn ANTLR v3.5  based tree grammar for TSPHP must be created.
 
All requirements in chapter Language Specification,
NF6 - Independence

As well as the normal grammar an ANTLR v3 based tree grammar improve the portability to another language such as C, C++, C# etc. Furthermore ANTLR can generate a tree walker out of a tree grammar. The type checker, translater (and code optimiser in the future) should use a tree grammar.

SysD3Independent parser, typechecker, translatorAll components should be independent from the compiler and from each other (excluding tests, where integration tests might use other components). However, dependencies to the common component and ANTLR are fine.NF6 - Independence,
NF7 - Testability
Independence allows using the , the type checker as well as the translator component in another context than TSPHP. For instance, the type checker and the translator could be used in an AOP implementation for TSPHP.
SysD4Output format componentsThe different output formats shall be implemented as independent components

O5 - Output format,
O6 - Output format PHP 5.4,
NF3 - More output formats,
NF6 - Independence,
NF7 - Testability

This way the design of the compiler is forced to support different output formats, thus a well-defined interface has to be defined. This interface could be implemented by another output format component and this component could be used instead of the PHP 5.4 output format component (or amongst).
SysD5Output - directory - structure

Each output format component (SysD4) will decide how it behaves in terms of how the structure of its output directory will look like.

A good practice may be to take over the global configuration (see O3 - Output - directory - structure) and to make it configurable (NF8), so it is possible to overwrite the global configuration.

O3 - Output - directory - structureSince it is maybe not applicable for each output component to produce a hierarchy structure, it is better that each output format component decide on its own if this part of the component is configurable or not.
SysD6Output - directory - structure strategy componentsEach output directory strategy shall be implemented as independent component.

O3 - Output - directory - structure,
O4 - Output - directory - structure strategies,
NF6 - Independence,
NF7 - Testability

This way it is possible to reuse the output strategy by different output format components.
SysD7Omit comments

Each output format component (SysD4) will decide how it behaves in terms of omitting comments.

A good practice may be to take over the global configuration (see O7 - Omit comments) and to make it configurable (NF8), so it is possible to overwrite the global configuration.

O7 - Omit commentsIt could be that for one output format component (SysD4) it is essential that comments are not omitted and another output format component cannot deal with comments etc. This way we have a more flexible solution.
SysD8AST and IWalker interfaceThe parser of the compiler will generate an TSPHPAst (TSPHP's abstract syntax tree) and there will be an simple interface IWalker with the method TSPHPAst walk(TSPHPAst ast).NF11 - Additional stepsThe well defined and simple interface IWalker will enable additional step such as code optimisation, code obfuscating, code weaving etc. which should facilitate a contribution to TSPHP
SysD9Exchange ANTLRThe compiler shall encapsulate the access to generated code from ANTRL v3NF1 - Exchange ANTLR v3.5Changes to the ANTLR interface in the future will not result in a big change of the compiler. Only the class/component which encapsulates the access has to be changed. Furthermore, it would be easier to exchange ANTLR with another compiler-generator
SysD10AST and ANTLRThe class CommonTree from ANTLR will be used as base class for our TSPHPAst representation. The downside of this decision will be, that every extension/add-on of the compiler (such as code optimiser, code obfuscator etc.) will depend on ANTLR. This goes against NF1 - Exchange ANTLR v3.5 but seems to be the better choice (see 14.01.2013 - Review meeting architecture).NF1 - Exchange ANTLR v3.5Lower maintenance, if an own implementation would be used, then we would need to fix bugs on our own if ANTLR recognise a bug in their implementation. Furthermore we would need to change our implementation every time ANTLR change something as well. This way ANTLR makes the change for us and we just need to ensure, that the tests still pass when we upgrade to a new ANTLR version.

SysD11

Split type checking in several phases

Split the type checking phase in definition, reference and checking phase.

The code is easier understandable and thus easier to maintain and a contribution to the project should be facilitated as well.

Misc

IDNameDescriptionTypePrio.Comp.RiskAffectsBenefit
SysM1Property fileThe compiler shall be configurable through a property filem112NF8 - ConfigurableThe developer can configure the compiler without recompiling the compiler
SysM2LibrariesThe compiler shall support libraries. A library contains one or more pre-compiled classes.d221NF9 - Distribution of TSPHPWould simplify the distribution of pre-compiled classes and also the configuration management.

Input

IDNameDescriptionTypePrio.Comp.RiskAffectsBenefit
SysI1Property file as parameterThe compiler shall provide an interface* which allows to pass a path to a property file. All properties in this file shall overwrite or extend the configuration made in the compiler's property file (SysM1).m111I1 - Overwrite Standard Config, NF8 - ConfigurableThe developer has the possibility to use different configurations without rewriting or replacing the property file each time.
SysI2Ant TaskCreate an Ant task for the compiler which uses a file set as inputd221NF5 - Automation, I5 - Input - file set indirectly also I3 - Input - file and I4 - Input - directoryAutomation. Ant is a commonly known way to automate development processes.
SysI3ClasspathThe compiler shall provide an interface* which allows to 'link' pre-compiled classes and librariesd121NF9 - Distribution, NF8 - ConfigurableRe-usability and performance.
SysI4No more InputStreamsThe compiler shall provide a method which declares, that no more input streams will be added.m112

I2 - Input stream, I3 - Input - file, I4 - Input - directory, I5 - Input - file set

In the case where not the console interface is used, where every system calls equals to one compilation, it is necessary, that one can tell the compiler, that all desired input streams have been added. Otherwise it is not possible for the compiler to know when it can actually start type checking the parsed code.

Output

IDNameDescriptionTypePrio.Comp.RiskAffectsBenefit
SysO1multiple output formatsIt shall be possible to compile code and translate it to multiple output formats in one rund321

O5 - Output format,
NF3 - More output formats
,
NF12 - Performance
 

Better performance. It is not necessary to run the compiler several times for different output formats.
SysO2Output format PHP 5.4 - property fileThe output format component (SysD4) for PHP 5.4 shall be configurable through a property file.m211O6 - Output format PHP 5.4Customisation. The developer can configure the component without recompiling.
SysO3Output - directory - path for PHP 5.4The output format component (SysD4) for PHP 5.4 shall be configurable (NF8) concerning the name of a sub directory which will hold the output of this component. Standard should be php5.4d121O2 - Output - directory - pathSince multiple output formats can be used at one run it is necessary that output format components have an own output directory folder within the global defined directory folder and that the name is configurable to solve conflicts.
SysO4Output - directory - structure strategies for PHP 5.4The output format component (SysD4) for PHP 5.4 shall be configurable (NF8) concerning the output directory structure. Standard should be the global configuration (see O3 - Output - directory - structure)d131O4 - Output - directory - structure strategiesCustomisation. The developer can choose between different output structures.
SysO5Output format(s) as parameterThe compiler shall provide an interface* which allows to set explicitly the output format(s). This definition has precedence to the configured output format.d211

O5 - Output format,
NF8 - Configurable

The output format could also be changed through Property file as parameter. However, if the output format is the only property in that file it is quite a lot overhead. An additional argument fulfills better the need of the developer.
SysO6Output format PHP 5.4The compiler shall comprise an output format component for PHP 5.4m233O6 - Output format PHP 5.4,The aim is to translate to PHP 5.4.
SysO7Omit comments PHP 5.4The output format component (SysD4) for PHP 5.4 shall be configurable (NF8) concerning omitting comments or not. Standard should be the global configurationd311O7 - Omit comments

Smaller PHP 5.4 files. However, comments can be useful if the PHP files are used in other projects which are not using TSPHP or if classes, methods etc. are annotated.

SysO8Only pre-compileIt shall be configurable (NF8) whether the full translation process shall be conducted or only the pre-compilation. Makes only sense if pre-compiled input is stored in files (SysO9) or the pre-compiled stream is retrieved by get pre-compilation (SysO10)d121NF9 - Distribution of TSPHPPerformance. For instance, if only the pre-compiled files are used it is not necessary to do the whole translation step.
SysO9pre-compiled input is stored in filesIt shall be configurable (NF8) whether the pre-compiled input shall be stored in files or not.d131NF9 - Distribution of TSPHPRe-usability and performance. The pre-compiled files can be used in other projects and do not have to be fully compiled again.
SysO10Get pre-compiliationThe compiler shall provide an interface to retrieve the pre-compilation streams including identifierd231NF9 - Distribution of TSPHPUseful if the pre-compiled streams are not stored in files but in another media (for instance in a database).

Further Constraints

Constraints have been already defined in Customer Requirements and Rough Concept of the bachelor project > Scope
The following list describes further design decisions which result in constraints compared to normal PHP.

NameDescriptionAffectsBenefit
More consist code

These statements are described in Valid in PHP but not in TSPHP

All requirements in chapter Language Specification

Possible errors can be avoided and code is more readable.

 

Architecture

Icon

Unfortunately this chapter is out of date (see task TSPHP-534 - update chapter architecture of the system specification on wiki Open ) Please have a look at the Bachelor Thesis - Type-Safe PHP: A compile time approach, appendix D3 in the meantime

This chapter describes the four views of the compiler by name the contextual, the block, the runtime and the distribution view

Contextual View

 The following figure shows a context diagram. The common context for a compiler might be its usage in IDEs. However, also build servers and other automatisation tools could use it and finally a user could use it over the shell (comand prompt).
The data flow is more or less the same for every actor. The actor defines which files shall be translated and the compiler translates them and save the result in new files. Optionally a Java system could use the interface to translate streams and retrieve the result as streams.

In reality it is a little bit more complicated than that. Not only files and streams can be used to define what data the compiler should compile. Have a look at the requirements for input above. Also the output is not necessarily only translated code. It is also possible to retrieve pre-compiled code (see SysO8, SysO9  and SysO10).

Block View

The following figure shows the block view of the compiler (please click on the image to view it in its original size).

The compiler provide two interfaces. One for system calls and another one for Java applications. The controller is the heart of the compiler. It controlls the compilation process as shown in the runtime view. Have a look at the runtime view to get a better understanding how the classes work together.
The controller depends on a IConfig implementation, the parser component and several classes which are implementing the IWalker interface, IChecker interface respectively. The IConfig implementation will be responsible to load the necessary modifiers, checkers, optimisers and translators. In the figure above, the PropertiesConfig class, which is loading the configuration from properties files, is implementing the IConfig interface. It is not necessary that properties files are used to store and retrieve the config. As long as the corresponding class implements the IConfig interface.

Separation of the different responsibilites (presentation, business logic, data) by layers has been set aside on purpose, since one key factor to success is performance for a compiler and layers would unnecessarely slow down the compiler. Nevertheless, seperation of concern and easy extensibility is still incorporated in the compiler. You can see two classes with dotted borders: PHP53Translator and Optimiser. Those two classes will not be implemented in this project, but serve as illustration how easy is is to extend the compiler (have a look at the requirement NF11 - Additional steps).

The following figure shows the parser component in detail (please click on the image to view it in its original size).

The parser heavily depends on ANTLR v3 and TSPHPLexer as well as TSPHPParser are generated from the corresponding grammar file (can be found in the version control system). The parser is implemented as own component thus it can be reused in other contexts than TSPHP (see also SysD1)

Runtime View

The following figure shows an activity diagram of one compilation (please click on the image to view it in its original size).

The compilation process comprises five steps:

  1. First of all an actor (here IDE) has to add InputStreams to the compiler. The parsing already starts with the first added InputStream to be more efficient. Once all InputStream have been added, the corresponding method is called on the compiler to indicate it. If this has happend and as soon as all InputStreams have been parsed then the next step can take place.
  2. After the code has been parsed it will be validated against type safety.
  3. The fourth step is also optional and is meant for optimisers (be aware, that modifications of optimisers must not break type safety).
  4. The last step involves the translation to an output format (in this project PHP 5.4). The translations will be done parallely if multiple output formats exist.

Distribution View

TSPHP will be distributed using an archive (for instance zip). The extracted directory structure would look as follows:

  • config
    • tsphp.properties
    • php5.4-translator.properties
  • lib
    • antlr-3.4-complete.jar
    • typechecker-x.x.x.jar
    • tsphp-parser-x.x.x.jar
    • tsphp-interface-x.x.x.jar
      • IWalker
      • IChecker
  • tsphp.x.x.x.jar
  • tsphp.bat
  • README
  • LICENCE

This structure could also be used by extensions/addons. For instance, the output format component for php 5.3 could be an archive as well with the following structure:

  • config
    • php5.3-translator.properties
  • lib
    • php5.3-translator-x.x.x.jar
    • tsphp-interface-x.x.x.jar

A developer would need then only to extract the archive to the tsphp folder and modify the tsphp.properties file to activate the new output format component.

Internal interfaces

The internal interfaces which allow integrating the different components are located in the
common component.
The following interfaces are used:

  • IParser – for the parser component
  • ITypeChecker – for the type checker component
  • ITranslator – for the different output components

External interfaces

Icon

Unfortunately this chapter is out of date (see task TSPHP-534 - update chapter architecture of the system specification on wiki Open ) Please have a look at the Bachelor Thesis - Type-Safe PHP: A compile time approach, appendix D5 in the meantime

There are three external interfaces. Both are shown in Block View.

  • The first one is provided for system calls. As an example, it can be used by a developer to execute the compiler through the console. The Main class is implementing this interface.
  • The second interface is provided for other Java applications. A facade is implementing the interface and is delegating the calls to the corresponding classes.
  • And the third interface is provided for extension developers and is represented by the tsphp-interface-x.x.x.jar (Have a look at Distribution View)

Scoping

Icon

Unfortunately this chapter does not yet exists (see task TSPHP-534 - update chapter architecture of the system specification on wiki Open ) Please have a look at the Bachelor Thesis - Type-Safe PHP: A compile time approach, appendix D6 in the meantime

Type Hierarchy

Icon

Unfortunately this chapter does not yet exists (see task TSPHP-534 - update chapter architecture of the system specification on wiki Open ) Please have a look at the Bachelor Thesis - Type-Safe PHP: A compile time approach, appendix D7 in the meantime

Type Checking

Icon

Unfortunately this chapter does not yet exists (see task TSPHP-534 - update chapter architecture of the system specification on wiki Open ) Please have a look at the Bachelor Thesis - Type-Safe PHP: A compile time approach, appendix D8 in the meantime