Table of Contents
This page describes the system specification of the compiler. Please have a look at Motivation behind TSPHP, Vision of TSPHP and Goals of TSPHP to get a better understanding of the global context and goal. More information can also be found in Rough Concept of the bachelor project but be aware that it refers to the bachelor project which is already finished (yet, many information still apply). In addition, have a look at the Customer Requirements of TSPHP if you are not familiar with them.
Very briefly, the compiler will be used to compile TSPHP, check if the code is type safe and translate it to PHP 5.4. The compiler is designed in a way that further steps such as optimisation, code weaving, code obfuscating etc. could be added easily. Furthermore it is designed to support multiple output formats.
The following chapter describe more requirements for the compiler.
Software Requirements and Design Decisions
The following software requirements precise the Customer Requirements of TSPHP and the corresponding design decisions have been made to support them. For a better overview, the requirements have been split up in different sub chapters.
The follwing key applies for all tables in all chapters:
Affects: Mention which customer requirement is refined.
Explicit Design Decisions
The following design decisions describe what has to be done to support the affected customer requirement rather than refine it.
|SysD1||ANTLR v3.5 Grammar||An ANTLR v3.5 based grammar for TSPHP must be created which generates an AST||All requirements in chapter Language Specification, |
NF6 - Independence
An ANTLR v3.5 based grammar improve the portability to another language such as C, C++, C# etc. Furthremore, the AST
|SysD2||ANTLR v3.5 Tree Grammar||An ANTLR v3.5 based tree grammar for TSPHP must be created.||All requirements in chapter Language Specification, |
NF6 - Independence
As well as the normal grammar an ANTLR v3 based tree grammar improve the portability to another language such as C, C++, C# etc. Furthermore ANTLR can generate a tree walker out of a tree grammar. The type checker, translater (and code optimiser in the future) should use a tree grammar.
|SysD3||Independent parser, typechecker, translator||All components should be independent from the compiler and from each other (excluding tests, where integration tests might use other components). However, dependencies to the common component and ANTLR are fine.||NF6 - Independence, |
NF7 - Testability
|Independence allows using the , the type checker as well as the translator component in another context than TSPHP. For instance, the type checker and the translator could be used in an AOP implementation for TSPHP.|
|SysD4||Output format components||The different output formats shall be implemented as independent components||This way the design of the compiler is forced to support different output formats, thus a well-defined interface has to be defined. This interface could be implemented by another output format component and this component could be used instead of the PHP 5.4 output format component (or amongst).|
|SysD5||Output - directory - structure||O3 - Output - directory - structure||Since it is maybe not applicable for each output component to produce a hierarchy structure, it is better that each output format component decide on its own if this part of the component is configurable or not.|
|SysD6||Output - directory - structure strategy components||Each output directory strategy shall be implemented as independent component.||This way it is possible to reuse the output strategy by different output format components.|
|SysD7||Omit comments||O7 - Omit comments||It could be that for one output format component (SysD4) it is essential that comments are not omitted and another output format component cannot deal with comments etc. This way we have a more flexible solution.|
|SysD8||AST and IWalker interface||The parser of the compiler will generate an TSPHPAst (TSPHP's abstract syntax tree) and there will be an simple interface IWalker with the method TSPHPAst walk(TSPHPAst ast).||NF11 - Additional steps||The well defined and simple interface IWalker will enable additional step such as code optimisation, code obfuscating, code weaving etc. which should facilitate a contribution to TSPHP|
|SysD9||Exchange ANTLR||The compiler shall encapsulate the access to generated code from ANTRL v3||NF1 - Exchange ANTLR v3.5||Changes to the ANTLR interface in the future will not result in a big change of the compiler. Only the class/component which encapsulates the access has to be changed. Furthermore, it would be easier to exchange ANTLR with another compiler-generator|
|SysD10||AST and ANTLR||The class CommonTree from ANTLR will be used as base class for our TSPHPAst representation. The downside of this decision will be, that every extension/add-on of the compiler (such as code optimiser, code obfuscator etc.) will depend on ANTLR. This goes against NF1 - Exchange ANTLR v3.5 but seems to be the better choice (see 14.01.2013 - Review meeting architecture).||NF1 - Exchange ANTLR v3.5||Lower maintenance, if an own implementation would be used, then we would need to fix bugs on our own if ANTLR recognise a bug in their implementation. Furthermore we would need to change our implementation every time ANTLR change something as well. This way ANTLR makes the change for us and we just need to ensure, that the tests still pass when we upgrade to a new ANTLR version.|
Split type checking in several phases
Split the type checking phase in definition, reference and checking phase.
The code is easier understandable and thus easier to maintain and a contribution to the project should be facilitated as well.
|SysM1||Property file||The compiler shall be configurable through a property file||m||1||1||2||NF8 - Configurable||The developer can configure the compiler without recompiling the compiler|
|SysM2||Libraries||The compiler shall support libraries. A library contains one or more pre-compiled classes.||d||2||2||1||NF9 - Distribution of TSPHP||Would simplify the distribution of pre-compiled classes and also the configuration management.|
|SysI1||Property file as parameter||The compiler shall provide an interface* which allows to pass a path to a property file. All properties in this file shall overwrite or extend the configuration made in the compiler's property file (SysM1).||m||1||1||1||I1 - Overwrite Standard Config, NF8 - Configurable||The developer has the possibility to use different configurations without rewriting or replacing the property file each time.|
|SysI2||Ant Task||Create an Ant task for the compiler which uses a file set as input||d||2||2||1||NF5 - Automation, I5 - Input - file set indirectly also I3 - Input - file and I4 - Input - directory||Automation. Ant is a commonly known way to automate development processes.|
|SysI3||Classpath||The compiler shall provide an interface* which allows to 'link' pre-compiled classes and libraries||d||1||2||1||NF9 - Distribution, NF8 - Configurable||Re-usability and performance.|
|SysI4||No more InputStreams||The compiler shall provide a method which declares, that no more input streams will be added.||m||1||1||2||In the case where not the console interface is used, where every system calls equals to one compilation, it is necessary, that one can tell the compiler, that all desired input streams have been added. Otherwise it is not possible for the compiler to know when it can actually start type checking the parsed code.|
|SysO1||multiple output formats||It shall be possible to compile code and translate it to multiple output formats in one run||d||3||2||1||Better performance. It is not necessary to run the compiler several times for different output formats.|
|SysO2||Output format PHP 5.4 - property file||The output format component (SysD4) for PHP 5.4 shall be configurable through a property file.||m||2||1||1||O6 - Output format PHP 5.4||Customisation. The developer can configure the component without recompiling.|
|SysO3||Output - directory - path for PHP 5.4||The output format component (SysD4) for PHP 5.4 shall be configurable (NF8) concerning the name of a sub directory which will hold the output of this component. Standard should be php5.4||d||1||2||1||O2 - Output - directory - path||Since multiple output formats can be used at one run it is necessary that output format components have an own output directory folder within the global defined directory folder and that the name is configurable to solve conflicts.|
|SysO4||Output - directory - structure strategies for PHP 5.4||The output format component (SysD4) for PHP 5.4 shall be configurable (NF8) concerning the output directory structure. Standard should be the global configuration (see O3 - Output - directory - structure)||d||1||3||1||O4 - Output - directory - structure strategies||Customisation. The developer can choose between different output structures.|
|SysO5||Output format(s) as parameter||The compiler shall provide an interface* which allows to set explicitly the output format(s). This definition has precedence to the configured output format.||d||2||1||1||The output format could also be changed through Property file as parameter. However, if the output format is the only property in that file it is quite a lot overhead. An additional argument fulfills better the need of the developer.|
|SysO6||Output format PHP 5.4||The compiler shall comprise an output format component for PHP 5.4||m||2||3||3||O6 - Output format PHP 5.4,||The aim is to translate to PHP 5.4.|
|SysO7||Omit comments PHP 5.4||The output format component (SysD4) for PHP 5.4 shall be configurable (NF8) concerning omitting comments or not. Standard should be the global configuration||d||3||1||1||O7 - Omit comments|
Smaller PHP 5.4 files. However, comments can be useful if the PHP files are used in other projects which are not using TSPHP or if classes, methods etc. are annotated.
|SysO8||Only pre-compile||It shall be configurable (NF8) whether the full translation process shall be conducted or only the pre-compilation. Makes only sense if pre-compiled input is stored in files (SysO9) or the pre-compiled stream is retrieved by get pre-compilation (SysO10)||d||1||2||1||NF9 - Distribution of TSPHP||Performance. For instance, if only the pre-compiled files are used it is not necessary to do the whole translation step.|
|SysO9||It shall be configurable (NF8) whether the pre-compiled input shall be stored in files or not.||d||1||3||1||NF9 - Distribution of TSPHP||Re-usability and performance. The pre-compiled files can be used in other projects and do not have to be fully compiled again.|
|SysO10||Get pre-compiliation||The compiler shall provide an interface to retrieve the pre-compilation streams including identifier||d||2||3||1||NF9 - Distribution of TSPHP||Useful if the pre-compiled streams are not stored in files but in another media (for instance in a database).|
Constraints have been already defined in Customer Requirements and Rough Concept of the bachelor project > Scope
The following list describes further design decisions which result in constraints compared to normal PHP.
|More consist code|
These statements are described in Valid in PHP but not in TSPHP
|All requirements in chapter Language Specification|
Possible errors can be avoided and code is more readable.
This chapter describes the four views of the compiler by name the contextual, the block, the runtime and the distribution view
The following figure shows a context diagram. The common context for a compiler might be its usage in IDEs. However, also build servers and other automatisation tools could use it and finally a user could use it over the shell (comand prompt).
The data flow is more or less the same for every actor. The actor defines which files shall be translated and the compiler translates them and save the result in new files. Optionally a Java system could use the interface to translate streams and retrieve the result as streams.
In reality it is a little bit more complicated than that. Not only files and streams can be used to define what data the compiler should compile. Have a look at the requirements for input above. Also the output is not necessarily only translated code. It is also possible to retrieve pre-compiled code (see SysO8, SysO9 and SysO10).
The following figure shows the block view of the compiler (please click on the image to view it in its original size).
The compiler provide two interfaces. One for system calls and another one for Java applications. The controller is the heart of the compiler. It controlls the compilation process as shown in the runtime view. Have a look at the runtime view to get a better understanding how the classes work together.
The controller depends on a IConfig implementation, the parser component and several classes which are implementing the IWalker interface, IChecker interface respectively. The IConfig implementation will be responsible to load the necessary modifiers, checkers, optimisers and translators. In the figure above, the PropertiesConfig class, which is loading the configuration from properties files, is implementing the IConfig interface. It is not necessary that properties files are used to store and retrieve the config. As long as the corresponding class implements the IConfig interface.
Separation of the different responsibilites (presentation, business logic, data) by layers has been set aside on purpose, since one key factor to success is performance for a compiler and layers would unnecessarely slow down the compiler. Nevertheless, seperation of concern and easy extensibility is still incorporated in the compiler. You can see two classes with dotted borders: PHP53Translator and Optimiser. Those two classes will not be implemented in this project, but serve as illustration how easy is is to extend the compiler (have a look at the requirement NF11 - Additional steps).
The following figure shows the parser component in detail (please click on the image to view it in its original size).
The parser heavily depends on ANTLR v3 and TSPHPLexer as well as TSPHPParser are generated from the corresponding grammar file (can be found in the version control system). The parser is implemented as own component thus it can be reused in other contexts than TSPHP (see also SysD1)
The following figure shows an activity diagram of one compilation (please click on the image to view it in its original size).
The compilation process comprises five steps:
- First of all an actor (here IDE) has to add InputStreams to the compiler. The parsing already starts with the first added InputStream to be more efficient. Once all InputStream have been added, the corresponding method is called on the compiler to indicate it. If this has happend and as soon as all InputStreams have been parsed then the next step can take place.
- After the code has been parsed it will be validated against type safety.
- The fourth step is also optional and is meant for optimisers (be aware, that modifications of optimisers must not break type safety).
- The last step involves the translation to an output format (in this project PHP 5.4). The translations will be done parallely if multiple output formats exist.
TSPHP will be distributed using an archive (for instance zip). The extracted directory structure would look as follows:
This structure could also be used by extensions/addons. For instance, the output format component for php 5.3 could be an archive as well with the following structure:
A developer would need then only to extract the archive to the tsphp folder and modify the tsphp.properties file to activate the new output format component.
The internal interfaces which allow integrating the different components are located in the
The following interfaces are used:
- IParser – for the parser component
- ITypeChecker – for the type checker component
- ITranslator – for the different output components
There are three external interfaces. Both are shown in Block View.
- The first one is provided for system calls. As an example, it can be used by a developer to execute the compiler through the console. The Main class is implementing this interface.
- The second interface is provided for other Java applications. A facade is implementing the interface and is delegating the calls to the corresponding classes.
- And the third interface is provided for extension developers and is represented by the tsphp-interface-x.x.x.jar (Have a look at Distribution View)