Skip to end of metadata
Go to start of metadata

This project was started in 2012 in conjunction with Robert Stoll's bachelor thesis. This rough concept reflects the rough concept created for the bachelor project. Most points still apply for the continuing project (for the current status). However, aim and objectives go further and the project tries to extend the scope. Furthermore, requirements as well as the presented architecture here are just rough idea (made at the beginning of the bachelor project). One should have a look at Customer Requirements of TSPHP and System Specification of TSPHP for preciser and more accurate information about the requirements and the architecture.

Table of Contents

Background Information

PHP was created in 1994 by Rasmus Lerdorf and is now maintained and further developed by several core developers, and is published under the PHP License v3.01, a permissive open source license/BSD-style license. PHP started as a procedural programming language, then became an object oriented programming language starting with PHP 5. Zend Technologies Ltd stated in their research ZEND DEVELOPER PULSE (2012: 2) that the PHP community consists of 5 million developers, and in consideration of the fact that famous websites such as Facebook, Wikipedia and many more use PHP, PHP has become one of the leading programming languages for the web over the years. This is fortified by Usage Statistics and Market Share of Server-side Programming Languages for Websites, November 2012 (w3techs 2012) which states 78.3% of the top 1 Million websites uses PHP, and by the TIOBE Programming Community Index for November 2012 (TIOBE 2012) which ranks PHP as the 5th most popular programming language right before C# and just after C, Java, Objective-C and C++. However, these other languages are seldom used for web development if used at all.

Initial Situation

The current version of PHP, PHP 5.5, has extremely basic support for type safety (namely only for parameters in methods/functions). PHP 6 was aspiring to address this issue by improving type safety for return values of methods/functions but the development for PHP 6 as it was once planned is suspended so far, the goals would probably change respectively.

Type safety for PHP would bring about several desirable enhancements and opportunities - not only better understanding of source code (since variables, return values of methods and method parameters can only have a single data type), but also more secure web applications (as input can be matched to a safe type to avoid malicious input) and better support for programmers using an Integrated Development Environment (IDE). For instance, without type safety it is not possible for IDEs to fully support refactoring. As an example, ‘Rename Method’ is supported by IDEs by showing recommended changes to the user. Those recommendations are without any guarantee and thus are a source of potential bugs. The better support from IDEs would also result in more sophisticated code completion functionalities and would result in faster development.

Despite the benefits of type safety, the language developers have not yet introduced it. One reason, which was mentioned several times by core developers, was that PHP should remain a programming language which is easy learnable and thus perfectly for beginners. This project does not intend to replace PHP. PHP shall remain as it is and TSPHP shall be the nice alternative for those developers who wish type safety in PHP. Thus the change of the PHP core is currently not an option, the change of the language specification of PHP 5.5 is not an option either.

Aim and Objectives

Aim:

Define a language specification for TSPHP followed by an implementation of a suitable compiler/translator (in Java). In addition, the author of this project will explore and review the strengths and weaknesses of TSPHP and investigate how interested the community would be in developing it further.

Hypothesis:

Investigate whether a static type-safe PHP is possible by translating from a meta-PHP language to PHP 5.4 (i.e. without changing the underlying language specification of PHP 5.4 or its interpreter), and whether it can deliver value to the PHP community.

Objectives:

  1. Explore and review the strengths, weaknesses, opportunities and threats (SWOT) of TSPHP through primary research (performance tests) and identifying relevant case studies (secondary data).
  2. Conduct a survey to find the PHP community's position regarding type safety (primary research).
  3. Review techniques for ensuring type-safety in the design of computer programming languages (literature review).
  4. Research and write a language specification for TSPHP inclusive a grammar of TSPHP for Antlr v.3  and the corresponding transformations to PHP 5.4. (Focus on classical procedural programming and classical object oriented programming to be made type-safe – proof of concept).
  5. Gain an understanding of the architecture and design of compilers and how compilers work (algorithms) to be able to analyse and develop a requirement specification for TSPHP and the corresponding compiler.
  6. Develop a compiler (in Java) which transforms/translates TSPHP into PHP 5.4.
  7. Set up a project environment which consists of a project management system (Jira), issue and bug tracking system (Jira), a wiki (Confluence) for documentation, specifications etc., and a build server (Jenkins) - bitbucket will be used as version control system.
  8. Use good project management practices and quality assurance by developing project standards (Java code conventions, test philosophy, version control philosophy) and achieving a code coverage (lines and branches) >= 80%.
  9. Draw conclusions against the hypothesis of the project with clear recommendations for the progress and approach for any future development of TSPHP.
  10. Make critical evaluation of the project and approach.

Scope

PHP 5.4 supports a variety of concepts (e.g. procedural programming, object oriented programming, higher-order functions, traits etc.), has implemented nearly 5000 functions and 50 classes/interfaces to support the developer. It will not be possible to cover everything during the bachelor project. The bachelor project shall be a proof of concept and a good basis to develop TSPHP further. It will focus on classical procedural programming and classical object oriented programming.

The language specification of TSPHP will especially not contain:

And the compiler will not know:

The page Customer Requirements of TSPHP will define whether some of the 5000 functions or 50 classes/interfaces will be used as part of the project, and the page will also define further restrictions.

Criterion for abandoning the project

There is no criterion for abandoning the project, since this is a final year project and not a project in the industry. If this had been a project in the industry, then the following criterion for abandoning the project would have been defined:

An Online Survey wil be conducted. An analysis for the question 'Furhter development?' will be conducted on 26.02.2013 (end of survey 25.02.2013). If the specified criterions for proceeding the development will not be reached, then the project will be abandoned.

However, even it is a final year project, there is a criterion for abandoning certain development tasks . If the specified criterions for proceeding the development will not be reached, then all tasks which would add new features to the compiler (e.g. TSPHP-197 - time for some fancy stuff 1 Closed ) will be abandoned.

Requirements

This chapter describes roughly the main requirements of the language specification and the compiler. A more detailed list can be found in Customer Requirements of TSPHP

  • The language specification shall cover the concepts of classical procedural programming and the concepts of classical object oriented programming (see chapter Scope above)
  • It shall be possible to use the compiler from the shell (command prompt)
  • The compiler shall be designed for extensibility considering several output formats (different PHP versions).
  • The compiler shall be designed for automation ( in all probability it will be used within an IDE)

The following figure shows the main use cases. Further uses cases and their description can be found in Use cases.e

Outcomes/Deliverables

  1. Analysis of the results of the survey on the position which the PHP community take regarding type safety.
  2. SWOT analysis of TSPHP.
  3. Online project management environment (Jira, Confluence and Jenkins - source code in bitbucket repository).
  4. Requirements on TSPHP and the compiler (e.g. which features/functions of PHP 5.4 should TSPHP support, which new features should TSPHP have, what is not implemented during the final year project).
  5. Language specification of TSPHP and the corresponding transformations to PHP 5.4. (Focus on classical procedural programming and classical object oriented programming to be made type-safe – proof of concept).
  6. Grammar specification of TSPHP for Antlr v.3.
  7. System specification of the compiler.
  8. Executable compiler which translates TSPHP into PHP 5.4. (Java).
  9. Source code of the compiler and the corresponding unit tests (Java source code).
  10. Java code convention for the compiler.
  11. Code coverage report in HTML (JaCoCo report).
  12. Report of the final year project (will include other project results mentioned above).

Business Case

The overall benefits of TSPHP can roughly be divided in to three components:

  • Better understanding of source code and better support for the developer within Integrated Development Environments (IDEs)
    • Variables, return values of methods and method parameters can only have a single data type.
    • It is clear to which type each variable belongs to at every point in the source code.
      • IDEs always know which methods and attributes belongs to an object and thus can make superior and faster code completion recommendations (no vaguely recommendations anymore).
      • It enables IDEs to provide better refactoring methods. For instance, it should be possible to do a "Rename Method" completely automatically (no vagely recommendations anymore).
      • IDEs can better support other functionalities such as 'Goto Source', 'Find Usages', 'Call Hierarchy' etc.
      • It enables IDEs to find type discrepancies before runtime.
      • It enables IDEs to suggest code improvements such as 'This method could be made private', 'This method is never called', 'The type of the variable could be more abstract' etc.

  • Support of new (language) features
    • Some concepts are not know in PHP such a Generics, Operator Overloading, Properties as in C#, Object Initialisation as in C# and many more. Those concepts could be introduced in TSPHP which would result in more sophisticated and faster development.
    • Concepts of PHP 5.4 which are not available in PHP 5.3 (PHP 5.2, PHP 5.1 etc.) such as Traits, Short Array Syntax etc. could be made availble through an corresponding PHP 5.3 translator. 
    • The compiling step enables new possibilites such as:
      • Code optimisation
      • Code obfuscating
      • Smaller PHP code since comments and documentation can be removed, ergo smaller releases.
      • Aspect Oriented Programming (AOP)

  • More robust and securer web applications
    • Type discrepancies are detected during compiling (if IDE does not support this feature) and not until runtime.
    • Input can be matched to a safe data type to avoid malicious input (of course, it is still in the responsibility of the developer to do that). For instance a variable of type int cannot contain SQL injection.

A benefit calculation is omitted in this place since this project is of general interest and an estimation would be more than vague.

Rough Planning

This project will use HTAgil, an iterative-incremental process model, during this project. HTAgil split the process of a project into 4 phases: Preparation, Elaboration, Construction and Transition. Theses phases are then broken down into several iterations whereby the iterations contain several tasks.
In the following figure you will see a rough planning of the project whereby only phases and iterations are shown to keep it clearly. The names underneath the phases define the main deliverables during the corresponding iterations. Each iteration corresponds to a version in the online issue tracking system and is represented here in the column "Ver.". Each task is assigned to a version and the version number tells us about to which milestone it contributes (2nd digit) and to which iteration within the milestone the task belongs (3rd digit). For instance a task assigned to the version 0.2.1 contributes to the milestone 2 and belongs to the iteration 1 within the milestone 2. The column "Dur." stands for duration whereby w stands for weeks (7 days per week) and d for day(s). Weekends, holidays or days off are not considered. For instance the phase Elaboration takes place from 26.11.2012 to 28.12.2012 in which I take holidays from 13.12.2012 to 21.12.2012 and a day off on 24.12.2012. Nevertheless the duration is still 4w 4d.

 

Icon

Since I use a rolling wave planning this figure is not binding and will slightly change during the project. Project controlling is conducted after every iteration and a replanning will be made if necessary.

The following lists show roughly which tasks have to be done in each iteration. The level of detail diminishes with later iterations and the corresponding tasks become bigger and have to be broken down during the project. Iterations after 0.2.0 are not listed since they would just contain one single task which represents the same as the deliverable in the figure above. This list will not be updated during the project. A more detailed view, which is going to be up-to-date over the whole project, can be found in the online issue tracking system.

Iteration 0.0.1
  • Evolve the project idea
  • Search for suitable secondary data for background research on compiler
  • Clarify if the project can be developed and published under a permissive open source license (Apache License 2.0)
  • Write the project proposal and post it on moodle
  • Visit the lectures of 300COM
Iteration 0.0.2
  • Set up the online project environment
  • Visit a workshop about CU Harvard Reference Style
  • Start with the rough concept - especially rough planning
  • Write a first draft of the detailed project proposal
  • Background research about compiler
  • Visit the lectures of 300COM
Iteration 0.1.0
  • Write the final version of the detailed project proposal and post it on moodle
  • Fill in the online ethics
  • Complete the rough concept
  • Application integration in the project environment
  • Write the Java code convention
  • Write the test philosophy
  • Background research about compiler
  • Visit the lectures of 300COM
Iteration 0.1.1
  • Background research about compiler
  • Background research about type safety
  • Set up online survey
  • Grammar of a simple calculator for Antlr v.3
  • Parser for the simple calculator
  • Start development of the language specification for TSPHP
Iteration 0.1.2
  • Background research about type safety
  • Background research about PHP 6
  • performance tests
  • first draft of SWOT analysis
  • Further development of the language specification for TSPHP
Iteration 0.2.0
  • Finalise language specification for TSPHP (see Scope above)
Iteration 0.2.1

...

Iteration 0.5.0
  • print report and hand it in

Approach

This project adopts a static safety paradigm and introduces Type-Safe PHP (TSPHP) on a high level in order to avoid altering the current specification of PHP 5.4 or the current interpreter. TSPHP will be a mild extension of PHP syntax, and would be translated to PHP 5.4 (and hence can be run using the current interpreter). The focus will be on static type safety which addresses type safety during compile-time only, i.e. during translating TSPHP to PHP 5.4.

Context Diagram

The following figure shows a context diagram. The common context for a compiler might be its usage in IDEs. However, also build servers and other automatisation tools could use it and finally a user could use it over the shell (comand prompt).
The data flow is more or less the same for every actor. The actor defines which files shall be translated and the compiler translates them and save the result in new files. Optionally a Java system could use the interface to translate streams and retrieve the result as streams.

Decomposition

The following figure shows the rough decomposition of the compiler. The decomposition will be pushed forward in the System Specification of TSPHP.

Icon

Please be aware, that the information below is obsolete in some parts and is kept here for traceability purposes (development process). Have a look at the System Specification of TSPHP

 


The compiler will contain at least five sub components: input, lexer, parser, tree parser and output.

The input component is in charge of retrieving data from an actor (see Context Diagram above). The lexer then does lexical analysis of the input and will return tokens as output. The parser will parse the tokens, conduct type checking etc. and return an abstract syntax tree (AST). The tree parser will use the AST to translate the tree to PHP 5.4. Finally the output component returns the translated data back to the actor (see Context Diagram above). The data flow is shown in the next figure.

Deployment

The following two figures shows roughly how the compiler will be deployed. It is very simplified and will be refined in the System Specification of TSPHP. The distribution as shown in the figure left will be used in most cases. The distribution shown in the right will include the java 1.7 jre thus the compiler can be run in a non Java environment.

                                 

Project Organisation

The project organisation is very simple at the moment (see the following figure). This is due to the fact that the project is a one man show at the moment and there is no client involved. Robert Stoll has the lead of the project. There is no advisory board nor a project office. However, there exists some kind of a steering committee which consists of two supervisors (UK and CH) and one 2nd marker (UK).

Reference List

TIOBE (2012) TIOBE Programming Community Index for November 2012 [online] available from <http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html> [04 Nov 2012]

w3techs (2012) Usage Statistics and Market Share of Server-side Programming Languages for Websites, November 2012 [online] available from <http://w3techs.com/technologies/overview/programming_language/all[14 Nov 2012]

Zend Technologies Ltd (2012) ZEND DEVELOPER PULSE: Taking the Pulse of the Developer Community [online] available from <http://static.zend.com/topics/zend-developer-pulse-survey-report-Q2-2012-0612-EN.pdf> [31 Oct 2012]