Coding Style

A brief description of things we strive at, more or less unsuccessfully.

1. Packaging

We use the classical GNU Autoconf/Automake approach, for tutorial see e.g. Learning the GNU development tools or the AutoBook.

2. Modules

Invenio started as a set of pretty independent modules developed by independent people with independent styles. This was even more pronounced by the original use of many different languages (e.g. Python, PHP, Perl). Now the Invenio code base is striving to use Python everywhere, except in speed-critical parts when a compiled language such as Common Lisp may come to the rescue in the near future.

When modifying an existing module, we propose to strictly continue using whatever coding style the module was originally written into. When writing new modules, we propose to stick to the below-mentioned standards.

The code integration across modules is happening, but is slow. Therefore, don't be surprised to see that there is a lot of room to refactor.

3. Python

We aim at following recommendations from PEP 8, although the existing code surely do not fulfil them here and there. The code indentation is done via spaces only, please do not use tabs. One tab counts as four spaces. Emacs users can look into our Emacs Tips wiki page for inspiration.

All the Python code should be extensively documented via docstrings, so you can always run pydoc file.py to peruse the file's documentation in one simple go. We follow the epytext docstring markup, which generates nice HTML source code documentation.

Do not forget to run pylint on your code to check for errors like uninitialized variables and to improve its quality and conformance to the coding standard. If you develop in Emacs, run M-x pylint RET on your buffers frequently. Read and implement pylint suggestions. (Note that using lambda and friends may lead to false pylint warnings. You can switch them off by putting block comments of the form ``# pylint: disable=C0301''.)

Do not forget to run pychecker on your code either. It is another source code checker that catches some situations better and some situations worse than pylint. If you develop in Emacs, run C-c C-w (M-x py-pychecker-run RET) on your buffers frequently.

You can check the kwalitee of your code by running ``python modules/miscutil/lib/kwalitee.py --check-all *.py'' on your files. This will run some basic error checking, warning checking, indentation checking, but also compliance to PEP 8. You can also check the code kwalitee stats across all the modules by running ``make kwalitee-check'' in the main source directory.

Do not hardcode magic constants in your code. Every magic string or a number should be put into accompanying file_config.py with symbol name beginning by cfg_modulename_*.

Clearly separate interfaces from implementation. Document your interfaces. Do not expose to other modules anything that does not have to be exposed. Apply principle of least information.

Create as few new library files as possible. Do not create many nested files in nested modules; rather put all the lib files in one dir with bibindex_foo and bibindex_bar names.

Use imperative/functional paradigm rather then OO. If you do use OO, then stick to as simple class hierarchy as possible. Recall that method calls and exception handling in Python are quite expensive.

Use rather the good old foo_bar naming convention for symbols (both variables and function names) instead of fooBar CaMelCaSe convention. (Except for Class names where UppercaseSymbolNames are to be used.)

Pay special attention to name your symbols descriptively. Your code is going to be read and work with by others and its symbols should be self-understandable without any comments and without studying other parts of the code. For example, use proper English words, not abbreviations that can be misspelled in many a way; use words that go in pair (e.g. create/destroy, start/stop; never create/stop); use self-understandable symbol names (e.g. list_of_file_extensions rather than list2); never misname symbols (e.g. score_list should hold the list of scores and nothing else - if in the course of development you change the semantics of what the symbol holds then change the symbol name too). Do not be afraid to use long descriptive names; good editors such as Emacs can tab-complete symbols for you.

When hacking module A, pay close attention to ressemble existing coding convention in A, even if it is legacy-weird and even if we use a different technique elsewhere. (Unless the whole module A is going to be refactored, of course.)

Speed-critical parts should be profiled with pyprof or our built-in web profiler (&profile=t).

The code should be well tested before committed. Testing is an integral part of the development process. Test along as you program. The testing process should be automatized via our unit test and regression test suite infrastructures. Please read the test suite strategy to know more.

Python promotes writing clear, readable, easily maintainable code. Write it as such. Recall Albert Einstein's ``Everything should be made as simple as possible, but not simpler''. Things should be neither overengineered nor oversimplified.

Recall principles Unix is built upon. As summarized by Eric S. Reymond's TAOUP:

or the golden rule that says it all: ``keep it simple''.

Think of security and robustness from the start. Follow secure programming guidelines.

For more hints, thoughts, and other ruminations on programming, see our CDS Invenio wiki, notably Git Workflow and Invenio QA.

3. MySQL

Table naming policy is, roughly and briefly:

- end of file -