![]() |
Multivariate Pattern Analysis in Python |
PyMVPA is a Python module intended to ease pattern classification analysis of large datasets. It provides high-level abstraction of typical processing steps and a number of implementations of some popular algorithms. While it is not limited to neuroimaging data it is eminently suited for such datasets. PyMVPA is truly free software (in every respect) and additionally requires nothing but free software to run. Theoretically PyMVPA should run on anything that can run a Python interpreter, although the proof is yet to come.
PyMVPA stands for Multivariate Pattern Analysis in Python.
This manual does not make an attempt to be a comprehensive introduction into machine learning theory or pattern recognition techniques. There is a wealth of high-quality text books about this field available. A very good example is: Pattern Recognition and Machine Learning by Christopher M. Bishop.
A good starting point to learn about the application of machine learning algorithms to (f)MRI data are two recent reviews by Norman et al. [1] and Haynes and Rees [2].
This manual also does not describe every bit and piece of the PyMVPA package. For more information, please have a look at the API documentation, which is a comprehensive and up-to-date description of the whole package.
More examples and usage patterns extending the ones described here can be taken from the examples shipped with the PyMVPA source distribution (doc/examples/) or even the unit test battery, also part of the source distribution (in the tests/ directory).
| [1] | Norman, K.A., Polyn, S.M., Detre, G.J. & Haxby, J.V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Science 10, 424–430. |
| [2] | Haynes, J.D. & Rees, G. (2007). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7, 523–534. |
The roots of PyMVPA date back to early 2005. At that time it was a C++ library (no Python yet) developed by Michael Hanke and Sebastian Krüger, intended to make it easy to apply artificial neural networks to pattern recognition problems.
During a visit to Princeton University in spring 2005, Michael Hanke was introduced to the MVPA toolbox for Matlab, which had several advantages over a C++ library. Most importantly it was easier to use. While a user of a C++ library is forced to write a significant amount of front-end code, users of the MVPA toolbox could simply load their data and start analyzing it, providing a common interface to functions drawn from a variety of libraries.
However, there are some disadvantages to writing a toolbox in Matlab. While users in general benefit from the powers of Matlab, they are at the same time bound to the goodwill of a commercial company. That this is indeed a problem becomes obvious when one considers the time when the vendor of Matlab was not willing to support the Mac platform. Therefore even if the MVPA toolbox is GPL-licensed it cannot fully benefit from the enormous advantages of the free software development model environment (free as in free speech, not only free beer).
For these reasons, Michael thought that a successor to the C++ library should remain truly free software, remain fully object-oriented (in contrast to the MVPA toolbox), but should be at least as easy to use and extensible as the MVPA toolbox.
After evaluating some possibilities Michael decided that Python is the most promising candidate that was fully capable of fulfilling the intended development goal. Python is a very powerful language that magically combines the possibility to write really fast code and a simplicity that allows one to learn the basic concepts within a few days.
One of the major advantages of Python is the availability of a huge amount of so called modules. Modules can include extensions written in a hardcore language like C (or even FORTRAN) and therefore allow one to incorporate high-performance code without having to leave the Python environment. Additionally some Python modules even provide links to other toolkits. For example RPy allows to use the full functionality of R from inside Python. Even Matlab can be used via some Python modules (see PyMatlab for an example).
After the decision for Python was made, Michael started development with a simple k-Nearest-Neighbour classifier and a cross-validation class. Using the mighty NumPy package made it easy to support data of any dimensionality. Therefore PyMVPA can easily be used with 4d fMRI dataset, but equally well with EEG/MEG data (3d) or even non-neuroimaging datasets.
By September 2007 PyMVPA included support for reading and writing datasets from and to the NIfTI format, kNN and Support Vector Machine classifiers, as well as several analysis algorithms (e.g. searchlight and incremental feature search).
During another visit in Princeton in October 2007 Michael met with Yaroslav Halchenko and Per B. Sederberg. That incident and the following discussions and hacking sessions of Michael and Yaroslav lead to a major refactoring of the PyMVPA codebase, making it much more flexible/extensible, faster and easier than it has ever been before.
Like every other Python module PyMVPA requires at least a basic knowledge of the Python language. However, if one has no prior experience with Python one can benefit from the simplicity of the Python language and acquire this knowledge within a few days by studying some of the many tutorials available on the web.
As PyMVPA is about pattern recognition a basic understanding about machine learning principles is necessary to correctly apply methods with PyMVPA to ensure interpretability of the results.
While most parts of PyMVPA will work without any additional software, some functionality makes use of additional software packages. It is strongly recommended to install these packages as well.
- SciPy: linear algebra, standard distributions
- SciPy is mainly used by the statistical testing and the logistic regression classifier code. However, in the long run SciPy might be used a lot more and could become a required dependency of PyMVPA.
- PyNIfTI: access to NIfTI files
- PyMVPA provides a convenient wrapper for datasets stored in the NIfTI format. If you don’t need that, PyNIfTI is not necessary, but otherwise it makes it really easy to read from and write to NIfTI images.
- Shogun: various classifiers
- PyMVPA currently can make use of several SVM implementations of the Shogun toolbox. It requires the modular python interface of Shogun to be installed. Any version from 0.6 on should work.
- R and RPy: more classifiers
- Currently PyMVPA provides a wrapper around the LARS library.
The following list of software is not required by PyMVPA, but it might make life a lot easier and leads to more efficiency when using PyMVPA.
- IPython: frontend
- If you want to use PyMVPA interactively it is strongly recommend to use IPython. If you think: “Oh no, not another one, I already have to learn about PyMVPA.” please invest a tiny bit of time to watch the Five Minutes with IPython screencasts at showmedo.com, so at least you know what you are missing.
- FSL: preprocessing and analysis of (f)MRI data
- PyMVPA provides some simple bindings to FSL output and filetypes (e.g. EV files and MELODIC output directories). This makes it fairly easy to e.g. use FSL’s implementation of ICA for data reduction and proceed with analyzing the estimated ICs in PyMVPA.
- AFNI: preprocessing and analysis of (f)MRI data
- Similar to FSL, AFNI is a free package for processing (f)MRI data. Though its primary data file format is BRIK files, it has the ability to read and write NIFTI files, which easily integrate with PyMVPA.
- LIBSVM: fast SVM classifier
- Only the C library is required and none of the Python bindings that are available on the upstream website. PyMVPA provides its own Python wrapper for LIBSVM which is a fork based on the one included in the LIBSVM package. Additionally the upstream LIBSVM distribution causes flooding of the console with a huge amount of debugging messages. Please see the Building from Source section for information on how to build an alternative version that does not have this problem.
- matplotlib: Matlab-style plotting library for Python
- This is a very powerful plotting library that allows you to export into a large variety of raster and vector formats, and thus, is ideal to produce publication quality figures.
The easiest way to obtain PyMVPA is to use pre-built binary packages. Currently we provide such packages or installers for the Debian/Ubuntu family and 32-bit Windows (see below). Since version 0.2.2 there is also an initial version of a RPM package for OpenSUSE 10.3. If there are no binary packages for your operating system or platform yet, you can build PyMVPA from source. Please refer to Building from Source for more information.
PyMVPA is available as an official Debian package (python-mvpa; since lenny). The documentation is provided by the optional python-mvpa-doc package. To install PyMVPA simply do:
sudo aptitude install python-mvpa
Backports for the current Debian stable release and binary packages for recent Ubuntu releases are available from a repository at the University of Magdeburg. Please read the package repository instructions to learn about how to obtain them. Otherwise install as you would do with any other Debian package.
There are a few Python distributions for Windows. In theory all of them should work equally well. However, we only tested the standard Python distribution from www.python.org (with version 2.5.2).
First you need to download and install Python. Use the Python installer for this job. Yo do not need to install the Python test suite and utility scripts. From now on we will assume that Python was installed in C:\Python25 and that this directory has been added to the PATH environment variable.
For a minimal installation of PyMVPA the only thing you need in addition is NumPy. Download a matching NumPy windows installer for your Python version (in this case 2.5) from the SciPy download page and install it.
Now, you can use the PyMVPA windows installer to install PyMVPA on your system. If done, verify that everything went fine by opening a command promt and start Python by typing python and hit enter. Now you should see the Python prompt. Import the mvpa module, which should cause no error messages.
>>> import mvpa
>>>
Although you have a working installation already, most likely you want to install some additional software. First and foremost install SciPy – download from the same page where you also got the NumPy installer.
If you want to use PyMVPA to analyze fMRI datasets, you probably also want to install PyNIfTI. Download the corresponding installer from the website of the NIfTI libraries and install it. PyNIfTI does not come with the required zlib library, so you also need to download and install it. A binary installer is available from the GnuWin32 project. Install it in some arbitrary folder (just the binaries nothing else), find the zlib1.dll file in the bin subdirectory and move it in the Windows system32 directory. Verify that it works by importing the nifti module in Python.
>>> import nifti
>>>
Another piece of software you might want to install is matplotlib. The project website offers a binary installer for Windows. If you are using the standard Python distribution and matplotlib complains about a missing msvcp71.dll, be sure to obey the installation instructions for Windows on the matplotlib website.
With this set of packages you should be able to run most of the PyMVPA examples which are shipped with the source code in the doc/examples directory.
To install the provided RPM package for OpenSUSE, simply download it, open a console and invoke (the example command refers to PyMVPA 0.2.2 and OpenSUSE 10.3):
rpm -i pymvpa-0.2.2-1suse10_3.i586.rpm
Please refer to the section about building on OpenSUSE for notes about the installation of the dependencies.
If a binary package for your platform and operating system is provided, you do not have to build the packages on your own – use the corresponding pre-build packages instead. However, if there are no binary packages for your system, or you want to try a new (unreleased) version of PyMVPA, you can easily build PyMVPA on your own. Any recent linux distribution should be capable of doing it (e.g. RedHat). Additionally, building PyMVPA also works on Mac OSX and Windows systems.
The first step is obtaining the sources. The source code tarballs of all PyMVPA releases are available from the PyMVPA project website. Alternatively, one can also download a tarball of the latest development snapshot (i.e. the current state of the master branch of the PyMVPA source code repository).
If you want to have access to both, the full PyMVPA history and the latest development code, you can use the PyMVPA Git repository, which is publicly available. To view the repository, please point your web browser to gitweb:
http://git.debian.org/?p=pkg-exppsy/pymvpa.git
The gitweb browser also allows to download arbitrary development snapshots of PyMVPA. For a full clone (aka checkout) of the PyMVPA repository simply do:
git clone git://git.debian.org/git/pkg-exppsy/pymvpa.git
After a short while you will have a pymvpa directory below your current working directory, that contains the PyMVPA repository.
In general you can build PyMVPA like any other Python module (using the Python distutils). This general method will be outline first. However, in some situations or on some platforms alternative ways of building PyMVPA might be more covenient – alternative approaches are listed at the end of this section.
To build PyMVPA from source simply enter the root of the source tree (obtained by either extracting the source package or cloning the repository) and run:
python setup.py build_ext
If you are using a Python version older than 2.5, you need to have python-ctypes (>= 1.0.1) installed to be able to do this.
Now, you are ready to install the package. Do this by invoking:
python setup.py install
Most likely you need superuser privileges for this step. If you want to install in a non-standard location, please take a look at the –prefix option. You also might want to consider –optimize.
Now you should be ready to use PyMVPA on your system.
From the 0.2 release of PyMVPA on, the LIBSVM classifier extension is not build by default anymore. However, it is still shipped with PyMVPA and can be enabled at build time. To be able to do this you need to have SWIG_installed on your system.
PyMVPA needs a patched LIBSVM version, as the original distribution generates a huge amount of debugging messages and therefore makes the console and PyMVPA output almost unusable. Debian (since lenny: 2.84.0-1) and Ubuntu (since gutsy) already include the patched version. For all other systems a minimal copy of the patched sources is included in the PyMVPA source package (3rd/libsvm).
If you do not have a proper LIBSVM package, you can build the library from the copy of the code that is shipped with PyMVPA. To do this, simply invoke:
make 3rd
Now build PyMVPA as described above. The build script will automatically detect that LIBSVM is available and builds the LIBSVM wrapper module for you.
If your system provides an appropriate LIBSVM version, you need to have the development files (headers and library) installed. Depending on where you installed them, it might be necessary to specify the full path to that location with the –include-dirs, –library-dirs and –swig options. Now add the ‘–with-libsvm’ flag when building PyMVPA:
python setup.py build_ext --with-libsvm \
[ -I<LIBSVM_INCLUDEDIR> -L<LIBSVM_LIBDIR> ]
The installation procedure is equivalent to the build setup without LIBSVM, except that the ‘–with–libsvm’ flag also has to be set when installing:
python setup.py install --with-libsvm
Alternatively, if you are doing development in PyMVPA or if you simply do not want (or do not have sufficient permissions to do so) to install PyMVPA system wide, you can simply call make (same make build) in the top-level directory of the source tree to build PyMVPA. Then extend or define your environment variable PYTHONPATH to point to the root of PyMVPA sources (i.e. where you invoked all previous commands from):
export PYTHONPATH=$PWD
However, please note that this procedure also always builds the LIBSVM extension and therefore also requires the patched LIBSVM version and SWIG to be available.
On Windows the whole situation is a little more tricky, as the system doesn’t come with a compiler by default. Nevertheless, it is easily possible to build PyMVPA from source. Although, one could use the Microsoft compiler that comes with Visual Studio to do it, but as this is commercial software and not everybody has access to it, we will outline a way that exclusively involves free and open source software.
First one needs to install the packages required to run PyMVPA as explained above.
Next we need to obtain and install the MinGW compiler collection. Download the Automated MinGW Installer from the MinGW project website. Now, run it and choose to install the current package. You will need the MinGW base tools, g++ compiler and MinGW Make. For the remaining parts of the section, we will assume that MinGW got installed in C:\MinGW and the directory C:\MinGW\bin has been added to the PATH environment variable, to be able to easily access all MinGW tools. Note, that it is not necessary to install MSYS to build PyMVPA, but it might handy to have it.
If you want to build the LIBSVM wrapper for PyMVPA, you also need to download SWIG (actually swigwin, the distribution for Windows). SWIG does not have to be installed, just unzip the file you downloaded and add the root directory of the extracted sources to the PATH environment variable (make sure that this directory contains swig.exe, if not, you haven’t downloaded swigwin).
PyMVPA comes with a specific build setup configuration for Windows – setup.cfg.win in the root of the source tarball. Please rename this file to setup.cfg (and overwrite the existing one). This is only necessary, if you have not configured your Python distutils installation to always use MinGW instead of the Mircrosoft compilers.
Now, we are ready to build PyMVPA. The easiest way to do this, is to make use of the Makefile.win that is shipped with PyMVPA to build a binary installer package (.exe). Make sure, that the settings at the top of Makefile.win (the file is located in the root directory of the source distribution) correspond to your Python installation – if not, first adjust them accordingly before your proceed. When everything is set, do:
mingw32-make -f Makefile.win installer
Upon success you can find the installer in the dist subdirectory. Install it as described above.
Building PyMVPA on OpenSUSE involves the following steps (tested with 10.3): First add the OpenSUSE science repository, that contains most of the required packages (e.g. NumPy, SciPy, matplotlib), to the Yast configuration. The URL for OpenSUSE 10.3 is:
http://download.opensuse.org/repositories/science/openSUSE_10.3/
Now, install the following required packages:
- a recent C and C++ compiler (e.g. GCC 4.1)
- python-devel (Python development package)
- python-numpy (NumPy)
- swig (SWIG is only necessary, if you want to make use of LIBSVM)
Now you can simply compile and install PyMVPA, as outlined above, in the general build instructions (or alternatively using the method with LIBSVM).
If you have problems compiling the NIfTI libraries and PyNIfTI on OpenSUSE, try the following: Download the nifticlib source tarball, extract it and run make in the top-level source directory. Be sure to install the zlib-devel package before. Now, download the pynifti source tarball extract it, and edit setup.py. Change the line:
libraries = [ 'niftiio' ],
to:
libraries = [ 'niftiio', 'znz', 'z' ],
as mentioned in the PyNIfTI installation instructions. This is necessary, as the above approach does only generate static NIfTI libraries which are not properly linked with all dependencies. Now, compile PyNIfTI with:
python setup.py build_ext -I <path_to_nifti>/include \
-L <path_to_nifti>/lib --swig-opts="-I<path_to_nifti>/include"
where <path_to_nifti> is the directory that contains the extracted nifticlibs sources. Finally, install PyNIfTI with:
sudo python setup.py install
If you want to run the PyMVPA examples including the ones that make use of the plotting capabilities of matplotlib you need to install of few more packages (mostly due to broken dependencies in the corresponding OpenSUSE packages):
- python-scipy
- python-gobject2
- python-gtk
The PyMVPA toolbox was first presented with a poster at annual meeting of the German Society for Psychophysiology and its Application in Magdeburg, 2008. This is currently the prefered way to cite PyMVPA. However, we submitted a paper introducing the toolbox, which should become replace the poster soon.
(needs some more words, for now just a list)
- NumPy, SciPy
- LIBSVM
- Shogun
- IPython
- Debian (for hosting, environment, ...)
- FOSS community
- Credits to individual labs if they officially donate time ;-)