[f2py] The long promised notes, and a question about the Intel compiler

Fernando Perez fperez@colorado.edu
Fri, 18 Apr 2003 17:19:06 -0600


This is a multi-part message in MIME format.
--------------060300010007070901050903
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hi all,

I'd promised a while back that, in return for all the help here, I'd 
contribute a set of notes on f2py's usage.  Well, since I keep on asking 
questions, I better hold my end of the bargain :)  Here are the notes, free 
for anyone to use/include in other documents/websites.

Now that I've paid my dues, here is my question:  I've been struggling for the 
past few days trying to build an extension using the Intel 7.0 compiler 
(incidentally, the setup.py included in the attached notes is the one used to 
build this extension).

I made sure that _all_ the libraries this thing needed 	were hand-built using 
the Intel compiler.  However, when I try to import the final module (which 
builds just fine), I get an ImportError with undefined symbols s_wsfe and the 
like.  A fair bit of googling unearthed indications that these are internal IO 
routines from the system's libraries, which the Intel compiler apparently has 
problems with.

My best guess is that they are being triggered when the Intel-compiled code is 
linked with the gcc-compiled C code generated by f2py.  Is there a way out of 
this mess?  Is it possible to instruct f2py to use the intel _C_ compiler 
instead of gcc for the final linking of the shared module?  Or is there any 
way to get the Intel compiler to behave sanely with these libraries?

In the meantime I switched over to using the Lahey compiler, for which 
fortunately Pierre Schnizer had done most of the work.  This gets me off the 
hook for now, as I can concentrate on other issues.  But in the future I'll 
probably need to go back to using the Intel compiler, so I'd like to know if 
there is a solution for this.

Before you say 'use g77', this is Fortran90 code, so I just can't use the Gnu 
compilers, period.  And until I can convince the guy who wrote the code to 
rewrite it in F77, I'm stuck :(

One last question: is there a way to specify the compiler to use either at the 
command line or inside the setup.py file?  I know how to do it if calling f2py 
directly, but I haven't been able to find the proper trick when the extension 
is built via distutils.

Thanks in advance for any help, and I hope the included notes are useful to 
someone.

Best,

Fernando.

--------------060300010007070901050903
Content-Type: text/plain;
 name="f2py_notes"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="f2py_notes"

F2PY
====

Author: Fernando Perez <fperez@colorado.edu>.  Please send all
comments/corrections to this address.

This is a rough set of notes on how to use f2py.  It does NOT substitute the
official manual, but is rather meant to be used alongside with it.  

For any non-trivial poject involving f2py, one should also keep at hand Pierre
Schnizer's excellent 'A short introduction to F2PY', available from
http://fubphpc.tu-graz.ac.at/~pierre/f2py_tutorial.tar.gz


Usage for the impatient
-----------------------

Start by building a scratch signature file automatically from your Fortran
sources (in this case all, you can choose only those .f files you need):

f2py -m MODULENAME -h MODULENAME.pyf *.f

This writes the file MODULENAME.pyf, making the best guesses it can from the
Fortran sources.  It builds an interface for the module to be accessed as
'import adap1d' from python.

You will then edit the .pyf file to fine-tune the python interface exhibited
by the resulting extension.  This means for example making unnecessary scratch
areas or array dimensions hidden, or making certain parameters be optional and
take a default value.

Then, write your setup.py file using distutils, and list the .pyf file along
with the Fortran sources it is meant to wrap.  f2py will build the module for
you automatically, respecting all the interface specifications you made in the
.pyf file.

This approach is ultimately far easier than trying to get all the declarations
(especially dependencies) right through Cf2py directives in the Fortran
sources.  While that may seem appealing at first, experience seems to show
that it's ultimately far more time-consuming and prone to subtle errors.
Using this approach, the first f2py pass can do the bulk of the interface
writing and only fine-tuning needs to be done manually.  I would only
recommend embedded Cf2py directives for very simple problems (where it works
very well).

The only drawback of this approach is that the interface and the original
Fortran source lie in different files, which need to be kept in sync.  This
increases a bit the chances of forgetting to update the .pyf file if the
Fortran interface changes (adding a parameter, for example).  However, the
benefit of having explicit, clear control over f2py's behavior far outweighs
this concern.


Choosing a default compiler 
---------------------------

Set the FC_VENDOR environment variable.  This will then prevent f2py from
testing all the compilers it knows about.


Using Cf2py directives
----------------------

For simpler cases you may choose to go the route of Cf2py directives. Below
are some tips and examples for this approach.

Here's the signature of a simple Fortran routine:

	subroutine phipol(j,mm,nodes,wei,nn,x,phi,wrk)
	
	implicit real *8 (a-h, o-z)
	real *8 nodes(*),wei(*),x(*),wrk(*),phi(*)
	real *8 sum, one, two, half

The above is correctly handled by f2py, but it can't know what is meant to be
input/output and what the relations between the various variables are (such as
integers which are array dimensions).  If we add the following f2py
directives, the generated python interface is a lot nicer:

	subroutine phipol(j,mm,nodes,wei,nn,x,phi,wrk)
c
c       Lines with Cf2py in them are directives for f2py to generate a better 
c	python interface.  These must come _before_ the Fortran variable 
c       declarations so we can control the dimension of the arrays in Python.
c
c       Inputs:
Cf2py   integer check(0<=j && j<mm),depend(mm) :: j
Cf2py   real *8 dimension(mm),intent(in) :: nodes
Cf2py   real *8 dimension(mm),intent(in) :: wei
Cf2py   real *8 dimension(nn),intent(in) :: x
c
c       Outputs:
Cf2py   real *8 dimension(nn),intent(out),depend(nn) :: phi
c
c       Hidden args:
c       - scratch areas can be auto-generated by python
Cf2py   real *8 dimension(2*mm+2),intent(hide,cache),depend(mm) :: wrk
c       - array sizes can be auto-determined
Cf2py   integer intent(hide),depend(x):: nn=len(x)
Cf2py   integer intent(hide),depend(nodes) :: mm = len(nodes)
c
	implicit real *8 (a-h, o-z)
	real *8 nodes(*),wei(*),x(*),wrk(*),phi(*)
	real *8 sum, one, two, half
c

Some comments on the above:

- The f2py directives should come immediately after the 'subroutine' line and
before the Fortran variable lines. This allows the f2py dimension directives
to override the Fortran var(*) directives.

- If the Fortran code uses var(N) instead of var(*), the f2py directives can
be placed after the Fortran declarations.  This mode is preferred, as there is
less redundancy overall.  The result is much simpler:

	subroutine phipol(j,mm,nodes,wei,nn,x,phi,wrk)
c
	implicit real *8 (a-h, o-z)
	real *8 nodes(mm),wei(mm),x(nn),wrk(2*mm),phi(nn)
	real *8 sum, one, two, half
c
c       The Cf2py lines allow f2py to generate a better Python interface.
c
c       Inputs:
Cf2py   integer check(0<=j && j<mm),depend(mm) :: j
Cf2py   intent(in) :: nodes
Cf2py   intent(in) :: wei
Cf2py   intent(in) :: x
c
c       Outputs:
Cf2py   intent(out) :: phi
c
c       Hidden args:
c       - scratch areas can be auto-generated by python
Cf2py   intent(hide,cache) :: wrk
c       - array sizes can be auto-determined
Cf2py   integer intent(hide),depend(x):: nn=len(x)
Cf2py   integer intent(hide),depend(nodes) :: mm = len(nodes)
c


- Since python can automatically manage memory, it is possible to hide the
need for manually passed 'work' areas.  The C/python wrapper to the underlying
fortran routine will allocate the memory for the needed work areas on the
fly.  This is done by specifying intent(hide,cache).  'hide' tells f2py to
remove the variable from the argument list and 'cache' tells it to
auto-generate it.

In cases where the allocation cost becomes a performance problem, one can
remove the 'hide' part and make it an optional argument.  In this case it will
only be generated if not given.  For this, the line above should be changed
to:

Cf2py   real *8 dimension(2*mm+2),intent(cache),optional,depend(mm) :: wrk

Note that this should only be done after _proving_ that the scratch areas are
causing a performance problem.  The 'cache' directive causes f2py to keep
cached copies of the scratch areas, so no unnecessary mallocs should be
triggered.

- Since f2py relies on Numeric arrays, all dimensions can be determined from
the arrays themselves and it is not necessary to pass them explicitly.


With all this, the resulting f2py-generated docstring becomes:

phipol - Function signature:
  phi = phipol(j,nodes,wei,x)
Required arguments:
  j : input int
  nodes : input rank-1 array('d') with bounds (mm)
  wei : input rank-1 array('d') with bounds (mm)
  x : input rank-1 array('d') with bounds (nn)
Return objects:
  phi : rank-1 array('d') with bounds (nn)


Debugging
---------

For debugging, use the --debug-capi option to f2py.  This causes the extension
modules to print detailed information while in operation.  In distutils, this
must be passed as an option in the f2py_options to the Extension constructor.


Wrapping C codes with f2py
--------------------------

Below is Pearu Peterson's (the f2py author) response to a question about using
f2py to wrap existing C codes.  While SWIG provides similar functionality and
weave is perfect for inlining C, f2py seems to be an incredibly simple and
convenient tool for wrapping C libraries.

Pearu's response follows:

First, one cannot scan C sources with f2py as it lacks C parser.
Therefore, when wrapping C functions you must manually write the
corresponding signature file where the signatures of C functions are
represented by the signatures of equivalent Fortran functions.

For example, consider the following C file:

/* foo.c */
double foo(double *x, int n) {
  int i;
  double r = 0;
  for (i=0;i<n;++i)
    r += x[i];
  return r;
}
/* EOF foo.c */

To wrap the C function foo() with f2py, create the following signature
file bar.pyf:

! -*- F90 -*-
python module bar
  interface
    real*8 function foo(x,n)
      intent(c) foo
      real*8 dimension(n),intent(in) :: x
      integer intent(c,hide),depend(x) :: n = len(x)
    end function foo
  end interface
end python module bar
! EOF bar.pyf

(see usersguide for more info about intent(c)) and run

  f2py -c bar.pyf foo.c

Finally, in Python:

>>> import bar
>>> bar.foo([1,2,3])
6.0

The f2py mailing list archives contain more examples about wrapping C
functions with f2py.


Passing offset arrays to Fortran routines
-----------------------------------------

It is possible to pass offset arrays (like pointers to the middle of other
arrays) by using Numeric's slice notation.

The print_dvec function below simply prints its argument as "print*,'x',x".
We show some examples of how it behaves with both 1 and 2-d arrays:

In [3]: x
Out[3]: array([ 2.8,  3.4,  4.1])

In [4]: tf.print_dvec(x)
 n 3
 x  2.8  3.4  4.1

In [5]: tf.print_dvec ?
Type:           fortran
String Form:    <fortran object at 0x8306fe8>
Namespace:      Currently not defined in user session.
Docstring:
    print_dvec - Function signature:
      print_dvec(x,[n])
    Required arguments:
      x : input rank-1 array('d') with bounds (n)
    Optional arguments:
      n := len(x) input int


In [6]: tf.print_dvec (x[1])
 n 1
 x  3.4

In [7]: tf.print_dvec (x[1:])
 n 2
 x  3.4  4.1

In [8]: A
Out[8]:
array([[ 3.5,  5.6,  8.2],
       [ 2.1,  4.5,  1.2],
       [ 6.3,  3.4,  3.1]])

In [9]: tf.print_dvec(A)
 n 9
 x  3.5  5.6  8.2  2.1  4.5  1.2  6.3  3.4  3.1

In [10]: A
Out[10]:
array([[ 3.5,  5.6,  8.2],
       [ 2.1,  4.5,  1.2],
       [ 6.3,  3.4,  3.1]])

In [11]: tf.print_dvec(A[1:])
 n 6
 x  2.1  4.5  1.2  6.3  3.4  3.1

In [12]: A[1:]
Out[12]:
array([[ 2.1,  4.5,  1.2],
       [ 6.3,  3.4,  3.1]])

In [13]: A[1:,1:]
Out[13]:
array([[ 4.5,  1.2],
       [ 3.4,  3.1]])

In [14]: tf.print_dvec(A[1:,1:])
 n 4
 x  4.5  1.2  3.4  3.1


On matrix ordering and in-memory copies
---------------------------------------

Numeric (which f2py relies on) is C-based, and therefore its arrays are
stored in row-major order.  Fortran stores its arrays in column-major order.
This means that copying issues must be dealt with.  Below we reproduce some
comments from Pearu on this topic given in the f2py mailing list in
June/2002:

> So if I use intent(in,out) and I want to avoid any copying etc. when I
> hand a matrix to a Fortran subroutine then I have to declare the matrix
> with the correct (Fortran) dimensions 
                             ^^^^^^^^^^ - wrong 
Should be 'correct (Fortran) data ordering', dimensions are the same both
in Fortran and Python.

> and typecode in python and then pass the (lazy) transpose of this
> matrix to the wrapper since that checks whether the matrix is
> contiguous after another lazy transpose (the first
> set_transpose_strides in the <<< marked region). Is that correct?

No. To avoid copying, you should create array that has internally Fortran
data ordering. This is achived, for example, by reading/creating your data
in Fortran ordering to Numeric array and then doing Numeric.transpose on
that. Every f2py generated extension module provides also function 

  has_column_major_storage

to check if an array is Fortran contiguous or not. If
has_column_major_storage(arr) returns true then there will be no copying
for the array arr if passed to f2py generated functions (assuming that
the types are proper, of cource).

Also note that copying done by f2py generated interface is carried out in
C on the raw data and therefore it is extremely fast compared to if you
would make a copy in Python, even when using Numeric. Tests with say
1000x1000 matrices show that there is no noticable performance hit when
copying is carried out, in fact, sometimes making a copy may speed up
things a bit -- I was quite surprised about that myself.

So, I think, you should worry about copying only if the sizes of matrices
are really large, say, larger than 5000x5000 and efficient memory usage is
relevant. The time spent for copying is negligible even for large arrays
provided that your computer has plenty of memory (>=256MB).


Distutils
---------

Below is an example setup.py file which generates a Python extension module
from Fortran90 sources and a .pyf interface file generated by f2py and later
fine tuned.

>---------------------------------------------------------------------------<
#!/usr/bin/env python
"""Setup script for f2py-processed, Fortran based extension modules.

A typical call is:

% ./setup.py install --home=~/usr

This will build and install the generated modules in ~/usr/lib/python.

If called with no args, the script defaults to the above call form (it
automatically adds the 'install --home=~/usr' options)."""

# Global variables for this extension:
name         = "tree_dim3"  # name of the generated python extension (.so)
description  = "F2PY-wrapped MultiWavelet Tree decomposition/reconstruction"
author       = "Fast Algorithms Group - CU Boulder"
author_email = "fperez@colorado.edu"

# Necessary sources, _including_ the .pyf interface file
sources = """
ctree3dwp.f90 fv2cf3d.f90 evlchildp3d.f90 fvalsbox3d.f90
tree_dim3.pyf
""".split()

# Additional libraries required by our extension modules
libraries = ['treetoolbox','m','io','test']

# Additional directories for libraries (besides the compiler's defaults)
library_dirs = ["~/usr/lib"]

# Set to true (1) to turn on Fortran/C API debugging (very verbose)
debug_capi = 0

#***************************************************************************
# Do not modify the code below unless you know what you are doing.

# Required modules
import sys
from os.path import expanduser,expandvars
from scipy_distutils.core import setup,Extension

expand_sh = lambda path: expanduser(expandvars(path))

# Modify default arguments (if none are supplied) to install in ~/usr
if len(sys.argv)==1:
    default_args = 'install --home=~/usr'
    print '*** Adding default arguments to setup:',default_args
    sys.argv += default_args.split()  # it must be a list

# Additional options specific to f2py:
f2py_options = []
if debug_capi:
    f2py_options.append('--debug-capi')

# Define the extension module(s)
extension = Extension(name = name,
                      sources = sources,
                      libraries = libraries,
                      library_dirs=map(expand_sh,library_dirs),
                      f2py_options = f2py_options,
                      )

# Call the actual building/installation routine, in usual distutils form.
setup(name = name,
      description  = description,
      author       = author,
      author_email = author_email,
      ext_modules  = [extension],
      )
>---------------------------------------------------------------------------<


Makefile
--------

Below is an example of a makefile for f2py.  In the long term, it's probably
better to rely on distutils instead.  Kept here for reference purposes.

>---------------------------------------------------------------------------<
# Building python extension modules with f2py
#
# Note that this has been superceded by using setup.py, which relies 
# on distutils and gets all the proper python things done.
#
F2PY_BUILDDIR = .
F2PY_FLAGS = --fcompiler=Gnu
F2PY = f2py -c --build-dir $(F2PY_BUILDDIR) $(F2PY_FLAGS)

# Additional libraries needed for the f2py extension modules
F2PY_LIBS = -lio

gauss.so: $(SOURCES)
	$(F2PY) $^ $(F2PY_LIBS) -m $(basename $@)
	mv $@ $(PY_DIRLIB)

phipol.so: phipol.f legepols.f
	$(F2PY) $^ $(F2PY_LIBS) -m $(basename $@)
	mv $@ $(PY_DIRLIB)

f2py_clean:
	rm -f $(F2PY_BUILDDIR)/*.o $(F2PY_BUILDDIR)/*module.c \
	$(F2PY_BUILDDIR)/*-f2pywrappers.f *~
>---------------------------------------------------------------------------<


ToDo
----

These are things I either need to find out about or which aren't yet possible
with f2py.

- Adding documentation to a variable's signature entry in the auto-generated
docstring. Also adding a global note to the docstring about the function
itself.  (From Pearu's response, this isn't implemented in f2py yet).

- Overriding the auto-generated compiler flags?

- Multiple entry points?

- Benchmark the cost of:

  * having checks (size checks esp.)
  * building work areas hidden from the user
  * building return values as python objects (Numeric arrays) instead of
working in-place.

--------------060300010007070901050903--