The `Python` web site is the best place to start. Python is easy to use and fast for developing projects. I know people have used this network package and it came highly recommended in a
blog about Python packages

Some places to look

- Python documentation has most links.
- For Imperial College users, try the physics first year lab pages (but access is ridiculously limited, you may have to ask for access).
- Tutorialspoint has some nice summaries e.g. on string formatting.

- The preferred way seems now to be to use the
`.format`method of strings.print ' Integer {0:d} Integer {0:5d} Float {1:.3g} Float {1:9.3g} String {2:s}'.format(239,12.356789,'abc')

numberlines=0 try: print "--- opening edgelist file ",fullfilename f=open(fullfilename,"w") for edge in edgelist: f.write('{0}\t{1}\n'.format(edge[0],edge[1])) numberlines=numberlines+1 print "--- finished writing edgelist file ",fullfilename,numberlines," lines written" except: print "*** Failed to finish edgelist file ",fullfilename,numberlines," lines written" f.close()

Writing a line of numbers and strings to a file use the C-like format formatprint('%i \t %i \t %i \t %f \t %f \t %f \t %f \n'%(index,x,y,xcircle,ycircle,xrand,yrand)) with open(filename,"w") as f: f.write('%i \t %i \t %i \t %f \t %f \t %f \t %f \n'%(index,x,y,xcircle,ycircle,xrand,yrand))

- A cheap way to deal with unicode and other non-ascii strings but to remain in an ascii environment is to use
ustring=u"unicode mess" ustring.encode('ascii','replace')

Note the u in front of the quote to indicate a unicode string. -
For regular expressions use the raw string option of python (r in front of double quotes) and the re package
import re wikilinkregex=r"\[\[.*?[\]\|]" text=" abc [[link1|text]] xyz [[link2]]" re.findall(wikilinkregex,text)

- To deal with file names try the
`os.path` - To find the version of a module at runtime try this
import pkg_resources pkg_resources.get_distribution("moduleofinterest").version

- When a python file is called (even if imported?), all the code will be executed, that is methods defined and code outside a method will be executed in order. To run a
`main`method use the followingif __name__ == "__main__": main()

- There is a standard way to document python code. Look up docstring and in particular try the Docstring conventions are in PEP-257.
- When you show a plot in
`matplotlib`python will normally stop executing until you kill the display. To change this behaviour you need to be in interactive mode so you can sayimport matplotlib.pyplot as plt plt.ion() # this turns on interactive mode

then when you do`plt.show()`it will not block the rest of the execution. -
Useful ways to change setting for file locations etc
import getpass username = getpass.getuser() import socket hostname = socket.gethostname()

Google's Python Class has some useful tips on how to set up editors. They suggest that you want the tab key to insert two spaces rather than a tab character. They also suggest that files are saved with the Unix end-of-line convention (otherwise an "Unknown option: -" may be produced), the file may have the wrong line-ending. For Notepad++ they suggest the following change

Tabs: Settings > Preferences > Edit Components > Tab settings, and Settings > Preferences > MISC for auto-indent. Line endings: Format > Convert, set to Unix

I use IDLE which is part of Python. There is a page on "IDLE by Anne Dawson which helped me to get going. Note that the command history is obtained using `alt+P`. Its useful when developing code as it is interactive. I often try things out interactively (simple examples to check command syntax or to see if a command behaves as I think it should) then write then into my full programme in eclipse.

To change directory use the following

>>> import os >>> os.getcwd() '/home/user' >>> os.chdir("/tmp/") >>> os.getcwd() '/tmp'

For most work I actually prefer to use a proper IDE (integrated development environment). For python I use the Eclipse IDE (also useful for java) with the pydev package added through the Eclipse system of updates, try this help on eclipse and pydev. I have seen many recommendations for this.

Like all IDE there is a learning curve and a large amount of non-python overhead to learn, always similar but different from IDE to IDE. It is not worth it for the odd project, and not necessary for larger projects. However I do think for any long term work it will repay the investment handsomely. I found the tutorial by Lars Vogel on using eclipse with python a good place to start.

(31/07/13) In fact I tried adding a new project, picked python and eclipse prompted me to go to another window to install pydev.

(27/07/2013) Changed access to python directory to see if it helped adding libraries. Perhaps best to run as administrator when running executable installation routines files.

(25/6/14) I can not get Eclipse to run with the Enthought Canopy python distribution. I can set the path to the correct python location but then Ecilpse can not find the libraries. I tried to set up a parallel cPython installation (64bit Windows version) to use with Eclipse but then some of the Windows install packages for Numpy and so forth only find the Canopy distribution and won't let you change this. This seems to be because the packages are only compiled for 32 bit Windows due to compiler licence restrictions. Currently stuck on this. Its Canopy or Eclipse and I want to stick to an IDE I can use for other things too. Now trying the WinPython distribution with Eclipse.

WinPython and Eclipse. I found some pretty good instructions on how to link Eclipse and pydev to WinPython. I placed my WinPython in `C:\WinPython` so I needed to point pydev to `C:\WinPython\python-2.7.6.amd64\python.exe`. I'm pretty sure the autoconfig will work and that the key here is to **restart eclipse** after making these changes. Perhaps that is true for other reconfigurations. I didn't mess around with grammars and setting explicit version of Python though that could be useful if I need a standard 32bit cPython installation for something later.

I have been running into trouble with different versions of packages being accessed and it is hard to see from python what version of a package your system has found. It is much easier to switch to the Enthought Python package which installs the scientific packages by default and maintains them to the latest versions.

The best way is to use the `easy_install` command mentioned on many web pages. What they fail to tell you is that this is part of another package that you have to install first. So first try to install setuptools from the Python package index. This worked easily enough. Note that you download a script for a python programme so you need python installed to run it (either double click on the file `ez_setup.py` or run it via a command window `python ez_setup.py`). Before you use `easy_install` you may have to `import setuptools` inside the python.

(27/07/2013) I could not get `easy_install` to work inside python and could not find any Windows executables to download.
However running the `ez_setup.py` python file creates an `exe` file in the `Scripts` subdirectory of the main python directory. Then I ran this from a Windows command line to install packages. For instance I used

easy_install networkxfollowed by

easy_install --upgrade networkxfrom the command line to install networkx (for some reason it first installed an old version).

Alternatively, most major packages come with a Windows installer - just make sure you pick the one for the correct version of Python (its obvious from the file names). The main Python package worked fine and I have a version 2.7 with the IDLE GUI interface. Just make sure that `c:\Python27` and its `Scripts` subdirectory is on the PATH environment variable.

Finally some suggest using pip. However again this is not a python package or command but something set up outside python. You get errors containing lines such as `NameError: name 'pip' is not defined` if you try this. I accessed this from the command window in Windows and this was in `C:\python27\Scripts`. So I changed to this directory and then typed `pip install networkx`. In case you have already been messing around then you might need to upgrade so use `pip install networkx --upgrade`.

There is a useful blog about Python packages.
Scientific libraries of interest include (watch the order you add them, one may depend on the other so best use `easy_install`).

`numpy`numerical arrays and often used in other packages. I found that there was an easy Windows executable but I would probably now use`easy_install`. Numpy documentation is alright but I have found some things hard to follow. My Scipy tips are below.`scipy`general package for maths, science and engineering. I found that there was an easy Windows executable but I would probably now use`easy_install`. Numpy and Scipy documentation is OK but I have found some things hard to follow so see my Scipy tips below.`Matplotlib`is a 2D plotting library producing publication quality figures in many formats.- NetworkX

Describes itself as "High productivity software for complex networks". Its a free package for Python. Lots of information on the web. If you have a problem google and you will probably find something relevant.

I installed by downloading the source code, following instructions on the web site. The`easy_install`suggested on the web site would be easier and better.

The problem I found is that there is no immediate visualisation here. You seem to have to install other packages to link through to Graphviz drawing programmes or matlab graph drawing. `powerlaw`has a python package which needs several of the packages above plus`mpmath`.
`pydot`.`Pyclutser``RPy``scikit-learn`an easy to use machine learning library recommended in Kaggle.

The `loadtxt` command is an easy way to read in text data

import numpy as np tweets, authors = np.loadtxt('c:/data/textdata.txt', float, skiprows=1, usecols=(0,1), unpack=True)This will give two arrays, tweets and authors, with tweets being the first column (numbered 0 by loadtxt) and authors being the second column (numbered 1 internally). Both will be of the string data type. The skiprows option defaults to 0 but if there is a row of column headings then you need so skiprows=1. If you have missing entries then the more general routine is needed

import numpy as np missingCode = 'missing' jid1, rating1 = np.genfromtxt('c:/data/textdata.txt', np.str_, skiprows=1, usecols=(0,1), unpack=True, missing=missingCode, invalid_raise=False)

Statistical distributions are a bit odd at first so read the introduction on Scipy statistical distributions. All statistical distributions have a `name` (such as `uniform`, `norm` and `lognorm`)and various functions giving the pdf, cdf or a sequence of random numbers drawn from the distribution, e.g. in Scipy these would be called `name.pdf`, `name.cdf` and` name.rvs`. You need to import them to use them

>>>from scipy import stats >>>from scipy.stats import name

All statistical functions take two special arguments, the shifting (`loc`) and scaling (`scale`) parameter. Suppose our random variable is `x` and `name.pdf` gives the function p(x), that is with probability p(x)dx we will get a value between x and x+dx. Then the standard python command to `get p(x)` value is `name.pdf(x)` which is short hand for `name.pdf(x,loc=0,scale=1)`. Now suppose `p(x)` has a fixed mean of *mu* and a standard deviation of *sigma* and the python `name` routines have no additional parameters to change this. What you need to do is use the shift and scale to get the same shape distribution but different width. That is we need to use is something like

name.pdf(X,loc=L,scale=S)

*Warning* this is *not* the same as `name.pdf((X-L)/S,loc=0,scale=1)`.
When working with the pdf of a continuous distribution there is a subtle difference as the two represent distributions defined in terms of different variables. In the second case we have the original distribution so it gives us a density p(x) evaluated at `x=(X-L)/S`. Crucially this means we have a probability finding a value between X and X+dx of `p(x)dx` but note that `dx` is exactly the same as in the standard `loc=0`, `scale=1` function. In the first case `name.pdf(X,loc=L,scale=S)` is a distribution `q(y)` defined in terms of a new variable `y=(x-L)/S`. The important difference is that to define a distribution defined in terms of a new variable yet to maintain the proabailities are the same in the same interval we must demand that `p(x)dx=q(y)dy`. That means the pdf given by the form `name.pdf(X,loc=L,scale=S)` is the function `q(X)` which automatically includes a factor of `dy/dx` to convert from the form `p(x)` in one variable to a new form in a new variable. This form `name.pdf(X,loc=L,scale=S)` is probably the version you want. The first form, `name.pdf((X-L)/S,loc=0,scale=1)` gives the original standard pdf function `p` evaluated at a different position. Note that for other distribution functions, such as the cdf and rvs, there can be simple equality of the two forms but best to stick to the `name.pdf(X,loc=L,scale=S)` form in all cases.

One way to see this is to realise that the mean and standard deviation of the second form, `q(y) = name.pdf(y,loc=L,scale=S)` are `( mu-L)` and

Another way to see this is to look at the uniform distribution. The standard Scipy form is `p(x)=1` if `0 < x < 1` and is zero otherwise. On the other hand if we define `y=(x-L)/S`, then we find that the associated pdf in the y variable is `q(x)=1/S=dy/dx` if `L < x < (1+L)` and is zero otherwise. Indeed we find that

>>>from scipy import stats >>>from scipy.stats import uniform >>>uniform.pdf(0.5) 1.0 >>>uniform.pdf(0.5,loc=0.25,scale=2) 0.5 >>>uniform.pdf(0.125) 1.0The last two are evaluated at the same value (0.5-0.25)/s=0.125 but as we have said represent distributions in different spaces.

In general the distributions in python represent a family of distributions and accessing different members of the family is done by what are called *shape* arguments. This can be confusing as you might think that `loc` and `scale` change the shape too, certainly plots of `name.pdf(x,loc=0,scale=1)` and `name.pdf(x,loc=L,scale=S)` will not be the same. However Scipy wants to regard these as straightforward changes of variable not as fundamentally different shapes so you have to get used to this division of parameters. Take the gamma distribution which on Wikipedia is defined in different ways terms of different sets of parameters. One parameter is always one of Scipy's shape parameters. The other parameter discussed in the definitions in the literature is achieved by Scipy's scale parameter (and is called scale in many cases for the gamma distribution). However in all cases, this gamma distribution is defined as zero for `x<0`. So should one need a gamma distribution shape but starting at `x=Z` you would need to call `gamma.pdf(x-Z,n)=gamma.pdf(x,n,loc=1,scale=1)` where `n` is the sole shape parameter needed by the Scipy gamma distribution routine. What is this shape parameter doing? You have to look at the documentation where it is written out. In general if there is more than one shape parameter as there can be for more complicated distributions, then this second parameter needs to be an array like object carrying the different values needed to specify the shape.

**Example: lognormal**
The manual gives this as a function of one variable x and one shape value, `s`, where

lognorm.pdf(x, s) = 1 / (s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)so in general we have that

lognorm.pdf(X, sigma, loc=L, scale=S) = 1 / (sigma*(X-L)*sqrt(2*pi)) * exp(-1/2*(log((X-L)/S)/sigma)**2)

Note the following

- Second form is in terms of the variable
`y=(X-L)/S)`which explains why there is no factor of S with the`(X-L)`outside the exponential, it is cancelled by the`dy/dx`factor. - Looking at the first form, the standard Scipy lognormal
`lognorm.pdf(x, s)`, it looks like a normal distribution in terms of a variable`ln(x)`with mean of m=zero and standard deviation of s. In fact lognormals are usually defined with a mu like parameter m as1 / (s*x*sqrt(2*pi)) * exp(-1/2*( (log(x)-m)/s)**2)

That is true for some variable z=ln(x). However the mean and standard deviation for the x variable are not these parameters. For instance the mean of x is`exp(m+(s*s/2))`not m (zero for the Scipy form) and the variance is`[exp(s*s)-1]exp(2m+(s*s))`not s*s. -
From this we see that the general form of the lognormal in Scipy is obtained by setting the loc parameter to zero and the Scale parametr equal to log(mu) where again mu is not the mean of the lognormal (its the mean of log(x)) but it is the standard parameter used when describing lognormals
1 / (s*x*sqrt(2*pi)) * exp(-1/2*( (log(x)-m)/s)**2) = lognorm.pdf(x, s, loc=0, scale=exp(m) ) = 1 / (s*x*sqrt(2*pi)) * exp(-1/2*(log(X/scale)/s)**2)

>>> from scipy.stats import lognorm >>> lognorm.mean(1,loc=0,scale=1) 1.6487212707001282 >>> lognorm.std(1,loc=0,scale=1) 2.1611974158950877 >>> from numpy import exp >>> exp(0.5) 1.6487212707001282 >>> from numpy import sqrt >>> sqrt((exp(1)-1)*exp(1)) 2.1611974158950877

I like to save a plot in several file formats so I don't need to rerun the code: `pdf` is my current vector format of choice for LaTeX documents, `svg` so I can edit the file in a vector package like `Inkscape`, `jpg` form for presentations and quick discussions, and so forth. Each is specified in `matplotlib` by the standard extension used for that format. So this routine you pass a list of the strings containing the desired file types, something I usually set as a global at the start of my programming. The routine I use is below but you may want to adapt it so you can pass or set other parameters e.g. dpi for bitmap formats.

def saveFigure(plt,filenameroot,extlist=['pdf'],messageString='Plot'): """Save figures as files in different formats Inputs plt - a plot filenameroot - the full name of the file to be used but without the extension extlist=['pdf'] - a list of strings, each string is the extension of an allowed graphics type messageString='Plot' - string to print before printing name of file being created. Note empty string will produce no message at all. Output For every string, ext, in the extlist, it will produce the plot from plt in the format specified by the extension, ext, in the file in filenameroot.ext """ for ext in extlist: if filenameroot.endswith('.'): plotfilename=filenameroot+ext else: plotfilename=filenameroot+'.'+ext if len(messageString)>0: print messageString+' file '+plotfilename plt.savefig(plotfilename)

The following is an outline of how I use this.

import matplotlib.pyplot as plt extlist=['pdf','jpg'] screenOn=False if screenOn: print '--- plots shown on screen' else: print '--- plots not shown on screen' # http://matplotlib.org/examples/color/named_colors.html colourlist = ['red', 'green', 'blue', 'cyan', 'magenta', 'brown', 'black', 'pink', 'purple', 'yellow'] (stuff) fig, ax = plt.subplots(figsize=(15, 15)) # set size ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling ax.plot(X, Y, linestyle='', mec='none',c=colourList, ms=sizeList) ax.set_aspect('auto') if len(extlist)>0: saveFigure(plt,fullfilenameroot,extlist) if screenPlotOn: plt.show()