Pages

Affichage des articles dont le libellé est python. Afficher tous les articles
Affichage des articles dont le libellé est python. Afficher tous les articles

mardi 17 juin 2014

Mac Port, Homebrew (and easy_install)

Mac Port:
Recently I tried to installed the HDF and netcdf software on Mac 10.9.2.
1) After installing the dependences HDF5-1.8 manually, I then tried to install the H5PY manually. But I got a problem which seems related to the utilisation of Clang (with gcc/cc compilers) and some long arguments of compilation.

2) I decide to use the Mac Port.

3) I installed the X-code command line (X-code already installed)

3) then installed Mac Port following the recommendation of port web site

4) several commands of Port are useful:
port search: give the list of softwares available for installation
port info software_full_name_in_mac_repertory: gives useful info about the software
port deps software_full_name_in_mac_repertory: gives the dependencies needed for this software
sudo port install software_full_name_in_mac_repertory: install the software
port contents software_full_name_in_mac_repertory: gives where the software is installed

Note:
-- Mac Port installs the softwares in /opt/local/bin , /opt/local/sbin which should be included in the PATH of bash file
-- Mac Port installs the Python softwares in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages, which should be included in the PYTHONPATH of bash file
-- to check where the distribute package is, use the following python command
import site; site.getsitepackages()
5) the H5PY is installed with success by using Mac Port !!


--------------
easy_install:
It is a installation tool of for installing Python in linux and mac.  The path where the softwares can be installed must be defined in the file .pydistutils.cfg. 

For example, I used the following path:
[install]
install_lib = /Users/username/work/local/lib/python2.7/site-packages
#~/Library/Python/$py_version_short/site-packages

install_scripts = ~/work/local/bin


----
homebrew:
it is similar to Mac Port and it works well for installing and managing softwares on Mac OSX. This can be installed also easily once the X-code and X-code command line tool are installed. The default path of the installation is in /usr/local, which is different from Mac Port. 






mardi 4 février 2014

deal with very small value in python


sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.2204460492503131e-16, radix=2, rounds=1)

1+sys.float_info.epsilon  -----> 1.0000000000000002

1+sys.float_info.epsilon/10  -----> 1.0


To use and compute with very small values in python:

1) use the decimal module:
The defaut precision of decimal is 28, and we can change it. For example to define the Plank constant, 6.62606957e-34, we can do:
import decimal
decimal.getcontext().prec=46
h=decimal.Decimal('6.62606957e-34')
wavelength = np.arange(0.0001, 100)

frequency = c/wavelength
Iv = 5
tb = []
for v in frequency:
    temp_v = decimal.Decimal(str(v))
    temp_tb = h*temp_v/kb/(1+2*h*temp_v**3/Iv/c**2)
    #temp_tb_float = float(temp_tb)
    tb.extend([temp_tb])




Attention:
i) if use decimal, then we can not compute with another number which is defined in numpy.float64 for example. we have to convert them to the same format.
ii) the results of decimal is a string, and we can transfer to float directly.


2) use the C99 concept of nextafter to calculate the value appropriate epsilon. For Python, either use numpy or the Decimal class to calculate nextafter.



jeudi 12 septembre 2013

Plot one curve in different colors using Python


In order to plot a curve in different colors and the colors vary with some parameters, it seems that the best way is to use the LineCollection module of python.

In my problem I have an array residuals_resume_utdates_geoindex: the first column is dates (x), the second are residuals (y), and the third is the parameter to condition the residuals. I want to plot x, y in different colors when the values in the third column vary. In fact the values are between 0 and 5 (integer) in the third column. If the value in this column is 0, then the corresponding residuals should be plotted in one color; if the value is 1, a different color should be used for the corresponding residuals.

Here is my code:


import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib.colors import ListedColormap, BoundaryNorm


ydata = residuals_resume_utdates_geoindex[:,1]
xdata = residuals_resume_utdates_geoindex[:,0]
# put x and y together as an array of N*1*2, where N is the number of data in ydata
points = np.array([xdata, ydata]).T.reshape(-1, 1, 2)
# create a line segment array in order to color them individually
# the segment array has the shape (N-1)*2*2 for line collection
# [ [[x0,y0],[x1,y1]],[[x1,y1],[x2,y2]],[].....]
segments = np.concatenate([points[:-1], points[1:]], axis=1)


# plot y with different color for each 6 indices (5,4,3,2,1,0)
# if 0: black, 1: yello, 2 cyron, 3 megan, 4 green, 5 red, biger than 5: blue
cmap = ListedColormap(['k','y','c','m','g', 'r', 'b'])
norm = BoundaryNorm([-1, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6], cmap.N)
lc = LineCollection(segments, cmap=cmap, norm=norm)
# define the color for each segment which depends on the value of the third column
color_seg = residuals_resume_utdates_geoindex[:,-1]
# set the linecollection property: color, linestyle
lc.set_array(color_seg)
lc.set_linewidth(3)
lc.set_linestyle('dashdot')

#plot the figure with linecollection
fig =  plt.figure()
plt.gca().add_collection(lc)
plt.xlim(np.min(xdata), np.max(xdata))
plt.ylim(0, 1)

lundi 17 décembre 2012

display colorbar with python


I used pcolor, map.contourf, or map.imshow for drawing the 2_D arrray in colors. In order to show the color bar, I defined a small function cbar() as below. If I need to display the color bar, I use the "cbar()" directly.


def cbar():
     """
     Displays a colorbar.
     """
     if len(gcf().axes) > 1:
         gcf().axes[1].clear()
         cax = gcf().axes[1]
     else:
         # left, up, width, heigth, normalized in (0,1) unit
         #cax = pylab.axes([0.875, 0.1, 0.05, 0.75])
         cax = pylab.axes([0.92, 0.1, 0.025, 0.75])

     pylab.colorbar(cax = cax)

mardi 26 juin 2012

use variables in the os.system in python: combine linux and python

In python we can use os.system to call linux command. For example,
cmd = 'echo Good'
os.system(cmd)

it will print "Good" on the terminal.

If we would like to use variables (already defined in python code) with a linux command, then we need use %variable.
for example, we would like to call /home/username/bin/xyz2flh/xyzflh to convert the positions from XYZ to latitude/longitude/height. This command asks for 4 inputs: the option of reference Ellipsoide (with Enter key), then three numbers for X position, Y position and Z position. The option of reference ellisoide is 2.

If in python code, we have defined the positions in sx, sy and sz, and then define a cmd to be used by os.system:

import os, sys
sx = -5246416.97098
sy= -3077275.36744 
sz = -1913808.08861


cmd = ('printf "%s\n%s %s %s" | /home/username/bin/xyz2flh/xyzflh' %(2, sx, sy, sz))

os.system(cmd)

The results of this python code would be like:

Ellipsoide de reference:
     WGS84 (ae = 6378137.00 ; 1/f = 298.257222) : tapez 1
  ITRF2005 (ae = 6378136.46 ; 1/f = 298.257650) : tapez 2
     autre..................................... : tapez 3
  entrez x y z en metres ou latitude(deg) longitude(deg) hauteur(m)
  latitude (deg):   -17.5767433707736  
  longitude (deg):    210.393669153517  
  hauteur (m):    96.8535648984835  


Attention: the format of the strings need be correct, separated with Enter key and spaces.



jeudi 5 avril 2012

basemap for plot contour map in python

1) the page with extensive explanations of basemap is:
http://matplotlib.sourceforge.net/basemap/doc/html/api/basemap_api.html

2) a page with useful examples:
http://matplotlib.github.com/basemap/users/examples.html

3) In the python code, these lines are needed to call the basemap

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

jeudi 22 mars 2012

change the ticks labels in python

temp_figure = figure()
ax1 = temp_figure.gca()
ax1.xaxis.set_major_locator(MaxNLocator(len(data)))


for templabel in ax1.xaxis.get_ticklabels():
        templabel.set_fontsize(8)
        templabel.set_color('red')
        templabel.set_rotation(45)

for line in  ax1.xaxis.get_ticklines():  
        line.set_color('red')     
        line.set_markersize(25)
        line.set_markeredgewidth(3)

NOTE: Note that the MaxNLocator sets the maximum number of *intervals* so the max number of ticks will
be the max number of intervals plus one. 

vendredi 30 septembre 2011

data type in python

the built-in data type in python are: int, long, float, complex.
1) to find the precision of floating point:
import sys
sys.float_info.dig
this will show a number of 15, which means that the 15 decimals are the max precision from python; beyond this, the precision is not good.

http://docs.python.org/library/stdtypes.html for detailed info

vendredi 15 octobre 2010

An example of procede large volume of text data file

I have some data saved in text format and they are in the dimension of 300*7*100*150, which represents time*components*latitudes*longitudes. I want to find the data point in the domain which is at a specific time and location.

Luckily I can find find the index of the point with other calculations, ie., time_ind, component_ind, latitude_ind and longitude_ind. A question is how to find the value of this point effeciently. The old way was to read in the dataset, then find the value. Since we have several large arrays like this, it will take huge memory to proceed with this method.

The more effecient way is (for example in Python):
--First find out the index of data point in the domain, which is time_ind*components*latitudes*longitudes + component_ind*latitudes*longitudes + latitude_ind*longitudes + longitude_ind

--f = open('file.txt','r') ---create the file object
temp = f.next() -----start to read the next data
--set up a loop of reading the data file with next() method with 'count' increasing. If the 'count' value equals to the index of data point, loop stop.
--use the value of temp as the output result.

In this method the machine each time will read in a small portion of text data (the length is decided by the text data format), and use very small memory; the method f.next() is also fast. When we look at the 'top' results, only 1% of memory is used, compared to possibly 30% used with the first method. The time cost is much reduced as well with a test using the 'time' command.

A tip: given a large range of data used in a loop, the command 'xrange' uses the less memory than 'range', and is slightly faster than 'range'.

mardi 24 août 2010

using clf() to reduce the memory, matplotlib

I worked on processing some large volumes of files with python and used a for loop for processing (reading, calculating and plotting) at each iteration, and then save the figures to my computer as well. IPython is used when I run the code, and matplotlib is in the interative mode and displays no the figures (ie., without show() in the code). The backend default is GTKAgg.

A simple problem that I ran into is: the memory of the computer was used more and more as the iteration number increased. This can bring the risk of crashing the system if there are a lot of files to be processed. Using a profiler from python, it was shown that the built-in method "write_png" took the longest portion of time than other functions. I realised that I did not clear the figures in each iteration and this causes the memory problem ! So I add a line at the end of an iteration ie., clf(), in order to clear all figures of each iteration (note: clf() is better to be used here than close(), because we do not use show() to display the figures and there is no reason to close a figure which is not displayed).

This indeed solve the memory problem and the code can be run quickly, and in the end of run we can exit ipython gracefully. The same code can be run without problem using python as well.

vendredi 23 juillet 2010

update python dictionary in a for loop

I have a piece of text data, indexed with blocks, ie, organised by [block time obs1 obs2]
There are 50 blocks, and each block has 100 lines of data. So the text data is in the size of 5000 * 4. I would like to select some blocks of data and etablish a dictionary for these selected data, ie., the selected blocks indices are the keys, and for each key the arrays [time obs1, obs2] are the values.

I first defined the data array: data = array([5000*4]) and the selected keys,for example, allkeys = ['0', '1', '2']

Then I try to create a dictionary for the selected data:
data_blks={}
for i in range(len(allkeys)):
temp_key = allkeys[i]
data_blks[temp_key] = data[where(data[:,1] == int(temp_key))]

This can produce the key and values correctly in most cases. But it does not work for some cases, for example, if the allkeys are defined as ['40', '41']. In this case, the output data_blks has the keys ['41','40'] with the order not correct ! I tested the output of each line and it seems the line
data_blks[temp_key] = data[where(data[:,1] == int(temp_key))]
twisted the key order. But I find no reason why it would do so ! (it seems to me the dictionary in python is still under development and need improvements in its flexibility !)

To solve this problem, I need develop a method that I can insert the key and its values properly. Before I do so, I found a useful tool http://www.voidspace.org.uk/python/odict.html#downloading. It has a method call "insert", which allows me to insert the index, the key and its values ine one simple line: data_blks.insert(index, key, values).
I imported this module into my code and redefined the dictionary as
data_blks = OrderedDict([])
Then in the for loop, i just insert the key and values for each index, and it works beautifully and the previous problem is now gone !

I am still wondering why the previous (classic) method does not work properly ....

jeudi 22 juillet 2010

matplotlib show() freezes the python shell

I use python, pylab and matplotlib for plottings. One thing disturbing is : when I call ipython and then run some plot code, eg;

x=[1,2,3]
plot(x)
show()

The figure is shown on the screen, but the python shell is freezed that I can not type anything on the python command window. I looked for raisons and found it a common GUI issues with the threads.

So I add this in my bashrc script:
alias ipython='ipython -pylab'

The problem is fixed !

jeudi 15 juillet 2010

C code with Python

Notes if we have C source code that we want to execute from Python
we can either::
a) Build a library and create a Python wrapper so it looks like a module, eg., swig
b) Build an executable file and call that using subprocess/popen etc
http://docs.python.org/library/subprocess.html#module-subprocess
c) If its one of the common C libraries or a Windows DLL we can
probably use an existing framework to call it. For example ctypes
will access a lot of stuff.

lundi 24 mai 2010

some notes about python functions and class

I had some python codes and had put them together in one file as a module. It includes a lot of functions with each function a list of input variables. The whole code is not very clean with a long list of variables in the functions.

I decide to rewrite the code. What I did is to define a new class, and the class objects are defined ( in 'def __init__') the same as the commonly used input variables.

Then in each function of the class, I use an instance of the class as the input variable. This really makes the code cleaner !

Attention1:
For a defined class, it can include functions. But the function must be either a method which defines class object values, or a method which returns some calculation results using the class objects. Iis not right to return a new varable which is not defined in the class initialization.

If a function need to return new variable, we need define the function outside the class.

vendredi 21 mai 2010

building functions with *arg and **kargs

I was reading some stuff about *args and **kwargs in function definitions. They are used to pass a variable number of arguments to a function. The *args is used to pass a non-keyworded, variable-length argument list, and the double asterisk form is used to pass a keyworded, variable-length argument list. Here is an example of how to use the non-keyworded form. This example passes one formal (positional) argument, and two more variable length arguments.

def test_args(farg, *args):
print "farg:", farg
for arg in args:
print "other arg:", arg

test_args(1, "two", 3)

Results:
farg: 1
other arg: two
other arg: 3


Another example of how to use the keyworded form. Again, one formal argument and two keyworded variable arguments are passed.

def test_kwargs(farg, **kwargs):
print "farg:", farg
for key in kwargs:
print "another keyword arg: %s: %s" % (key, kwargs[key])

test_kwargs(farg=1, myarg2="two", myarg3=3)

Results:

farg: 1
another keyword arg: myarg2: two
another keyword arg: myarg3: 3

This is helpful to me to build the python functions more clearly.

empty new class

Sometimes it is useful to have a data type bundling together a few named data items. An empty class definition will do nicely:

#define an empty class
class Employee:
pass

#create an instance of the class
john = Employee() # Create an empty employee record

# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

This way perhaps makes the code more extendable ?

Inheritance in python classes

Today I learned a few more things about functions, classes and modules in python.
Here is some notes about the inheritance in python classes. Good tutorial link is
http://docs.python.org/tutorial/classes.html.

1) simple inheritance
class DerivedClassName(BaseClassName):
statement-1
statement-N

or,
class DerivedClassName(modname.BaseClassName):

For example,
class cartesian:
def __init__(self, x=0, y=0):
self.x, self.y = x, y
def distanceToOrigin(self):
return floor(sqrt(self.x**2 + self.y**2))
class manhattan:
def __init__(self, x=0, y=0):
self.x, self.y = x, y
def distanceToOrigin(self):
return self.x + self.y

can be modified as:
class point:
def __init__(self, x=0, y=0):
self.x, self.y = x, y

class cartesian(point):
def distanceToOrigin(self):
return floor(sqrt(self.x**2 + self.y**2))
class manhattan(point):
def distanceToOrigin(self):
return self.x + self.y


2) multiple inheritance
Python supports a limited form of multiple inheritance as well. A class definition with multiple base classes looks like this:
class DerivedClassName(Base1, Base2, Base3):
statement-1
statement-N

lundi 10 mai 2010

python module and package

Recently I have trying to develop some python modules and put them together as a package. This package is aimed to realize 4 purposes:

1) read and process the satellite data.
The satellite data are formatted binary files with huge information in each data file. The satellite observes a lot of parameters, organised by time, locations and orbit. I wrote this module, which contains the classes and functions, to properly read the data according to the data format and organize them in an efficient way.

2) download and process ground radar observations.
The radar observations are well organized data in their website. They have a lot of parameters organized by time, location, altitude, measurement angle, solar condition, geomagnetic condition, experiments etc. This module is to: find the interested parameters according to specific criteria, and download and write the results in formatted text files in my computer.

3) Comparing satellite and radar data:
Because satellite and radar data have complicated data information, a key is first found to make this complicated problem easier, ie., functions to further filter data . It also contains functions with estimation methods, extrapolation methods for different conditions.

4) models
3 models have been used in this study. This part contains basic functions to process the results from each model.

It is also a process of improving my python programming skills and I enjoy it.

plot with several sets of tick values with matplotlib

Sometimes we are interested in plotting one curve but with several sets of tick values. For example, the curve represents observations at different time and locations. How to add two sets of tick values to the figure, one for the time and the other for the locations ? Here is one example using python and matplotlib, if the ticks values are in xaxis.

ax1=subplot(111);
plot(data);
#set the tick position
ax1.xaxis.tick_bottom();
#define the ticks
xticks(tick_range1, xtickval1 ,position=(0,0), fontsize=10)

# Make a second axes overlay ax1 by sharing the yaxis
ax2=twiny();
# put the tickvalues in the bottom
ax2.xaxis.tick_bottom()
#define the new ticks. The position of the ticks values is now different from that in ax1.
#this is to put the second set of tick values lower than the first set.
xticks(tick_range2, xtickval2 ,position=(0,-0.03), fontsize=10)

mercredi 5 mai 2010

read binary data files in python

Sometimes data product are stored in binary format. Python has a great tool to read binary files properly, ie., "struct" module,. This module should be imported before we start. This module is designed to interpret the strings as packed binary.

For examples,

from struct import *
# open the file in the binary format
f= open(filename,'rb')
#to find the total size of your file,
file_size = os.path.getsize(filename)

#We can start read the file byte by byte: to read the first 8 bytes which stors the datetime information
buf = f.read(8)
#convert it to the readable data by using the "unpack" method
date = unpack('>7H', date)
#read the next 2 bytes for orbit number
orb = f.read(2)
orb = unpack('>H',orb)

f.close()

We can use f.tell, f.seek for the reading positions.
Yes, we need to know a great deal of the details about data format before the coding !

For finding the format used in the struct module in python, check out http://docs.python.org/library/struct.html.