miércoles, 22 de febrero de 2012

Running programs in parallel using Parallel Python

Parallel Python is a python module which provides mechanism for parallel execution of python code on SMP(systems with multiple processors or cores) and clusters(computers connected via network).

Parallel Python has good features as you can see below:
  • Parallel execution of python code on SMP and clusters.
  • Easy to understand and implement job-based parallelization technique (easy to convert serial application in parallel).
  • Automatic detection of the optimal configuration (by default the number of worker processes is set to the number of effective processors).
  • Dynamic processors allocation (number of worker processes can be changed at runtime).
  • Low overhead for subsequent jobs with the same function (transparent caching is implemented to decrease the overhead).
  • Dynamic load balancing (jobs are distributed between processors at runtime).
  • Fault-tolerance (if one of the nodes fails tasks are rescheduled on others).
  • Auto-discovery of computational resources.
  • Dynamic allocation of computational resources (consequence of auto-discovery and fault-tolerance).
  • SHA based authentication for network connections.
  • Cross-platform portability and interoperability (Windows, Linux, Unix, Mac OS X).
  • Cross-architecture portability and interoperability (x86, x86-64, etc.).
  • Open source.

The first thing what we need to do is just download a module of Parallel Python, as you see below I leave one link where you can find it, download whichever you want.

Decompress it, and open the directory created, then type the following:

sudo setup.py install

We tested an example called sum_primes.py using two netbooks with Atom processor and a MacBookPro with i5 processor, before to see results I take some screenshots using just the MacBookPro.

 And now the results adding the two netbooks.

As you saw in the second one we have less time considerable than the first one, another thing that we saw is the warning that said statistics provided adove is not accuarte due to job rescheduling, it may occurs because they work with a little porcentage of time.

I took screenshots from my activity monitor to prove that the processor is using the top.



1 comentario: