Archive for the ‘Programming’ Category

Demystifying Unladen Swallow

Tuesday, October 13th, 2009

Unladen Swallow is “an optimization branch of CPython, intended to be fully compatible and significantly faster”. Interesting. In order to understand Unladen Swallow, we first needed to understand the CPython compiler. It’s  written in C and fairly easy to understand, very unlike the self-hosting complex beasts Steel Bank Common Lisp and Glasgow Haskell Compiler. The design of the compiler is explained fully in PEP 339, and I’m just including a small warm-up discussion.

Start off at the top level: the evaluator (Python/cevel.c). It executes Python bytecode. From the Python program you write to the Python bytecode that is produced, there are several stages with optimizations at every stage. It goes like this:

  1. Using the SPARK parser generator, generate the Parse Tree from the Python code  (Parser/pgen.c)
  2. Convert the Parse Tree into an Abstract Syntax Tree (Python/ast.c)
  3. Transform the AST into a Control Flow Graph. Several compiler optimizations like removing unreachable code are implemented here (Python/compile.c)
  4. Post-order DFS the CFG and flatten it to emit the final Python bytecode. The bytecode is then subjected to various peephole optimizations (Python/compile.c)
  5. Instead of having to go through this this long procedure everytime we want to run the same program, the Python bytecode produced is stored in a .pyc file.

Enough warming up. The Unladen Swallow developers want to optimize CPython very smoothly, preferably without breaking any existing code. Their Project Plan is laid out on the Wiki page; I’ll mainly discuss 2009Q2.

If they work at any stage higher than the Python bytecode, they’ll have to re-implement the fantastic peephole optimizations written by the community over the years. The plan is simple: instead of executing the high-level bytecode in the eval loop (which can be painfully slow), compile atleast some part of it into a lower representation that can be executed faster. Why not compile all of it to x86 assembly? Firstly, we’d lose the interactive shell, a very major part of Python. Secondly, we’d lose out on Python’s high portability, that arises because it’s not tied down to any single architecture’s assembly language (I even run Python on my phone using the PyS60 package). Thirdly, it’s terribly difficult to do, makes Python uglier and unmaintainable, and it’s utterly pointless- we’re trading off too much for execution speed.

Just-in-time (JIT) compilation is the solution. Compile the code only when needed. Then again, don’t compile everything; compile when the new execution time is less than the old execution time by an extent enough to justify the compile time. Otherwise, the startup delay will become unbearable and the interactive interpreter won’t really be interactive anymore. The perfect low-level platform independent representation to JIT to is LLVM (Low-level virtual machine). So, the Unladen Swallow developers first wrote a compiler from Python bytecode to LLVM IR from scratch (Python/llvm_compile.cc). The LLVM IR (Intermediate Representation) looks a lot like assembly, except that it’s architecture-independent,  and can be subject to a nice set of optimizations. Then they modified the eval loop to execute LLVM IR, while keeping the existing Python bytecode evaluator (file renamed: Python/eval.cc).

Q3 and Q4 aren’t updated on the Wiki page, but it should be easy enough to find out the work in progress from the IRC channel, mailing list, and repository commits.

Designing an introductory computer science course: Part 1

Wednesday, September 30th, 2009

This course attempts to teach programming methodology from an engineering viewpoint, as well give an insight into theoretical computer science.

Guidelines:

  1. Teach programming, and not the programming language. Discuss abstract ideas in class, and ask students to write implementations in a language of their choice. Avoid building language-specific patterns (for example, the double-for loop for generating prime numbers in C got stuck in my head). Discuss various different implementations AFTER the students have tried it themselves.
  2. Discussions should span several programming paradigms. Initially, cast problem statements to strongly indicate a paradigm. For example “Go over numbers from 1 to 10 and print those that are even” versus “Define an even number and print even numbers <= 10″. Later, let students figure it out themselves.
  3. Develop a hacker mindset- give students incomplete solutions written by other students and ask them to complete it.
  4. For routine assignments, prepare a huge problem set of ascending difficulty to randomly pick from, along the lines of Project Euler. Problems should cover algorithms, pure math, computational geometry, and dynamic programming.
  5. Conduct collaborative programming sessions where students learn to implement from each other. No single instructor will be able to provide such a vast variety of implementations.
  6. Prepare a reading list for software engineering, including articles on editing (introduction to specialized editors such as Vim and Emacs), debugging, versioning, testing, and profiling. It’s important to make students write large programs collaboratively in steps, so that they will be able to appreciate the importance of these tools.
  7. Prepare another reading list for theoretical computer science. Also give students sneak peeks into the latest work in every field. The purpose of this exercise is to get students excited, so that they can work independently to eventually discover their research interests.
  8. Finally, it all boils down to grading. If you’re going to ask students to evaluate things like (i++) + (++i) in the exam, it’s clear that you’re expecting students to know programming language intricacies, and not programming methodology.

lecture1.pdf (130 KB)

Why and how I switched from Vim to Emacs

Monday, May 18th, 2009

NOTE: I have no intentions of starting another editor war. Opinions expressed here are my own.

Over five years ago, I realized that using editors like nano and Gedit wouldn’t suffice. The need to learn a real editor arose as I wrote more and more code. By nature, I’m a minimalist. I hate clicky flashy bloated things like GNOME, KDE, Compiz, Beryl etc. Emacs is large, graphical and consumes eons of memory compared to Vim. I had also just learnt touch typing and was eager to put that knowledge to use. Emacs didn’t appeal to me at all with its complex META and CONTROL keybindings and its complicated GUI. Vim, on the other hand was small, simple, ran off my terminal and had very simple keybindings. I took to Vim immediately and progressed to become very good at it. Vim uses muscle memory heavily- once you’re that good at Vim, you don’t have to conciosly edit anything; fingers move on their own and it feels like Vim is reading your thoughts. It’s the principle concept as touch typing. Which is what makes it so hard to switch. It struck me that there was something about Emacs worth learning over a year ago, but I was never really able to use it.

I wanted to do so much more with Vim when I realized that it was crippled. Everyone seemed to be talking about those features being present in Emacs. Consider a simple task- editing a file over the network. In the Vim world, I’d ssh into the remote machine and copy my ~/.vimrc and ~/.vim over, praying that the same version of Vim is present on the remote machine. Then using screen (see GNU screen) sessions, I could resume work. In the development front, I learnt that Vim had lots of features hardcoded for C/ C++, but not other languages. In Python, I had to use indentation-based folding to select codeblocks. I wished that Vim behaved more intelligently and do context-specific tasks. A :make in C or C++ is not the same as the :make in Python or :make in LaTeX. Vim didn’t understand this.

Writing and debugging were also separated- I couldn’t really debug anything from within Vim. This wasn’t critical until I ventured out to learn a Lisp. In Common Lisp, writing code and debugging it are so closely related that it was impossible to keep switching between applications to do it. Everyone recommended SLIME (Superior Lisp Interaction Mode for Emacs). I tried to switch retaining the keybindings using tricks like Viper mode for Emacs, but to be honest, nothing really worked out. Purists encouraged me to use Emacs as-it-is without these cheap tricks. I kept looking for Vim equivalents for several tasks and kept failing.

Few people understand the real difference between the two editors. They go on and on comparing the number of keystrokes required to do equivalent operations. I did the same for some time until it struck me- Vim is a dumb text editor, Emacs feels like some sort of intelligent organism. No real equivalents like the Vim “vawy…p” existed in Emacs- why? because Emacs is more intelligent in that. Vim doesn’t understand what a “function” or “codeblock” is. It simply understands characters and lines. Vim is a superb dumb text editor, but nothing more. Emacs, on the other hand, seems to understand what you write and assists you more intelligently. It doesn’t require the basic text editing functions that Vim has at all. I’ll give you an example. Recently, M-z or zap-to-char function was introduced into Emacs. Initially, I was very excited about it, because majority of my commands in Vim were dt, df. Now, I rarely use the command. There’s little doubt about it- In my Vim time, I could have beaten any Emacs expert at basic text editing. Vim does it faster. Emacs doesn’t even have many of those keybindings.

Then why? Because I came to realize that programmers do a lot more than just basic text editing. Vim and Emacs assist the programmer in different ways. Vim helps the programmer save time on basic text editing functions while deserting him when he wants to do his compiling or debugging or even interact with an external application. Emacs, on the other hand, seems to understand the code and helps the programmer operate on code entities, rather than characters and lines. SLIME is easily the best development environment I have seen. Emacs interactions with external applications nicely. Even documentation is readily available within Emacs.

What gives Emacs its so-called intelligence? A major part is contributed by it minor modes. When I open a certain file, some specific minor modes get activated and keybindings invoke context-specific functions. I loved the context-specific help. In latex-mode, I could preview the document I was writing with the same keybinding that I used to compile lines in elisp-mode. The same keybindings mean different (but comprehensible and expected) things in different minor modes.

As I started using it more and more, I understood the true nature of Emacs. Emacs is actually just an elisp environment bundled with several snippets of code written in elisp. Every keybinding invokes an elisp function (ok, some are so basic that the functions are compiled in C. But there are few immutable functions like this). This allows it to be infinitely extensible. Today, there are several brilliant elisp programs like emacs-w3m, emms, eterm, rcirc etc. which allow users to do email, chat, browse the internet and do a lot more, right from within emacs. SLIME is also just an elisp application. In Vim, everything’s written in C and keybindings are hardcoded, which is what makes it rigid.

At the end of the day, I have realized that average programmers can be equally productive on both Emacs and Vim. The question is simple- do you want your editor to assist you with basic text editing or higher tasks? And as programmers become better and better at what they do, I personally feel that they require assistance with higher tasks and not basic text editing.

If programming languages were metal bands

Thursday, December 25th, 2008

C – Black Sabbath.
Rough, unpolished and what most people consider too old for this day
and age. Yet some people worship Sabbath and claim they’re the
greatest thing in existance.
 
C++ – Dream Theater
A little complex when you start out with it, doesen’t make perfect
sense unless you understand it. You consider it brilliance if you do.
 
Java – Metallica
Was a great concept once upon a time. Now everyone seems to hate it,
though people still cling to it, who are rather disliked by others in
the programming world.
 
C# – Megadeth
Competes with Java, though you can’t deny some things about it are
very impressive. Also, the frontman/controlling company is considered
a real prick.
 
Python – Iron Maiden
Everybody who tries it falls in love the first time. You’ll never meet
a single chap who’s tried python/Iron Maiden and not liked it. You
don’t understand why they do some of the things in the latest releases
though.
 
Perl – Dragonforce
You have no idea what’s going on and is incredibly complicated. If you
don’t pay attention to it, it sounds kindof nice.
 
PHP – Opeth
Pretty inconsistent, a little confusing, though fun to try once in a
while. *way* too overrated.
 
LOLCODE – Spinal Tap
‘Nuff said.
 
Ruby – Nine Inch Nails
Everyone keeps talking about it, but you can’t really make out what
the fuss all about.
 
Haskell – Dethklok
Brutally fast and extremely popular with specific people, with almost
cult status. Any haskell/dethklok fans will bond almost instantly.
 
Visual Basic – Linkin Park
Only 14 year olds think that’s a programming language/metal band.

Posted via email from Ramkumar’s posterous

Python workshop

Tuesday, August 26th, 2008
Python workhop poster

Python workshop poster

I decided to conduct a Python workshop in the institute as too many people don’t even know such a thing exists. Since I don’t do poster design, I had to ask another Entrepreneurship Cell member, Anirban Pal to make it. He’s a very talented designer and has been doing practically all the design work of Ecell. All that’s left now is to prepare for the workshop.

Updates: The workshop is over. Resources for the workshop can be found here.