Category Archives: programming

Knowledge and intelligence

LTU links to a lecture by Alan Kay on (ostensibly) the history and current state of programming. But he actually talks about confusing our beliefs with reality, the origin of ideas, “tinkering” as opposed to proper science and engineering, the abstraction-concretization-abstraction cycle etc etc.

Something he says about half way though the lecture reminded me in some ways of Rand’s “intellectual pyramid.” He considers knowledge and outlook to be superior, in most cases, to pure brain power, IQ. And he shows this by comparing Leonardo da Vinci to Henry Ford. “[Leonardo's] IQ could not transcend his time,” whereas Ford had Newton, and those who built upon his ideas, to thank for his success. Kay says: “One of the wonderful things about the way knowledge works [is that] if you can get a supreme genius to invent calculus, those of us with more normal IQs can learn it. So we are not shut out from what the genius does. We just can’t invent calculus by ourself but once one of these guys turns things around, the knowledge of the era changes completely.”

A provocative quote, to end this: “If you take the word engineering seriously, and I do … we haven’t got it yet in software. We really don’t know how to do it.”

The idea behind flacFS and virtualFLAC

[I am not providing the complete source code for what I have described here right now. It is not fit for release. If and when it will be, I will GPL it.]

A very substantial part of my audio library consists of cds ripped in the [single flac image + embedded cue sheet] format. All metadata is stored in .apl files—one for each track—which foobar2000 can generate based on the cue sheet. While this is a neat arrangement, the problem with it is that foobar is one of the very few players which can deal with the situation. If I want to access the collection on linux, I have to run foobar under wine. To my knowledge, no native application is capable of playing the tracks.

The problem
There are three main parts to the problem. One, the apl files. These files are “link” files which specify a particular sample range within an audio image and any player that supports Monkey’s Audio should be able to handle them. But from what I have read, while the MA decoder is free and may even be termed as open source, the license agreement is not compatible with GPL. Hence many players on linux (I have tried three or four) don’t use the SDK to support either ape files or apl ones. Those which do use gstreamer don’t play apl files. Two, cue sheets. Not many players support cue sheets either. Three, metadata. The metadata is locked in the apl files along with the track locations and timings. Thus if a player doesn’t support apl, but does support cue sheets, you lose access to metadata. If it doesn’t support cue sheets either, you are left with a huge flac file with no way to identify the contained tracks.

Further, even if some player does meet all these expectations, there will always be “something” that it will miss. For example (and this is only an example—I haven’t thought about how this might work), I want the ability to create “virtual albums.” Not play lists, but albums. Many times, older albums are not available, but some songs from them are available in special collections. In other cases, cds come with bonus tracks from other albums, or worse, advertisement tracks! (I have one such cd.) Virtual albums will allow one to use the same track in multiple albums, or use only a part of an album. The same track could have different metadata for different albums. And so on.

One more thing. The fact that the metadata is presently stored in apl files doesn’t mean it has to be stored there for eternity. If there were some platform agnostic way to manage it, the same could also be stored in a database—sqlite, firebird, anything—and be applied to the track when it is played, and carried back when it is changed. Thus, the single most important idea would be that the audio file and metadata management should be independent of any audio/ media player. Note that these are mere musings. As I said I haven’t given much thought to how all this might be implemented. Now to the specific problem. Getting the apl + flac combination playing on linux.

The theory
The problem, one should note again, lies at the player end. Players don’t understand that the single image actually has multiple tracks within it. There are three ways I can think of to attempt a solution. Somehow write a decoder/ plugin for the player of your choice. Or stream the tracks to the player. Or use FUSE. I don’t know if I can do the first (apart from the fact that the solution is player specific), and I don’t even know where to begin as far as the second is concerned. But FUSE has promise. If you are not aware of it, FUSE stands for Filesystem in Userspace. (Read the wiki article, and also try the one on IFS for more information.) In short, while filesystems—FAT, NTFS etc—are generally implemented in the kernel space, as drivers, FUSE allows you to create a fully functional filesystem within a regular application/ service. ntfs-3g on linux, which allows access to NTFS volumes, is a great example. pytagsfs and mp3fs too are fuse-based applications.

Deciding on FUSE is the easy part. FUSE the software, with bindings for many languages including .net (see Mono-Fuse) is available on linux. FUSE the concept also has a few Windows implementations, including one called Dokan. One can thus show the media player files which do not really exist, and it will play them. The question is how to show one flac file as many. Its here that I hit a roadblock. I tried on-the-fly decoding using the flac reference decoder. And the application went crazy with multiple decoding processes starting every time a directory was accessed. Maybe I missed something and it can still be implemented in this way, but I dropped the idea quite soon. The only alternative was creating the file, on-the-fly, as it was being read.

Its here that the need to understand the flac format specification arose. The overview, and the specification on the flac site were helpful when it came to understanding the file structure. But that still didn’t answer the question about the contents of the file. I have no experience when it comes to the technicalities of digital audio. So I didn’t know why some things were happening the way they were. Its here that Brian Langenberger’s “audio formats reference” was extremely useful. And jFLAC which I could observe under a debugger when I faced some problems, and from which I borrowed some utility functions.

FLAC
I will only provide a brief description of the format and then move on to what can be done to tie all of it together. Anyone who is interested should refer to the many links I have provided above. Once you understand the specification, a lot of what follows will make sense. A FLAC file is made up of a header (which is a 4 byte signature), one or more metadata blocks, and one or more frames. The first metadata block is a streaminfo one which contains information on the flac stream. Other blocks follow. The last block is identified by a flag set in the block header (every block has one). And the frames follow after that. Each frame consists of one or more subframes.

A note on audio before I proceed. Consider cd-audio. It is described as being 16-bit stereo audio at 44.1kHz. What does that mean? What it means is that the audio has two channels left and right, and every channel is sampled 44,100 times every second. Further, every sample is stored using 16 bits, or 2 bytes. Thus one second of uncompressed audio of the above description will require 44,100 * 2 (channels) * 2 (bytes) = 176,400 bytes of space. The 44.1 is called the sample rate, and the 16-bit is referred to as the sample size.

Back to flac. Each frame has a “block size” which specifies the number of samples it contains. And each frame has as many subframes as the number of channels. Thus, in the above example, a frame with a blocksize of 4096 will store 4096 samples for channel 0 and another 4096 for channel 1. The encoded data lies in the subframes. A lot of unary encoding takes place and the only way to know the byte size of the frame is by seeking through it to the end.

This is the end of the specification in general. But how does it help us in locating tracks? Every frame has a header (all data structures have one) which contains, among other things, three important pieces of data. First is the blocking strategy. In theory every frame can be of a different size (meaning they can contain a variable number of samples) – a variable strategy can be used. But in practice, all frames except the last one have the same size. The residual data is put in the last frame. This is called a fixed strategy. But make note of the possibility that variable blocking is legal. The second is the frame/ sample number. If the strategy is fixed this field contains a zero-based frame number and is stored as a UTF-8 encoded 32-bit integer; if the strategy followed is variable, the field contains the number of the first sample in the frame, and is stored as a UTF-8 encoded 64-bit integer. The last piece of information is the block size, whatever it is. After all we need to know how many samples exist in the frame.

The idea is this. Given the starting and ending sample numbers of a track, we can locate the first and last frames of the track in the stream. The track will start either at the beginning of a frame (like the first track of a cd) or somewhere in the middle of a frame, and will end similarly (except in case of the last track of the cd where the last sample of the cd is the last sample of the track). A legal track can be extracted from a flac image by writing, at the very least, the stream header, and the streaminfo block updated to the values of the track in question, and a variable first frame which will only contain the samples relevant to the track, and the complete samples of all the frames except the last one, and a variable last frame. Because the specification states that a stream can either be variable or fixed, that is, a variable frame, and a fixed frame cannot legally exist in the same stream, we have to set the “variable” flag on every frame even when the block size is fixed. Further the header of every frame will have to be updated to record the starting sample number of the frame instead of the frame number. Another very important step. The subframes, two of them at least, the fixed one and the lpc one, contain encoded samples. This is no problem when we grab the complete frames. But when we need a frame where the sample we are interested in is located in the middle, that frame will have to be decoded, the data will have to be shifted to the front, and the frame will have to be re-encoded. This process will only have to be done at most once per track. At the end of all this we end up with a perfectly legal track. And we hardly did any encoding or decoding.

In my solution, all of the above is taken care of by the flac module. Before calling its decode function, one must subscribe to the “metadata read” and “before frame read” events. The subscriber can hand in a writable flacstream to any flac structure and the structure will write itself to the stream. It is the subscriber’s responsibility to set the correct values of the variables I mentioned above based on values already read from the input stream, and the samples targeted.

virtualFLAC
I wrote three test functions to verify the accuracy of the flac module. The first one analyzed a supplied flac file, calculating various checksums along the way and checking them against those in the stream. The second one copied a supplied file to another one – one flacstream did the reading, the other did the writing. The third function did what was my intention all the time—extract a track if the starting and ending sample numbers, and a flac image were provided.

But track extraction is a partial solution. If a FUSE application is implemented, we have to deal in offsets, not in samples. The audio player may be seeking to a particular sample, but it is actually issuing a read call to the underlying file system which has to be trapped and the necessary data provided. This is the function of the virtualFlac module. It contains two of the most complex classes in the whole suite. I think I have complicated things more than is necessary. If this part of the process can be implemented more elegantly, I would definitely do it. But I will describe what I have done anyway.

The first class is the VirtualFlacCreator. Given a flac file and an array of starting and ending sample numbers (parsed from apl files in this case), it will create a .virtualflac file, an xml file, which contains general information on every track – its name, size (byte size calculated while the stream is being read) and information required to patch the streaminfo block of a flacstream. Further, each track will contain a list of all the frames over which the track is spread. What this means is that nearly every time, a frame will be listed as the last one of a previous track as well as the first one of the current track. For each frame, its starting offset in the image, its virtual offset when it is part of a single track, and a few other values are recorded. To save space and processing power (I hope), only the track information is stored in human readable form. The frame information is stored in base64-encoded form. The .virtualflac generated at the end of this process can be used in conjunction with its companion flac file to generate one flac file for every track. And that, precisely, is the function of the second class—VirtualFlacTrack (call it VFT).

VirtualFlacTrack takes two arguments—the path to the flac image, and a reference to a VirtualTrack object. I know the names are confusing—I gave them the first names that came to mind— but VirtualTrack is the representation of the track node in the .virtualflac file, basically a dumb structure, whereas VirtualFlacTrack uses the information available in the structure to perform on-the-fly substitutions. VFT exposes exactly one function, a standard filesystem read call which takes a preallocated buffer, an offset and a count as arguments and returns the number of bytes read.

This is the end of the virtualFlac module. While the virtual file system can simply call a VFT object and get data as if it already exists on the disk, as I mentioned above, VFT can also perform another function. If you have xyz.flac and xyz..virtualflac, the following function

private static void WriteTracks(string virtualFlacFile)
{
	VirtualTrack [] tracks = VirtualTrack.GetTracks(virtualFlacFile);
	foreach(VirtualTrack vt in tracks)
	{
		string dir = Directory.GetParent(virtualFlacFile).FullName;
		string flacImage = Path.Combine(dir, vt.ImageFile);
		VirtualFlacTrack ft = new VirtualFlacTrack(flacImage, vt);

		FileStream fs = new FileStream(Path.Combine(dir, vt.FileNameWithExt), FileMode.Create);

		const int BUFFER_SIZE = 1024 * 128;
		byte[] data = new byte[BUFFER_SIZE];
		int offset = 0;
		while (offset < vt.Size)
		{
			int read = BUFFER_SIZE;
			ft.Read(data, offset, read);
			offset += read;
			fs.Write(data, 0, read);
			fs.Flush();
		}
		fs.Close();
	}
}

will dump all the tracks in the flac image to the same directory.

Putting it all together – flacFS
I said at the beginning that the FUSE application is the easy part. Because the whole project only uses what can be termed as the lowest common denominator when it comes to .net class usage, and is a console application to boot, it is platform agnostic. It references both Dokan as well as Mono-Fuse. At runtime, it detects the environment and uses the fuse implementation applicable to the platform. To avoid writing the same code in both classes—flacDokan and flacFuse—the whole filesystem is implemented in a third class and the Dokan and Fuse classes simply talk to this third class—flacFs, which is implemented like so. (It is a .cs file with a .docx extension.)

One just has to provide the directory to be mounted and a mountpoint (drive letter in the case of Dokan) and a new volume will appear with virtual flac files on them.

This is how a .virtualflac file looks.
virtualflac

These are the application files involved.
apps

And this one shows the application at work.
explode

A lot remains to be done. For one, metadata support is missing though it should be easy to add, at least the “readonly” part. A vorbis comments metadata block is defined by the flac specification. The application will simply translate between vorbis comments and ape tags or wherever else the metadata is stored. It might even be possible to store it in the virtualflac container. Further, the flac module is brittle because of assumptions regarding the size of the file. And the second Rice partition method is not yet implemented. As for performance, I am not sure how it will work with a few thousand tracks. But I will leave these for another day. The fact that this works is, the pun is very much intended, music to my ears.

Tag clouds, part 8 – The Python Way

wpTags is what I call the python script that is supposed to manage the tag cloud on your wordpress.com blog. Download it from box.net.

The script is released under under GNU GPL v3. To use it, you need to have Python installed on your system. It is available by default on nearly every Linux pc. For others, get it from the Python website. Some part of the code is based on my unreleased .net 2.0 application and some more parts are based on a couple of functions from the wordpress.org php code.

Using the script

  • Open the script in a text editor and read the NOTICE at the beginning of the file.
  • Before running the script, you need to create a special page on your wordpress.com blog with a little bit of text so that you can save it. Whatever you name it, note its page id (the number after the ?p= part – you can find it in the dashboard under Manage Page when you hover the mouse cursor over the page title). If you provide the wrong id, you might end up with a MAJOR HEADACHE. Once this is done, things are somewhat easy.
  • Put the script in a separate folder, run it, and provide the information it asks for. And wait as it downloads ALL your posts and updates the cloud.
  • If you want to change some settings, delete the settings.wpbx file in the script folder, and you can start with the slate clean. I know that this sounds stupid, but this is the first version and I was in a hurry to get it working. Will provide a cleaner method when I next update it.

I haven’t tested it much, but it seems to work fine as of now. I don’t recommend this for serious use, but if you want to try it out just like that, you are welcome to do so.

Python

Some months back, I wrote a software that creates the tag cloud [Edit: Nov. '09; I don't find the separate page to be of much use. The wordpress tag cloud widget is sufficient for most needs] for this blog. It was written in C# 2.0, and I had plans of releasing the source code and binaries under an open source license. But things started sliding (feature creep, gold plating) and finally I realized that since the code depended completely on how certain functions in the wordpress.org codebase worked (slug generation, primarily), there would never be a public release of the software.

I continued using the software on this blog till a few days back, when a particular tag broke the code. Pulling the source code from my subversion repository, making the necessary changes, recompiling it and then trying it out was not something I wanted to do then. I let things lie for a while, and then decided to write the most essential part of the software using python. I downloaded the necessary software a couple of days back, and began working on the same. To my surprise, I had a working version running inside 24 hours. While its not yet ready for a public release, I hope to release the script soon. That I am restricting myself to the English alphabets and dropping things like cryptography and multiple blog support surely helped.

Python is a very interesting language, gives the write-compile-run cycle the go by and is available on multiple operating systems. In spite of having a general idea about it for some years now, I never really did try it – till now that is. Well, better late than never.

Minimax, Negamax and Tic Tac Toe

Last time, I wrote about how I decided to revisit Tic Tac Toe. Like most programmers, I too wrote a program some years back that could play the game without losing. But that version was based simply on a set of rules. This time, I moved further and used implementations of Minimax and Negamax algorithms in addition to rule based play.

Algorithms
I had to do some reading on these algorithms and started with wikipedia. While the entry for minimax provided accurate pseudo code, the one for negamax was plain wrong. I have accumulated quite a few links on these algorithms and their variations (mostly minimax and alpha-beta pruning). However, from among them, An Introduction to Game Tree Algorithms by Hamed Ahmadi Nejad is the best one. It explains minimax, negamax and alpha-beta pruning in a clear and concise manner. Minimax Explained is another good source of information on these algorithms. But it does not cover negamax. In any case, negamax is nothing but minimax in disguise (in fact, it is a more elegant version of the same algorithm).

Source Code
I am releasing the source code and binaries (the program has been written in C# 2.0 and requires the .net 2.0 framework to be installed) for my version of Tic Tac Toe under GNU GPL v3. Download it from box.net.

The source contains two projects – Tic Tac Toe and Tic Tac Toe Stats. While the first project is the game, the second one constructs a game tree and generates stats like how many legal games are there, who wins how many times, how many truly unique games are there if you exclude board rotations, and symmetrical positions etc etc.

Game Play
The program uses one of the following strategies when the computer is playing -

  • rule based play.
  • a modified version of minimax that takes advantage of the static nature of the game.
  • standard minimax.
  • ply-restricted minimax search.
  • standard negamax.

The best thing is that these strategies can be changed from a menu during game play to see how they perform comparatively. I have not implemented alpha-beta pruning based strategies of minimax and negamax because the optimisation is not as simple as it appears. For ABP to work, the depth first search should have discovered a leaf node with a value which makes searching the current node irrelevant. If that does not happen, in the worst case, time wise, ABP will give results equivalent to minimax or negamax.

Ply-restricted search and the evaluation function
If you have downloaded and tried the application, you would have noticed a ‘Minimax Play With Four Ply Look Ahead’ menu item in the Strategy menu. Tic Tac Toe is not a complex game and hence the entire game tree for the 255,168 legal games can be generated within a couple of seconds. But other games like Chess, Draughts and Reversi/ Othello are not that simple and generating the complete game tree is out of the question. So, on each move, a tree is generated with the current board state as the root and a depth cut-off of say 6 or 12 plies being provided, a ply being a move made by one player.

The four-ply strategy demonstrates the effect of a depth cut-off on tic tac toe if we limit the computer’s knowledge to only four moves from the current board state. The following four board positions reflect the progress of the game after moves 3, 4, 5 and 6 with the human player moving first (x) and the computer second (o).
ttt3ttt4ttt5ttt6

The reader may ask why the computer did not prevent x from completing column three and instead capture the centre square? The real question that needs to be asked is why did the computer capture the square on row one-column two on its second move? For, that is where it lost the plot.

The computer is destined to lose the game because the evaluation function that it depends on to make the right moves which works wonderfully well when standard minimax or negamax is used, fails horribly when it comes to the ply restricted version of the algorithm. The function scans the board position at the leaf node and returns +1 if x is winning, -1 if o is winning and 0 if there is a draw or tie. While this behaviour is fine when it scans the ‘real’ leaf nodes in the full game tree, it hardly helps when it is doing it on a game tree only four plies deep. Since it may not find any one winning at that level, the function always returns 0, assuming that the game is ending in a draw.

And so, the computer looks at the values for each of the next moves, finds that all moves lead to a draw, and so picks the first move that it finds. It is only after x’s third move does it have access to the real terminal positions of the game and knows that it has erred badly. From here on, whatever it does, as long as x plays a flawless game, x is bound to win. So, again, since it hardly matters what move it makes, it picks the first move it finds, which is the one that captures the centre square. This shows why an evaluation function is super critical in minimax based algorithms, more so when there is a depth cut off involved.

Now that Tic Tac Toe is done to my satisfaction, I plan to return to Pallankuzhi. Or I might take a look at Draughts or Reversi.

Follow

Get every new post delivered to your Inbox.