Milktrader

Iterating Until Convergence

Tuesday, July 13, 2010

How To Learn Strange Things

My decision to focus on the C programming language as my first real attempt to learn programming has some rational thought behind it. I'd like to create functions that can be used to manipulate, twist, distort and skew data. The R programming language and statistical platform is actually well-suited for data mining expeditions, except that I overheard some "quote" - programmers - complain that it has some problems with memory and speed. Please, who cares. It's always something.

Well, though I like to gratuitously ignore complainers, I listen with one ear just in case. There may be something there, but there is also a solution. Pass the function duties to a C program, because everyone knows that C is the fastest in the universe when it comes to running programs. C is compiled (that means it can get scrunchy) and it has a very close relationship to assembly language, which is what actually talks to silicon chips.

Okay, we know what must be done. Create a C program to do the heavy lifting of an R session. To do this, you need to create an R package (and you thought C pointers where confusing). Once you create an R package, you put your turbo-charged C program into a pre-defined directory, compile it, and then that's it. Ask a professional "programmer" about this process and you'll get the "read the manual" response because their time is so important they can't be bothered. Plus, they're really smart too. Left to your own devices, you must learn on your own. This is my method.

I've posted links below to various PDF and text files I found through google searching. I've printed the files (anywhere from 3 to 16 pages) and then drove down to the local mail/fax/print shop to have them bind it for about $3.00. Now, I have my own book about how to learn some strange things.


Using .Call in R


Calling C functions from R using .C and .Call

Calling C Functions From R

Calling C and Fortran from R

An Introduction to the R package Mechanism


R Functions: Writing, Using and Documenting


An Introduction to the .C Interface to R


Calling C Code from R
this one is a text file

Due to the sausage-making nature of this learning method, you will likely find quite a bit of repetition in these articles. But that is also the point of iterating until convergence. You keep returning to your 6:00 am wakeup call and repeat the day until you get it right.

4 comments:

  1. A quick tip: focus on the .Call interface if you want your code to be as fast as possible. It's more nuanced, but the speed can't be beat in R.
    ReplyDelete
  2. Thanks Joshua.

    That is also the view of the first referenced article on the process written by Brian Caffo. Once the general logic behind R packages falls into place, it's just a matter of following best practices in expressing functions.

    I'm starting very simple, and will eventually look into Rcpp. Do you use C code in packages mostly, or C++? I thought that I saw quantmod and TTR used mostly C code, no?
    ReplyDelete
  3. I mostly use C because it's the language R is written in and it was faster than C++ when I looked into it.

    That said, Dirk Eddelbuettel and Romain Francois have done a lot of work on Rcpp rcently and the speed difference between calling C/C++ in R may now be negligible.

    Other than personal preference, I'm not sure why you would want to code in C++ over C when calling from R... but I'm just a few steps ahead of you in my journey of learning programming. ;-)
    ReplyDelete
  4. It sounds like you are on your way to your short term goal!

    A few idle comments: C isn't the fastest language for all applications mainly because of pointers actually. At times C can be "too powerful" for an optimizing compiler to be sure whether certain optimizations are allowed or not.

    C is definitely one of the fastest especially recent GCC compilers (with SIMD support) and the Intel compiler. The Fortran is also a contender for certain types of programs. Possibly the most highly optimizing compiler available is called Stalin which compiles Scheme, and is very fast for many types of programs that involve too many levels of function calls for existing C compilers to optimize as highly as possible.

    There is a compiler to turn R into Common Lisp which speeds the program up like 1000x http://dan.corlan.net/bench.html
    http://dan.corlan.net/R_to_common_lisp_translator/

    Right now I am playing with J, which is related to K and APL. These languages have apparently
    been very widely used in finance.
    ReplyDelete