For over a year, I was patiently waiting for the new Fermi architecture to hit the shelves so that I could get the latest CUDA parallel processing power at an incredible performance-per-watt and price-per-performance level. I have been looking forward to writing some parallel-processing-optimized code, and these cards were exactly what I needed. I ended up buying a Quadro 600 to start with, since it has 96 processing cores and only uses 40 watts of power! So far, I have been quite impressed by this card's abilities, quiet operation, and price (well under $200). I can now perform "real time" graphical operations that were impossible before. It is like having a 5 or 10 year old multi-million-dollar supercomputer on my desktop for under $200.
Personal supercomputing for the masses!
Nvidia (NASDAQ:NVDA) has moved to a modern 40nm architecture for these new GPUs, which has allowed them to be much more power-efficient while cranking out tons of graphics horsepower for gaming and/or professional applications that make use of their stream-processors (aka, "CUDA cores") on the graphics card for high-performance computing (HPC) via massively-parallel-processed algorithms.
Get an NVidia Fermi-based Graphics Card
First, get hold of a new Fermi-based Nvidia CUDA Graphics card to develop and run your new CUDA applications on.
Now you can start putting some new CUDA abilities to work using the latest Nvidia CUDA Toolkit 3.2 release that has some features specific to the new Fermi cards and architecture that you may want to check into...
Nvidia CUDA Toolkit 3.2 Release Highlights
New and Improved CUDA Libraries
- CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
- CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs, now 2x to 10x faster than MKL
- New CUSPARSE library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations delivers 5x to 30x faster performance than MKL
- New CURAND library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines at 10x to 20x faster than similar routines in MKL
- H.264 encode/decode libraries now included in the CUDA Toolkit
- Support for new 6GB Quadro and Tesla products
- New support for enabling high performance Tesla Compute Cluster (TCC) mode on Tesla GPUs in Windows desktop workstations
- Multi-GPU debugging support for both cuda-gdb and Parallel Nsight
- Expanded cuda-memcheck support for all Fermi architecture GPUs
- NVCC support for Intel C Compiler (ICC) v11.1 on 64-bit Linux distros
- Support for debugging GPUs with more than 4GB device memory
- Support for memory management using malloc() and free() in CUDA C compute kernels
- New NVIDIA System Management Interface (nvidia-smi) support for reporting % GPU busy, and several GPU performance counters
- Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
- Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
- Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
- Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
- Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
- Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
- SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
- cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
- Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
- simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs
Nvidia (NASDAQ:NVDA) stock?
Since this blog also focuses on stock-market and investing opportunities, I have to point out that in my August 13th, 2010 blog entry about Nvidia Toolkit 3.1 news, I contemplated whether the new Nvidia Fermi cards were going to drive substantial sales/revenue-gains and associated profit-gains for Nvidia corporation.
When I wrote that blog entry in August, Nvidia stock was $9.39, and today it is $14.77 as I write this article. If you jumped in on this one, you have already made 57% in a mere 4 months! (update: May-2011; NVDA at $19.00+) The current trend-lines on the stock look good, as it is staying ahead of its moving-average trend-lines on a technical basis, so it may well have some decent upside remaining. Plus, we are rather early in the Fermi-chip-based GPU series from Nvidia. 50+% return on your NVDA stock may have you wanting to take some profits. The choice is yours... this stock has a long-history of being rather volatile, and it will likely have some up/down cycles during its future. I plan to maintain at least some of my position in NVDA as I still think they have the technology to beat when it comes to GPU-computing and supercomputing. Intel's "Sandy Bridge" products coming out in Q1-2011 may have a slight impact on Nvidia (since the new Intel CPUs will include an integrated and allegedly rather capable GPU onboard, which will perhaps suffice for mainstream users).
WHO is going to use these cards?
The thing that I see happening with these discrete graphics cards like the Fermi-based CUDA-capable Nvidia cards is simple: if you are a business and you want to compete, you best learn how to leverage the power of these cards. Period.
Wall Street already knows this (and, I do not mean in the price of NVDA stock),... they are using this technology to perform lightning-fast calculations for algorithmic trading, options pricing, and much more. ANY application that can be significantly enhanced (i.e., made faster and more robust) through parallel processing will be made so by the companies that are leading in any field. They WILL use these GPUs from Nvidia to accomplish that feat.
I am not investing in Nvidia for the fact that home "gamers" and the like enjoy their super-potent "GeForce" cards... I am in this because businesses are going to use TONS of these cards/GPUs in their "desktop supercomputers" for analyzing all sorts of things throughout their domain. Mark my words: companies that miss this opportunity (to leverage CUDA and parallel processing) are going to find themselves looking like Blockbuster as compared to Netflix now.
The margins on Nvidia's "Quadro" business-oriented line of cards is likely higher than that on the consumer "GeForce" line, and certainly their Tesla dedicated supercomputing-desktop devices are going to be money makers as businesses figure out how to use these (i.e., find talented developers to help them write some seriously cool parallel-enabled algorithms and applications software). This may take a while yet, but I would say that within 3-5 years, MOST serious business applications will make some use of CUDA and/or GPUs for heavy analytical processing.
Bottom line: NVIDIA HAS SOME AWESOME GRAPHICS CARDS TO CONSIDER, and some nicely updated tools to go with them!