Fast Fourier Transform in C++ using KFR

12 September 2016

In this article I’ll show you how to use Fast Fourier Transform in Digital Signal Processing and how to apply forward and inverse FFT on complex and real data using the KFR framework.


Fast Fourier Transform (FFT) can be used to perform:

  • Convolution (including convolution reverberation)
  • Cross-correlation and auto-correlation
  • Applying large FIR filters
  • Sample rate conversion
  • Spectrum visualization
  • Large integer multiplication
  • and many other algorithms

Often FFT is the most efficient way to perform each of these algorithms.

Moreover, KFR has one of the most efficient FFT implementation, so you can get a great performance boost of your DSP applications using KFR’s FFT for all of these algorithms.

Differences from other implementations

KFR implementation of the FFT:

  • is fully optimized for X86, X86-64, ARM and AARCH64 processors
  • uses vector intrinsics (if available for cpu)
  • supports both single- and double precision
  • can cache internal data between calls to speed up plan creation
  • can do forward and inverse FFT without a need to create two plans
  • can be used for complex-to-complex, real-to-complex and complex-to-real transforms
  • doesn’t require computing FFT at runtime and measuring their execution time to find an optimal configuration
  • has special implementations for FFT sizes up to 256
  • has no external dependencies
  • is thread-safe
  • is written in modern C++
  • is header-only
  • is open source (GPL license)

Getting started

For the list of supported compilers and OSes, see the latest README:

  1. Download the latest version of KFR:

  2. Add the include folder to your include path.

  3. Insert the following line at the beginning of your project’s source code:

#include <kfr/dft.hpp>

KFR is header-only, so no need to compile any source files, create static library and pass it to linker. The only thing you must to do is to unsure that kfr folder from include directory is available for inclusion.

Quick example

Complex input

Let’s apply the FFT to the complex input data:

using namespace kfr;
// prepare data (0, 1, 2, ..., 255)
univector<complex<double>, 256> data = cexp(
        linspace(0, c_pi<double, 2>, 256) * make_complex(0, 1));
univector<complex<double>, 256> freq;

// do the forward FFT
freq = dft(data);

To perform the inverse FFT do the following:

data = idft(freq);

But as a result, we will get an input scaled by size (256 in this example): 0, 256, 512, ... This is a property of FFT.

To get a desired result we have to divide result by size:

data = idft(freq) / data.size();

Real input

Frequency data are stored in CCS format as stated in the table below.

Note, that second half of an output is not stored because of symmetry.

index real imaginary
0 frequency[0].real() (DC offset) always 0 because of real input
1 frequency[1].real() frequency[1].imag()
2 frequency[2].real() frequency[2].imag()
N/2-1 frequency[N/2-1].real() frequency[N/2-1].imag()
N/2 frequency[N/2].real() (Nyquist frequency) always 0 because of real input

The size of the output data is equal to size/2+1

using namespace kfr;
// prepare data (0, 1, 2, ..., 255)
univector<double, 256> data = counter();
univector<complex<double>, 129> freq; // data.size() / 2 + 1

// do the forward FFT
freq = realdft(data);

For the inverse FFT, you have to prepare frequency data in the CSS format as well.

To be sure that you will get pure real output, you must ensure that freq[0] and freq[N/2] are real numbers.

using namespace kfr;
data = irealdft(freq);
// or to get the properly scaled result:
data = irealdft(freq) / data.size();

Creating FFT plan

Implementation of FFT requires twiddle coefficients to be prepared before actual processing occurs. If FFT will be performed more than once, then it makes sense to store the coefficients and reuse it every time.

FFT Plan caching

If you are using dft, idft, realdft or irealdft functions, all plans will be kept in memory, so the next call to these functions will reuse the saved data.

You can manually get the plan from the cache (or create a new if it doesn’t exist in the cache):

dft_plan_ptr<T> dft = dft_cache::instance().get(ctype<T>, size);

All functions related to the FFT caching are protected with mutex and are thread safe

ctype is a special template variable intended just to pass a type to the function.

dft_plan_ptr is an alias for std::shared_ptr<const dft_plan<T>>

To clear all saved plans and free memory, call the clear function:


As of version 1.3.0, functions convolve and correlate use the cache internally.

Examples using dft_plan

You can create a plan manually without using the cache to get more control over it.

dft_plan class is intended to store all data needed for FFT.

using namespace kfr;
template <typename T> // data type, float or double
struct dft_plan
    dft_plan(size_t size);
    void execute(complex<T>* out, 
                 const complex<T>* in, 
                 u8* temp, 
                 bool inverse = false) const;
    void execute(univector<complex<T>, N>& out, 
                 const univector<const complex<T>, N>& in, 
                 univector<u8, N2>& temp, 
                 bool inverse = false) const;

dft_plan has constructor that takes FFT size as an argument.

Input and output arguments for calls to the execute function must have types complex<T>* or univector<complex<T>, N>.

You can create a plan and use it as many times as needed. After creating a plan, all access to it is thread-safe, because executing the FFT doesn’t modify internal data, so you can share this plan across threads in your application without protecting the plan with locks and mutexes.

using namespace kfr;
dft_plan<double> plan(1024);
univector<complex<double>, 1024> in;
univector<complex<double>, 1024> out;
// here fill `in` array with our data (samples)
univector<u8> temp(plan.temp_size);
plan.execute(out, in, temp, false); // direct FFT
// `out` now contains frequencies which have to be processed
plan.execute(in, out, temp, true);  // inverse FFT
// `in` now contains processed data (samples)
// process new data
plan.execute(out, in, temp, false); // direct FFT

Real to complex and complex to real transforms using dft_plan_real

using namespace kfr;
dft_plan_real<double> plan(1024); // dft_plan_real for real transform
univector<double, 1024> in;
univector<complex<double>, 1024> out;
// here fill `in` array with our data (samples)
univector<u8> temp(plan.temp_size);
plan.execute(out, in, temp); // direct FFT
// `out` now contains frequencies which have to be processed
plan.execute(in, out, temp); // inverse FFT
// `in` now contains processed data (samples)


As you can see, using the Fast Fourier Transform is not so hard thing with KFR. Forward and inverse, complex and real transforms can be performed in one line of code with great performance and flexibility.

Moreover, the speed of KFR implementation of the FFT is in many cases higher than the speed of the most known and used libraries written in C, C++ or any other language.