In this article I’ll show you how to use Fast Fourier Transform in Digital Signal Processing and how to apply forward and inverse FFT on complex and real data using the KFR framework.

## Introduction

Fast Fourier Transform (FFT) can be used to perform:

- Convolution (including convolution reverberation)
- Cross-correlation and auto-correlation
- Applying large FIR filters
- Sample rate conversion
- Spectrum visualization
- Large integer multiplication
- and many other algorithms

Often FFT is the most efficient way to perform each of these algorithms.

Moreover, KFR has one of the most efficient FFT implementation, so you can get a great performance boost of your DSP applications using KFR’s FFT for all of these algorithms.

## Differences from other implementations

KFR implementation of the FFT:

- is fully optimized for X86, X86-64, ARM and AARCH64 processors
- uses vector intrinsics (if available for cpu)
- supports both single- and double precision
- can cache internal data between calls to speed up plan creation
- can do forward and inverse FFT without a need to create two plans
- can be used for complex-to-complex, real-to-complex and complex-to-real transforms
- doesn’t require computing FFT at runtime and measuring their execution time to find an optimal configuration
- has special implementations for FFT sizes up to 256
- has no external dependencies
- is thread-safe
- is written in modern C++
- is header-only
- is open source (GPL license)

## Getting started

For the list of supported compilers and OSes, see the latest README:

https://github.com/kfrlib/kfr/blob/master/README.md

- Download the latest version of KFR:
- Add the
`include`

folder to your include path. - Insert the following line at the beginning of your project’s source code:

1 2 |
#include <kfr/dft.hpp> |

KFR is **header-only**, so no need to compile any source files, create static library and pass it to linker. The only thing you must to do is to unsure that `kfr`

folder from `include`

directory is available for inclusion.

## Quick example

### Complex input

Let’s apply the FFT to the complex input data:

1 2 3 4 5 6 7 8 9 |
using namespace kfr; // prepare data (0, 1, 2, ..., 255) univector<complex<double>, 256> data = cexp( linspace(0, c_pi<double, 2>, 256) * make_complex(0, 1)); univector<complex<double>, 256> freq; // do the forward FFT freq = dft(data); |

To perform the inverse FFT do the following:

1 2 |
data = idft(freq); |

But as a result, we will get an input scaled by size (256 in this example): `0, 256, 512, ...`

This is a property of FFT.

To get a desired result we have to divide result by size:

1 2 |
data = idft(freq) / data.size(); |

### Real input

Frequency data are stored in CCS format as stated in the table below.

Note, that second half of an output is not stored because of symmetry.

index | real | imaginary |
---|---|---|

0 | frequency[0].real() (DC offset) |
always 0 because of real input |

1 | frequency[1].real() |
frequency[1].imag() |

2 | frequency[2].real() |
frequency[2].imag() |

… | … | … |

N/2-1 | frequency[N/2-1].real() |
frequency[N/2-1].imag() |

N/2 | frequency[N/2].real() (Nyquist frequency) |
always 0 because of real input |

The size of the output data is equal to size/2+1

1 2 3 4 5 6 7 8 |
using namespace kfr; // prepare data (0, 1, 2, ..., 255) univector<double, 256> data = counter(); univector<complex<double>, 129> freq; // data.size() / 2 + 1 // do the forward FFT freq = realdft(data); |

For the inverse FFT, you have to prepare frequency data in the CSS format as well.

To be sure that you will get pure real output, you must ensure that freq[0] and freq[N/2] are real numbers.

1 2 3 4 5 |
using namespace kfr; data = irealdft(freq); // or to get the properly scaled result: data = irealdft(freq) / data.size(); |

## Creating FFT plan

Implementation of FFT requires twiddle coefficients to be prepared before actual processing occurs. If FFT will be performed more than once, then it makes sense to store the coefficients and reuse it every time.

### FFT Plan caching

If you are using `dft`

, `idft`

, `realdft`

or `irealdft`

functions, all plans will be kept in memory, so the next call to these functions will reuse the saved data.

You can manually get the plan from the cache (or create a new if it doesn’t exist in the cache):

1 2 |
dft_plan_ptr<T> dft = dft_cache::instance().get(ctype<T>, size); |

All functions related to the FFT caching are protected with mutex and are

thread safe

`ctype`

is a special template variable intended just to pass a type to the function.

`dft_plan_ptr`

is an alias for `std::shared_ptr<const dft_plan<T>>`

To clear all saved plans and free memory, call the `clear`

function:

1 2 |
dft_cache::instance().clear(); |

As of version 1.3.0, functions

`convolve`

and`correlate`

use the cache internally.

### Examples using `dft_plan`

You can create a plan manually without using the cache to get more control over it.

`dft_plan`

class is intended to store all data needed for FFT.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
using namespace kfr; template <typename T> // data type, float or double struct dft_plan { dft_plan(size_t size); void execute(complex<T>* out, const complex<T>* in, u8* temp, bool inverse = false) const; void execute(univector<complex<T>, N>& out, const univector<const complex<T>, N>& in, univector<u8, N2>& temp, bool inverse = false) const; ... }; |

`dft_plan`

has constructor that takes FFT size as an argument.

Input and output arguments for calls to the `execute`

function must have types `complex<T>*`

or `univector<complex<T>, N>`

.

You can create a plan and use it as many times as needed. After creating a plan, all access to it is **thread-safe**, because executing the FFT doesn’t modify internal data,

so you can share this plan across threads in your application without protecting the plan with locks and mutexes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
using namespace kfr; dft_plan<double> plan(1024); univector<complex<double>, 1024> in; univector<complex<double>, 1024> out; // here fill `in` array with our data (samples) univector<u8> temp(plan.temp_size); plan.execute(out, in, temp, false); // direct FFT // `out` now contains frequencies which have to be processed plan.execute(in, out, temp, true); // inverse FFT // `in` now contains processed data (samples) ... // process new data plan.execute(out, in, temp, false); // direct FFT |

### Real to complex and complex to real transforms using `dft_plan_real`

1 2 3 4 5 6 7 8 9 10 11 |
using namespace kfr; dft_plan_real<double> plan(1024); // dft_plan_real for real transform univector<double, 1024> in; univector<complex<double>, 1024> out; // here fill `in` array with our data (samples) univector<u8> temp(plan.temp_size); plan.execute(out, in, temp); // direct FFT // `out` now contains frequencies which have to be processed plan.execute(in, out, temp); // inverse FFT // `in` now contains processed data (samples) |

## Conclusion

As you can see, using the Fast Fourier Transform is not so hard thing with KFR. Forward and inverse, complex and real transforms can be performed in one line of code with great performance and flexibility.

Moreover, the speed of KFR implementation of the FFT is in many cases higher than the speed of the most known and used libraries written in C, C++ or any other language.

In the next part of the article I will show you how to perform various DSP algorithms using the FFT and the KFR framework.