

FPGA Programming Fundamentals for Every LabVIEW Developer

Introduction
Learning to program, let alone optimize, an FPGA application can take years, if not decades of digital engineering experience. While the technology continues to evolve, certain fundamental concepts remain essential across all FPGA platforms.
Timing paradigms and clocking
Data types and structures
Memory and data transfer
Compilation and debugging
While this article does not intend to outline all of the considerations and benefits of FPGA-based application development, it does seek to provide context into these concepts both generally and in particular consideration for FPGA application development in LabVIEW. Hopefully you’ll walk away with some insight into some common programing tips and tradeoffs so that you can get your application up and running and refine it over time.
Section 1: Why Use an FPGA?
When developing on an FPGA, you are doing more than providing instructions to be executed on a pre-defined chip, you are actually customizing the FPGA chip itself. The benefit of this is that the application can achieve very high loops rates, parallelism, and responsivity to I/O and logic states with minimal software overhead. Depending on your experience, this may be review, so you may want to skip down to some of the other sections covering pipelining, parallel loop operations, or host synchronization. But if you want the full story, read on.
FPGAs, or Field Programmable Gate Arrays, are digital circuits that can be modified time and again for different applications or updates to existing applications. For test and measurement applications, FPGAs can be programmed to perform tasks more traditionally handled by a processor – anywhere from a simple microcontroller through a multicore CPU. While not all tasks make sense to execute on a FPGA, applications requiring inline signal processing for minimal latency, high hardware-timed reliability, and/or high-speed, deterministic control are good candidates for FPGA integration.

Below are some of the most common use cases for FPGAs in control, monitoring, and test applications.
Reduce data processing time → sub-µs loop rates
Customizable algorithms and filtering → hardware-timed execution
Lower level memory control → DMA streaming with low overhead
Custom protocol support and decoding → digital communications and integration
Complex and deterministic triggers → customize responsiveness
Now, if you know you need any of the system or performance benefits listed above, an alternative option to using an FPGA is to use an ASIC. An ASIC, or Application-Specific Integrated Circuit, is a chip that has a predefined set of functionality that can be programmatically accessed through an API.
Depending on the application requirements and planned deployment volume, selecting or building an ASIC for some set of functionality may be the right design decision, but the core limitation is that once they are built, the underlying hardware cannot be modified. From the moment the chip ASIC is fabricated, you take it as it is.
And here is the beauty of FPGA-based development. You can not only modify the application software running on top of the FPGA, but also modify the hardware circuit implemented on the FPGA itself. This gives development teams massive flexibility to adapt to changing application requirements, bugs, and new features over time.
Section 2: FPGA Programming Basics
HDLs, or Hardware Description Languages, are used to program an FPGA chip. To oversimplify what programming a circuit means, an HDL program is compiled and pushed to the FPGA target which intakes that compiled program and modifies the logic gates on the chip. The result is a temporarily static instantiation of a “personality” on the FPGA that incorporates:

Memory blocks – data storage in user-defined RAM
Logic blocks – Logic, arithmetic, DSP algorithms, etc.
I/O blocks – connections between external circuits (e.g., sensors, processors, other FPGAs) and logic blocks
Interconnections – connections between multiple I/O blocks common in any integrated hardware application
If the developer wants to make a tweak to some block or repurpose the FPGA entirely, they must modify the HDL source code, re-compile, and re-push the compiled code to the FPGA. Pretty neat, right?
Yes, very neat, though there are some caveats:
When compared to an ASIC, FPGAs are typically more power hungry, less performant, and higher cost for post-design deployment, though the application design cycles are typically far shorter, thereby lowering total engineering cost.
In terms of abstraction, HDLs resemble assembly languages, which are quite low in the compute hierarchy. To put it another way, they are not easy to program.
While there are some early AI HDL copilot tools out there, they seem to be far from maturity in truly expediting the digital software design process.
While there are numerous HDLs available, the two most common are VHDL and Verilog. While there are some savants out there, if you haven’t been using these languages for some time or have a library of existing IP at your disposal, the learning curve on these languages is both steep and long. Don’t forget your oxygen tank.
This is where LabVIEW FPGA comes in. It provides programming access points at a higher level in the abstraction hierarchy targeting developers who see the benefit of using an FPGA but don’t have expertise in HDL development and validation. While LabVIEW is a full-featured graphical programming language with full IDE support, the FPGA Module extends some of the core tenets to FPGA application development, making high-performance, low-latency FPGA-based systems more accessible to a wider swath of engineering teams. Again, if you’re an experienced HDL programmer, all the power to you.

The remainder of this article intends to provide additional details on how FPGA-based application development can be simplified in LabVIEW and some tips to use along the way.
Want to see some application examples where LabVIEW FPGA shines?
Section 3: FPGA Clocking
If you’re developing on an FPGA, you almost certainly have some critical design considerations around closed loop control and timing. While FPGAs typically run at lower rates than CPUs and GPUs, the level of control and parallelism they provide can enable very complex orchestration of processes. LabVIEW FPGA provides a number of tools to control timing in your application, but here we’ll cover Single-Cycle Time Loops and derived (divided) clocks.
The Single-Cycle Timed Loop is a While loop intended to execute all of the functionality inside it within one clock cycle of the FPGA. In this code example, all of the functionality added to the loop would need to execute within 5ns (200 MHz loop interval). IF you try to compile and all of the functionality cannot be executed on the FPGA within that interval, LabVIEW will through a timing violation, suggesting that you make some optimization or removal of functionality, or lower the clock rate.

Derived clocks easily enable developers to create loops of different clock domains in the same application. This gives the developer timing flexibility in which functionality gets executed at which rate, empowering them to parallelize processes and reserve FPGA resources for higher-speed tasks and avoid timing violations. The clock configuration utility in LabVIEW FPGA allows for any combination of integer multipliers and divisors.

Section 4: Numeric Data Types
In LabVIEW FPGA applications, there are three main numeric data types: integers, fixed point, and single-precision floating point. It is non-trivial to decide which data type to use for different scenarios, so here’s a simplifying outline of the choices.
Integers – Our understanding of integers dates back to grade school and then was bolstered when we first learned to program (ANSI C for me). You can use integers when there is no need for precision beyond the decimal, but the story goes deeper than that. Integers can be a good choice for numeric representation if you have the following requirements:
Bit manipulations, such as masking, shifting, or inverting.
Packing of multiple 8- or 16-bit integers into 32- or 64-bit words. This can be helpful with data sharing as it minimizes the overhead associated with each numeric read/write.
Choosing between calibrate fixed-point or uncalibrated integer I/O node outputs
Fixed-point - As opposed to floating point number which have a relative precision (the decimal placement can “float”), fixed point numbers have an absolute precision (the decimal point is set).
Resource-efficient arithmetic
You’re using High Throughput Math functions*
Default datatype with C Series analog I/O
Watch out for data saturation and LSB (least significant bit) underflow errors.
Single-precision floating point – This numeric datatype in LabVIEW FPGA provides 24-bit procession with a variable position of the decimal. Naturally, arithmetic operations on floating-point numbers are more resource intensive than integers or fixed-point numbers, though there are a couple interesting use cases:
·
High dynamic range data paths. Many analog I/O channels have multiple ranges where the best precision and accuracy is provided in the channel whose max value is as close to the measurement or setpoint as possible. Oftentimes changing ranges means changing digits of precision implying your FPGA designs needs to be flexible across that set of ranges.
Prototype algorithms and designs quickly without losing precision and worry about resource optimization later.
Oh, and don’t forget to watch out for data type tradeoffs. Extra resources for arithmetic can scale quickly when performing operations on large arrays and other data structures. Sometimes precision is worth the extra FPGA resources and sometimes it is not. Only you as the system developer can determine that.
Section 5: Pipelining
Pipelining is an extremely powerful paradigm applicable across computing architectures. Pipelining is a process by which parallel execution of operations (or instructions at the chip level) can occur, thereby increasing throughput and operational clock frequency for a given amount of resource utilization. This means that assuming there is FPGA fabric available, transforming a non-pipelined design to a pipelined design implies you can increase the clock speed for a given set of process operations. If you don’t have a need to execute faster or increase throughput, pipelining may not be worth your trouble.
Implementing a pipelined design is aided by feedback nodes in LabVIEW FPGA. Feedback nodes incorporate a data register under the hood such that data can be shared between loop iterations at each step along the process.

Section 6: High Throughput Math Functions
These specialized functions in the LabVIEW FPGA API are implemented with pipelining under the hood, thereby saving significant development and debugging time compared to custom-designed functions. While they may not be as performant as alternatives composed in lower-level HDLs, they tend to work pretty well for out-of-the-box functionality.
·
The API includes trigonometric, exponential, logarithmic, and polar operations, in addition to (no pun intended) basic arithmetic operations.
These functions operate on fixed-point numbers
Some of these functions have Throughput controls that strive to operate to the data throughput level you specify.

Section 7: Parallel Loop Operations
As previously described, you may require multiple loops running on your FPGA, such as the need to have different processes running at different loop rates to optimize FPGA resource utilization. There are a number of mechanisms available in LabVIEW FPGA used to communicate between these various loops.
·
Local variables, global variables, register items – These mechanism are used for communicating latest data without buffering. Because of this, they are subject to pesky race conditions that arise when you have multiple writers and 1-N readers of that memory space. Also, because there is no buffering, they are generally not good for data streaming which typically requires a lossless communication mode.
Local variables – access scope to a single VI
Global variables – access scope to multiple VIs
Register items – Because you can generate a reference to a register, you can re-use subVIs that access different registers (through the provided reference) given different calling conditions. This makes them more flexible in practice than local and global variables.
FIFOs and handshake Items – These “first in, first out” data structures are bread and butter for lossless data communication because they have allocated memory to buffer data. This is useful when the producer of the data and the consumer of the data do not always run at the same rate. Because all memory blocks in the FPGA are allocated at compile time, it is possible for these data structures to run out of memory. For FIFOs, you can have multiple writer and multiple readers accessing the same data buffer, whereas handshake items are single writer / single reader.
Memory items – Block memory and lookup tables (LUTs) provide mechanisms for re-writeable data storage that can be accessed across your FPGA application. They are lossy and therefore a poor choice for data streaming, though widely flexible otherwise.
We’re starting to get into some rather non-trivial concepts here with various caveats and implications, meaning care and caution must be exercised when choosing which data communication mechanism should be used for different pieces of functionality across complex FPGA applications.
Section 8: Host Synchronization
Depending on the application, you may need to send data between an FPGA and an external processor, such as a Real-Time processor or a Windows host PC, for datalogging, further processing, or visibility in a UI. One tool to help with this data sharing is a Direct Memory Access (DMA) FIFO. These data structures provide an efficient mechanism for data streaming which minimizes the processor overhead for lossless data fetching.

In this code snippet, the FPGA is taking accelerometer data, converting that data to single point, and writing it to a lossless DMA FIFO. What is not shown here is the corresponding FIFO Read function call that would be made asynchronously from code running external to the FPGA.
Section 9: Compilation
There exists a world of information and complexity on this topic, but here we’ll highlight some core points to help you take your design implemented in LabVIEW FPGA and get it running on real hardware.
While LabVIEW code intended to run on Windows does not necessarily require a compilation step, LabVIEW FPGA code always does. The output of a LabVIEW FPGA compilation is a bitfile which can be referenced from calling programs and re-used across similar FPGA targets. When you select to build your code, LabVIEW presents you with a number of configuration and optimization options which must be selected (or implicitly accepted in common click-through practice) before kicking off the compilation. From there, LabVIEW generate intermediate files which are then passed to a Xilinx compiler (don’t worry, you don’t need to install these tools separately)
Lastly, you have a few different options of where the code is actually compiled. You can do so locally on the development machine, on a networked server, or on an NI-provided cloud service. If you’re just getting started with LabVIEW FPGA, compiling locally is probably the easiest option, but once you get up and running with the tool chain, the cloud compile service is pretty neat and offloads a ton of work from your machine.

Section 10: Debugging
Ever write perfect code the first time around? Me neither. Debugging is a fact of life, but you don’t need me to tell you that.
Because FPGA code can take a long time to compile, you don’t necessarily want to go through that step every time you make a small tweak to an algorithm or want to test out a new subVI. Given this, LabVIEW FPGA offers a few different execution modes which can be very helpful in debugging. I’ve only ever used Simulation (Simulated I/O) when not going through a full compilation.

With that said, verifying full system functionality with I/O and memory simulated on the FPGA is generally a bad idea.
Host visibility: Use indicators and FIFOs to pass data from key areas up to the host to get quicker top-level visibility into different pieces of lower level functionality in subVIs that are otherwise difficult to access. In final deployment applications, be sure to remove unnecessary indicators and data structures as they consume limited resources.
Performance benchmarking: Use sequence structures and tick counters to benchmark timing in critical sections of code. From experience, this is often an iterative process where you ratchet up loop rates until timing violations occur or you identify areas of the code that can be further optimized through pipelining or other tactics.
Xilinx Toolkit: Use the Xilinx ChipScope toolkit to probe, trigger, and view internal FPGA signals on FlexRIO targets.
Conclusion
Whether you’re assessing FPGA technology fit, trying to choose a future-proofed platform, or developing a system, the concepts outlined in this article are intended to bolster your knowledge so you can make the best decisions possible for your application. While specifically focused on fundamentals applied through LabVIEW FPGA, these discussed concepts span across chipsets, programming languages, and application requirements.
Timing paradigms and clocking
Data types and structures
Memory and data transfer
Compilation and debugging
If you’re interested in learning more about design patterns associated with common processes, such as analog data streaming, custom triggers, and serial protocol decoding, you can review this article.
Ready to enhance your FPGA development skills or want help selecting a technology?