Tableau is a powerful tool in the Business Intelligence (BI) market. It is akin to having the user serve as the Director of Analytics for a room full of developers. The user issues a quick line of script and a few drags of the mouse to orchestrate an elaborate series of number crunching steps reminiscent of manually programmed layers found in traditional software.
Developers may not consider this option because it’s an alternative and ready-made solution to the art of analytics visualization development. Typically, they use high-level languages, such as C# or Java, and a multitude of support frameworks, databases, and other tools in the back-end to accomplish similar goals.
However, developers should take advantage of BI tools more often. This article presents an in-depth examination of some available operations in Tableau, and compares them against a standard implementation in a typical software stack.
Visualizations and Features
Let’s start with a few example visualizations created using Tableau. Each took five minutes or less to create. They all use the Microsoft AdventureWorks database as a data source.
The following graph shows the total gross sales in dollars for each year.
Visualizations in Tableau are also called sheets, which you can combine into dashboards or stories. Data fields in Tableau are known as pills, and the places where you put them are shelves. You design sheets by dragging fields to the column and row shelves, like so:
The first three blue pill fields on the Rows shelf are dimensions, which are like the set of columns in a SQL GROUP BY statement.
Max Weight on the Rows shelf is a measure and has been set as a discrete field, which causes it to be rendered as text. The green pill in the Columns shelf has been set as a continuous field, so it will be rendered as a Mark — a bar chart in this case. The AGG that surrounds the Max Weight field is a function that tells the sheet the parameter is already aggregated to the level of the sheet.
The Filters card, shown below, contains the sheet’s filters. The Marks Bar uses the Max Weight for its color-coding. A Country filter drop-down list at the bottom allows you to make a selection.
It takes a little more work to accomplish this manually. The fields and filters above are equivalent to the following SQL:
This sheet configuration creates the following visualization:
While it would take just a short time for a seasoned SQL developer to write this query, its readability, maintainability, and the integration with calling code remains a challenge. It is difficult to quickly glance at the SQL query and know what it returns, whereas having a clean-cut set of pills is much simpler. And, the same goes for the where clause in SQL. The Filters card in Tableau is enough to give you the gist of what is filtered out. Furthermore, adding SQL parameterization to the SQL to pass filter values is a moot point in Tableau, because you can easily choose to show a filter for any of the filter pills and choose one of their drop down box types to render the selections.
Tableau offers many styles of visualization with lots of useful features. Below is the same sales data displayed as a line graph with a month level of detail (LOD). Users can examine the trend line’s details by hovering over it with the mouse.
Tableau offers many features for performing statistical analysis. Here’s a pair of bar graphs that shows each state’s gross sales and a comparison to the country’s average. The coloration, formatting, axis ranges, and other settings are easily adjustable. The bars on the right side use a LOD expression to perform the calculations, which we’ll get to shortly.
Calculated Field Scripting Environment
Tableau provides a scripting environment for authoring calculated fields. A calculated field lets you derive new fields that can use data source fields or other calculated fields. The text editor provides IntelliSense and a documentation panel:
As mentioned previously, LOD expressions are a type of calculated field. These fields let you redefine the set of dimensional fields used by an aggregation operator (like Average, Sum, Min, and so on), instead of relying solely on the host sheet’s set of dimensions. In contrast, a non-LOD field uses the sheet’s explicit dimensions for dimensional scope.
An LOD expression has a simple three-part syntax:
- LOD type name (Fixed, Include, or Exclude)
- Dimensions applied to LOD (any combination from the data source)
- Aggregation expression (the same syntax as non-LOD fields)
Here is an example LOD field that calculates average Total Due by Store ID and Customer ID.
In essence, LOD expressions are a shorthand for SQL subqueries. The host sheet uses these subqueries through an inner join and uses the LOD aggregate operator’s result like any other measure or dimension. Each LOD type corresponds to a strategy for building the effective list of dimensions for the sub-query.
The following sheet includes all three types of LODs—FIXED, INCLUDE, and EXCLUDE. In the last text column, the sheet shows the maximum weight of the sales orders by Country, State, and day of week as Max Weight, along with a corresponding color-coded bar graph. The LODs are displayed in the other weight-related columns.
Let’s look a little more closely at the three types of LOD expressions.
The FIXED LOD expression creates a separate result set, like an SQL subquery that’s joined to the results of the sheet. This type of LOD determines the full dimensionality (or set of GROUP BY columns) of the subquery. Here’s an example:
The results are shown in yellow:
This is equivalent to the following SQL:
The EXCLUDE LOD expression also creates a separate result set that’s joined to the query results of the sheet. The difference is that it removes its specified dimensionality from that of the host sheet to form a subquery. The following expression creates the column called Max Weight Exclude Day.
The column data is shown below:
Notice how it’s equivalent to the previous FIXED LOD type expression. This is because:
Country + State == Country + State + WeekDay – WeekDay
INCLUDE LODs are similar to the FIXED type, but they simply append their specified list of dimensions to those of the sheet to create a subquery. Typically, if more dimensions are required to perform a calculation, but you don’t want them to play a role in the visualization, such as affecting the data-driven row or column counts, then an INCLUDE expression is the right choice.
Here’s an example. The INCLUDE LOD expression below calculates the minimum weight across the View (Country, State, WeekDay), plus an extra dimension: the [product] Category. This expression results in a subquery with one more level of detail than that of the sheet. The sheet calculates the average of those minimums and names it Avg. Min Weight Include Category:
Here’s the sheet’s row shelf where the above LOD expression is wrapped in the average aggregate operator:
The results of this average are shown in yellow:
These two simple-looking expressions are equivalent to the following large block of SQL:
The SQL approach involves a lot of effort — or at least a lot of code, and the accompanying opportunity for hard-to-debug errors — just to get the same results that using only five terms and fields in Tableau achieves.
Tableau offers table calculations that allow the user to override the values of a measure field inside a sheet. These calculations take the field’s entire set of values from the sheet as input and then map each of those inputs to a new value. Here’s an example table calculation that calculates the running total of the Total Weight column:
The calculation is configured to use all of the available dimensions in the sheet to track the current total. This can be accomplished by setting Compute Using to either Table (down) or Specific Dimensions and selecting all of the dimensions as shown below.
In addition, the running total can be restarted each time a new year is encountered in the sheet by specifying this in the Restarting every setting, shown in yellow.
Here’s the resulting sheet after making this selection:
Tableau offers many kinds of table calculations.
The Moving Average calculation offers nifty features shown in yellow below, which include setting the size of the look-ahead and look-back windows and whether to include the current row value in the average calculation for each row. One use case for this calculation could be to generalize the shipping effort for future years, or any other case that requires noise filtering.
The results of applying this moving average to the Total Weight is shown below.
It might be easy to take the power of such features found in the GUI for granted, but consider that, if this were instead implemented using a traditional software stack, the calculations would likely occur in the client application, since SQL isn’t equipped to perform them. A C# language implementation of the same moving average calculation can put this into perspective:
This implementation is O(n) for any relative look-ahead or look-behind window size compared to the input size. Moreover, this method is actually hard to come by in the big-name web and desktop visual component makers, and it’s not even available in the Math.NET library (which does offer a different flavor of the moving average with only a single window size).
Tableau makes it easy to design your workbook’s data source queries with a tool that has all the standard features, plus a lot more. You can search for tables using a text box and then, after dropping them onto the designer, override their automatically discovered relationships as needed and alias the tables.
The graphical depiction of inner and outer joins is clear and concise. Plus, the designer automatically lays out entities in a clean left-to-right hierarchical fashion. It even shows unions of tables, a feature that’s not available in the SQL Server Management Studio’s (SSMS) view designer.
The Tableau designer is shown below, depicting part of the data source query used in earlier visualization examples.
The display only includes table names. However, you can interrogate the full listing of field names and query data in the lower details pane (not pictured). You can also bring up details of the joins by clicking on the relationship graphic, as shown here.
In contrast, SSMS (version 18.4) defaults to a table displaying the columns, and doesn’t automatically lay out the tables — though, with some clicking and dragging, an equivalent view can be achieved.
SSMS also offers a similar join menu, shown here.
The Tableau data query SQL is accessible through the Data > Convert to Custom SQL command. If you’re unable to complete your query design in the Tableau designer, this command offers a fallback option.
After running the convert command, you can copy the SQL and then create a view object in your relational database. Here’s an example:
Live and Extract Modes
Tableau lets you choose either a real-time data adapter or a super-fast and highly compressed proprietary file plus runtime engine combination for its data access layer (DAL). These two modes are called Live and Extract. The GUI selector is shown here.
The Live mode (data adapter) causes Tableau to delegate much of the analytic processing (minus table calculations, of course) to the underlying data sources. This means Tableau generates large amounts of SQL in the case of a relational database, which it tailors to the specifics of each platform.
You can examine the SQL executed by Live mode if needed. Here’s a quick peek using Microsoft SQL Profiler while executing the Refresh Data Source command:
In contrast, Extract mode causes Tableau to create a .hyper file that contains a columnar-structured copy of the data required by the workbook. Tableau creates multiple memory-mapped file (MMF) views in memory of this .hyper file in response user activity in Tableau. This provides performance gains over either paging through a regular file on disk using file I/O or swallowing the entire dataset into memory.
Tableau’s automatic memory management of the .hyper file through MMFs is analogous to that of the .NET Framework or a JVM garbage collection scheme, which it tailors to the needs of the user’s workbook.
In the end, Tableau is just another coding environment. It improves analytics development by providing a number of high-level capabilities and design-time user interface integration. Traditional approaches using SQL, Java, C#, and so on may be a developer’s first choice because they are un-opinionated, familiar, and flexible. However, Tableau has a lot of advantages worth exploring.
Additionally, developers can automate and extend Tableau’s native features to support large-scale and specialized analytics operations by using Tableau’s APIs and SDKs — in other words, going beyond the desktop. They exist in different forms including in-process libraries, HTTP/REST/GraphQL servers, and web UI components.
If we consider a larger strategic perspective, it is possible that the emerging Command Query Responsibility Segregation (CQRS) design pattern for microservices architecture may even factor out some aggregation type details of the Read responsibility found in the Create Read Update Delete (CRUD) mantra of large-scale systems, leaving them to a tool like Tableau.
Developers have opportunities to apply their expertise given any level of Tableau adoption. After all, developers should play a role whenever and wherever code is written.