top of page

How Do You Join Data Sets In Tableau?

Updated: 4 days ago

Sometimes a data set doesn't have everything you need in it for a certain visualization. You will have to perform extra steps to get the data in the correct format.

Do you want to see the below steps in a video? I created a YouTube video here if you want to see the steps!

Let's look at a calendar example using the Sample Superstore dataset. The below calendar is showing December Sales for 2021. Notice it is missing Sales for December 12th and December 31st. How can I get a value of zero for these dates? Well there isn't a row for these dates, so Tableau cannot fill the holes in with this information.

If I right-clicked on the Sales measure and selected Format, I could try to enter a 0 in the Special Values (eg NULL) field but nothing happens. The visualization remains the same. There are no rows with NULL values for those dates, because the dates do not exist in the data set.

How can we correct this issue? You can correct this by adding rows with those dates in the data set. This process is called Date Data Scaffolding. I will be using the method Ken Flerlage describes in his blog article here. I have to perform a Join to fill these holes.

First I extract all the Customers and the Customer Numbers from the Orders table in the Sample Superstore data set. This will provide me with a unique customer data.

Then I need to create a data set with just the Dates in it. I will start with January 1st of the first year of my data set. For this data set, that is 2018. Then I will add all the dates down to the very last day of the year for 2021. If you had a ETL tool like Alteryx, you can use the Generate Rows tool for this.

Now in Tableau, I bring in the Customer data source first. I drag the sheet from this data source containing the customer names to where it says drag tables here. Then I double-click on the sheet to see its physical table.

Tableau will put in a grey outline and add a blue line on the left of this sheet.

Next I needed to create a Cross Database Join and add the spreadsheet with just the Dates. I clicked on the Add button in blue under Connections. Then I navigated to where I had the Dates spreadsheet and added it.

Then I dragged the sheet from this data source onto the Table pane. Tableau prompted me for a join clause.

Unfortunately these two data sources do not have a primary key or column in common.

I had to create a join clause and created a Cartesian Join or Cross Join. This method is adding everything to everything. So for every customer name you will add a date from 01/01/2018 - 12/31/2021.

I clicked on the Create Join Calculation from the Data source drop-down. Then on the Join Calculation pop-up menu I entered the value of 1. Then I clicked OK.

Then under the second sheet, I had to do the same steps.

So every customer has a row with 01/01/2018 and 01/02/2018 and so on for all of the dates.

The first data source (Customer table) is outlined in blue and the second data source (Dates table) is in Orange.

Now we need to add the Orders table to this as a third data set. I clicked on Add under Connections and brought in the Sample - Superstore dataset. I dragged the Orders table to the Tables pane. I selected Left join since it automatically selected inner. It will join on Customer ID since that is the column in common for both datasets. I added a Secondary Join by joining Date to Order Date in the second row.

*Be careful with Cross Database Joins. Refer to this Tableau Help article on how to speed up processing for this here.

After this process you will have rows for the missing data. The Left Join selection enabled any customers that have sales on dates will appear as shown below;

Now I rebuilt that calendar example using the Date column and not the Order Date column. As you can see below in this calendar example, I now have days for the missing dates.

And if I right-click Format on the Sales measure, I can add zero in the Special Values (eg. NULL) field. It will clean up the visualization and add zero on those dates.

This is just one use case for joining data. There are many others!

203 views0 comments


bottom of page