Data Blending - Is it Good or Bad?

Suppose you have two data sources that you can't pull in with a join or a union. How do you bring in these two data sources?


Tableau has an ability to blend data. Meaning, if you have one primary data source selected and you have another on the server, you can bring data from both sources into one worksheet. Tableau will then select a primary key to blend the data together.


Limitations

Data blending is the equivalent of a left outer join, sort of... You will receive an asterisk(*) for any data that joins on multiple values with the primary key or field you are using to blend the two data sets. See the example below from the Help Tableau site here;


See how Branch has an asterisk? It's because there are two Branches listed for District 2. It aggregated (or summed the data together) for both branches. Any data brought in from a secondary data source will automatically be aggregated.



The final data appears as below;



Notice there are null values for User ID 4. In a Left Join, any rows that are not in the left data table will not be brought in. The data for User ID 4 would be omitted if this was a Left Join.


The other downside to blending is that they cannot be published as a unit. Each data source will be published separately. This can add space on the server.


If the two data sources have multiple columns, they will be added together. Notice how the final data set brought in one column Type. If there were other columns, they would bring those in as well. If you have too many blends, (such as blending 3 or more data sources) you will have too many columns. Tableau does not like a lot of columns in your data source. It prefers rows. You may also notice the size of your workbook will go up when you blend data.

You will also need to aggregate any data from the secondary data source in any calculated fields you create.


How do you Blend Data?

In the example below, I have two data sources from the Sample - Superstore Data set. I brought in one data set from the Orders sheet and I brought in the Returns sheet separately.



The primary data source will be identified automatically from any metrics you put on the worksheet first. So if you drag Category from the Orders data source, it will automatically mark Orders as the primary data source. This will be identified with a blue checkmark. The secondary data source will have an orange checkmark on it.


In the example below, from the Orders data source, I put Category, Order ID on the Rows shelf.

I then selected the Returns data source and placed Returned onto the Rows shelf. I then placed the Returns(Count) onto the Text Marks card.

As you can see, I have null values for the Order ID's that were not returned.


Tableau automatically blended the two data sources on the Order ID field. I can see this by the linked indicator next to the field if I click on the Returns data source.


I can also adjust this by going to the Data menu and selecting Edit Blend Relationships, if I needed to link on other fields. I would click the Add button to manually add those other relationships.



Keep the limitations in mind when you blend data sources. Refer to the Help Tableau link for more information on Data Blending.


121 views0 comments