Sample Tool vs Random Sample Tool

Do you need to pull a sample of data from a large dataset to validate formulas? Here are two tools in Alteryx that can pull from your dataset randomly.


Sample Tool



There are six options to pick from for the Sample Tool with N being the number you enter in the field.

  • First N rows - If you pick this option, you specify the first X number of rows. If you specify 10, it will return the first 10 rows of your dataset.

  • Last N rows - This option is the opposite of the above selection. It will return the last number of rows. If you specify 20, it will return the last 20 rows of your dataset.

  • Skip 1st N rows - This option will help you skip a certain number of records and start on the row after those records. If you enter 5 in this tool, it will return data on the 6th row. Some users use this if they want to skip extra header rows etc.

  • 1 of every N rows - If you enter 10 in this field, it will return the first row, the 11th row, the 21st row and so on.

  • 1 in N chance to include each row - With this option, rows are randomly chosen at a rate of 1 in 10. They change every time the workflow is run. You may receive 9 or 11 rows etc if you enter 10 in this field with a data set of 90 rows.

  • First N% of rows - if you enter 10 in this field, it will return the first 10% from your dataset. This of course will vary depending on the size of your dataset.

You can also select a group by option which will make the data return a little differently than specified above. An example would be if I select First N rows and enter 10 for City selected as my Group by column, I will return 10 rows for each city. if there are less than 10 rows for the city, it will return all of those rows.



For more information about this tool, you can refer to this link in the Alteryx Help website.


Random Sample Tool



Please note: Every time you run this tool, different records will return.


For Random Sample, there are three options to choose from with N being the number you enter.

  • Random N Records - You can specify the number of records Alteryx returns in total. If you type 100 records, Alteryx will return 100 random records from your dataset.

  • Random N% of Records - Here you can specify the percentage of records that are returned from the dataset. If you specify 10%, Alteryx will randomly pick 10% of your records from your dataset.

  • Deterministic Output - If you specify a number in the Random Seed field, Alteryx will return the same set of records after every run.


For more information about this tool, you can refer to this link in the Alteryx Help website.


You can open example workflows for both of these tools in your Alteryx Application by clicking on the Tool in your Tool Palette and then clicking on the Open Example hyperlink. Click Run on these sample workflows to explore the data.



20 views0 comments