Tagging Tutorial
Hybrik allows you to create up to 250 Computing Groups. Each Computing Group has a specific configuration that determines the AWS region, machine instance type, minimum and maximum numbers of machines, etc. that control the launching of machines in your AWS account. When a job is submitted, Hybrik will review the settings of the enabled Computing Groups to determine what type of machine to launch. Computing Groups can have different priorities, and Hybrik will launch machines in the Group that has a higher priority first. But how do you route a particular job or task over to a specific Computing Group? This is done with tagging. Simply put, if a job or task includes a particular tag, then Hybrik will look for and launch a machine from a Computing Group with the same tag. Tags can be any user-defined string. For example, if a job is given a tag of EUROPEAN_PRIORITY
, then every task of that job will run on a machine from the Computing Group that has the tag EUROPEAN_PRIORITY
.
If you have a simple operation that only runs one type of machine in a single AWS region, you may never use tagging. But tagging is a convenient way to manage multiple Computing Groups, allowing you to effectively route jobs depending on their region, priority, customer, processigng demands, etc. Since Computing Groups can have different AWS billing prefixes, some Hybrik customers use tagging to help provide billing information for different users.
Computing Group Tags
You associate a tag with a Computing Group in the Computing Group Editor dialog. When you create or edit a Computing Group, this dialog allows you to configure the settings associated with a Computing Group, such as AWS region, machine type, number of machines, etc. Near the bottom of this dialog you will see the Task Tags section.
There are two types of tags that you can use here – Mandatory Tags and Provided Tags. In the example below, we have put the tag uhd_transcode
in the Mandatory Tag section. This means that this Computing Group will only ever execute tasks that have the tag uhd_transcode
. You can add multiple tags to a computing group by separating them with a comma. If, for example, we put the following - uhd_transcode
, URGENT
- into the dialog, then this Computing Group would only execute tasks that include BOTH of these tags. The Provided Tags allow you to specify additional tags that the Machine Group can accept. The Machine Group will execute tasks that have all the required Mandatory Tags AND zero or more of the Provided Tags. So, if we put uhd_transcode
into the Mandatory Tags and put URGENT
and PRIORITY
into the Provided Tags, then the Machine Group would execute tasks that were either [uhd_transcode
], [uhd_transcode
, URGENT
], [uhd_transcode
, PRIORITY
], or [uhd_transcode
, PRIORITY
, URGENT
].
Job and Task Tags
Job Tags
It is important to remember that Hybrik jobs are composed of one or more tasks, and tasks get assigned to machines. So a job with ten tasks may actually have those tasks executed on ten different machines. You can assign tags to either a job or to a specific task. If you assign a tag to a job, then every task in that job is given the same tag and will be assigned to a machine from a Computing Group that has a matching tag. You can also specify a tag for a specific task. Every task in a job can even be given a different tag. There is no requirement that every task be given a tag - it is perfectly acceptable to give one task a tag and leave the others untagged in which case the default Computing Group will be used.
A job tag looks like this:
{
"name": "Hybrik Tagging Example",
"task_tags": [
"uhd_transcode"
],
"payload": {
"elements": [
{
"uid": "source_file",
"kind": "source",
This would assign the uhd_transcode
tag to every task in this job, and therefore every task would execute on Computing Groups with the matching tag. This parameter is called the task_tags
, and is an array because you can assign multiple tags to the tasks. There is another parameter called user_tags
, which allows you to create any number of user-specified tags that can used for your internal tracking purposes. Hybrik will not take any action based on user_tags
.
Read more on user_tags
in our API Page under ‘create-job’. If you need to insert additional internal tracking data into a job, see our User Data Tutorial.
Task Tags
Suppose we only wanted to route the transcode task to this Computing Group while letting all the other tasks run on any available machine. We could tag just the transcode task like this:
{
"uid": "transcode_task",
"kind": "transcode",
"task": {
"retry_method": "fail",
"tags": [
"uhd_transcode"
]
}
...
}
Combining Job and Task Tags
If you include tags in both the job AND the task sections of your JSON, then you will create tasks that require both sets of tags as the two lists are concatenated together. For instance, if you assigned a tag of CUSTOMER_A
at the job level, and a tag of PRIORITY
in a transcode tasks, then all tasks in the job would require a Computing Group with the tag CUSTOMER_A
and additionally, the transcode task would need a Computing Group with both CUSTOMER_A
and PRIORITY
in some combination of Mandatory and Provided tags.
Summarizing Tag behavior
- From the task perspective:
- I will only run on a Computing Group that has all of my tags (from any combination of Mandatory and Provided tags).
- From the Computing Group perspective:
- I will only execute Tasks that have all of my Mandatory tags and zero or more of my Provided tags.