Watch Folder Tutorial
Watch folders provide a simple way to trigger Hybrik workflows when a piece of content is uploaded to a cloud storage location. The idea behind watch folders is very simple – a watch folder is a directory residing in your cloud storage that is regularly polled for new content. When new content is detected in the watch folder, a new job is triggered for each item. The job can be a simple transcode or any other series of tasks.
A watch folder job JSON is similar to a regular Hybrik job, except that the source file is not specified – instead, only the cloud storage location to poll is specified. Note that watch folders are not compatible with complex assets or workflows that require supplying multiple source files, such as sidecar media or metadata.
Summary
Watch folder tasks are created with JSON that specifies the folder to watch, and then a transcode
task or any other tasks to perform on the files arriving in the watch folder. A watch folder can reside on any cloud storage location that Hybrik can normally access for file retrieval. In this tutorial, we’ll create a watch folder in an S3 bucket.
You can create as many watch folder jobs as needed for different workflows.
watch_folder
tasks run continuously until you disable or delete them (this includes the compute instance that the task uses). A watch folder task is simply a task that is watching for new content; the actual job that’s triggered by the watchfolder generally takes place on another machine group. As such, the watch folder task does not need as powerful machine to run on compared to a transcode task since it simply checks the contents of your watch folder. We would recommend a t2.micro
instance, which is capable of managing several watch folder tasks. A t2.micro
costs approximately $0.08
per 24 hours on the spot market.
Once you create a watch_folder
task, you’ll monitor it in the Watch Folders tab in the Hybrik Web Console. This is where you can Delete
, Edit
, Clone
, reset priority, and enable/disable the watch folder.
Initial Configuration
- Log into your Hybrik account.
- Navigate to your Machines/Configuration section.
- At this point you can either use an existing machine group or create a new one. For the purposes of this tutorial, we’ll create a New Computing Group.
- Click the New Computing Group option.
- In the window that pops up, configure a new group as per the instructions found in our Computing Group tutorial; we recommend choosing
t2.micro
as the instance type as a starting point for watch folder jobs. - In the Mandatory Tags field, in the Task Tags section, enter
WATCH_FOLDER
; you can set this to any value as long as your watch folder job is tagged to match - Click Save.
- Back in the main Machine Configuration page, find the small gear icon towards the top right of the page and click it.
- In the Columns Configuration window, make sure Mandatory Tags is checked; optionally also check Provided Tags. This will let you see the tags used in the Machine Configuration view.
- Click Save.
Watch Folder Task JSON
Here’s an example Hybrik job JSON that defines a watch folder job. The full example is linked at the bottom of the page.
{
"definitions": {
"watchfolder": "s3://hybrik/watch_folder/",
"destination": "s3://hybrik/watch_folder_output/"
},
"name": "Watchfolder Example",
"payload": {
"elements": [
{
"uid": "watchfolder_source",
"kind": "watchfolder",
"task": {
"retry_method": "retry",
"retry": {
"count": -1,
"delay_sec": 2
},
"tags": [
"WATCH_FOLDER"
]
},
"payload": {
"source": {
"storage_provider": "s3",
"path": "{{watchfolder}}"
},
"settings": {
"key": "watch_folder_1",
"interval_sec": 120,
"pattern_matching": "wildcard",
"wildcard": "*",
"recursive": true,
"process_existing_files": false
}
}
},
It is important to set the retry count as shown above to -1
. This will infinitely retry if spot machines are taken away. If you don’t set this, your watch folder task will be disabled after it retries the default (3) number of times.
Our API Docs explain all of the parameters for Watch Folders. Here are brief explanations of some of the options.
tags
- Insert a tag here that matches your watch folder computing group (generally a separate group with a small instance type)
key
- A unique key to identify this watchfolder for tracking processed source files. If you have multiple watchfolders, using the different
key
in all of them makes sure that each watchfolder will process the same files
- A unique key to identify this watchfolder for tracking processed source files. If you have multiple watchfolders, using the different
interval_sec
- the interval, in seconds, that the watch folder is checked for new files. Can be from
1
to3600
seconds.
- the interval, in seconds, that the watch folder is checked for new files. Can be from
pattern_matching
- you can use
wildcard
orregex
expressions to define which files will trigger the watch folder.
- you can use
wildcard
- Defines the search expression. The default is
*
- Defines the search expression. The default is
regex
- A regular expression may be used to match only certain file names for processing.
recursive
true
orfalse
- True searches through any subfolders for any new files, false limits the search to the listed folder.
process_existing_files
true
orfalse
- Determines whether Hybrik processes any files already existing in the folder when the watch folder job is initially created.
Tagging and Watch folders
Our example uses the inexpensive t2.micro
instance size for the watch_folder
task because it simply spins up and checks the storage bucket for new items; it does not need to be a large instance. Your watch_folder
task will then likely connect to a transcode
task or other processing tasks that will require more resources. You can either tag those tasks separately for a specific computing group, or not include a task tag to use your default computing group. You will NOT want to use tags specified at the job level for watch_folder
tasks, as job level tasks and task level tasks are combined. This will likely lead to the job requiring a computing group that does not exist. For more information on job and task tagging, read more in our Tagging Tutorial.
Starting the Watch Folder Process:
- Navigate to the Watch folder tab in your Hybrik account.
- Click on the Submit Job JSON button and navigate to your watch folder job JSON. Click
Open
. - There will be a short delay between the watch folder being queued to being active. You can check on the status of machine activity by navigating to the Machines -> Activity Tab
- Once the watch folder is active, any file that is placed in the watch folder’s source location will be processed according to the task parameters contained in the watch folder job JSON.
- To monitor the progress of jobs triggered by the watch folder, navigate to the Jobs -> Active Jobs tab.
- If you want to stop triggering jobs from the watch folder, you can disable it by navigating back to the
Watch Folder
tab, selecting the watch folder process you wish to disable, and click on theDisable
button.