Watch Folder Tutorial

Watch folders provide a simple way to trigger Hybrik workflows when a piece of content is uploaded to a cloud storage location. The idea behind watch folders is very simple – a watch folder is a directory residing in your cloud storage that is regularly polled for new content. When new content is detected in the watch folder, a new job is triggered for each item. The job can be a simple transcode or any other series of tasks.

A watch folder job JSON is similar to a regular Hybrik job, except that the source file is not specified – instead, only the cloud storage location to poll is specified. Note that watch folders are not compatible with complex assets or workflows that require supplying multiple source files, such as sidecar media or metadata.

Summary

Watch folder tasks are created with JSON that specifies the folder to watch, and then a transcode task or any other tasks to perform on the files arriving in the watch folder. A watch folder can reside on any cloud storage location that Hybrik can normally access for file retrieval. In this tutorial, we’ll create a watch folder in an S3 bucket.

You can create as many watch folder jobs as needed for different workflows.

watch_folder tasks run continuously until you disable or delete them (this includes the compute instance that the task uses). A watch folder task is simply a task that is watching for new content; the actual job that’s triggered by the watchfolder generally takes place on another machine group. As such, the watch folder task does not need as powerful machine to run on compared to a transcode task since it simply checks the contents of your watch folder. We would recommend a t2.micro instance, which is capable of managing several watch folder tasks. A t2.micro costs approximately $0.08 per 24 hours on the spot market.

Once you create a watch_folder task, you’ll monitor it in the Watch Folders tab in the Hybrik Web Console. This is where you can Delete, Edit, Clone, reset priority, and enable/disable the watch folder.

Initial Configuration

Log into your Hybrik account.
Navigate to your Machines/Configuration section.
At this point you can either use an existing machine group or create a new one. For the purposes of this tutorial, we’ll create a New Computing Group.
Click the New Computing Group option.
In the window that pops up, configure a new group as per the instructions found in our Computing Group tutorial; we recommend choosing t2.micro as the instance type as a starting point for watch folder jobs.
In the Mandatory Tags field, in the Task Tags section, enter WATCH_FOLDER; you can set this to any value as long as your watch folder job is tagged to match
Click Save.
Back in the main Machine Configuration page, find the small gear icon towards the top right of the page and click it.
In the Columns Configuration window, make sure Mandatory Tags is checked; optionally also check Provided Tags. This will let you see the tags used in the Machine Configuration view.
Click Save.

Watch Folder Task JSON

Here’s an example Hybrik job JSON that defines a watch folder job. The full example is linked at the bottom of the page.

{ 
  "definitions": {
    "watchfolder": "s3://hybrik/watch_folder/",
    "destination": "s3://hybrik/watch_folder_output/"
  },
  "name": "Watchfolder Example",
  "payload": {
    "elements": [
      {
        "uid": "watchfolder_source",
        "kind": "watchfolder",
        "task": {
          "retry_method": "retry",
          "retry": {
            "count": -1,
            "delay_sec": 2
          },
          "tags": [
            "WATCH_FOLDER"
          ]
        },
        "payload": {
          "source": {
            "storage_provider": "s3",
            "path": "{{watchfolder}}"
          },
          "settings": {
            "key": "watch_folder_1",
            "interval_sec": 120,
            "pattern_matching": "wildcard",
            "wildcard": "*",
            "recursive": true,
            "process_existing_files": false
          }
        }
      }, 

It is important to set the retry count as shown above to -1. This will infinitely retry if spot machines are taken away. If you don’t set this, your watch folder task will be disabled after it retries the default (3) number of times.

Our API Docs explain all of the parameters for Watch Folders. Here are brief explanations of some of the options.

tags
- Insert a tag here that matches your watch folder computing group (generally a separate group with a small instance type)
key
- A unique key to identify this watchfolder for tracking processed source files. If you have multiple watchfolders, using the different key in all of them makes sure that each watchfolder will process the same files
interval_sec
- the interval, in seconds, that the watch folder is checked for new files. Can be from 1 to 3600 seconds.
pattern_matching
- you can use wildcard or regex expressions to define which files will trigger the watch folder.
wildcard
- Defines the search expression. The default is *
regex
- A regular expression may be used to match only certain file names for processing.
recursive
- true or false
- True searches through any subfolders for any new files, false limits the search to the listed folder.
process_existing_files
- true or false
- Determines whether Hybrik processes any files already existing in the folder when the watch folder job is initially created.

Tagging and Watch folders

Our example uses the inexpensive t2.micro instance size for the watch_folder task because it simply spins up and checks the storage bucket for new items; it does not need to be a large instance. Your watch_folder task will then likely connect to a transcode task or other processing tasks that will require more resources. You can either tag those tasks separately for a specific computing group, or not include a task tag to use your default computing group. You will NOT want to use tags specified at the job level for watch_folder tasks, as job level tasks and task level tasks are combined. This will likely lead to the job requiring a computing group that does not exist. For more information on job and task tagging, read more in our Tagging Tutorial.

Starting the Watch Folder Process:

Navigate to the Watch folder tab in your Hybrik account.
Click on the Submit Job JSON button and navigate to your watch folder job JSON. Click Open.
There will be a short delay between the watch folder being queued to being active. You can check on the status of machine activity by navigating to the Machines -> Activity Tab
Once the watch folder is active, any file that is placed in the watch folder’s source location will be processed according to the task parameters contained in the watch folder job JSON.
To monitor the progress of jobs triggered by the watch folder, navigate to the Jobs -> Active Jobs tab.
If you want to stop triggering jobs from the watch folder, you can disable it by navigating back to the Watch Folder tab, selecting the watch folder process you wish to disable, and click on the Disable button.

Example Jobs

Watch folder to mp4 transcode example