The main concepts you need to be aware of when digging into dfTimewolf’s codebase are:
Modules are individual Python objects that will interact with specific
platforms depending on attributes passed through the command line or
AttributeContainer objects created by a previous module’s execution.
Recipes are instructions that define how modules are chained, essentially
defining which Module’s output becomes another Module’s input. Input and output
are all stored in a State object that is attached to each module.
Modules all extend the
and implement the
SetUp is what is called with the recipe’s modified arguments. Actions here
should include things that have low overhead and can be accomplished with no big
delay, like checking for API permissions, verifying that a file exists, etc. The
idea here is to detect working conditions and “fail early” if the module can’t
Process is where all the magic happens - here is where you’ll want to
parallelize things as much as possible (copying a disk, running plaso, etc.).
You’ll be reading from containers pushed by previous modules (e.g. processed
plaso files) and adding your own for future modules to process. Accessing
containers is done through the
StoreContainer functions of
Modules can log messages to make the execution flow clearer for the user. This
is done through the module’s
This uses the standard python
logging module so can use functions like
Modules can also report errors using their
ModuleError function. Errors added
this way will be reported at the end of the run. Semantically, they mean that
the recipe flow didn’t go as expected and should be examined.
ModuleError also takes a
critical parameter, that will raise an exception
and interrupt the flow of the recipe. This should be used for errors that
dftimewolf can’t recover from (e.g. if a binary run by one of the modules can’t
be found on disk).
Recipes are JSON files that describe how Modules are chained, and which parameters can be ingested from the command-line. A recipe JSON object follows a specific format:
name: This is the name with which the recipe will be invoked (e.g.
description: This is a longer description of what the recipe does. It will show up in the help message when invoking
dftimewolf recipe_hame -h.
short_description: This is what will show up in the help message when invoking
modules: An array of JSON objects describing modules and their corresponding arguments.
wants: What other modules this module should wait for before calling its
name: The name of the module class that will be instantiated.
args: A list of (argument_name, argument) tuples that will be passed on to the module’s
argumentstarts with an
@, it will be replaced with its corresponding value from the command-line or the
args: Recipes need to describe the way arguments are handled in a global
argsvariable. This variable is a list of
(switch, help_message, default_value)tuples that will be passed to the
argparse.add_argumentfunction for later parsing.
State and AttributeContainers¶
The State object is an instance of the DFTimewolfState class. It has a couple of useful functions and attributes:
StoreContainer: Store your containers to make them available to future modules.
GetContainers: Retrieve the containers stored using
StoreContainer. It takes a
container_classparam where you can select which containers you’re interested in.
StreamContainer: This will push a container on the streaming queue, and any registered streaming callbacks will be called on the container. Containers stored this way are not persistent (e.g. can’t be accessed with
RegisterStreamingCallback: Use this to register a function that will be called on the container as it is streamed in real-time.
Life of a dfTimewolf run¶
The dfTimewolf cycle is as follows:
- The recipe JSON is parsed, all requested modules are instantiated, as well as
the semaphores that will schedule the execution of the Module’s
- Command-line arguments are taken into account and passed to Module’s
SetUpfunction. This occurs in parallel for all modules, regardless of the semaphores they declared in the recipe.
- The modules with no blocking semaphores start running their
Processfunction. At the end of their run, they free their semaphore, signalling other modules that they can proceed with their own
- This cycle repeats until all modules have called their