Architecture¶
The main concepts you need to be aware of when digging into dfTimewolf’s codebase are:
- Modules
- Recipes
- The
state
object
Modules are individual Python objects that will interact with specific
platforms depending on attributes passed through the command line or
AttributeContainer
objects created by a previous module’s execution.
Recipes are instructions that define how modules are chained, essentially
defining which Module’s output becomes another Module’s input. Input and output
are all stored in a State object that is attached to each module.
Modules¶
Modules all extend the BaseModule
class,
and implement the SetUp
, and Process
functions.
SetUp
is what is called with the recipe’s modified arguments. Actions here
should include things that have low overhead and can be accomplished with no big
delay, like checking for API permissions, verifying that a file exists, etc. The
idea here is to detect working conditions and “fail early” if the module can’t
run correctly.
Process
is where all the magic happens - here is where you’ll want to
parallelize things as much as possible (copying a disk, running plaso, etc.).
You’ll be reading from containers pushed by previous modules (e.g. processed
plaso files) and adding your own for future modules to process. Accessing
containers is done through the GetContainers
and StoreContainer
functions of
the state
object.
Logging¶
Modules can log messages to make the execution flow clearer for the user. This
is done through the module’s logger
attribute: self.logger.info('message')
.
This uses the standard python logging
module so can use functions like info
,
warning
, debug
.
Error reporting¶
Modules can also report errors using their ModuleError
function. Errors added
this way will be reported at the end of the run. Semantically, they mean that
the recipe flow didn’t go as expected and should be examined.
ModuleError
also takes a critical
parameter, that will raise an exception
and interrupt the flow of the recipe. This should be used for errors that
dftimewolf can’t recover from (e.g. if a binary run by one of the modules can’t
be found on disk).
Recipes¶
Recipes are JSON files that describe how Modules are chained, and which parameters can be ingested from the command-line. A recipe JSON object follows a specific format:
name
: This is the name with which the recipe will be invoked (e.g.local_plaso
).description
: This is a longer description of what the recipe does. It will show up in the help message when invokingdftimewolf recipe_hame -h
.short_description
: This is what will show up in the help message when invokingdftimewolf -h
.modules
: An array of JSON objects describing modules and their corresponding arguments.wants
: What other modules this module should wait for before calling itsProcess
function.name
: The name of the module class that will be instantiated.args
: A list of (argument_name, argument) tuples that will be passed on to the module’sSetUp()
function. Ifargument
starts with an@
, it will be replaced with its corresponding value from the command-line or the~/.dftimewolfrc
file.
args
: Recipes need to describe the way arguments are handled in a globalargs
variable. This variable is a list of(switch, help_message, default_value)
tuples that will be passed to theargparse.add_argument
function for later parsing.
State and AttributeContainers¶
The State object is an instance of the DFTimewolfState class. It has a couple of useful functions and attributes:
StoreContainer
: Store your containers to make them available to future modules.GetContainers
: Retrieve the containers stored usingStoreContainer
. It takes acontainer_class
param where you can select which containers you’re interested in.StreamContainer
: This will push a container on the streaming queue, and any registered streaming callbacks will be called on the container. Containers stored this way are not persistent (e.g. can’t be accessed withGetContainers
later on).RegisterStreamingCallback
: Use this to register a function that will be called on the container as it is streamed in real-time.
Life of a dfTimewolf run¶
The dfTimewolf cycle is as follows:
- The recipe JSON is parsed, all requested modules are instantiated, as well as
the semaphores that will schedule the execution of the Module’s
Process
functions. - Command-line arguments are taken into account and passed to Module’s
SetUp
function. This occurs in parallel for all modules, regardless of the semaphores they declared in the recipe. - The modules with no blocking semaphores start running their
Process
function. At the end of their run, they free their semaphore, signalling other modules that they can proceed with their ownProcess
function. - This cycle repeats until all modules have called their
Process
function.