package documentation

Package related to data tainting.

Data tainting allows to follow the flow of some data throughout the execution trace. This package exposes the taint algorithm available in REVEN v2 through an (experimental and simplified) API.

Starting with the Taint API

The entry point of the API is the Tainter.simple_taint method, that allows to return a Taint object parameterized by the data you wish to taint and the range of the taint.

>>> # Spawning a taint with a tainter.simple_taint
>>> tainter = reven2.preview.taint.Tainter(trace)
>>> taint = tainter.simple_taint(tag0="[ds:0xffff88007cb95800 ; 27]", from_context=trace.context_before(0),
...                              to_context=trace.context_after(2968406), is_forward=False)
>>> taint
Taint Object (from_context=Context before #0, to_context=Context before #2968407, direction=Backward,
granularity=Instruction)

Once you have a Taint object, you can query the results produced by the taint using a variety of TaintResultView objects.

>>> # Getting changes and states from a taint
>>> print(next(taint.accesses(changes_only=True).all()))
Taint access at #2960185 rep movsb byte ptr [rdi], byte ptr [rsi]
Tainted memories:
[phy:0x77bcccb8; 3] : gained[tag0,]
[phy:0x7cb95818; 3] : lost[tag0,]
>>> print(taint.state_at(trace.context_before(0)))
Taint state at Context before #0
Tainted memories:
[phy:0x779fe80e; 19] : [tag0,]
[phy:0x779fe828; 1] : [tag0,]
[phy:0x779fe83b; 7] : [tag0,]

The taint produces three types of results:

  • TaintState: the state of the taint at some Context, that indicates which data is currently tainted.
  • TaintAccess: An access to the tainted data at some Transition, if there is a change to the state of taint it indicates which data is newly tainted and which data just lost the taint.
  • TaintWarning: a list of warning messages associated to a range of Transitions, that indicate that something may be incorrect in the taint propagation. Users are encouraged to check manually the correctness of these range of Transitions, especially if the range contains one or more TaintAccess, in which case it is likely to have had an effect on the taint.

Please refer to the documentation of Taint.accesses, Taint.states, Taint.warnings and TaintResultView for more information on querying results.

What does it mean for some data to be tainted?

When data is tainted, it means that it is marked with a taint marker. During the (forward) taint propagation, if some piece of data that has a taint marker influences another, it also transmits the taint marker to this other piece of data (in backward propagation, the relationship is "depends on another").

The taint algorithm in REVENv2 can operate with multiple taint markers. This means each piece of data can be marked with zero, one or several taint markers. A marker is defined by its name, and by an integer handle (for efficiency reasons when manipulating multiple markers).

In the simplified API, only two predefined taint markers, "tag0" and "tag1" are available in taints, but we plan to allow taints with an arbitrary number of taint markers in the future.

Specificities of the Taint API

Compared to the other objects available in the API, the Taint object is special in several ways:

  • The quantity of results it produces is mutable, that is, the quantity of produced results may change over time.
  • The taint is a server-side background process, and as of today, this comes with a few limitations.

The two next sections go in details over these points.

Mutable Results

While the taint can be very fast (may traverse billions of instructions in a few seconds), sometimes the taint process can last for a long time (especially when there are a lot of changes to the taint state). It would be possible to wait for the taint to finish before accessing the results, but meanwhile, your script would not be able to advance. The current taint makes its results available as soon as they are computed. This means that is you query the results on a taint that is not finished, you will not get all the results of the taint, only those that are computed at the time of the query.

The API exposes various handles to address this problem:

  • The Progress class gives you the current status of the taint (whether it is still running or not), and the last taint state that was computed.
    >>> # Fetching the progress of a Taint
    >>> taint.progress()
    TaintProgress(last_known_state_id=0, status=Finished)
    
  • When querying results, the API returns subclasses of the TaintResultView class. This builds "views" of the queried results, where results can be fetched by polling repeatedly the available results with the TaintResultView.available method, or the user can wait for all results to be produced with the TaintResultView.all method.
    >>> # Fetching results
    >>> accesses = taint.accesses()
    >>> for access in accesses.available():
    ...     print(access.transition.id)
    21
    27
    52
    >>> for access in accesses.available():
    ...     print(access.transition.id)
    87
    101
    >>> for access in taint.accesses().all():
    ...     print(access.transition.id)
    21
    27
    52
    87
    101
    

Server-side background process

As of today, the main limitation to the taint being a server-side background process is that there can only be one taint processed at the same time per server. Starting a second taint, be it in Axion of through the API by using Tainter.simple_taint, immediately cancels and discards the results of the first taint.

NOTE: Currently, starting a new taint can confuse Axion and Python clients that were in the middle of requesting results from a previous taint, resulting in the clients mixing results from the previous and the new taint. To avoid this issue, always make sure that no client is in the middle of requesting results from a taint before starting another one.

Module helpers Undocumented

From the __init__.py module:

Class ChangeMarkers Models gained and lost markers for a TaintAccess entry.
Class MarkerIterator Models an iterator over taint markers.
Class MarkerManager Manages the taint markers defined in a taint.
Class Progress Represent the status of the taint at the point where it was requested.
Class ProgressStatus No summary
Class PropagationDirection Enum describing the taint direction - Forward: Forward taint - Backward: Backward taint
Class ResultGranularity No summary
Class Taint Represent a started taint in the backend, allowing to retrieve the taint results.
Class TaintAccess Containing information about the accessed data.
Class TaintAccessView A view of the TaintAccesss that occurred in a taint.
Class TaintResultStatus Enum describing the various statuses of a TaintResultView.
Class TaintResultView The abstract class from which any result view is derived.
Class TaintState A taint state lists all data that is currently tainted at a given Context.
Class TaintStateView A view of the TaintStates that occurred in a taint.
Class TaintWarning A taint warning is a collection of warning messages that occurred during the taint.
Class TaintWarningView A view of the TaintWarnings that occurred in a taint.
Class TaintedData Undocumented
Class TaintedMemories Models a range of memory
Class TaintedRegisterSlice Models a slice of an architecture register, e.g. rax[0..4]
Class Tainter Entry point object for tainting data.
Class _TaintData Undocumented
Variable _from_ll_propagation_direction Undocumented
Variable _from_ll_result_granularity Undocumented
Variable _progress_status_to_str Undocumented
Variable _progress_to_status Undocumented
Variable _taint_result_status_to_str Undocumented
Variable _to_ll_propagation_direction Undocumented
Variable _to_ll_result_granularity Undocumented
_progress_to_status =

Undocumented

_progress_status_to_str =

Undocumented

_from_ll_propagation_direction =

Undocumented

_to_ll_propagation_direction =

Undocumented

_from_ll_result_granularity =

Undocumented

_to_ll_result_granularity =

Undocumented

_taint_result_status_to_str =

Undocumented

API Documentation for reven2, generated by pydoctor 21.2.2 at 2021-10-01 07:18:12.