SMB Documentation

The SMB Work Flow of pcapFS

With this wiki entry, we want to document how pcapFS handles SMB traffic internally. We give a detailed explanation of the whole SMB "work flow" of pcapFS and highlight assumptions made on the way as well as SMB scenarios where pcapFS has its weaknesses.

Overview

The functionality of pcapFS regarding SMB currently includes creation of SMB control files (one SMB control file per underlying TCP connection) which contain information about all transferred SMB messages. SMB control files are obtainable by use of the option --show-metadata. PcapFS also creates so-called SMB server files. These are the server-side files which are accessed during a captured SMB connection. These files are either accessed directly by SMB2_CREATE messages or we know from the context that the respective file(s) exists, e.g. through conducted search queries (SMB2_QUERY_DIRECTORY and SMB2_QUERY_INFO messages). Knowing these files, pcapFS reconstructs, as far as possible, the known parts of the server-side directory hierarchy. For now, only file metadata (file name, timestamps, file size) is set for SMB server files.

Details about how everything works is given in the further course. Let's start with a quick overview over the SMB-related source code files with their respective responsibilities:

In smb.cpp/smb.h, everything regarding SMB starts, it is also responsible for creating SMB control files which protocol all transferred SMB messages.
smb_packet.cpp/smb_packet.h are responsible for parsing SMB packets. One SMB packet consists of one SMB header and one SMB message body.
smb_messages.h parses the SMB message body of an SMB packet w.r.t. the message type.
smb_structs.h and smb_constants.h define structs, enums, response codes etc. needed for parsing and memorizing relevant SMB-related information.
smb_utils.cpp/smb_utils.h contain functions frequently needed for handling SMB traffic.
serverfile.cpp/serverfile.h define a super-class for virtual files representing server-side files which are accessed via protocols like SMB, NFS, etc.
smb_serverfile.cpp/smb_serverfile.h inherit from the serverfile super-class and represent server-side files ("real" files and directories) accessed via SMB.
smb_manager.cpp/smb_manager.h is the connecting piece between the parts responsible for parsing SMB traffic and SMB server files. It memorizes and manages per SMB server endpoint all server files as well as all SMB-related mappings that are needed to be kept in mind.

SMB Parsing

Here, we give a detailed explanation of the SMB parsing process. The parsing is designed on the basis of Microsoft's documentation of SMB version 2 and 3, so you need to be somehow familiar with that.

There are multiple ways of how SMB traffic is embedded in network packets. SMB can be realized on top of raw TCP, NetBIOS over TCP, QUIC and RDMA. PcapFS currently only supports SMB over TCP and SMB over NetBIOS over TCP. When such communication is detected, smb.cpp initiates the parsing of the "SMB packets" contained in the respective TCP payload. One SMB packet consists of one SMB header and one SMB message body.

Depending on the SMB version, different SMB headers are used which can be distinguished by different magic numbers. PcapFS focuses on SMB packets containing the SMB2 Header which is the standard header for SMB version 2 and 3. When the SMB2 header is detected, the whole SMB message body is parsed in a detailed manner. The main SMB-related functionality of pcapFS - especially regarding SMB server files - is built upon information extracted from packets containing this header. For all other header types, the only information extracted from them is, if possible, the message type (which is then documented in the resp. control file). In the further course, we assume to have packets with an SMB2 header.

SMB packets can be chained together. By this, multiple SMB packets can be embedded in the payload of one TCP packet. Offsets into the virtual TCP files differ for chained and unchained packets. Therefore, we need to carefully consider chained packets and distinguish chained packets which are the last part of a chain from other chained packets. This is done by taking the chainOffset field and related operations flag of the SMB2 header into account.

For parsing the message bodies of SMB packets, pcapFS provides one dedicated class for almost each message type. These classes represent the respective messages and enable easy access of information contained in the message which is needed to provide more information about the message in the control file as well as for creating and updating SMB server files. Parsing may fail if the message or field sizes are not correct, the structureSize field does not equal the obligated documented value or the SMB message type is unknown. Then, a generic SmbMessage class is instantiated and no further information is extracted. One special message type is the Error Response which is sent by an SMB server when an error occurs during handling the client's request message. PcapFS identifies an Error Response Message by detecting a structureSize value of 9 and a non-zero status code in the SMB2 packet header.

For each SMB message body, its message length is calculated. This needs to be done for the case that the respective SMB packets are chained in order to calculate the correct offsets. When they are not chained, the size field from NBSS header can be taken.

When detecting an SMB2_CREATE, SMB2_QUERY_DIRECTORY or SMB2_QUERY_INFO message, the SMB manager is invoked for updating the state of SMB server files according to the message. This is explained in the corresponding section below.

But before that, we need to look closer at relevant information - especially mappings - that is needed to be kept in mind. Some information is required to be managed per SMB connection and other stuff globally for all SMB connections to the same SMB server.

What is managed per SMB connection?

While parsing all SMB packets (including the contained SMB messages), pcapFS manages one smbContext struct per SMB connection. smbContext holds information which needs to be memorized along one SMB connection. This includes amongst other things:

a reference to the underlying virtual TCP file
information which needs to be remembered between request and response of SMB2_CREATE, SMB2_QUERY_DIRECTORY, SMB2_QUERY_INFO and SMB2_TREE_CONNECT messages (file name, fileId, fileInfoClass, ...)
treeId-treename mapping: Each SMB server can have multiple separate directory trees which are identified by the treeId field of the SMB2 header. So, the treeId indicates which directory tree the respective SMB message refers to. The name of a tree corresponds to the name of its root directory (This is also the root directory of the directory tree which is derived by pcapFS containing the known SMB server files which are located there). The mapping is only unique for one SMB connection. This means that in a scenario with distributed access to the same SMB server over multiple TCP connections at the same time, it is possible that among these multiple connections, the same treeId is referring to different trees. Thus, this mapping needs to be managed per connection. (Other mappings, like e.g. fileId-filename, are the same over multiple simultaneous connections to the same SMB server. So, this must be memorized globally). The treeId-treename mapping is extracted from SMB2_TREE_CONNECT request and responses. Because of that, the tree name cannot be determined for a given treeId if the corresponding SMB2_TREE_CONNECT request and response were not captured. When this is the case and the corresponding SMB message accesses a file which will be created as an SMB server file by pcapFS, we cannot determine to which tree it belongs, i.e., in which derived directory tree to put it. This results in the creation of a derived tree with the generic tree name "treeid_x" with x being the treeId number. The respective accessed file is then inserted there. In a scenario where we have accesses to the same SMB server over multiple TCP connections at the same time, this approach can lead to redundancies of the same files being saved in different derived directory trees (with different tree names according to their treeIds).

What is globally managed?

The SMB manager is - besides managing SMB server files - also responsible for managing two mappings which need to be maintained globally because they pertain all connections to the same SMB server, or to be more precise, all connections to the same tree of the same SMB server. Such a tree is identified by the ServerEndpointTree struct which consists of the SMB server's IP address and port as well as the respective tree name. Per ServerEndpointTree the following mappings are memorized by the SMB manager:

fileId-filename mapping: Each server-side file is addressed using its fileId instead of its file name. The usual procedure for the client to interact with a server-side file, is that at first, the file needs to be opened. For that the client sends an SMB2_CREATE request containing the file name. The server responds with a newly created fileId (file handle) corresponding to the file which is valid until the file is closed with the SMB2_CLOSE message. Having the fileId, the client does whatever they want to to with the file (obtain metadata information, read, write, ...). The fileId-filename mapping is memorized globally because the same fileId can be used over different connections to the same ServerEndpointTree, i.e., it is possible that in one connection, the fileId is obtained by an SMB2_CREATE response and in another simultaneous connection to the same tree, this fileId is used. For pcapFS, this behavior may lead to undetected SMB server files in the following scenario: Assume we have two SMB connections to the same SMB server at the same time and in the connection which started later than the other, the fileId is obtained by an SMB2_CREATE response. When then the other connection, which started earlier, uses this file handle, pcapFS does not know to which file name it belongs. This is because pcapFS does not parse the whole capture file chronologically but connection-wise. Thus, the SMB connection, which uses the fileId, is parsed before the SMB connection where the corresponding fileId-filename mapping can be derived from.
filename-FilePtr mapping: By this mapping, the derived server-side files (SMB server files) are managed. The FilePtr is a pointer to a virtual SMBServerFile (this can also be a directory) which later becomes a real file in the resulting server-side directory hierarchy derived by pcapFS. More to that in the subsequent section.

Management of SMB Server Files

Now that we roughly know how pcapFS handles SMB parsing and which information needs to be memorized at which abstraction layer, it remains to be explained how SMB server files are extracted and how the server-side directory hierarchies are derived. All of this is done by the SMB manager. Currently, SMB server files are created/updated via three different SMB message types, SMB2_CREATE response, SMB2_QUERY_INFO response and SMB2_QUERY_DIRECTORY response. When one of these message types is detected, the SMB manager takes over right after the message is parsed. Depending on the different file metadata contained in the message, different file properties can be set or updated. When the SMB manager encounters a message regarding a file which is priorly unknown for the respective tree, a new SMBServerFile is created and its metadata is set according to the information contained in the respective SMB message. SMBServerFile is a specialized ServerFile for SMB. The difference of a ServerFile from other virtual files is that it contains more timestamps and a pointer to its parent directory which is also a ServerFile. So, starting from an SMBServerFile, a cascade of parent directory pointers can be built up until the root directory of the corresponding tree (whose name is obtained by the treeId-treename mapping) is reached. By that, pcapFS can easily build up the respective directory hierarchy at the mount point. For each newly created SMBServerFile, its parent directories can be determined because the file name for the SMBServerFile (as it has to be derived from the fileId-filename mapping) luckily always includes its absolute path beginning with the first subdirectory of the corresponding tree (There is one exception when handling SMB2_QUERY_DIRECTORY responses, look below for further info).

Let's look closer at what pcapFS does for each of the three mentioned message types.

`SMB2_CREATE` response

First of all, the fileId-filename mapping of the file requested via the SMB2_CREATE message is updated for the tree it belongs to. When the file is not yet present as an SMB Server File in the filename-FilePtr mapping, a new SMBServerFile is created and its metadata is initialized with the file information contained in the SMB2_CREATE response (namely timestamps, filesize and the information whether it is a directory or not). In order to set the pointer to the file's parent directory, pcapFS iterates backwards through the file's absolute path and recursively creates SMB server files for all parent directories which are not yet represented by an SMBServerFile instance. If the file requested via the SMB2_CREATE message is already present as an SMB Server File in the filename-FilePtr mapping, its metadata is updated if the file's lastChangeTime contained in the SMB2_CREATE response's message body is newer.

`SMB2_QUERY_DIRECTORY` response

In contrast to the other two message types, SMB2_QUERY_DIRECTORY responses can contain information for multiple files, i.e., all files in the current directory which match the search pattern specified in the SMB2_QUERY_DIRECTORYrequest. The file info classes which are relevant for pcapFS are:

FileDirectoryInformation,
FileFullDirectoryInformation,
FileIdFullDirectoryInformation,
FileBothDirectoryInformation,
FileIdBothDirectoryInformation,
FileIdExtdDirectoryInformation.

They contain all timestamps, filename and relevant file attributes for every matching file of the requested directory. The directory is addressed using its fileId and, for creating/updating the SMB server files corresponding to the files listed in the SMB2_QUERY_DIRECTORY response, we need to know (the absolute path of) the directory name. This is no issue when the fileId-filename mapping for that directory is already known. However, pcapFS needs to tackle somehow the case that the name for the fileId is not known. This case might not occur very often because before a SMB2_QUERY_DIRECTORY request, the corresponding directory has to be accessed via SMB2_CREATE and, with every SMB2_CREATE response, the respective fileId-filename mapping for that directory gets memorized. But, it can happen that the SMB2_CREATE request and SMB2_QUERY_DIRECTORY request for the same directory are chained together. Then, the client specifies the fileId fffff...f in the SMB2_QUERY_DIRECTORY request indicating that they refer to the directory accessed via the SMB2_CREATE request right before. Then, only looking at the SMB2_QUERY_DIRECTORY messages, pcapFS does not know the name of the requested directory (since the fileId fffff...f doesn't resolve to a known file name). Thus, pcapFS takes the name specified in the last SMB2_CREATE request as directory name. Then, pcapFS is able to assemble the absolute path for every file listed in the SMB2_QUERY_DIRECTORY response and can create/update the corresponding SMBServerFile instances. If the name specified in the last SMB2_CREATE request is not available (i.e., it is empty), pcapFS puts the respective SMB server files into the root directory of the current tree. This is done because it is common for SMB to have SMB2_CREATE messages with empty file name when it is referred to the root directory. So, in this case, pcapFS would put the SMB server files in the correct directory. But this may also be wrong when, e.g., the SMB2_CREATE messages regarding the directory requested via the SMB2_QUERY_DIRECTORY messages are not captured and the directory is not the root directory or when the SMB2_CREATE messages are transmitted in a later-parsed simultaneous connection to the same tree (This corresponds to the issue mentioned above when explaining the fileId-filename mapping). By that, SMB server files could be mistakenly put into the tree's root directory.

`SMB2_QUERY_INFO` response

The detection of an SMB2_QUERY_INFO response initiates SMB server file creations/changes only if the underlying query info type is SMB2_0_INFO_FILE and the file info class is FileAllInformation, FileBasicInformation or FileNetworkOpenInformation. All other info types/classes don't contain (enough) needed file information. Like SMB2_QUERY_DIRECTORY, SMB2_QUERY_INFO messages address files via their fileIds. Hence, the fileId-filename mapping for the requested file needs to be known in advance for all file info classes except for FileAllInformation which also contains the file name. So, when the fileId-filename mapping is not known for the requested file's fileId and we have a FileAllInformation info class, pcapFS is still able to establish that mapping through the file name contained in FileAllInformation. When the SMB manager handles SMB2_QUERY_INFO responses, the same issue as for SMB2_QUERY_DIRECTORY responses can happen. This is, when the name for the fileId is not known because the SMB2_CREATE messages corresponding to that file are not captured or transferred in an other SMB connection which is parsed later and the file name corresponding to the last SMB2_CREATE of the current connection is empty, the file is assumed to be in the tree's root directory. This is correct for many cases, but not for all.

Creation of derived server-side directory hierarchies

Now that we know the absolute path for every SMB server file through the cascade of parent directory pointers, it is pretty easy to derive the resulting directory hierarchies for all SMB server endpoint trees and incorporate them into the directory layout. For that, pcapFS flips the cascade of parent directory pointers for every SMB server file and then inserts each resulting tree at the mount point's subdirectory where the underlying SMB connection satisfies the property corresponding to the directory.

Memorizing SMB server file hierarchies in the index file

PcapFS typically writes all important meta data information about each derived virtual file (especially offsets into and identifier for the underlying (virtual) file) into an index file. This has the advantage that for the next time pcapFS is to be executed with the same capture file(s), the index file can be passed to pcapFS by what the capture file(s) don't need to be parsed once more. Instead, the virtual directory hierarchy can be constructed directly by using the information saved in the index file.

In order to reconstruct SMB server files by reading out of an index file, more information needs to be saved than for other virtual files. The additional timestamps and the information whether the SMB server file is a directory can be directly put into the index file. For memorizing a reference to the parent directory, which is also an SMB server file, each parent directory has a unique representative Id. Instead of a FilePtr object which can't be really saved in the index file, the parent directory is saved as its Id. By that, the parent directory file pointer can be reconstructed correctly for each SMB server file after all virtual files are created by reading out of the index file.

Summary of SMB-related issues

When SMB server files are located in a tree and the SMB2_TREE_CONNECT request/response for that tree is not captured, the respective tree name (aka the tree's root directory name) is set to "treeId_x" where x is the treeId number. For multiple captured connections to the same tree, this can lead to redundancies of the same files being saved in different derived directory trees (with different tree names according to their treeIds).
It may also be possible that among two connections to the same SMB server, the same treeId is used for different trees. Then, when also the corresponding SMB2_TREE_CONNECT request/response for both connections are not captured, the respective SMB server files of the different trees are merged into the same derived tree directory hierarchy when both connections are in the same property directory (This is e.g., the case when the complete virtual directory built by pcapFS is sort by the property "protocol". Then both connections land in the same directory "smb". Instead, when the sortby argument distinguishes different TCP connections, e.g., through the argument "srcPort", the virtual files corresponding to the SMB connections are saved in different directories)
It can be relevant in which order fileId-filename mappings are derived. Assume we have two SMB connections to the same SMB server at the same time and in the connection which started later than the other, the fileId is obtained by an SMB2_CREATE response. When then the other connection, which started earlier, uses this file handle, pcapFS does not know to which file name it belongs. This is because pcapFS does not parse the whole capture file chronologically but connection-wise. Thus, the SMB connection, which uses the fileId, is parsed before the SMB connection where the corresponding fileId-filename mapping can be derived from. This can lead to case that the respective SMB server file is not created.
Similarly to the previous point, some metadata information saved in SMB control files may not be completely reconstructable when reading from an index file in the following scenario: Once again, consider two SMB connections to the same server at the same time. The connection, which starts earlier, obtains the fileId for a file via SMB2_CREATE messages and the other connection accesses the respective file. For the first time executing pcapFS with the respective capture file, the connection, where the fileId-filename mapping is extracted, is parsed before the connection where the file is accessed, so everything is fine. But for the case that pcapFS is not executed for the first time with the capture file and the index file is passed, it can happen that the content for the SMB control file for the second connection (where the file is accessed) is read before the content of the SMB control file for the first connection. Then, like above, the fileId-filename mapping is derived too late and in the respective SMB control file for the connection which accesses the affected file, the extracted message corresponding to the file access doesn't contain the information which file it refers to. On the other hand, the problem from above, that the corresponding SMB server file might not be created, doesn't occur here since all necessary metadata information of the respective SMB server file is saved in the index file.
When an SMB2_QUERY_DIRECTORY/SMB2_QUERY_INFO message refers to an fileId whose corresponding file name is not known and the name specified in the last SMB2_CREATE request is not available (i.e., it is empty), pcapFS puts the SMB server file(s) derived from the SMB2_QUERY_DIRECTORY/SMB2_QUERY_INFO response into the root directory of the current tree. This is done because it is common for SMB to have SMB2_CREATE messages with an empty file name field when it is referred to the root directory. In that case, pcapFS would put the SMB server files in the correct directory. But this may also be wrong when, e.g., the SMB2_CREATE messages regarding the file/directory requested via the SMB2_QUERY_DIRECTORY/SMB2_QUERY_INFO messages are not captured and the directory is not the root directory or when the SMB2_CREATE messages are transmitted in a different, later-parsed connection to the same tree.

TODOs

Parse SMB2_READ and SMB2_WRITE messages in a way so that file content can be written into the corresponding SMB server files.
Create multiple versions of SMB server files for each point in time where the file content/metadata is changed.
Consider the case that SMB server files are renamed via SMB2_SET_INFO messages.
Investigate ASYNC SMB2 packet headers and find out whether they might disturb the way how SMB packets are parsed and SMB server files are managed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SMB Documentation

The SMB Work Flow of pcapFS

Overview

SMB Parsing

What is managed per SMB connection?

What is globally managed?

Management of SMB Server Files

`SMB2_CREATE` response

`SMB2_QUERY_DIRECTORY` response

`SMB2_QUERY_INFO` response

Creation of derived server-side directory hierarchies

Memorizing SMB server file hierarchies in the index file

Summary of SMB-related issues

TODOs

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

SMB Documentation

The SMB Work Flow of pcapFS

Overview

SMB Parsing

What is managed per SMB connection?

What is globally managed?

Management of SMB Server Files

SMB2_CREATE response

SMB2_QUERY_DIRECTORY response

SMB2_QUERY_INFO response

Creation of derived server-side directory hierarchies

Memorizing SMB server file hierarchies in the index file

Summary of SMB-related issues

TODOs

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`SMB2_CREATE` response

`SMB2_QUERY_DIRECTORY` response

`SMB2_QUERY_INFO` response