Skip to content

SMB Documentation

Axel Mahr edited this page Oct 22, 2024 · 11 revisions

The SMB Work Flow of pcapFS

With this wiki entry, we want to document how pcapFS handles SMB traffic internally. We give a detailed explanation of the whole SMB "work flow" of pcapFS and highlight assumptions made on the way as well as SMB scenarios where pcapFS has its weaknesses.

Overview

The functionality of pcapFS regarding SMB currently includes creation of SMB control files (one SMB control file per underlying TCP connection) which contain information about all transferred SMB messages. SMB control files are obtainable by use of the option --show-metadata. PcapFS also creates so-called SMB files. These are the server-side files which are accessed during a captured SMB connection. These files are either accessed directly by SMB2_CREATE/SMB2_READ/SMB2_WRITE messages or we know from the context that the respective file(s) exists, e.g. through conducted search queries (SMB2_QUERY_DIRECTORY and SMB2_QUERY_INFO messages). Knowing these files, pcapFS reconstructs, as far as possible, the known parts of the server-side directory hierarchy. For files accessed via Read/Write, the created SMB files are populated with the content read/written while also considering different file versions. Files, for which only metadata (file name, timestamps, file size) is known, are created as empty SMB files and are displayed when the option --show-metadata is set.

Details about how everything works is given in the further course. Let's start with a quick overview over the SMB-related source code files with their respective responsibilities:

  • smbcontrol.cpp/smbcontrol.h are responsible for creating SMB control files which protocol all transferred SMB messages.
  • smb_packet.cpp/smb_packet.h are responsible for parsing SMB packets. One SMB packet consists of one SMB header and one SMB message body.
  • smb_messages.h parses the SMB message body of an SMB packet w.r.t. the message type.
  • smb_structs.h and smb_constants.h define structs, enums, response codes etc. needed for parsing and memorizing relevant SMB-related information.
  • smb_utils.cpp/smb_utils.h contain functions frequently needed for handling SMB traffic.
  • serverfile.cpp/serverfile.h define a super-class for virtual files representing server-side files which are accessed via protocols like SMB, NFS, etc.
  • smb.cpp/smb.h inherit from the ServerFile super-class and represent server-side files ("real" files and directories) accessed via SMB.
  • smb_manager.cpp/smb_manager.h is the connecting piece between the parts responsible for parsing SMB traffic and SMB files. It memorizes and manages per SMB server endpoint all server files as well as all SMB-related mappings that are needed to be kept in mind.

SMB Parsing

Here, we give a detailed explanation of the SMB parsing process. The parsing is designed on the basis of Microsoft's documentation of SMB version 2 and 3, so you need to be somehow familiar with that.

There are multiple ways of how SMB traffic is embedded in network packets. SMB can be realized on top of raw TCP, NetBIOS over TCP, QUIC and RDMA. PcapFS currently only supports SMB over TCP and SMB over NetBIOS over TCP. When such communication is detected, smb.cpp initiates the parsing of the "SMB packets" contained in the respective TCP payload. One SMB packet consists of one SMB header and one SMB message body.

Depending on the SMB version, different SMB headers are used which can be distinguished by different magic numbers. PcapFS focuses on SMB packets containing the SMB2 Header which is the standard header for SMB version 2 and 3. When the SMB2 header is detected, the whole SMB message body is parsed in a detailed manner. The main SMB-related functionality of pcapFS - especially regarding SMB files - is built upon information extracted from packets containing this header. For all other header types, the only information extracted from them is, if possible, the message type (which is then documented in the resp. control file). In the further course, we assume to have packets with an SMB2 header.

SMB packets can be chained together. By this, multiple SMB packets can be embedded in the payload of one TCP packet. Offsets into the virtual TCP files differ for chained and unchained packets. Therefore, we need to carefully consider chained packets and distinguish chained packets which are the last part of a chain from other chained packets. This is done by taking the chainOffset field and related operations flag of the SMB2 header into account.

For parsing the message bodies of SMB packets, pcapFS provides one dedicated class for almost each message type. These classes represent the respective messages and enable easy access of information contained in the message which is needed to provide more information about the message in the control file as well as for creating and updating SMB files. Parsing may fail if the message or field sizes are not correct, the structureSize field does not equal the obligated documented value or the SMB message type is unknown. Then, a generic SmbMessage class is instantiated and no further information is extracted. One special message type is the Error Response which is sent by an SMB server when an error occurs during handling the client's request message. PcapFS identifies an Error Response Message by detecting a structureSize value of 9 and a non-zero status code in the SMB2 packet header.

For each SMB message body, its message length is calculated. This needs to be done for the case that the respective SMB packets are chained in order to calculate the correct offsets. When they are not chained, the size field from NBSS header can be taken.

When detecting an SMB2_CREATE response,SMB2_READ response, SMB2_WRITE request, SMB2_QUERY_DIRECTORY response, SMB2_QUERY_INFO response or SMB2_SET_INFO request, the SMB manager is invoked for updating the state of SMB files according to the message. This is explained in the corresponding section below.

But before that, we need to look closer at relevant information - especially mappings - that is needed to be kept in mind. Some information is required to be managed per SMB connection and other stuff globally for all SMB connections to the same SMB server.

What is managed per SMB connection?

While parsing all SMB packets (including the contained SMB messages), pcapFS manages one smbContext struct per SMB connection. smbContext holds information which needs to be memorized along one SMB connection. This includes amongst other things:

  • a reference to the underlying virtual TCP file
  • information which needs to be remembered between request and response of SMB2_CREATE, SMB2_READ, SMB2_QUERY_DIRECTORY, SMB2_QUERY_INFO and SMB2_TREE_CONNECT messages (file name, fileId, fileInfoClass, ...)

What is globally managed?

The SMB manager is - besides managing SMB files - also responsible for managing mappings which need to be maintained globally because they pertain all connections to the same SMB server, or all connections to the same tree of the same SMB server. Such a tree is identified by the ServerEndpointTree struct which consists of the SMB server's IP address and port as well as the respective tree name. Per ServerEndpoint (tree-independent), the following mapping is memorized:

  • treeId-treename mapping: Each SMB server can have multiple separate directory trees which are identified by the treeId field of the SMB2 header. So, the treeId indicates which directory tree the respective SMB message refers to. The name of a tree corresponds to the name of its root directory (This is also the root directory of the directory tree which is derived by pcapFS containing the known SMB files which are located there). The treeId-treename mapping is extracted from SMB2_TREE_CONNECT request and responses. Because of that, the tree name cannot be determined for a given treeId if the corresponding SMB2_TREE_CONNECT request and response were not captured. When this is the case and the corresponding SMB message accesses a file which will be created as an SMB file by pcapFS, we cannot determine to which tree it belongs, i.e., in which derived directory tree to put it. This results in the creation of a derived tree with the generic tree name "treeid_x" with x being the treeId number. The respective accessed file is then inserted there. The treeId-treename mappings are resolved even before the actual detailed SMB parsing is done. For resolving, all freshly created TCP files are skimmed right at the beginning. This needs to be done because the SMB connections are later parsed connection-wise and in scenarios with simultaneous, accesses to the same tree some mappings might otherwise be determined too late.

Per ServerEndpointTree the following mappings are memorized by the SMB manager:

  • fileId-filename mapping: Each server-side file is addressed using its fileId instead of its file name. The usual procedure for the client to interact with a server-side file, is that at first, the file needs to be opened. For that the client sends an SMB2_CREATE request containing the file name. The server responds with a newly created fileId (file handle) corresponding to the file which is valid until the file is closed with the SMB2_CLOSE message. Having the fileId, the client does whatever they want to to with the file (obtain metadata information, read, write, ...). The fileId-filename mapping is memorized globally because the same fileId can be used over different connections to the same ServerEndpointTree, i.e., it is possible that in one connection, the fileId is obtained by an SMB2_CREATE response and in another simultaneous connection to the same tree, this fileId is used. Similarly to the treeId-treename mapping, the fileId-filename mappings are obtained before the actual SMB connections are parsed in detail. For that, pcapFS skims through all TCP files in advance.
  • filename-FilePtr mapping: By this mapping, the derived server-side files (SMB files) are managed. The FilePtr is a pointer to a virtual SMBServerFile (this can also be a directory) which later becomes a real file in the resulting server-side directory hierarchy derived by pcapFS. More to that in the subsequent section.

Management of SMB Files

Now that we roughly know how pcapFS handles SMB parsing and which information needs to be memorized at which abstraction layer, it remains to be explained how SMB files are extracted and how the server-side directory hierarchies are derived. All of this is done by the SMB manager. Currently, SMB files are created/updated via six different SMB message types, SMB2_CREATE response, SMB2_READ response, SMB2_WRITE request, SMB2_QUERY_INFO response, SMB2_QUERY_DIRECTORY response and SMB2_SET_INFO request. When one of these message types is detected, the SMB manager takes over right after the message is parsed. Depending on the different file content/metadata contained in the message, different file properties can be set or updated. When the SMB manager encounters a message regarding a file which is priorly unknown for the respective tree, a new SmbFile is created and its metadata is set according to the information contained in the respective SMB message. SmbFile is a specialized ServerFile for SMB. The difference of a ServerFile from other virtual files is that it contains more timestamps and a pointer to its parent directory which is also a ServerFile. So, starting from an SmbFile, a cascade of parent directory pointers can be built up until the root directory of the corresponding tree (whose name is obtained by the treeId-treename mapping) is reached. By that, pcapFS can easily build up the respective directory hierarchy at the mount point. For each newly created SmbFile, its parent directories can be determined because the file name for the SmbFile (as it has to be derived from the fileId-filename mapping) luckily always includes its absolute path beginning with the first subdirectory of the corresponding tree (There is one exception when handling SMB2_QUERY_DIRECTORY responses, look below for further info).

Let's look closer at what pcapFS does for each of the five mentioned message types.

SMB2_CREATE response

First of all, the fileId-filename mapping of the file requested via the SMB2_CREATE message is updated for the tree it belongs to. When the file is not yet present as an SMB file in the filename-FilePtr mapping, a new SmbFile is created and its metadata is initialized with the file information contained in the SMB2_CREATE response (namely timestamps, file size and the information whether it is a directory or not). In order to set the pointer to the file's parent directory, pcapFS iterates backwards through the file's absolute path and recursively creates SMB files for all parent directories which are not yet represented by an SmbFile instance. If the file requested via the SMB2_CREATE message is already present as an SMB file in the filename-FilePtr mapping, its metadata is updated if the file's lastAccessTime contained in the SMB2_CREATE response's message body is newer.

SMB2_READ response and SMB2_WRITE request

These two message types are responsible for reading from server-side files or writing to them. Through them, we are able to actually fill our SMB files with content. Both message types are handled similarly, where each time a new read/write with a read-/write-offset of zero is encountered for a file, as new file version is created. For that, the old respective SMB file version is cloned and equipped with a file version tag. The current aka new version is set according to the data body of the read/write message. Possible redundant successive file versions containing the same content are deduplicated later on. When the read-/write-offset equals the current file size, i.e. new data is appended to the file, no new file version is created. Instead, the current file version is updated. One important remark is that, when handling SMB2_WRITE requests, the respective SMB files are updated with the provided data regardless of whether the write operation is successful at the end or not.

SMB2_QUERY_DIRECTORY response

In contrast to the other two message types, SMB2_QUERY_DIRECTORY responses can contain information for multiple files, i.e., all files in the current directory which match the search pattern specified in the SMB2_QUERY_DIRECTORYrequest. The file info classes which are relevant for pcapFS are:

  • FileDirectoryInformation,
  • FileFullDirectoryInformation,
  • FileIdFullDirectoryInformation,
  • FileBothDirectoryInformation,
  • FileIdBothDirectoryInformation,
  • FileIdExtdDirectoryInformation.

They contain all timestamps, filename and relevant file attributes for every matching file of the requested directory. The directory is addressed using its fileId and, for creating/updating the SMB files corresponding to the files listed in the SMB2_QUERY_DIRECTORY response, we need to know (the absolute path of) the directory name. This is no issue when the fileId-filename mapping for that directory is already known. However, pcapFS needs to tackle somehow the case that the name for the fileId is not known. This case might not occur very often because before a SMB2_QUERY_DIRECTORY request, the corresponding directory has to be accessed via SMB2_CREATE and, with every SMB2_CREATE response, the respective fileId-filename mapping for that directory gets memorized. But, it can happen that the SMB2_CREATE request and SMB2_QUERY_DIRECTORY request for the same directory are chained together. Then, the client specifies the fileId fffff...f in the SMB2_QUERY_DIRECTORY request indicating that they refer to the directory accessed via the SMB2_CREATE request right before. Then, only looking at the SMB2_QUERY_DIRECTORY messages, pcapFS does not know the name of the requested directory (since the fileId fffff...f doesn't resolve to a known file name). Thus, pcapFS takes the name specified in the last SMB2_CREATE request as directory name. Then, pcapFS is able to assemble the absolute path for every file listed in the SMB2_QUERY_DIRECTORY response and can create/update the corresponding SmbFile instances. If the name specified in the last SMB2_CREATE request is not available (i.e., it is empty), pcapFS puts the respective SMB files into the root directory of the current tree. This is done because it is common for SMB to have SMB2_CREATE messages with empty file name when it is referred to the root directory.

SMB2_QUERY_INFO response

The detection of an SMB2_QUERY_INFO response initiates SMB file creations/changes only if the underlying query info type is SMB2_0_INFO_FILE and the file info class is FileAllInformation, FileBasicInformation or FileNetworkOpenInformation. All other info types/classes don't contain (enough) needed file information. Like SMB2_QUERY_DIRECTORY, SMB2_QUERY_INFO messages address files via their fileIds. Hence, the fileId-filename mapping for the requested file needs to be known in advance for all file info classes except for FileAllInformation which also contains the file name. So, when the fileId-filename mapping is not known for the requested file's fileId and we have a FileAllInformation info class, pcapFS is still able to establish that mapping through the file name contained in FileAllInformation.

SMB2_SET_INFO request

PcapFS updates SMB file's timestmaüs if the corresponding Set Info Request contains FileBasicInformation.

Creation of derived server-side directory hierarchies

Now that we know the absolute path for every SMB file through the cascade of parent directory pointers, it is pretty easy to derive the resulting directory hierarchies for all SMB server endpoint trees and incorporate them into the directory layout. For that, pcapFS flips the cascade of parent directory pointers for every SMB file and then inserts each resulting tree at the mount point's subdirectory where the underlying SMB connection satisfies the property corresponding to the directory.

Memorizing SMB file hierarchies in the index file

PcapFS typically writes all important meta data information about each derived virtual file (especially offsets into and identifier for the underlying (virtual) file) into an index file. This has the advantage that for the next time pcapFS is to be executed with the same capture file(s), the index file can be passed to pcapFS by what the capture file(s) don't need to be parsed once more. Instead, the virtual directory hierarchy can be constructed directly by using the information saved in the index file.

In order to reconstruct SMB files by reading out of an index file, more information needs to be saved than for other virtual files. The additional timestamps and the information whether the SMB file is a directory can be directly put into the index file. For memorizing a reference to the parent directory, which is also an SMB file, each parent directory has a unique representative Id. Instead of a FilePtr object which can't be really saved in the index file, the parent directory is saved as its Id. By that, the parent directory file pointer can be reconstructed correctly for each SMB file after all virtual files are created by reading out of the index file.

Summary of SMB-related issues

  • When SMB files are located in a tree and the SMB2_TREE_CONNECT request/response for that tree is not captured, the respective tree name (aka the tree's root directory name) is set to "treeId_x" where x is the treeId number. For multiple captured connections to the same tree, this can lead to redundancies of the same files being saved in different derived directory trees (with different tree names according to their treeIds).

  • When sorting the virtual directory hierarchy w.r.t. the source port of TCP connections (--sortby=srcPort), it can be the case that not all of the SMB files, which are accessed in the corresponding connection, are displayed in the corresponding port folders. This can happen when having multiple simultanous connections accessing the same files. Then, e.g., for the case that we have 2 of those connections, the respective SMB files are displayed only in one of the corresponding port folders, not both.

Clone this wiki locally