-
Notifications
You must be signed in to change notification settings - Fork 6
SMB Documentation
With this wiki entry, we want to document how pcapFS handles SMB traffic internally. We give a detailed explanation of the whole SMB "work flow" of pcapFS and highlight assumptions made on the way as well as SMB scenarios where pcapFS has its weaknesses.
The functionality of pcapFS regarding SMB currently includes creation of SMB control files (one SMB control file per underlying TCP connection) which contain information about all transferred SMB messages. SMB control files are obtainable by use of the option --show-metadata
. PcapFS also creates so-called SMB server files. These are the server-side files which are accessed during a captured SMB connection. These files are either accessed directly by SMB2_CREATE
messages or we know from the context that the respective file(s) exists, e.g. through conducted search queries (SMB2_QUERY_DIRECTORY
and SMB2_QUERY_INFO
messages). Knowing these files, pcapFS reconstructs, as far as possible, the known parts of the server-side directory hierarchy. For now, only file metadata (file name, timestamps, file size) is set for SMB server files.
Details about how everything works is given in the further course. Let's start with a quick overview over the SMB-related source code files with their respective responsibilities:
- In smb.cpp/smb.h, everything regarding SMB starts, it is also responsible for creating SMB control files which protocol all transferred SMB messages.
- smb_packet.cpp/smb_packet.h are responsible for parsing SMB packets. One SMB packet consists of one SMB header and one SMB message body.
- smb_messages.h parses the SMB message body of an SMB packet w.r.t. the message type.
- smb_structs.h and smb_constants.h define structs, enums, response codes etc. needed for parsing and memorizing relevant SMB-related information.
- smb_utils.cpp/smb_utils.h contain functions frequently needed for handling SMB traffic.
- serverfile.cpp/serverfile.h define a super-class for virtual files representing server-side files which are accessed via protocols like SMB, NFS, etc.
-
smb_serverfile.cpp/smb_serverfile.h inherit from the
serverfile
super-class and represent server-side files ("real" files and directories) accessed via SMB. - smb_manager.cpp/smb_manager.h is the connecting piece between the parts responsible for parsing SMB traffic and SMB server files. It memorizes and manages per SMB server endpoint all server files as well as all SMB-related mappings that are needed to be kept in mind.
Here, we give a detailed explanation of the SMB parsing process. The parsing is designed on the basis of Microsoft's documentation of SMB version 2 and 3, so you need to be somehow familiar with that.
There are multiple ways of how SMB traffic is embedded in network packets. SMB can be realized on top of raw TCP, NetBIOS over TCP, QUIC and RDMA. PcapFS currently only supports SMB over TCP and SMB over NetBIOS over TCP. When such communication is detected, smb.cpp initiates the parsing of the "SMB packets" contained in the respective TCP payload. One SMB packet consists of one SMB header and one SMB message body.
Depending on the SMB version, different SMB headers are used which can be distinguished by different magic numbers. PcapFS focuses on SMB packets containing the SMB2 Header which is the standard header for SMB version 2 and 3. When the SMB2 header is detected, the whole SMB message body is parsed in a detailed manner. The main SMB-related functionality of pcapFS - especially regarding SMB server files - is built upon information extracted from packets containing this header. For all other header types, the only information extracted from them is, if possible, the message type (which is then documented in the resp. control file). In the further course, we assume to have packets with an SMB2 header.
SMB packets can be chained together. By this, multiple SMB packets can be embedded in the payload of one TCP packet. Offsets into the virtual TCP files differ for chained and unchained packets. Therefore, we need to carefully consider chained packets and distinguish chained packets which are the last part of a chain from other chained packets. This is done by taking the chainOffset
field and related operations
flag of the SMB2 header into account.
For parsing the message bodies of SMB packets, pcapFS provides one dedicated class for almost each message type. These classes represent the respective messages and enable easy access of information contained in the message which is needed to provide more information about the message in the control file as well as for creating and updating SMB server files. Parsing may fail if the message or field sizes are not correct, the structureSize
field does not equal the obligated documented value or the SMB message type is unknown. Then, a generic SmbMessage
class is instantiated and no further information is extracted. One special message type is the Error Response which is sent by an SMB server when an error occurs during handling the client's request message. PcapFS identifies an Error Response Message by detecting a structureSize
value of 9 and a non-zero status code in the SMB2 packet header.
For each SMB message body, its message length is calculated. This needs to be done for the case that the respective SMB packets are chained in order to calculate the correct offsets. When they are not chained, the size field from NBSS header can be taken.
When detecting an SMB2_CREATE
, SMB2_QUERY_DIRECTORY
or SMB2_QUERY_INFO
message, the SMB manager is invoked for updating the state of SMB server files according to the message. This is explained in the corresponding section below.
But before that, we need to look closer at relevant information - especially mappings - that is needed to be kept in mind. Some information is required to be managed per SMB connection and other stuff globally for all SMB connections to the same SMB server.
While parsing all SMB packets (including the contained SMB messages), pcapFS manages one smbContext
struct per SMB connection. smbContext
holds information which needs to be memorized along one SMB connection. This includes amongst other things:
- a reference to the underlying virtual TCP file
- information which needs to be remembered between request and response of
SMB2_CREATE
,SMB2_QUERY_DIRECTORY
,SMB2_QUERY_INFO
andSMB2_TREE_CONNECT
messages (file name, fileId, fileInfoClass, ...) -
treeId-treename mapping: Each SMB server can have multiple separate directory trees which are identified by the
treeId
field of the SMB2 header. So, the treeId indicates which directory tree the respective SMB message refers to. The name of a tree corresponds to the name of its root directory (This is also the root directory of the directory tree which is derived by pcapFS containing the known SMB server files which are located there). The mapping is only unique for one SMB connection. This means that in a scenario with distributed access to the same SMB server over multiple TCP connections at the same time, it is possible that among these multiple connections, the same treeId is referring to different trees. Thus, this mapping needs to be managed per connection. (Other mappings, like e.g. fileId-filename, are the same over multiple simultaneous connections to the same SMB server. So, this must be memorized globally). The treeId-treename mapping is extracted fromSMB2_TREE_CONNECT
request and responses. Because of that, the tree name cannot be determined for a given treeId if the correspondingSMB2_TREE_CONNECT
request and response were not captured. When this is the case and the corresponding SMB message accesses a file which will be created as an SMB server file by pcapFS, we cannot determine to which tree it belongs, i.e., in which derived directory tree to put it. This results in the creation of a derived tree with the generic tree name "treeid_x" with x being the treeId number. The respective accessed file is then inserted there. In a scenario where we have accesses to the same SMB server over multiple TCP connections at the same time, this approach can lead to redundancies of the same files being saved in different derived directory trees (with different tree names according to their treeIds).
The SMB manager is - besides managing SMB server files - also responsible for managing two mappings which need to be maintained globally because they pertain all connections to the same SMB server, or to be more precise, all connections to the same tree of the same SMB server. Such a tree is identified by the ServerEndpointTree
struct which consists of the SMB server's IP address and port as well as the respective tree name. Per ServerEndpointTree
the following mappings are memorized by the SMB manager:
-
fileId-filename mapping: Each server-side file is addressed using its fileId instead of its file name. The usual procedure for the client to interact with a server-side file, is that at first, the file needs to be opened. For that the client sends an
SMB2_CREATE
request containing the file name. The server responds with a newly created fileId (file handle) corresponding to the file which is valid until the file is closed with theSMB2_CLOSE
message. Having the fileId, the client does whatever they want to to with the file (obtain metadata information, read, write, ...). The fileId-filename mapping is memorized globally because the same fileId can be used over different connections to the sameServerEndpointTree
, i.e., it is possible that in one connection, the fileId is obtained by anSMB2_CREATE
response and in another simultaneous connection to the same tree, this fileId is used. For pcapFS, this behavior may lead to undetected SMB server files in the following scenario: Assume we have two SMB connections to the same SMB server at the same time and in the connection which started later than the other, the fileId is obtained by anSMB2_CREATE
response. When then the other connection, which started earlier, uses this file handle, pcapFS does not know to which file name it belongs. This is because pcapFS does not parse the whole capture file chronologically but connection-wise. Thus, the SMB connection, which uses the fileId, is parsed before the SMB connection where the corresponding fileId-filename mapping can be derived from. -
filename-FilePtr mapping: By this mapping, the derived server-side files (SMB server files) are managed. The FilePtr is a pointer to a virtual
SMBServerFile
(this can also be a directory) which later becomes a real file in the resulting server-side directory hierarchy derived by pcapFS. More to that in the subsequent section.
Now that we roughly know how pcapFS handles SMB parsing and which information needs to be memorized at which abstraction layer, it remains to be explained how SMB server files are extracted and how the server-side directory hierarchies are derived. All of this is done by the SMB manager. Currently, SMB server files are created/updated via three different SMB message types, SMB2_CREATE
response, SMB2_QUERY_INFO
response and SMB2_QUERY_DIRECTORY
response. When one of these message types is detected, the SMB manager takes over right after the message is parsed. Depending on the different file metadata contained in the message, different file properties can be set or updated. When the SMB manager encounters a message regarding a file which is priorly unknown for the respective tree, a new SMBServerFile
is created and its metadata is set according to the information contained in the respective SMB message. SMBServerFile
is a specialized ServerFile
for SMB. The difference of a ServerFile
from other virtual files is that it contains more timestamps and a pointer to its parent directory which is also a ServerFile
. So, starting from an SMBServerFile
, a cascade of parent directory pointers can be built up until the root directory of the corresponding tree (whose name is obtained by the treeId-treename mapping) is reached. By that, pcapFS can easily build up the respective directory hierarchy at the mount point. For each newly created SMBServerFile
, its parent directories can be determined because the file name for the SMBServerFile
(as it has to be derived from the fileId-filename mapping) luckily always includes its absolute path beginning with the first subdirectory of the corresponding tree (There is one exception when handling SMB2_QUERY_DIRECTORY
responses, look below for further info).
Let's look closer at what pcapFS does for each of the three mentioned message types.
First of all, the fileId-filename mapping of the file requested via the SMB2_CREATE
message is updated for the tree it belongs to. When the file is not yet present as an SMB Server File in the filename-FilePtr mapping, a new SMBServerFile
is created and its metadata is initialized with the file information contained in the SMB2_CREATE
response (namely timestamps, filesize and the information whether it is a directory or not). In order to set the pointer to the file's parent directory, pcapFS iterates backwards through the file's absolute path and recursively creates SMB server files for all parent directories which are not yet represented by an SMBServerFile
instance. If the file requested via the SMB2_CREATE
message is already present as an SMB Server File in the filename-FilePtr mapping, its metadata is updated if the file's lastChangeTime
contained in the SMB2_CREATE
response's message body is newer.
In contrast to the other two message types, SMB2_QUERY_DIRECTORY
responses can contain information for multiple files, i.e., all files in the current directory which match the search pattern specified in the SMB2_QUERY_DIRECTORY
request. The file info classes which are relevant for pcapFS are:
-
FileDirectoryInformation
, -
FileFullDirectoryInformation
, -
FileIdFullDirectoryInformation
, -
FileBothDirectoryInformation
, -
FileIdBothDirectoryInformation
, -
FileIdExtdDirectoryInformation
.
They contain all timestamps, filename and relevant file attributes for every matching file of the requested directory. The directory is addressed using its fileId and, for creating/updating the SMB server files corresponding to the files listed in the SMB2_QUERY_DIRECTORY
response, we need to know (the absolute path of) the directory name. This is no issue when the fileId-filename mapping for that directory is already known. However, pcapFS needs to tackle somehow the case that the name for the fileId is not known. This case might not occur very often because before a SMB2_QUERY_DIRECTORY
request, the corresponding directory has to be accessed via SMB2_CREATE
and, with every SMB2_CREATE
response, the respective fileId-filename mapping for that directory gets memorized. But, it can happen that the SMB2_CREATE
request and SMB2_QUERY_DIRECTORY
request for the same directory are chained together. Then, the client specifies the fileId fffff...f
in the SMB2_QUERY_DIRECTORY
request indicating that they refer to the directory accessed via the SMB2_CREATE
request right before. Then, only looking at the SMB2_QUERY_DIRECTORY
messages, pcapFS does not know the name of the requested directory (since the fileId fffff...f
doesn't resolve to a known file name). Thus, pcapFS takes the name specified in the last SMB2_CREATE
request as directory name. Then, pcapFS is able to assemble the absolute path for every file listed in the SMB2_QUERY_DIRECTORY
response and can create/update the corresponding SMBServerFile
instances. If the name specified in the last SMB2_CREATE
request is not available (i.e., it is empty), pcapFS puts the respective SMB server files into the root directory of the current tree. This is done because it is common for SMB to have SMB2_CREATE
messages with empty file name when it is referred to the root directory. So, in this case, pcapFS would put the SMB server files in the correct directory. But this may also be wrong when, e.g., the SMB2_CREATE
messages regarding the directory requested via the SMB2_QUERY_DIRECTORY
messages are not captured and the directory is not the root directory or when the SMB2_CREATE
messages are transmitted in a later-parsed simultaneous connection to the same tree (This corresponds to the issue mentioned above when explaining the fileId-filename mapping). By that, SMB server files could be mistakenly put into the tree's root directory.
The detection of an SMB2_QUERY_INFO
response initiates SMB server file creations/changes only if the underlying query info type is SMB2_0_INFO_FILE
and the file info class is FileAllInformation
, FileBasicInformation
or FileNetworkOpenInformation
. All other info types/classes don't contain (enough) needed file information. Like SMB2_QUERY_DIRECTORY
, SMB2_QUERY_INFO
messages address files via their fileIds. Hence, the fileId-filename mapping for the requested file needs to be known in advance for all file info classes except for FileAllInformation
which also contains the file name. So, when the fileId-filename mapping is not known for the requested file's fileId and we have a FileAllInformation
info class, pcapFS is still able to establish that mapping through the file name contained in FileAllInformation
. When the SMB manager handles SMB2_QUERY_INFO
responses, the same issue as for SMB2_QUERY_DIRECTORY
responses can happen. This is, when the name for the fileId is not known because the SMB2_CREATE
messages corresponding to that file are not captured or transferred in an other SMB connection which is parsed later and the file name corresponding to the last SMB2_CREATE
of the current connection is empty, the file is assumed to be in the tree's root directory. This is correct for many cases, but not for all.
Now that we know the absolute path for every SMB server file through the cascade of parent directory pointers, it is pretty easy to derive the resulting directory hierarchies for all SMB server endpoint trees and incorporate them into the directory layout. For that, pcapFS flips the cascade of parent directory pointers for every SMB server file and then inserts each resulting tree at the mount point's subdirectory where the underlying SMB connection satisfies the property corresponding to the directory.
PcapFS typically writes all important meta data information about each derived virtual file (especially offsets into and identifier for the underlying (virtual) file) into an index file. This has the advantage that for the next time pcapFS is to be executed with the same capture file(s), the index file can be passed to pcapFS by what the capture file(s) don't need to be parsed once more. Instead, the virtual directory hierarchy can be constructed directly by using the information saved in the index file.
In order to reconstruct SMB server files by reading out of an index file, more information needs to be saved than for other virtual files. The additional timestamps and the information whether the SMB server file is a directory can be directly put into the index file. For memorizing a reference to the parent directory, which is also an SMB server file, each parent directory has a unique representative Id. Instead of a FilePtr object which can't be really saved in the index file, the parent directory is saved as its Id. By that, the parent directory file pointer can be reconstructed correctly for each SMB server file after all virtual files are created by reading out of the index file.
-
When SMB server files are located in a tree and the
SMB2_TREE_CONNECT
request/response for that tree is not captured, the respective tree name (aka the tree's root directory name) is set to "treeId_x" where x is the treeId number. For multiple captured connections to the same tree, this can lead to redundancies of the same files being saved in different derived directory trees (with different tree names according to their treeIds).
It may also be possible that among two connections to the same SMB server, the same treeId is used for different trees. Then, when also the correspondingSMB2_TREE_CONNECT
request/response for both connections are not captured, the respective SMB server files of the different trees are merged into the same derived tree directory hierarchy when both connections are in the same property directory (This is e.g., the case when the complete virtual directory built by pcapFS is sort by the property "protocol". Then both connections land in the same directory "smb". Instead, when the sortby argument distinguishes different TCP connections, e.g., through the argument "srcPort", the virtual files corresponding to the SMB connections are saved in different directories) -
It can be relevant in which order fileId-filename mappings are derived. Assume we have two SMB connections to the same SMB server at the same time and in the connection which started later than the other, the fileId is obtained by an
SMB2_CREATE
response. When then the other connection, which started earlier, uses this file handle, pcapFS does not know to which file name it belongs. This is because pcapFS does not parse the whole capture file chronologically but connection-wise. Thus, the SMB connection, which uses the fileId, is parsed before the SMB connection where the corresponding fileId-filename mapping can be derived from. This can lead to case that the respective SMB server file is not created. -
Similarly to the previous point, some metadata information saved in SMB control files may not be completely reconstructable when reading from an index file in the following scenario: Once again, consider two SMB connections to the same server at the same time. The connection, which starts earlier, obtains the fileId for a file via
SMB2_CREATE
messages and the other connection accesses the respective file. For the first time executing pcapFS with the respective capture file, the connection, where the fileId-filename mapping is extracted, is parsed before the connection where the file is accessed, so everything is fine. But for the case that pcapFS is not executed for the first time with the capture file and the index file is passed, it can happen that the content for the SMB control file for the second connection (where the file is accessed) is read before the content of the SMB control file for the first connection. Then, like above, the fileId-filename mapping is derived too late and in the respective SMB control file for the connection which accesses the affected file, the extracted message corresponding to the file access doesn't contain the information which file it refers to. On the other hand, the problem from above, that the corresponding SMB server file might not be created, doesn't occur here since all necessary metadata information of the respective SMB server file is saved in the index file. -
When an
SMB2_QUERY_DIRECTORY
/SMB2_QUERY_INFO
message refers to an fileId whose corresponding file name is not known and the name specified in the lastSMB2_CREATE
request is not available (i.e., it is empty), pcapFS puts the SMB server file(s) derived from theSMB2_QUERY_DIRECTORY
/SMB2_QUERY_INFO
response into the root directory of the current tree. This is done because it is common for SMB to haveSMB2_CREATE
messages with an empty file name field when it is referred to the root directory. In that case, pcapFS would put the SMB server files in the correct directory. But this may also be wrong when, e.g., theSMB2_CREATE
messages regarding the file/directory requested via theSMB2_QUERY_DIRECTORY
/SMB2_QUERY_INFO
messages are not captured and the directory is not the root directory or when theSMB2_CREATE
messages are transmitted in a different, later-parsed connection to the same tree.
- Parse
SMB2_READ
andSMB2_WRITE
messages in a way so that file content can be written into the corresponding SMB server files. - Create multiple versions of SMB server files for each point in time where the file content/metadata is changed.
- Consider the case that SMB server files are renamed via
SMB2_SET_INFO
messages. - Investigate ASYNC SMB2 packet headers and find out whether they might disturb the way how SMB packets are parsed and SMB server files are managed.