-
Notifications
You must be signed in to change notification settings - Fork 7.7k
LwM2M string resource is not a zero terminated C-string according to the LwM2M specification #90719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
acc6a5c
to
f27a77d
Compare
It's against LwM2M specification which does not specify zero-termination for a string resource. It's not required and the current implementation makes a terminating UTF-8 multibyte character invalid and modifies any string which use up the entire assigned buffer without any notice. It may also break compatibility with other LwM2M implementations. This commit fixes `lwm2m_get_string()` and `lwm2m_get_opaque()` implementation which could not return the copied data length. So, the API of these functions breaks compatiblity. However, `lwm2m_get_opaque` had no useful purpose without the information about the copied size. `lwm2m_get_string` still adds zero-termination but requires a buffer that can take the zero-termination character. To fix the implementation, `lwm2m_engine_get()` was modified in way it returns the copied data length. So, a return code >= 0 is okay now. While editing `lwm2m_engine_get()` some "side quests" arise. - `memcpy` works for everything except a LwM2M Time resource. So, a lot of code was dropped. - reading from a null pointer can return `success`. - `read_cb()` unittest implementation returns an error (null pointer). Signed-off-by: Stefan Schwendeler <Stefan.Schwendeler@husqvarnagroup.com>
A read callback can return a null pointer to inidicate an error. `lwm_engine_get()` will then return -ENOENT as response. Signed-off-by: Stefan Schwendeler <Stefan.Schwendeler@husqvarnagroup.com>
Verifies that the memory copy works as expected - return codes are correct - resource length parameter works Signed-off-by: Stefan Schwendeler <Stefan.Schwendeler@husqvarnagroup.com>
f27a77d
to
88c70cf
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable I think. Thanks for adding extra tests!
data_ptr = res->read_cb(obj_inst->obj_inst_id, res->res_id, res_inst->res_inst_id, | ||
&data_len); | ||
} | ||
|
||
if (data_ptr && data_len > 0) { | ||
if (data_ptr == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change makes sense, IMHO it'd be better to put it in a separate commit for clarity, the fix isn't really related to the string representation cleanup.
int lwm2m_get_opaque(const struct lwm2m_obj_path *path, void *buf, uint16_t buflen, | ||
uint16_t *reslen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of weird we had function like this, how in the world would the user know how large the actual data content is in the buffer.
But anayway, it's still an API change - LwM2M API is unstable, but it should be covered in the migration guide for 4.3, please add an entry, can be in a separate commit.
I fail to see how you can produce truncated stings into LwM2M engine.
|
By the way, referring to the LwM2M specification on this case does not apply. LwM2M specifies the network protocol, nothing about the internal representation in the engine or the C API of the client. So if something is not encoded correctly according to LwM2M spec, the content-type encoder/decoder needs to be fixed. We should not touch the API because of that. |
I run into an issues with the current implementation of LwM2M String resource.
A string resource may be truncated with a zero-terminator when the buffer was used up entirely. In case of a UTF-8 multi-byte character as final symbol the result is an invalid UTF-8 string.
from
lwm2m_engine_set(...)
From the git history, the zero termination was always done in one or another way.
However, that is not in line with LwM2M specification which does specify that it is an UTF-8 string. In UTF-8,
nul
is a character like many other.Many people, which are familiar with the C programming language, think that zero terminated strings are common sense. In reality, it is a concept of the the C programming language.
So, the current implementation which enforces zero-termination may cause compatibility issues with other LwM2M implementations. At least, any unit test should fail that do not
get
what he wasset
. That's the case with the current implementation.My solution proposal
Since the
datalen
field is introduced, a String resource can be handled like Opaque. So, don't add or remove any character in the resource itself.lwm2m_get_string()
andlwm2m_set_string()
can do the C string <-> LwM2M string conversion by adding and removing the C string zero terminator.Actually,
lwm2m_get_string()
andlwm2m_get_opaque()
functions lack also the information about the actual resource length. So, I added a parameter to get the information about the resource size.So, the API of these functions breaks compatibility. Anyway,
lwm2m_get_opaque()
had no useful purpose without that information.lwm2m_get_string()
adds now the C zero-termination but requires a buffer that cantake that additional character.
The first commit is too big. I cleaned some code that hurts my eyes. It was not really necessary.
However, any thoughts about the basic topic?