You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Closes#200
## Rationale
In PHP zend_strings are binary strings with no encoding information. They can contain any byte at any position.
The current implementation use `CString` to transfer zend_strings between Rust and PHP, which prevents zend_strings containing null-bytes to roundtrip through the ffi layer. Moreover, `ZendStr::new()` accepts only a `&str`, which is incorrect since a zend_string is not required to be valid UTF8.
When reading the PHP source code, it is apparent that most functions marked with `ZEND_API` that accept a `const *char` are convenience wrappers that convert the `const *char` to a zend_string and delegate to another function. For example [zend_throw_exception()](https://github.com/php/php-src/blob/eb83e0206c9b9261d786943bf2c5ad61dca287e2/Zend/zend_exceptions.c#L823) takes a `const *char message`, and just converts it to a zend_string before delegating to [zend_throw_exception_zstr()](https://github.com/php/php-src/blob/eb83e0206c9b9261d786943bf2c5ad61dca287e2/Zend/zend_exceptions.c#L795).
I kept this PR focused around `ZendStr` and it's usages in the library, but it should be seen as the first step of a more global effort to remove usages of `CString` everywhere possible.
Also, I didn't change the return type of the string related methods of `Zval` (e.g. I could have made `Zval::set_string()`
accept an `impl AsRef<[u8]>` instead of `&str` and return `()` instead of `Result<()>`). If I get feedback that it should be done in this PR, I'll do it.
## Summary of the changes:
### ZendStr
* [BC break]: `ZendStr::new()` and `ZendStr::new_interned()` now accept an `impl AsRef<[u8]>` instead of just `&str`, and are therefore infaillible (outside of the cases where we panic, e.g. when allocation fails). This is a BC break, but it's impact shouldn't be huge (users will most likely just have to remove a bunch of `?` or add a few `Ok()`).
* [BC break]: Conversely, `ZendStr::as_c_str()` now returns a `Result<&CStr>` since it can fail on strings containing null bytes.
* [BC break]: `ZensStr::as_str()` now returns a `Result<&str>` instead of an `Option<&str>` since we have to return an error in case of invalid UTF8.
* adds method `ZendStr::as_bytes()` to return the underlying byte slice.
* adds convenience methods `ZendStr::as_ptr()` and `ZendStr::as_mut_ptr()` to return raw pointers to the zend_string.
### ZendStr conversion traits
* adds `impl AsRef<[u8]> for ZendStr`
* [BC break]: replaces `impl TryFrom<String> for ZBox<ZendStr>` by `impl From<String> for ZBox<ZendStr>`.
* [BC break]: replaces `impl TryFrom<&str> for ZBox<ZendStr>` by `impl From<&str> for ZBox<ZendStr>`.
* [BC break]: replaces `impl From<&ZendStr> for &CStr` by `impl TryFrom<&ZendStr> for &CStr`.
### Error
* adds new enum member `Error::InvalidUtf8` used when converting a `ZendStr` to `String` or `&str`
0 commit comments