Skip to content

Owning reference to a string with O(1) access to lines #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## Version 0.1.4 (2020-03-04)

- Added `CachedLines`

## Version 0.1.3 (2021-08-25)

- Added default `alloc` feature, and made the crate `#![no_std]`
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "line-span"
version = "0.1.3"
version = "0.1.4"
authors = ["Christian Vallentin"]
edition = "2018"
description = "Find line ranges and jump between next and previous lines"
Expand Down
121 changes: 120 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,18 @@
//! // Returns `None` as `substring` is not a part of `string2`
//! assert_eq!(str_to_range(string2, substring), None);
//! ```
//!
//! # Cached line access
//!
//! Use [`CachedLines`] to get O(1) access to a contained String
//!
//!
//! ```
//! use line_span::CachedLines;
//! let cache = CachedLines::with_ending(String::from("hello\nworld"));
//! assert_eq!(&cache[0], "hello\n");
//! assert_eq!(&cache[1], "world");
//! ```

#![forbid(unsafe_code)]
#![deny(missing_docs)]
Expand All @@ -131,7 +143,7 @@ extern crate alloc;

use core::fmt;
use core::iter::FusedIterator;
use core::ops::{Deref, Range};
use core::ops::{Deref, Index, Range};
use core::str::Lines;

/// Find the start (byte index) of the line, which `index` is within.
Expand Down Expand Up @@ -848,10 +860,117 @@ impl LineSpans for alloc::string::String {
}
}

#[derive(Debug, Clone)]
/// Owning pointer to a string that provides O(1) access to line slices
///
/// ```
/// use line_span::CachedLines;
/// let cache = CachedLines::without_ending(String::from("hello\nworld"));
/// assert_eq!(&cache[0], "hello");
/// assert_eq!(&cache[1], "world");
/// ```
///
/// Slices might or might not include line breaks depending on a function used to construct it
///
/// - [`without_ending`](CachedLines::without_ending)
/// - [`with_ending`](CachedLines::with_ending)
///
///
#[cfg(feature = "alloc")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to avoid needing all the #[cfg(feature = "alloc")], by having a:

#[cfg(feature = "alloc")]
mod cached;

and then adding this implementation to a cached.rs module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay either way, was mostly trying to keep things in one file, second alternative is

mod cached {
    ...
}

Will do.

pub struct CachedLines {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, maybe it should be called LineSpanCache/CachedLineSpans. If it's changed to collect LineSpan

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names are hard, sure.

Comment on lines +863 to +880
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the attribute, to after the comment

#[derive(Clone, Debug)]
pub struct CachedLines {

Comment on lines +863 to +880
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel like this should probably be feature gated, with a cache = [] feature

content: alloc::string::String,
Comment on lines +880 to +881
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can use alloc::borrow::Cow instead to allow for borrowing a &str instead of needing to turn it into an owned String

(Not sure if it creates other issues)

Suggested change
pub struct CachedLines {
content: alloc::string::String,
pub struct CachedLines<'a> {
content: alloc::borrow::Cow<'a>,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay either way, but Cow adds complexity without adding significant benefits - CachedLines will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.

splits: alloc::vec::Vec<Range<usize>>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead we could have a:

pub struct LineSpanWithoutText {
    start: usize,
    end: usize,
    ending: usize,
}

and then:

Suggested change
splits: alloc::vec::Vec<Range<usize>>,
splits: alloc::vec::Vec<LineSpanWithoutText>,

That way we could replace with_ending() and without_ending() with new()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index and iterator goodies.

}

#[cfg(feature = "alloc")]
impl CachedLines {
/// Splits a [`String`](alloc::string::String) into lines, slices will include linebreaks
///
/// ```
/// use line_span::CachedLines;
/// let cache = CachedLines::with_ending(String::from("hello\nworld"));
/// assert_eq!(&cache[0], "hello\n");
/// assert_eq!(&cache[1], "world");
/// ```
pub fn with_ending(content: alloc::string::String) -> Self {
let splits = content
.line_spans()
.map(|s| s.range_with_ending())
.collect::<alloc::vec::Vec<_>>();
Self { splits, content }
}

/// Splits a [`String`](alloc::string::String) into lines, slices will not include linebreaks
///
/// ```
/// use line_span::CachedLines;
/// let cache = CachedLines::without_ending(String::from("hello\nworld"));
/// assert_eq!(&cache[0], "hello");
/// assert_eq!(&cache[1], "world");
/// ```
pub fn without_ending(content: alloc::string::String) -> Self {
let splits = content
.line_spans()
.map(|s| s.range())
.collect::<alloc::vec::Vec<_>>();
Self { splits, content }
}

/// Returns a reference to a line inside a contained string
///
/// Returns [`None`] if index is out of bounds
/// ```
/// use line_span::CachedLines;
/// let cache = CachedLines::with_ending(String::from("hello\nworld"));
/// assert_eq!(cache.get(0), Some("hello\n"));
/// assert_eq!(cache.get(3), None);
/// ```
pub fn get(&self, index: usize) -> Option<&str> {
self.content.get(self.splits.get(index)?.clone())
Comment on lines +928 to +929
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in extension of the previous comment)

Then have this be something like:

Suggested change
pub fn get(&self, index: usize) -> Option<&str> {
self.content.get(self.splits.get(index)?.clone())
pub fn get(&self, index: usize) -> Option<LineSpan<'_>> {
// i.e. from splits get LineSpanWithoutText + self.contents => LineSpan
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, for get this works.

}
}

#[cfg(feature = "alloc")]
impl Index<usize> for CachedLines {
type Output = str;

fn index(&self, index: usize) -> &Self::Output {
&self.content[self.splits[index].clone()]
Comment on lines +935 to +938
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I was about to suggest calling .get() and returning the LineSpan, but this needs to return a reference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.

}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_cached_lines_no_endings() {
let text = "\r\nfoo\nbar\r\nbaz\nqux\r\n\r";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that Rust 1.70 changed the behavior, so trailing \r will appear in the final line. (Just in case this test suddenly)

PR: rust-lang/rust#100311

let c = CachedLines::without_ending(alloc::string::String::from(text));

assert_eq!("", &c[0]);
assert_eq!("foo", &c[1]);
assert_eq!("bar", &c[2]);
assert_eq!("baz", &c[3]);
assert_eq!("qux", &c[4]);
assert_eq!("", &c[5]);
assert_eq!(None, c.get(6));
}

#[test]
fn test_cached_lines() {
let text = "\r\nfoo\nbar\r\nbaz\nqux\r\n\r";
let c = CachedLines::with_ending(alloc::string::String::from(text));

assert_eq!("\r\n", &c[0]);
assert_eq!("foo\n", &c[1]);
assert_eq!("bar\r\n", &c[2]);
assert_eq!("baz\n", &c[3]);
assert_eq!("qux\r\n", &c[4]);
assert_eq!(Some("\r"), c.get(5));
assert_eq!(None, c.get(6));
}

#[test]
fn test_line_spans() {
let text = "\r\nfoo\nbar\r\nbaz\nqux\r\n\r";
Expand Down