Skip to content

Conversation

Siddhram
Copy link

This PR introduces a comprehensive and well-documented implementation of the Count Segments problem in R for efficient word boundary detection and segment counting.

Overview

The algorithm counts the number of segments (words) in a given string by detecting word boundaries.
A segment is defined as a sequence of non-space characters, and segments are separated by one or more spaces.
It efficiently identifies transitions from space to non-space characters to determine segment counts.


Features

  • Optimized O(n) boundary detection algorithm
    • Iterates through each character once
    • Increments count when current != ' ' AND (i == 1 OR previous == ' ')
  • Handles leading, trailing, and multiple consecutive spaces correctly
  • Multiple implementation variants:
    • count_segments() – Primary boundary detection method
    • count_segments_regex() – Regex-based alternative for validation
    • count_segments_vectorized() – Vectorized version for multiple string inputs
  • Robust input validation and error handling
  • Comprehensive test suite with edge cases
  • Consistent roxygen2 documentation and code structure with other string_manipulation scripts

Complexity

  • Time Complexity: O(n) — Each character is processed exactly once
  • Space Complexity: O(1) — Constant space (or O(n) due to R’s internal character vector representation)

Directory

  • Updated DIRECTORY.md
    Added: "Count Segments" entry under String Manipulation

Demonstration

Run the following script to execute built-in examples and test cases:

PowerShell

Rscript "string_manipulation/count_segments.r"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant