Parse (split) a string in C++ using string delimiter (standard C++)

Learn parse (split) a string in c++ using string delimiter (standard c++) with practical examples, diagrams, and best practices. Covers c++, parsing, split development techniques with visual explan...

Parse (Split) a String in C++ Using a String Delimiter

Hero image for Parse (split) a string in C++ using string delimiter (standard C++)

Learn various standard C++ techniques to split a string into tokens based on a string delimiter, covering std::string::find, std::string::substr, and std::getline with custom stream buffers.

Splitting a string by a delimiter is a common operation in many programming tasks, from parsing configuration files to processing user input. Unlike some other languages, standard C++ does not provide a direct, built-in function for splitting a string by another string (a string delimiter). However, with the tools available in the standard library, you can implement this functionality efficiently and robustly. This article explores several methods to achieve this, focusing on standard C++ features without relying on external libraries.

Method 1: Using std::string::find and std::string::substr

This is one of the most common and straightforward approaches. It involves iteratively searching for the delimiter within the string using std::string::find and then extracting the substrings (tokens) between the delimiters using std::string::substr. This method gives you fine-grained control over the parsing process.

flowchart TD
    A[Start] --> B{Find first delimiter?}
    B -->|Yes| C[Extract substring before delimiter]
    C --> D[Advance search position past delimiter]
    D --> B
    B -->|No| E[Extract remaining substring]
    E --> F[End]

Flowchart for splitting a string using find and substr.

#include <iostream>
#include <string>
#include <vector>

std::vector<std::string> splitString(const std::string& s, const std::string& delimiter) {
    std::vector<std::string> tokens;
    size_t lastPos = 0;
    size_t pos = s.find(delimiter, lastPos);

    while (pos != std::string::npos) {
        tokens.push_back(s.substr(lastPos, pos - lastPos));
        lastPos = pos + delimiter.length();
        pos = s.find(delimiter, lastPos);
    }
    tokens.push_back(s.substr(lastPos)); // Add the last token
    return tokens;
}

int main() {
    std::string text = "apple,banana,orange,grape";
    std::string delimiter = ",";
    std::vector<std::string> result = splitString(text, delimiter);

    for (const std::string& token : result) {
        std::cout << token << std::endl;
    }

    std::cout << "\nTesting with multi-character delimiter:\n";
    std::string text2 = "one<->two<->three";
    std::string delimiter2 = "<->";
    std::vector<std::string> result2 = splitString(text2, delimiter2);

    for (const std::string& token : result2) {
        std::cout << token << std::endl;
    }

    return 0;
}

C++ function to split a string using std::string::find and std::string::substr.

Method 2: Using std::getline with a Custom Stream Buffer (Advanced)

While std::getline is typically used with character delimiters, it can be adapted for string delimiters by creating a custom std::streambuf. This approach is more complex but can be very powerful for scenarios where you need stream-like parsing behavior or want to integrate with existing std::istream operations. This method is generally overkill for simple splits but demonstrates the flexibility of C++ streams.

#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <streambuf>

// Custom streambuf to replace a string delimiter with a single character
class StringDelimiterBuffer : public std::streambuf {
public:
    StringDelimiterBuffer(std::streambuf* source, const std::string& delimiter)
        : source_(source), delimiter_(delimiter), buffer_(256) {
        setg(buffer_.data(), buffer_.data(), buffer_.data()); // Empty buffer initially
    }

protected:
    int underflow() override {
        if (gptr() < egptr()) {
            return traits_type::to_int_type(*gptr());
        }

        std::string temp_line;
        int c;
        while ((c = source_->sbumpc()) != EOF) {
            temp_line += (char)c;
            if (temp_line.length() >= delimiter_.length()) {
                if (temp_line.substr(temp_line.length() - delimiter_.length()) == delimiter_) {
                    // Found delimiter, replace with newline and put back remaining
                    temp_line.erase(temp_line.length() - delimiter_.length());
                    temp_line += '\n';
                    break;
                }
            }
        }

        if (temp_line.empty()) {
            return traits_type::eof();
        }

        // Copy temp_line to buffer_
        std::copy(temp_line.begin(), temp_line.end(), buffer_.data());
        setg(buffer_.data(), buffer_.data(), buffer_.data() + temp_line.length());
        return traits_type::to_int_type(*gptr());
    }

private:
    std::streambuf* source_;
    std::string delimiter_;
    std::vector<char> buffer_;
};

std::vector<std::string> splitStringStream(const std::string& s, const std::string& delimiter) {
    std::vector<std::string> tokens;
    std::istringstream iss(s);
    StringDelimiterBuffer sdb(iss.rdbuf(), delimiter);
    std::istream custom_stream(&sdb);

    std::string token;
    while (std::getline(custom_stream, token)) {
        tokens.push_back(token);
    }
    return tokens;
}

int main() {
    std::string text = "apple<->banana<->orange<->grape";
    std::string delimiter = "<->";
    std::vector<std::string> result = splitStringStream(text, delimiter);

    for (const std::string& token : result) {
        std::cout << token << std::endl;
    }

    std::cout << "\nTesting with another delimiter:\n";
    std::string text2 = "one...two...three";
    std::string delimiter2 = "...";
    std::vector<std::string> result2 = splitStringStream(text2, delimiter2);

    for (const std::string& token : result2) {
        std::cout << token << std::endl;
    }

    return 0;
}

C++ function to split a string using std::getline with a custom std::streambuf.

Choosing the Right Method

When deciding which method to use, consider the following:

  • Simplicity and Performance: For most common string splitting tasks, the std::string::find and std::string::substr method is the most straightforward, efficient, and idiomatic C++ solution. It offers good performance and is easy to understand and maintain.
  • Stream Integration: If you are already working with std::istream objects and need to parse data based on string delimiters within a stream, the custom std::streambuf approach might be considered, though it adds significant complexity.
  • Regular Expressions: For highly complex parsing patterns that go beyond simple fixed-string delimiters (e.g., splitting by any whitespace, or by multiple different delimiters), std::regex (from C++11 onwards) is a powerful alternative. However, std::regex has its own performance characteristics and learning curve.