Parse (split) a string in C++ using string delimiter (standard C++)
Categories:
Parse (Split) a String in C++ Using a String Delimiter

Learn various standard C++ techniques to split a string into tokens based on a string delimiter, covering std::string::find
, std::string::substr
, and std::getline
with custom stream buffers.
Splitting a string by a delimiter is a common operation in many programming tasks, from parsing configuration files to processing user input. Unlike some other languages, standard C++ does not provide a direct, built-in function for splitting a string by another string (a string delimiter). However, with the tools available in the standard library, you can implement this functionality efficiently and robustly. This article explores several methods to achieve this, focusing on standard C++ features without relying on external libraries.
Method 1: Using std::string::find
and std::string::substr
This is one of the most common and straightforward approaches. It involves iteratively searching for the delimiter within the string using std::string::find
and then extracting the substrings (tokens) between the delimiters using std::string::substr
. This method gives you fine-grained control over the parsing process.
flowchart TD A[Start] --> B{Find first delimiter?} B -->|Yes| C[Extract substring before delimiter] C --> D[Advance search position past delimiter] D --> B B -->|No| E[Extract remaining substring] E --> F[End]
Flowchart for splitting a string using find
and substr
.
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> splitString(const std::string& s, const std::string& delimiter) {
std::vector<std::string> tokens;
size_t lastPos = 0;
size_t pos = s.find(delimiter, lastPos);
while (pos != std::string::npos) {
tokens.push_back(s.substr(lastPos, pos - lastPos));
lastPos = pos + delimiter.length();
pos = s.find(delimiter, lastPos);
}
tokens.push_back(s.substr(lastPos)); // Add the last token
return tokens;
}
int main() {
std::string text = "apple,banana,orange,grape";
std::string delimiter = ",";
std::vector<std::string> result = splitString(text, delimiter);
for (const std::string& token : result) {
std::cout << token << std::endl;
}
std::cout << "\nTesting with multi-character delimiter:\n";
std::string text2 = "one<->two<->three";
std::string delimiter2 = "<->";
std::vector<std::string> result2 = splitString(text2, delimiter2);
for (const std::string& token : result2) {
std::cout << token << std::endl;
}
return 0;
}
C++ function to split a string using std::string::find
and std::string::substr
.
Method 2: Using std::getline
with a Custom Stream Buffer (Advanced)
While std::getline
is typically used with character delimiters, it can be adapted for string delimiters by creating a custom std::streambuf
. This approach is more complex but can be very powerful for scenarios where you need stream-like parsing behavior or want to integrate with existing std::istream
operations. This method is generally overkill for simple splits but demonstrates the flexibility of C++ streams.
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <streambuf>
// Custom streambuf to replace a string delimiter with a single character
class StringDelimiterBuffer : public std::streambuf {
public:
StringDelimiterBuffer(std::streambuf* source, const std::string& delimiter)
: source_(source), delimiter_(delimiter), buffer_(256) {
setg(buffer_.data(), buffer_.data(), buffer_.data()); // Empty buffer initially
}
protected:
int underflow() override {
if (gptr() < egptr()) {
return traits_type::to_int_type(*gptr());
}
std::string temp_line;
int c;
while ((c = source_->sbumpc()) != EOF) {
temp_line += (char)c;
if (temp_line.length() >= delimiter_.length()) {
if (temp_line.substr(temp_line.length() - delimiter_.length()) == delimiter_) {
// Found delimiter, replace with newline and put back remaining
temp_line.erase(temp_line.length() - delimiter_.length());
temp_line += '\n';
break;
}
}
}
if (temp_line.empty()) {
return traits_type::eof();
}
// Copy temp_line to buffer_
std::copy(temp_line.begin(), temp_line.end(), buffer_.data());
setg(buffer_.data(), buffer_.data(), buffer_.data() + temp_line.length());
return traits_type::to_int_type(*gptr());
}
private:
std::streambuf* source_;
std::string delimiter_;
std::vector<char> buffer_;
};
std::vector<std::string> splitStringStream(const std::string& s, const std::string& delimiter) {
std::vector<std::string> tokens;
std::istringstream iss(s);
StringDelimiterBuffer sdb(iss.rdbuf(), delimiter);
std::istream custom_stream(&sdb);
std::string token;
while (std::getline(custom_stream, token)) {
tokens.push_back(token);
}
return tokens;
}
int main() {
std::string text = "apple<->banana<->orange<->grape";
std::string delimiter = "<->";
std::vector<std::string> result = splitStringStream(text, delimiter);
for (const std::string& token : result) {
std::cout << token << std::endl;
}
std::cout << "\nTesting with another delimiter:\n";
std::string text2 = "one...two...three";
std::string delimiter2 = "...";
std::vector<std::string> result2 = splitStringStream(text2, delimiter2);
for (const std::string& token : result2) {
std::cout << token << std::endl;
}
return 0;
}
C++ function to split a string using std::getline
with a custom std::streambuf
.
std::streambuf
approach is significantly more complex and generally has higher overhead than the find
/substr
method. It's best reserved for specific use cases where stream integration is a primary requirement, or for learning purposes.Choosing the Right Method
When deciding which method to use, consider the following:
- Simplicity and Performance: For most common string splitting tasks, the
std::string::find
andstd::string::substr
method is the most straightforward, efficient, and idiomatic C++ solution. It offers good performance and is easy to understand and maintain. - Stream Integration: If you are already working with
std::istream
objects and need to parse data based on string delimiters within a stream, the customstd::streambuf
approach might be considered, though it adds significant complexity. - Regular Expressions: For highly complex parsing patterns that go beyond simple fixed-string delimiters (e.g., splitting by any whitespace, or by multiple different delimiters),
std::regex
(from C++11 onwards) is a powerful alternative. However,std::regex
has its own performance characteristics and learning curve.
find
/substr
is generally fast, the optimal solution can sometimes depend on the specific characteristics of your input data (e.g., very long strings, many delimiters, very short tokens).