1 Billion Row Challenge

This is a work in progress.

Had to attempt this 1BRC fun challenge and see where it gets me. Decided to use Go -- because, why not? My personal challenge is to not use chatgpt or read anything about this challenge before I try it myself.

Iterations

So far, I've had several iterations:

Basic Implementation: Without thinking about concurrency, write the simplest solution to this challenge. The code is in the it1-simple.go Result: 123 seconds
Bulk Process a chunk of lines: In this iteration, I read the file line by line, collect a given number of lines (in this case 1 million) then lunch a go routine to process this chunk as the process collects the next one million of lines. The code is in the it2-bulk-process.go Result: 564 seconds -- which turned out to be more than the basic implementation. Note: Acquiring the lock and releasing it that many times turned out to be a performance bottleneck.
Concurrency: For this iteration, I'm adding an initial buffer size for bufio scanning. Im adding in buffered channels and worker pools. I bring out a number of worker pools (Adjusted to 100) which will process chunks of data as they're made available in the channels. Also, I'm avoiding to acquire the lock to write to the common hash map in each goroutine, but rather do it at the end as you can see in the file it3-concurrent.go Result: 31seconds -- A huge improvement, but still not great.
Read Chunks of the file instead of line by line: The main point is to now read chunks of the file instead of line by line. I decided to not use the bufio.scanner as I realize through tests, that it doesn't guarantee to fill up the buffer size you set. I also added in couple of optimizations based on the CPU profiling data:

Made a custom Float parser
Avoiding bytes.Split, strings.split (String.Cut turns out to be better)
Introducing a result collector channel

Result: 12.09 seconds -- Things are looking good and really getting interesting

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
it1-simple.go		it1-simple.go
it2-bulk-process.go		it2-bulk-process.go
it3-concurrent.go		it3-concurrent.go
it4-read-chunks.go		it4-read-chunks.go
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1 Billion Row Challenge

Iterations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

yvanflorian/1brc

Folders and files

Latest commit

History

Repository files navigation

1 Billion Row Challenge

Iterations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages