Skip to content

Commit 8dde2ec

Browse files
committed
Stream git commit log to a file before parsing
Commit log for large git repositories doesn't fit in memory. We started seeing failing import jobs for large git repositories. This patch writes the log to a file before parsing it so that we don't hold the entire log in memory.
1 parent b0aff9f commit 8dde2ec

File tree

1 file changed

+30
-4
lines changed

1 file changed

+30
-4
lines changed

lib/scm/adapters/git/commits.rb

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,12 @@ def each_commit(opts={})
4040
# because Ohloh ignores merge diffs anyway.
4141

4242
previous = nil
43-
Scm::Parsers::GitStyledParser.parse(log(opts)) do |e|
44-
yield fixup_null_merge(e) unless previous && previous.token == e.token
45-
previous = e
46-
end
43+
open_log_file(opts) do |io|
44+
Scm::Parsers::GitStyledParser.parse(io) do |e|
45+
yield fixup_null_merge(e) unless previous && previous.token == e.token
46+
previous = e
47+
end
48+
end
4749
end
4850

4951
# Returns a single commit, including its diffs
@@ -85,6 +87,30 @@ def log(opts={})
8587
end
8688
end
8789

90+
91+
# Same as log() method above, except that it writes the log to
92+
# a file.
93+
def open_log_file(opts={})
94+
if has_branch?
95+
if opts[:after] && opts[:after]==self.head_token
96+
'' # Nothing new.
97+
else
98+
begin
99+
run "#{rev_list_command(opts)} | xargs -n 1 #{Scm::Parsers::GitStyledParser.whatchanged} > #{log_filename}"
100+
File.open(log_filename, 'r') { |io| yield io }
101+
ensure
102+
File.delete(log_filename) if FileTest.exist?(log_filename)
103+
end
104+
end
105+
else
106+
''
107+
end
108+
end
109+
110+
def log_filename
111+
File.join('/tmp', (self.url).gsub(/\W/,'') + '.log')
112+
end
113+
88114
def rev_list_command(opts={})
89115
up_to = opts[:up_to] || branch_name
90116
range = opts[:after] ? "#{opts[:after]}..#{up_to}" : up_to

0 commit comments

Comments
 (0)