Skip to content

Commit c2d6fd9

Browse files
zhangyi089tytso
authored andcommitted
jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from time to time, but it's very difficult to reproduce and analyse, benefit from the JBD2_CYCLE_RECORD option, we found out that the problem is the checkpointing process miss to write out some buffers which are raced by another do_get_write_access(). Looks below for detail. jbd2_log_do_checkpoint() //transaction X //buffer A is dirty and not belones to any transaction __buffer_relink_io() //move it to the IO list __flush_batch() write_dirty_buffer() do_get_write_access() clear_buffer_dirty __jbd2_journal_file_buffer() //add buffer A to a new transaction Y lock_buffer(bh) //doesn't write out __jbd2_journal_remove_checkpoint() //finish checkpoint except buffer A //filesystem corrupt if the new transaction Y isn't fully write out. Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint() have already handles waiting for buffers under IO and re-added new transaction to complete commit, and it also removing cleaned buffers, this makes sure the list will eventually get empty. So it's fine to leave buffers on the t_checkpoint_list while flushing out and completely stop using the t_checkpoint_io_list. Cc: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
1 parent 06c2afb commit c2d6fd9

File tree

1 file changed

+29
-73
lines changed

1 file changed

+29
-73
lines changed

fs/jbd2/checkpoint.c

Lines changed: 29 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
5757
}
5858
}
5959

60-
/*
61-
* Move a buffer from the checkpoint list to the checkpoint io list
62-
*
63-
* Called with j_list_lock held
64-
*/
65-
static inline void __buffer_relink_io(struct journal_head *jh)
66-
{
67-
transaction_t *transaction = jh->b_cp_transaction;
68-
69-
__buffer_unlink_first(jh);
70-
71-
if (!transaction->t_checkpoint_io_list) {
72-
jh->b_cpnext = jh->b_cpprev = jh;
73-
} else {
74-
jh->b_cpnext = transaction->t_checkpoint_io_list;
75-
jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
76-
jh->b_cpprev->b_cpnext = jh;
77-
jh->b_cpnext->b_cpprev = jh;
78-
}
79-
transaction->t_checkpoint_io_list = jh;
80-
}
81-
8260
/*
8361
* Check a checkpoint buffer could be release or not.
8462
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
183161
struct buffer_head *bh = journal->j_chkpt_bhs[i];
184162
BUFFER_TRACE(bh, "brelse");
185163
__brelse(bh);
164+
journal->j_chkpt_bhs[i] = NULL;
186165
}
187166
*batch_count = 0;
188167
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
242221
jh = transaction->t_checkpoint_list;
243222
bh = jh2bh(jh);
244223

224+
/*
225+
* The buffer may be writing back, or flushing out in the
226+
* last couple of cycles, or re-adding into a new transaction,
227+
* need to check it again until it's unlocked.
228+
*/
245229
if (buffer_locked(bh)) {
246230
get_bh(bh);
247231
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
287271
}
288272
if (!buffer_dirty(bh)) {
289273
BUFFER_TRACE(bh, "remove from checkpoint");
290-
if (__jbd2_journal_remove_checkpoint(jh))
291-
/* The transaction was released; we're done */
274+
/*
275+
* If the transaction was released or the checkpoint
276+
* list was empty, we're done.
277+
*/
278+
if (__jbd2_journal_remove_checkpoint(jh) ||
279+
!transaction->t_checkpoint_list)
292280
goto out;
293-
continue;
281+
} else {
282+
/*
283+
* We are about to write the buffer, it could be
284+
* raced by some other transaction shrink or buffer
285+
* re-log logic once we release the j_list_lock,
286+
* leave it on the checkpoint list and check status
287+
* again to make sure it's clean.
288+
*/
289+
BUFFER_TRACE(bh, "queue");
290+
get_bh(bh);
291+
J_ASSERT_BH(bh, !buffer_jwrite(bh));
292+
journal->j_chkpt_bhs[batch_count++] = bh;
293+
transaction->t_chp_stats.cs_written++;
294+
transaction->t_checkpoint_list = jh->b_cpnext;
294295
}
295-
/*
296-
* Important: we are about to write the buffer, and
297-
* possibly block, while still holding the journal
298-
* lock. We cannot afford to let the transaction
299-
* logic start messing around with this buffer before
300-
* we write it to disk, as that would break
301-
* recoverability.
302-
*/
303-
BUFFER_TRACE(bh, "queue");
304-
get_bh(bh);
305-
J_ASSERT_BH(bh, !buffer_jwrite(bh));
306-
journal->j_chkpt_bhs[batch_count++] = bh;
307-
__buffer_relink_io(jh);
308-
transaction->t_chp_stats.cs_written++;
296+
309297
if ((batch_count == JBD2_NR_BATCH) ||
310-
need_resched() ||
311-
spin_needbreak(&journal->j_list_lock))
298+
need_resched() || spin_needbreak(&journal->j_list_lock) ||
299+
jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
312300
goto unlock_and_flush;
313301
}
314302

@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
322310
goto restart;
323311
}
324312

325-
/*
326-
* Now we issued all of the transaction's buffers, let's deal
327-
* with the buffers that are out for I/O.
328-
*/
329-
restart2:
330-
/* Did somebody clean up the transaction in the meanwhile? */
331-
if (journal->j_checkpoint_transactions != transaction ||
332-
transaction->t_tid != this_tid)
333-
goto out;
334-
335-
while (transaction->t_checkpoint_io_list) {
336-
jh = transaction->t_checkpoint_io_list;
337-
bh = jh2bh(jh);
338-
if (buffer_locked(bh)) {
339-
get_bh(bh);
340-
spin_unlock(&journal->j_list_lock);
341-
wait_on_buffer(bh);
342-
/* the journal_head may have gone by now */
343-
BUFFER_TRACE(bh, "brelse");
344-
__brelse(bh);
345-
spin_lock(&journal->j_list_lock);
346-
goto restart2;
347-
}
348-
349-
/*
350-
* Now in whatever state the buffer currently is, we
351-
* know that it has been written out and so we can
352-
* drop it from the list
353-
*/
354-
if (__jbd2_journal_remove_checkpoint(jh))
355-
break;
356-
}
357313
out:
358314
spin_unlock(&journal->j_list_lock);
359315
result = jbd2_cleanup_journal_tail(journal);

0 commit comments

Comments
 (0)