Correctifs appliqués

Andres Freund pushed:

Simon Riggs pushed:

Peter Eisentraut pushed:

�lvaro Herrera pushed:

  • Fix coding rules violations in walreceiver.c. 1. Since commit b1a9bad9e744 we had pstrdup() inside a spinlock-protected critical section; reported by Andreas Seltenreich. Turn those into strlcpy() to stack-allocated variables instead. Backpatch to 9.6. 2. Since commit 9ed551e0a4fd we had a pfree() uselessly inside a spinlock-protected critical section. Tom Lane noticed in code review. Move down. Backpatch to 9.6. 3. Since commit 64233902d22b we had GetCurrentTimestamp() (a kernel call) inside a spinlock-protected critical section. Tom Lane noticed in code review. Move it up. Backpatch to 9.2. 4. Since commit 1bb2558046cc we did elog(PANIC) while holding spinlock. Tom Lane noticed in code review. Release spinlock before dying. Backpatch to 9.2. Discussion:
  • Fix traversal of half-frozen update chains. When some tuple versions in an update chain are frozen due to them being older than freeze_min_age, the xmax/xmin trail can become broken. This breaks HOT (and probably other things). A subsequent VACUUM can break things in more serious ways, such as leaving orphan heap-only tuples whose root HOT redirect items were removed. This can be seen because index creation (or REINDEX) complain like ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t" Because of relfrozenxid contraints, we cannot avoid the freezing of the early tuples, so we must cope with the results: whenever we see an Xmin of FrozenTransactionId, consider it a match for whatever the previous Xmax value was. This problem seems to have appeared in 9.3 with multixact changes, though strictly speaking it seems unrelated. Since 9.4 we have commit 37484ad2a "Change the way we mark tuples as frozen", so the fix is simple: just compare the raw Xmin (still stored in the tuple header, since freezing merely set an infomask bit) to the Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so the original value is lost and we have nothing to compare the Xmax with. To cope with that case we need to compare the Xmin with FrozenXid, assume it's a match, and hope for the best. Sadly, since you can pg_upgrade a 9.3 instance containing half-frozen pages to newer releases, we need to keep the old check in newer versions too, which seems a bit brittle; I hope we can somehow get rid of that. I didn't optimize the new function for performance. The new coding is probably a bit slower than before, since there is a function call rather than a straight comparison, but I'd rather have it work correctly than be fast but wrong. This is a followup after 20b655224249 fixed a few related problems. Apparently, in 9.6 and up there are more ways to get into trouble, but in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so there must be a separate bug. Reported-by: Peter Geoghegan Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood, Yi Wen Wong, �lvaro Discussion:

Tom Lane pushed:

  • Fix race condition with unprotected use of a latch pointer variable. Commit 597a87ccc introduced a latch pointer variable to replace use of a long-lived shared latch in the shared WalRcvData structure. This was not well thought out, because there are now hazards of the pointer variable changing while it's being inspected by another process. This could obviously lead to a core dump in code like if (WalRcv->latch) SetLatch(WalRcv->latch); and there's a more remote risk of a torn read, if we have any platforms where reading/writing a pointer is not atomic. An actual problem would occur only if the walreceiver process exits (gracefully) while the startup process is trying to signal it, but that seems well within the realm of possibility. To fix, treat the pointer variable (not the referenced latch) as being protected by the WalRcv->mutex spinlock. There remains a race condition that we could apply SetLatch to a process latch that no longer belongs to the walreceiver, but I believe that's harmless: at worst it'd cause an extra wakeup of the next process to use that PGPROC structure. Back-patch to v10 where the faulty code was added. Discussion:
  • Allow multiple tables to be specified in one VACUUM or ANALYZE command. Not much to say about this; does what it says on the tin. However, formerly, if there was a column list then the ANALYZE action was implied; now it must be specified, or you get an error. This is because it would otherwise be a bit unclear what the user meant if some tables have column lists and some don't. Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some editorialization by me Discussion:
  • Adjust git_changelog for new-style release tags. It wasn't on board with REL_n_n format.
  • Improve comments in vacuum_rel() and analyze_rel(). Remove obsolete references to get_rel_oids(). Avoid listing specific relkinds in the comments, since we seem unable to keep such things in sync with the code, and it's not all that helpful anyhow. Noted by Michael Paquier, though I rewrote the comments a bit more. Discussion:
  • Fix typo in README. s/BeginInternalSubtransaction/BeginInternalSubTransaction/
  • #ifdef out some dead code in psql/mainloop.c. This pg_send_history() call is unreachable, since the block it's in is currently only entered in !cur_cmd_interactive mode. But rather than just delete it, make it #ifdef NOT_USED, in hopes that we'll remember to enable it if we ever change that decision. Per report from David Binderman. Since this is basically cosmetic, I see no great need to back-patch. Discussion:
  • Fix access-off-end-of-array in clog.c. Sloppy loop coding in set_status_by_pages() resulted in fetching one array element more than it should from the subxids[] array. The odds of this resulting in SIGSEGV are pretty small, but we've certainly seen that happen with similar mistakes elsewhere. While at it, we can get rid of an extra TransactionIdToPage() calculation per loop. Per report from David Binderman. Back-patch to all supported branches, since this code is quite old. Discussion:
  • Fix intra-query memory leakage in nodeProjectSet.c. Both ExecMakeFunctionResultSet() and evaluation of simple expressions need to be done in the per-tuple memory context, not per-query, else we leak data until end of query. This is a consideration that was missed while refactoring code in the ProjectSet patch (note that in pre-v10, ExecMakeFunctionResult is called in the per-tuple context). Per bug #14843 from Ben M. Diagnosed independently by Andres and myself. Discussion:
  • Fix crash when logical decoding is invoked from a PL function. The logical decoding functions do BeginInternalSubTransaction and RollbackAndReleaseCurrentSubTransaction to clean up after themselves. It turns out that AtEOSubXact_SPI has an unrecognized assumption that we always need to cancel the active SPI operation in the SPI context that surrounds the subtransaction (if there is one). That's true when the RollbackAndReleaseCurrentSubTransaction call is coming from the SPI-using function itself, but not when it's happening inside some unrelated function invoked by a SPI query. In practice the affected callers are the various PLs. To fix, record the current subtransaction ID when we begin a SPI operation, and clean up only if that ID is the subtransaction being canceled. Also, remove AtEOSubXact_SPI's assertion that it must have cleaned up the surrounding SPI context's active tuptable. That's proven wrong by the same test case. Also clarify (or, if you prefer, reinterpret) the calling conventions for _SPI_begin_call and _SPI_end_call. The memory context cleanup in the latter means that these have always had the flavor of a matched resource-management pair, but they weren't documented that way before. Per report from Ben Chobot. Back-patch to 9.4 where logical decoding came in. In principle, the SPI changes should go all the way back, since the problem dates back to commit 7ec1c5a86. But given the lack of field complaints it seems few people are using internal subtransactions in this way. So I don't feel a need to take any risks in 9.2/9.3. Discussion:
  • Clean up sloppy maintenance of regression test schedule files. The partition_join test was added to a parallel group that was already at the maximum of 20 concurrent tests. The hash_func test wasn't added to serial_schedule at all. The identity and partition_join tests were added to serial_schedule with the aid of a dartboard, rather than maintaining consistency with parallel_schedule. There are proposals afoot to make these sorts of errors harder to make, but in the meantime let's fix the ones already in place. Discussion:
  • Enforce our convention about max number of parallel regression tests. We have a very old rule that parallel_schedule should have no more than twenty tests in any one parallel group, so as to provide a bound on the number of concurrently running processes needed to pass the tests. But people keep forgetting the rule, so let's add a few lines of code to check it. Discussion:
  • Improve pg_regress's error reporting for schedule-file problems. The previous coding here trashed the line buffer as it scanned it, making it impossible to print the source line in subsequent error messages. With a few save/restore/strdup pushups we can improve that situation. In passing, move the free'ing of the various strings that are collected while processing one set of tests down to the bottom of the loop. That's simpler, less surprising, and should make valgrind less unhappy about the strings that were previously leaked by the last iteration.
  • Reduce "X = X" to "X IS NOT NULL", if it's easy to do so. If the operator is a strict btree equality operator, and X isn't volatile, then the clause must yield true for any non-null value of X, or null if X is null. At top level of a WHERE clause, we can ignore the distinction between false and null results, so it's valid to simplify the clause to "X IS NOT NULL". This is a useful improvement mainly because we'll get a far better selectivity estimate in most cases. Because such cases seldom arise in well-written queries, it is unappetizing to expend a lot of planner cycles looking for them ... but it turns out that there's a place we can shoehorn this in practically for free, because equivclass.c already has to detect and reject candidate equivalences of the form X = X. That doesn't catch every place that it would be valid to simplify to X IS NOT NULL, but it catches the typical case. Working harder doesn't seem justified. Patch by me, reviewed by Petr Jelinek Discussion:
  • Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion:

Robert Haas pushed:

Correctifs en attente

Tom Lane sent in another revision of a patch to improve eval const expressions.

Micha�l Paquier sent in another revision of a patch to change detection of corrupted 2PC files to produce a FATAL error and minimize the window between history file and end-of-recovery record.

Claudio Freire sent in another revision of a patch to enable VACUUM to use over 1GB of work_mem.

Emre Hasegeli sent in another revision of a patch to improve geometric types.

Robert Haas sent in four revisions of a patch to widen queryId to 64 bits.

Vik Fearing sent in a patch to log idle checkpoints.

Andres Freund sent in a patch to combine expr{Type,Typmod,Collation}() into one function.

Andres Freund sent in a patch to add pg_strnlen(), a portable implementation of strlen and fix pnstrdup() to not memcpy() the maximum allowed.

Ildar Musin sent in another revision of a patch to factor out some repetitive code in RI triggers.

Yura Sokolov sent in two revisions of a patch to make CheckDeadlock do two passes in order to prevent a deadlock condition it could itself cause under high load.

Andres Freund sent in another revision of a patch to add configure infrastructure to detect support for C99's restrict, allow to avoid NUL-byte management for stringinfos and use in format.c, add more efficient functions to the pqformat API, use one stringbuffer for all rows printed in printtup.c, improve the performance of SendRowDescriptionMessage, and replace remaining printtup uses of pq_sendint with pq_sendintXX.

Alexander Korotkov sent in a patch to add TOAST to all system catalog tables with ACL.

Nico Williams sent in three revisions of a patch to add an ALWAYS DEFERRED option for constraints.

Amit Kapila sent in two more revisions of a patch to parallelize queries containing subplans.

Petr Jel�nek sent in a patch to fix an issue in logical replication by setting assigning GetCurrentCommandId in logical replication's estate.

Alexander Kuzmenkov sent in another revision of a patch to enable a full merge join on comparison clause.

Amit Khandekar sent in another revision of a patch to enable UPDATEs on partition keys in declaratively partitioned tables that would have the effect of moving tuples from one partition to another.

Nathan Bossart sent in a patch to add additional logging for VACUUM and ANALYZE.

Sean Chittenden sent in two more revisions of a patch to fix a bug where the system failed to use optimized shared memory on Solaris.

Andres Freund sent in another revision of a patch to implement JIT compiling.

Badrul Chowdhury sent in two revisions of a patch to implement wire protocol negotiation.

Vaishnavi Prabakaran sent in another revision of a patch to add pipelining batch support to libpq.

Masahiko Sawada sent in another revision of a patch to implement block-level parallel VACUUM.

Thomas Munro sent in another revision of a patch to remove BufFile's isTemp flag and add BufFileSet for sharing temporary files between backends.

Robert Haas sent in another revision of a patch to change walwriter wakeup.

Shubham Barai sent in another revision of a patch to add predicate locking for GiST indexes.

Fabr�zio de Royes Mello sent in another revision of a patch to add hooks for session start and end.

Amit Langote sent in a patch to improve error message for check_default_allows_bound.

Peter Geoghegan sent in another revision of a patch to add a Bloom filter data structure implementation and use same to add amcheck verification of indexes against heap.

Craig Ringer sent in a patch to expose the generate_qualified_relation_name functionality to C by adding get_qualified_relation_name().

Jing Wang sent in another revision of a patch to support to COMMENT ON DATABASE CURRENT_DATABASE.

Dilip Kumar sent in two more revisions of a patch to improve bitmap costing for lossy pages.

Tom Lane sent in a patch to enforce max test parallelism.