Correctifs appliqués

Andrew Dunstan pushed:

Tom Lane pushed:

  • Make construct_[md_]array return a valid empty array for zero-size input. If construct_array() or construct_md_array() were given a dimension of zero, they'd produce an array that contains no elements but has positive dimension. This violates a general expectation that empty arrays should have ndims = 0; in particular, while arrays like this print as empty, they don't compare equal to other empty arrays. Up to now we've expected callers to avoid making such calls and instead be careful to call construct_empty_array() if there would be no elements. But this has always been an easily missed case, and we've repeatedly had to fix callers to do it right. In bug #14826, Erwin Brandstetter pointed out yet another such oversight, in ts_lexize(); and a bit of examination of other call sites found at least two more with similar issues. So let's fix the problem centrally and permanently by changing these two functions to construct a proper zero-D empty array whenever the array would be empty. This renders a few explicit calls of construct_empty_array() redundant, but the only such place I found that really seemed worth changing was in ExecEvalArrayExpr(). Although this fixes some very old bugs, no back-patch: the problem is pretty minor and the risk of changing behavior seems to outweigh the benefit in stable branches. Discussion: Discussion:
  • Avoid SIGBUS on Linux when a DSM memory request overruns tmpfs. On Linux, shared memory segments created with shm_open() are backed by swap files created in tmpfs. If the swap file needs to be extended, but there's no tmpfs space left, you get a very unfriendly SIGBUS trap. To avoid this, force allocation of the full request size when we create the segment. This adds a few cycles, but none that we wouldn't expend later anyway, assuming the request isn't hugely bigger than the actual need. Make this code #ifdef __linux__, because (a) there's not currently a reason to think the same problem exists on other platforms, and (b) applying posix_fallocate() to an FD created by shm_open() isn't very portable anyway. Back-patch to 9.4 where the DSM code came in. Thomas Munro, per a bug report from Amul Sul Discussion:
  • Use a blacklist to distinguish original from add-on enum values. Commit 15bc038f9 allowed ALTER TYPE ADD VALUE to be executed inside transaction blocks, by disallowing the use of the added value later in the same transaction, except under limited circumstances. However, the test for "limited circumstances" was heuristic and could reject references to enum values that were created during CREATE TYPE AS ENUM, not just later. This breaks the use-case of restoring pg_dump scripts in a single transaction, as reported in bug #14825 from Balazs Szilfai. We can improve this by keeping a "blacklist" table of enum value OIDs created by ALTER TYPE ADD VALUE during the current transaction. Any visible-but-uncommitted value whose OID is not in the blacklist must have been created by CREATE TYPE AS ENUM, and can be used safely because it could not have a lifespan shorter than its parent enum type. This change also removes the restriction that a renamed enum value can't be used before being committed (unless it was on the blacklist). Andrew Dunstan, with cosmetic improvements by me. Back-patch to v10. Discussion:
  • Remove heuristic same-transaction test from check_safe_enum_use(). The blacklist mechanism added by the preceding commit directly fixes most of the practical cases that the same-transaction test was meant to cover. What remains is use-cases like begin; create type e as enum('x'); alter type e add value 'y'; -- use 'y' somehow commit; However, because the same-transaction test is heuristic, it fails on small variants of that, such as renaming the type or changing its owner. Rather than try to explain the behavior to users, let's remove it and just have a rule that the newly added value can't be used before being committed, full stop. Perhaps later it will be worth the implementation effort and overhead to have a more accurate test for type-was-created-in-this-transaction. We'll wait for some field experience with v10 before deciding to do that. Back-patch to v10. Discussion:
  • Fix failure-to-read-man-page in commit 899bd785c. posix_fallocate() is not quite a drop-in replacement for fallocate(), because it is defined to return the error code as its function result, not in "errno". I (tgl) missed this because RHEL6's version seems to set errno as well. That is not the case on more modern Linuxen, though, as per buildfarm results. Aside from fixing the return-convention confusion, remove the test for ENOSYS; we expect that glibc will mask that for posix_fallocate, though it does not for fallocate. Keep the test for EINTR, because POSIX specifies that as a possible result, and buildfarm results suggest that it can happen in practice. Back-patch to 9.4, like the previous commit. Thomas Munro Discussion:
  • Improve wording of error message added in commit 714805010. Per suggestions from Peter Eisentraut and David Johnston. Back-patch, like the previous commit. Discussion:
  • Revert to 9.6 treatment of ALTER TYPE enumtype ADD VALUE. This reverts commit 15bc038f9, along with the followon commits 1635e80d3 and 984c92074 that tried to clean up the problems exposed by bug #14825. The result was incomplete because it failed to address parallel-query requirements. With 10.0 release so close upon us, now does not seem like the time to be adding more code to fix that. I hope we can un-revert this code and add the missing parallel query support during the v11 cycle. Back-patch to v10. Discussion:
  • Fix behavior when converting a float infinity to numeric. float8_numeric() and float4_numeric() failed to consider the possibility that the input is an IEEE infinity. The results depended on the platform-specific behavior of sprintf(): on most platforms you'd get something like ERROR: invalid input syntax for type numeric: "inf" but at least on Windows it's possible for the conversion to succeed and deliver a finite value (typically 1), due to a nonstandard output format from sprintf and lack of syntax error checking in these functions. Since our numeric type lacks the concept of infinity, a suitable conversion is impossible; the best thing to do is throw an explicit error before letting sprintf do its thing. While at it, let's use snprintf not sprintf. Overrunning the buffer should be impossible if sprintf does what it's supposed to, but this is cheap insurance against a stack smash if it doesn't. Problem reported by Taiki Kondo. Patch by me based on fix suggestion from KaiGai Kohei. Back-patch to all supported branches. Discussion:
  • Marginal improvement for generated code in execExprInterp.c. Avoid the coding pattern "*op->resvalue = f();", as some compilers think that requires them to evaluate "op->resvalue" before the function call. Unless there are lots of free registers, this can lead to a useless register spill and reload across the call. I changed all the cases like this in ExecInterpExpr(), but didn't bother in the out-of-line opcode eval subroutines, since those are presumably not as performance-critical. Discussion:
  • Fix inadequate locking during get_rel_oids(). get_rel_oids used to not take any relation locks at all, but that stopped being a good idea with commit 3c3bb9933, which inserted a syscache lookup into the function. A concurrent DROP TABLE could now produce "cache lookup failed", which we don't want to have happen in normal operation. The best solution seems to be to transiently take a lock on the relation named by the RangeVar (which also makes the result of RangeVarGetRelid a lot less spongy). But we shouldn't hold the lock beyond this function, because we don't want VACUUM to lock more than one table at a time. (That would not be a big problem right now, but it will become one after the pending feature patch to allow multiple tables to be named in VACUUM.) In passing, adjust vacuum_rel and analyze_rel to document that we don't trust the passed RangeVar to be accurate, and allow the RangeVar to possibly be NULL --- which it is anyway for a whole-database VACUUM, though we accidentally didn't crash for that case. The passed RangeVar is in fact inaccurate when dealing with a child partition, as of v10, and it has been wrong for a whole long time in the case of vacuum_rel() recursing to a TOAST table. None of these things present visible bugs up to now, because the passed RangeVar is in fact only consulted for autovacuum logging, and in that particular context it's always accurate because autovacuum doesn't let vacuum.c expand partitions nor recurse to toast tables. Still, this seems like trouble waiting to happen, so let's nail the door at least partly shut. (Further cleanup is planned, in HEAD only, as part of the pending feature patch.) Fix some sadly inaccurate/obsolete comments too. Back-patch to v10. Michael Paquier and Tom Lane Discussion:
  • Support arrays over domains. Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion:
  • Fix pg_dump to assign domain array type OIDs during pg_upgrade. During a binary upgrade, all type OIDs are supposed to be assigned by pg_dump based on their values in the old cluster. But now that domains have arrays, there's nothing to base the arrays' type OIDs on, if we're upgrading from a pre-v11 cluster. Make pg_dump search for an unused type OID to use for this purpose. Per buildfarm. Discussion:
  • Use a longer connection timeout in pg_isready test. Buildfarm members skink and sungazer have both recently failed this test, with symptoms indicating that the default 3-second timeout isn't quite enough for those very slow systems. There's no reason to be miserly with this timeout, so boost it to 60 seconds. Back-patch to all versions containing this test. That may be overkill, because the failure has only been observed in the v10 branch, but I don't feel like having to revisit this later.
  • Update v10 release notes, and set the official release date. Last(?) round of changes for 10.0.

Robert Haas pushed:

Peter Eisentraut pushed:

  • Handle heap rewrites better in logical replication. A FOR ALL TABLES publication naturally considers all base tables to be a candidate for replication. This includes transient heaps that are created during a table rewrite during DDL. This causes failures on the subscriber side because it will not have a table like pg_temp_16386 to receive data (and if it did, it would be the wrong table). The prevent this problem, we filter out any tables that match this naming pattern and match an actual table from FOR ALL TABLES publications. This is only a heuristic, meaning that user tables that match that naming could accidentally be omitted. A more robust solution might require an explicit marking of such tables in pg_class somehow. Reported-by: yxq <> Bug: #14785 Reviewed-by: Andres Freund <> Reviewed-by: Petr Jelinek <>
  • Sort pg_basebackup options better. The --slot option somehow ended up under options controlling the output, and some other options were in a nonsensical place or were not moved after recent renamings, so tidy all that up a bit.
  • Turn on log_replication_commands in PostgresNode. This is useful for example for the pg_basebackup and related tests.
  • Add some more pg_receivewal tests. Add some more tests for the --create-slot and --drop-slot options, verifying that the right kind of slot was created and that the slot was dropped. While working on an unrelated patch for pg_basebackup, some of this was temporarily broken without any tests noticing.
  • pg_basebackup: Add option to create replication slot. When requesting a particular replication slot, the new pg_basebackup option -C/--create-slot creates it before starting to replicate from it. Further refactor the slot creation logic to include the temporary slot creation logic into the same function. Add new arguments is_temporary and preserve_wal to CreateReplicationSlot(). Print in --verbose mode that a slot has been created. Author: Michael Banck <>
  • Get rid of parameterized marked sections in SGML. Previously, we created a variant of the installation instructions for producing the plain-text INSTALL file by marking up certain parts of installation.sgml using SGML parameterized marked sections. Marked sections will not work anymore in XML, so before we can convert the documentation to XML, we need a new approach. DocBook provides a "profiling" feature that allows selecting content based on attributes, which would work here. But it imposes a noticeable overhead when building the full documentation and causes complications when building some output formats, and given that we recently spent a fair amount of effort optimizing the documentation build time, it seems sad to have to accept that. So as an alternative, (1) we create our own mini-profiling layer that adjusts just the text we want, and (2) assemble the pieces of content that we want in the INSTALL file using XInclude. That way, there is no overhead when building the full documentation and most of the "ugly" stuff in installation.sgml can be removed and dealt with out of line.
  • Fix plperl build. The changes in 639928c988c1c2f52bbe7ca89e8c7c78a041b3e2 turned out to require Perl 5.9.3, which is newer than our minimum required version. So revert back to the old code for the normal case and only use the new variant when both coverage and vpath are used. As the minimum Perl version moves forward, we can drop the old code sometime.
  • Improve vpath support in plperl build. Run xsubpp with the -output option instead of redirecting stdout. That ensures that the #line directives in the output file point to the right place in a vpath build. This in turn fixes an error in coverage builds that it can't find the source files. Refactor the makefile rules while we're here. Reviewed-by: Michael Paquier <>
  • Run only top-level recursive lcov. This is the way lcov was intended to be used. It is much faster and more robust and makes the makefiles simpler than running it in each subdirectory. The previous coding ran gcov before lcov, but that is useless because lcov/geninfo call gcov internally and use that information. Moreover, this led to complications and failures during parallel make. This separates the two targets: You either use "make coverage" to get textual output from gcov or "make coverage-html" to get an HTML report via lcov. (Using both is still problematic because they write the same output files.) Reviewed-by: Michael Paquier <>
  • Have lcov exclude external files. Call lcov with --no-external option to exclude external files (for example, system headers with inline functions) from output. Reviewed-by: Michael Paquier <>
  • Remove SGML marked sections. For XML compatibility, replace marked sections <![IGNORE[ ]]> with comments <!-- -->. In some cases it seemed better to remove the ignored text altogether, and in one case the text should not have been ignored.
  • Add lcov --initial. By just running lcov on the produced .gcda data files, we don't account for source files that are not touched by tests at all. To fix that, run lcov --initial to create a base line info file with all zero counters, and merge that with the actual counters when creating the final report. Reviewed-by: Michael Paquier <>
  • Add PostgreSQL version to coverage output. Also make overriding the title easier. That helps telling where the report came from and labeling different variants of a report. Reviewed-by: Michael Paquier <>
  • Add background worker type. Add bgw_type field to background worker structure. It is intended to be set to the same value for all workers of the same type, so they can be grouped in pg_stat_activity, for example. The backend_type column in pg_stat_activity now shows bgw_type for a background worker. The ps listing also no longer calls out that a process is a background worker but just show the bgw_type. That way, being a background worker is more of an implementation detail now that is not shown to the user. However, most log messages still refer to 'background worker "%s"'; otherwise constructing sensible and translatable log messages would become tricky. Reviewed-by: Michael Paquier <> Reviewed-by: Daniel Gustafsson <>
  • psql: Update \d sequence display. For \d sequencename, the psql code just did SELECT * FROM sequencename to get the information to display, but this does not contain much interesting information anymore in PostgreSQL 10, because the metadata has been moved to a separate system catalog. This patch creates a newly designed sequence display that is not merely an extension of the general relation/table display as it was previously. Reviewed-by: Fabien COELHO <>
  • Use Py_RETURN_NONE where suitable. This is more idiomatic style and available as of Python 2.4, which is our minimum.
  • Add list of acknowledgments to release notes. This contains all individuals mentioned in the commit messages during PostgreSQL 10 development. current through babf18579455e85269ad75e1ddb03f34138f77b6 Discussion:

Noah Misch pushed:

Dean Rasheed pushed:

Álvaro Herrera pushed:

  • Fix freezing of a dead HOT-updated tuple. Vacuum calls page-level HOT prune to remove dead HOT tuples before doing liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But concurrent transaction commit/abort may turn DEAD some of the HOT tuples that survived the prune, before HeapTupleSatisfiesVacuum tests them. This happens to activate the code that decides to freeze the tuple ... which resuscitates it, duplicating data. (This is especially bad if there's any unique constraints, because those are now internally violated due to the duplicate entries, though you won't know until you try to REINDEX or dump/restore the table.) One possible fix would be to simply skip doing anything to the tuple, and hope that the next HOT prune would remove it. But there is a problem: if the tuple is older than freeze horizon, this would leave an unfrozen XID behind, and if no HOT prune happens to clean it up before the containing pg_clog segment is truncated away, it'd later cause an error when the XID is looked up. Fix the problem by having the tuple freezing routines cope with the situation: don't freeze the tuple (and keep it dead). In the cases that the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag so that there is no need to look up the XID in pg_clog later on. An isolation test is included, authored by Michael Paquier, loosely based on Daniel Wood's original reproducer. It only tests one particular scenario, though, not all the possible ways for this problem to surface; it be good to have a more reliable way to test this more fully, but it'd require more work. In message I outlined another test case (more closely matching Dan Wood's) that exposed a few more ways for the problem to occur. Backpatch all the way back to 9.3, where this problem was introduced by multixact juggling. In branches 9.3 and 9.4, this includes a backpatch of commit e5ff9fefcd50 (of 9.5 era), since the original is not correctable without matching the coding pattern in 9.5 up. Reported-by: Daniel Wood Diagnosed-by: Daniel Wood Reviewed-by: Yi Wen Wong, Michaël Paquier Discussion:

Andres Freund pushed:

Heikki Linnakangas pushed:

Correctifs en attente

Rafia Sabih sent in another revision of a patch to speed up gather.

Alexander Kuzmenkov sent in another revision of a patch to implement CSN-based snapshots.

Doug Rady sent in a patch to pgbench to break out timing data for the initialization phases.

Doug Rady sent in a patch to enable building pgbench to use ppoll() instead of select() to allow for more than (FD_SETSIZE - 10) connections.

Thomas Munro sent in another revision of a patch to get parallel queries working with SERIALIZABLE isolation mode.

Amit Langote sent in two more revisions of a patch to set pd_lower correctly in the GIN, BRIN, and SP-GiST metapages.

Peter Geoghegan sent in a patch to consistently canonicalize ICU collations' collcollate as BCP 47.

Maksim Milyutin sent in two revisions of a patch to fix a cache invalidation bug which manifests in queries that contain constants of a temporary composite type.

Shubham Barai sent in another revision of a patch to implement predicate locking for hash indexes.

Michaël Paquier sent in a patch to shore up some shaky coding for vacuuming partitioned relations.

Chen Huajun sent in six more revisions of a patch to make pg_rewind to not copy useless WAL files.

Michaël Paquier sent in another revision of a patch to remove ALLOW_DANGEROUS_LO_FUNCTIONS for LO-related superuser checks, replace superuser checks of large object import/export with ACL checks, and move ACL checks for large objects when opening them.

Beena Emerson sent in a PoC patch to implement runtime partition pruning.

Amit Langote sent in a patch to move certain partitioning code to the executor.

Amit Langote sent in a patch to teach ValidatePartitionConstraints to skip validation in more cases and skip scanning default partition's child tables if possible.

Haribabu Kommi sent in another revision of a patch to add a pg_stat_walwrites statistics view.

Amul Sul sent in a patch to restrict concurrent update/delete with UPDATE of a partition key.

Emre Hasegeli sent in another revision of a patch to refactor the geometric functions and operators code, provide a header file for the built-in float datatypes, use the built-in float datatype to implement geometric types, and fix some obvious problems around the line datatype.

Pavel Stěhule sent in two more revisions of a patch to add default namespaces for XPath expressions.

Michaël Paquier sent in a patch to fix an infelicity in the use of RangeVar for partitioned tables in autovacuum.

Jeevan Chalke sent in another revision of a patch to implement partition-wise aggregation/grouping.

Kyotaro HORIGUCHI and Yura Sokolov traded patches to add a failing test: wal_sender_timeout+logical decoding of a big transaction, and fix walsender timeouts when decoding large transaction.

Nathan Bossart sent in three more revisions of a patch to enable specifying multiple tables in VACUUM.

Stas Kelvich sent in another revision of a patch to fix an issue with transactions involving multiple postgres foreign servers by adding a contrib extension called fdw_transaction_resovler.

Amit Langote sent in another revision of a patch to make planner-side changes for partition-pruning, interface changes for partition_bound_{cmp/bsearch}, implement get_partitions_for_keys(), and add more tests for the new partitioning-related planning code.

Amul Sul and Amit Langote traded patches to implement hash partitioning.

Etsuro Fujita sent in a patch to change postgresPlanForeignModify so that it handles "with check option" the same way as for the RETURNING case.

Alexander Kuzmenkov sent in another revision of a patch to implement full merge join on comparison clause.

Tom Lane sent in a patch to modifies eqjoinsel_semi by replacing the previous number-of-distinct-values estimate for the inner rel inner_rel->rows, effectively assuming that the inside of the IN or EXISTS is unique, and dropping the fallback to selectivity 0.5 altogether, instead applying the nd1 vs nd2 heuristic all the time.

Jesper Pedersen sent in a patch to change the message for restarting a server from a directory without a PID file to account for the case where a restart happens after an initdb.

Andres Freund sent in a patch to speed up fmgr_isbuiltin() by keeping an oid -> builtin mapping.

Fabien COELHO sent in a patch to fix an issue where pgbench would fail but get stuck with 100% CPU usage.

Konstantin Knizhnik sent in two revisions of a patch to add pg_prepared_xact_status().

Amit Khandekar sent in another revision of a patch to implement parallel append.

Martin Marques sent in two revisions of a patch to add an option to pg_basebackup to output messages as if it were running in batch-mode, as opossed to running in a tty.

Nikolay Shaplov sent in a patch to add a series of tests that triggers reloptions related code in all access methods.

Shubham Barai sent in another revision of a patch to implement predicate locking for GIN indexes.

Nikita Glukhov sent in another revision of a patch to implement SQL/JSON.