Correctifs appliqués

Peter Eisentraut pushed:

Heikki Linnakangas pushed:

Tom Lane pushed:

  • In PQsendQueryStart(), avoid leaking any left-over async result. Ordinarily there would not be an async result sitting around at this point, but it appears that in corner cases there can be. Considering all the work we're about to launch, it's hardly going to cost anything noticeable to check. It's been like this forever, so back-patch to all supported branches. Report: <>
  • Update user docs for switch to POSIX semaphores. Since commit ecb0d20a9 hasn't crashed and burned, here's the promised docs update for it. In addition to explaining that Linux and FreeBSD ports now use POSIX semaphores, I did some wordsmithing on pre-existing wording; in particular trying to clarify which SysV parameters need to be set with an eye to total usage across all applications.
  • Improve documentation for CREATE RECURSIVE VIEW. It was perhaps not entirely clear that internal self-references shouldn't be schema-qualified even if the view name is written with a schema. Spell it out. Discussion: <>
  • Docs: grammatical fix. Fix poor grammar introduced in 741ccd501.
  • Remove "sco" and "unixware" ports. SCO OpenServer and SCO UnixWare are more or less dead platforms. We have never had a buildfarm member testing the "sco" port, and the last "unixware" member was last heard from in 2012, so it's fair to doubt that the code even compiles anymore on either one. Remove both ports. We can always undo this if someone shows up with an interest in maintaining and testing these platforms. Discussion: <>
  • Drop server support for FE/BE protocol version 1.0. While this isn't a lot of code, it's been essentially untestable for a very long time, because libpq doesn't support anything older than protocol 2.0, and has not since release 6.3. There's no reason to believe any other client-side code still uses that protocol, either. Discussion: <>
  • Provide DLLEXPORT markers for C functions via PG_FUNCTION_INFO_V1 macro. This isn't really necessary for our own code, because we use a .DEF file in MSVC builds (see, or --export-all-symbols in MinGW and Cygwin builds, to ensure that all global symbols in loadable modules will be exported on Windows. However, third-party authors might use different build processes that need this marker, and it's harmless enough for our own builds. To some extent, this is an oversight in commit e7128e8db, so back-patch to 9.4 where that was added. Laurenz Albe. Discussion: <>
  • Remove unnecessary int2vector-specific hash function and equality operator. These functions were originally added in commit d8cedf67a to support use of int2vector columns as catcache lookup keys. However, there are no catcaches that use such columns. (Indeed I now think it must always have been dead code: a catcache with such a key column would need an underlying unique index on the column, but we've never had an int2vector btree opclass.) Getting rid of the int2vector-specific operator and function does not lose any functionality, because operations on int2vectors will now fall back to the generic anyarray support. This avoids a wart that a btree index on an int2vector column (made using anyarray_ops) would fail to match equality searches, because int2vectoreq wasn't a member of the opclass. We don't really care much about that, since int2vector is not meant as a type for users to use, but it's silly to have extra code and less functionality. If we ever do want a catcache to be indexed by an int2vector column, we'd need to put back full btree and hash opclasses for int2vector, comparable to the support for oidvector. (The anyarray code can't be used at such a low level, because it needs to do catcache lookups.) But we'll deal with that if/when the need arises. Also worth noting is that removal of the hash int2vector_ops opclass will break any user-created hash indexes on int2vector columns. While hash anyarray_ops would serve the same purpose, it would probably not compute the same hash values and thus wouldn't be on-disk-compatible. Given that int2vector isn't a user-facing type and we're planning other incompatible changes in hash indexes for v10 anyway, this doesn't seem like something to worry about, but it's probably worth mentioning here. Amit Langote Discussion: <>
  • Remove pg_dump/pg_dumpall support for dumping from pre-8.0 servers. The need for dumping from such ancient servers has decreased to about nil in the field, so let's remove all the code that catered to it. Aside from removing a lot of boilerplate variant queries, this allows us to not have to cope with servers that don't have (a) schemas or (b) pg_depend. That means we can get rid of assorted squishy code around that. There may be some nonobvious additional simplifications possible, but this patch already removes about 1500 lines of code. I did not remove the ability for pg_restore to read custom-format archives generated by these old versions (and light testing says that that does still work). If you have an old server, you probably also have a pg_dump that will work with it; but you have an old custom-format backup file, that might be all you have. It'd be possible at this point to remove fmtQualifiedId()'s version argument, but I refrained since that would affect code outside pg_dump. Discussion: <>
  • pg_dump's getTypes() needn't retrieve typinput or typoutput anymore. Commit 64f3524e2 removed the stanza of code that examined these values. I failed to notice they were unnecessary because my compiler didn't warn about the un-read variables. Noted by Peter Eisentraut.
  • Revert addition of PGDLLEXPORT in PG_FUNCTION_INFO_V1 macro. This turns out not to be as harmless as I thought: MSVC will complain if it sees an "extern" declaration without PGDLLEXPORT and then one with. (Seems fairly silly, given that this can be changed after the fact by the linker, but there you have it.) Therefore, contrib modules that have extern's for V1 functions in header files are falling over in the buildfarm, since none of those externs are marked PGDLLEXPORT. We might or might not conclude that we're willing to plaster those declarations with PGDLLEXPORT in HEAD, but in any case there's no way we're going to ship this change in the back branches. Third-party authors would not thank us for breaking their code in a minor release. Hence, revert the addition of PGDLLEXPORT (but let's keep the extra info in the comment). If we do the other changes we can revert this commit in HEAD. Per buildfarm.
  • Fix broken jsonb_set() logic for replacing array elements. Commit 0b62fd036 did a fairly sloppy job of refactoring setPath() to support jsonb_insert() along with jsonb_set(). In its defense, though, there was no regression test case exercising the case of replacing an existing element in a jsonb array. Per bug #14366 from Peng Sun. Back-patch to 9.6 where bug was introduced. Report: <>
  • Fix pg_dumpall regression test to be locale-independent. The expected results in commit b4fc64578 seem to have been generated in a non-C locale, which just points up the fact that the ORDER BY clause was locale-sensitive. Per buildfarm.
  • Clean up handling of anonymous mmap'd shared-memory segment. Fix detaching of the mmap'd segment to have its own on_shmem_exit callback, rather than piggybacking on the one for detaching from the SysV segment. That was confusing, and given the distance between the two attach calls, it was trouble waiting to happen. Make the detaching calls idempotent by clearing AnonymousShmem to show we've already unmapped. I spent quite a bit of time yesterday trying to find a path that would allow the munmap()'s to be done twice, and while I did not succeed, it seems silly that there's even a question. Make the #ifdef logic less confusing by separating "do we want to use anonymous shmem" from EXEC_BACKEND. Even though there's no current scenario where those conditions are different, it is not helpful for different places in the same file to be testing EXEC_BACKEND for what are fundamentally different reasons. Don't do on_exit_reset() in StartBackgroundWorker(). At best that's useless (InitPostmasterChild would have done it already) and at worst it could zap some callback that's unrelated to shared memory. Improve comments, and simplify the huge_pages enablement logic slightly. Back-patch to 9.4 where hugepage support was introduced. Arguably this should go into 9.3 as well, but the code looks significantly different there, and I doubt it's worth the trouble of adapting the patch given I can't show a live bug.
  • Try to find out the actual hugepage size when making a MAP_HUGETLB request. Even if Linux's mmap() is okay with a partial-hugepage request, munmap() is not, as reported by Chris Richards. Therefore it behooves us to try a bit harder to find out the actual hugepage size, instead of assuming that we can skate by with a guess. For the moment, just look into /proc/meminfo to find out the default hugepage size, and use that. Later, on kernels that support requests for nondefault sizes, we might try to consider other alternatives. But that smells more like a new feature than a bug fix, especially if we want to provide any way for the DBA to control it, so leave it for another day. I set this up to allow easy addition of platform-specific code for non-Linux platforms, if needed; but right now there are no reports suggesting that we need to work harder on other platforms. Back-patch to 9.4 where hugepage support was introduced. Discussion: <>
  • Remove dead code in pg_dump. I'm not sure if this provision for "pg_backup" behaving a bit differently from "pg_dump" ever did anything useful in a released version. But it's definitely dead code now. Michael Paquier
  • Fix another bug in merging of inherited CHECK constraints. It's not good for an inherited child constraint to be marked connoinherit; that would result in the constraint not propagating to grandchild tables, if any are created later. The code mostly prevented this from happening but there was one case that was missed. This is somewhat related to commit e55a946a8, which also tightened checks on constraint merging. Hence, back-patch to 9.2 like that one. This isn't so much because there's a concrete feature-related reason to stop there, as to avoid having more distinct behaviors than we have to in this area. Amit Langote Discussion: <>
  • Fix handling of pgstat counters for TRUNCATE in a prepared transaction. pgstat_twophase_postcommit is supposed to duplicate the math in AtEOXact_PgStat, but it had missed out the bit about clearing t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE. It's harder than you might think to replicate the issue here, because those counters would only be nonzero when a previous transaction in the same backend had added/deleted tuples in the truncated table, and those counts hadn't been sent to the stats collector yet. Evident oversight in commit d42358efb. I've not added a regression test for this; we tried to add one in d42358efb, and had to revert it because it was too timing-sensitive for the buildfarm. Back-patch to 9.5 where d42358efb came in. Stas Kelvich Discussion: <>
  • Fix assorted integer-overflow hazards in varbit.c. bitshiftright() and bitshiftleft() would recursively call each other infinitely if the user passed INT_MIN for the shift amount, due to integer overflow in negating the shift amount. To fix, clamp to -VARBITMAXLEN. That doesn't change the results since any shift distance larger than the input bit string's length produces an all-zeroes result. Also fix some places that seemed inadequately paranoid about input typmods exceeding VARBITMAXLEN. While a typmod accepted by anybit_typmodin() will certainly be much less than that, at least some of these spots are reachable with user-chosen integer values. Andreas Seltenreich and Tom Lane Discussion: <>

Andres Freund pushed:

  • Make regression tests less dependent on hash table order. Upcoming changes to the hash table code used, among others, for grouping and set operations will change the output order for a few queries. To make it less likely that actual bugs are hidden between regression test ordering changes, and to make the tests robust against platform dependant ordering, add ORDER BYs guaranteeing the output order. As it's possible that some of the changes expose platform dependant ordering, push this earlier, to let the buildfarm shake out potentially unstable results. Discussion: <>
  • Fix further hash table order dependent tests. Similar to 0137caf273, this makes contrib and pl tests less dependant on hash-table order. After this commit, at least some order affecting changes to execGrouping.c don't result in regression test changes anymore.
  • Make pg_dumpall's database ACL query independent of hash table order. Previously GRANT order on databases was not well defined, due to the use of EXCEPT without an ORDER BY. Add an ORDER BY, adapt test output. I don't, at the moment, see reason to backpatch this.
  • Use more efficient hashtable for tidbitmap.c to speed up bitmap scans. Use the new simplehash.h to speed up tidbitmap.c uses. For bitmap scan heavy queries speedups of over 100% have been measured. Both lossy and exact scans benefit, but the wins are bigger for mostly exact scans. The conversion is mostly trivial, except that tbm_lossify() now restarts lossifying at the point it previously stopped. Otherwise the hash table becomes unbalanced because the scan in done in hash-order, leaving the end of the hashtable more densely filled then the beginning. That caused performance issues with dynahash as well, but due to the open chaining they were less pronounced than with the linear adressing from simplehash.h. Reviewed-By: Tomas Vondra Discussion: <>
  • Add likely/unlikely() branch hint macros. These are useful for very hot code paths. Because it's easy to guess wrongly about likelihood, and because such likelihoods change over time, they should be used sparingly. Past tests have shown it'd be a good idea to use them in some places, e.g. in error checks around ereports that ERROR out, but that's work for later. Discussion: <>
  • Add a macro templatized hashtable. dynahash.c hash tables aren't quite fast enough for some use-cases. There are several reasons for lacking performance: The use of chaining for collision handling makes them cache inefficient, that's especially an issue when the tables get bigger. As the element sizes for dynahash are only determined at runtime, offset computations are somewhat expensive Hash and element comparisons are indirect function calls, causing unnecessary pipeline stalls It's two level structure has some benefits (somewhat natural partitioning), but increases the number of indirections to fix several of these the hash tables have to be adjusted to the individual use-case at compile-time. C unfortunately doesn't provide a good way to do compile code generation (like e.g. c++'s templates for all their weaknesses do). Thus the somewhat ugly approach taken here is to allow for code generation using a macro-templatized header file, which generates functions and types based on a prefix and other parameters. Later patches use this infrastructure to use such hash tables for tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation, ...). In queries where these use up a large fraction of the time, this has been measured to lead to performance improvements of over 100%. There are other cases where this could be useful (e.g. catcache.c). The hash table design chosen is a variant of linear open-addressing. The biggest disadvantage of simple linear addressing schemes are highly variable lookup times due to clustering, and deletions leaving a lot of tombstones around. To address these issues a variant of "robin hood" hashing is employed. Robin hood hashing optimizes chaining lengths by moving elements close to their optimal bucket ("rich" elements), out of the way if a to-be-inserted element is further away from its optimal position (i.e. it's "poor"). While that can make insertions slower, the average lookup performance is a lot better, and higher fill factors can be used in a still performant manner. To avoid tombstones - which normally solve the issue that a deleted node's presence is relevant to determine whether a lookup needs to continue looking or is done - buckets following a deleted element are shifted backwards, unless they're empty or already at their optimal position. There's further possible improvements that can be made to this implementation. Amongst others: Use distance as a termination criteria during searches. This is generally a good idea, but I've been able to see the overhead of distance calculations in some cases. Consider combining the 'empty' status into the hashvalue, and enforce storing the hashvalue. That could, in some cases, increase memory density and remove a few instructions. Experiment further with the, very conservatively choosen, fillfactor. Make maximum size of hashtable configurable, to allow storing very very large tables. That'd require 64bit hash values to be more common than now, though. some smaller memcpy calls could be optimized to copy larger chunks. But since the new implementation is already considerably faster than dynahash it seem sensible to start using it. Reviewed-By: Tomas Vondra Discussion: <>
  • Use more efficient hashtable for execGrouping.c to speed up hash aggregation. The more efficient hashtable speeds up hash-aggregations with more than a few hundred groups significantly. Improvements of over 120% have been measured. Due to the the different hash table queries that not fully determined (e.g. GROUP BY without ORDER BY) may change their result order. The conversion is largely straight-forward, except that, due to the static element types of simplehash.h type hashes, the additional data some users store in elements (e.g. the per-group working data for hash aggregaters) is now stored in TupleHashEntryData->additional. The meaning of BuildTupleHashTable's entrysize (renamed to additionalsize) has been changed to only be about the additionally stored size. That size is only used for the initial sizing of the hash-table. Reviewed-By: Tomas Vondra Discussion: <>

Robert Haas pushed:

Tatsuo Ishii pushed:

Correctifs en attente

Michaël Paquier sent in a patch to use pg_ctl promote -w in TAP tests.

Michaël Paquier sent in a patch to fix some FSM corruption.

Pavel Stěhule sent in another revision of a patch to add \setfileref to psql.

Heikki Linnakangas sent in two more revisions of a patch to simplify the tape block format, and add pause/resume support for tapes.

Thomas Munro sent in another revision of a patch to enable DISTINCT with btree skip scan.

Mithun Cy sent in another revision of a patch to add some tests to cover hash_index.

Amit Langote sent in a patch to remove the int2vector operator hash infrastructure.

Haribabu Kommi sent in a patch to add a 64-bit (EUI-64) MAC address data type.

Dilip Kumar sent in a patch to enable scan key push down to heap.

Haribabu Kommi sent in a patch to add a pg_stat_sql system view.

Jim Nasby sent in a patch to add support for SRFs and returning composites from pl/tcl.

Robert Haas sent in another revision of a patch to bump up max-parallel-workers.

Amit Kapila sent in a patch to enables parallelism for btree scans.

Prabhat Kumar Sahu and Jeevan Chalke traded patches to do aggregate pushdown to a foreign server.

Aleksander Alekseev sent in a patch to fix some infelicities in pg_filedump.

Michaël Paquier sent in another revision of a patch to rename pg_xlog to pg_wal and two alternatives for renaming pg_clog: pg_transaction, and pg_xact.

Heikki Linnakangas sent in a patch to replace the polyphase merge algorithm with a simple balanced k-way merge.

Michaël Paquier and Heikki Linnakangas traded patches to add SCRAM auth.

Michaël Paquier sent in a patch to improve OOM handling in pg_locale.c.

Mithun Cy sent in another revision of a patch to implement failover on libpq connect level.

Etsuro Fujita sent in another revision of a patch to push down more full joins in postgres_fdw.

Peter Eisentraut sent in a patch to simplify internal archive version handling in pg_dump.

Etsuro Fujita sent in a patch to clarify the mention of self-joins in the DELETE documentation.

Pavel Stěhule sent in another revision of a patch to add xmltable().

Ashutosh Bapat sent in another revision of a patch to optimize partition-wise JOINs in tables with declarative partitions.

Michaël Paquier sent in a patch to improve the durability of pg_dump(all).

Stas Kelvich sent in a patch to fix some of the logging for 2PC.

Christoph Berg sent in a patch to choose a non-empty default log_line_prefix.

Peter Eisentraut sent in a patch to make getrusage() output a little more readable.

Heikki Linnakangas sent in another revision of a patch to support multi-dimensional arrays in PL/python and give a hint when [] is incorrectly used for a composite type in array.

Magnus Hagander sent in another revision of a patch to enable pg_basebackup to stream xlog to tar.

Jeff Janes sent in a patch to change the auth check in postgres_fdw to something a little stricter.

Jim Nasby sent in a patch to return a bitmask of NULL fields in a record.