Correctifs appliqués

Heikki Linnakangas pushed:

Peter Eisentraut pushed:

  • Add missing newline to error messages. Also improve the message style a bit while we're here.
  • doc: Update CREATE DATABASE examples. The example of using CREATE DATABASE with the ENCODING option did not work anymore (except in special circumstances) and did not represent a good general-purpose example, so write some new examples. Reported-by:
  • doc: Document sequence function privileges better. Document the privileges required for each of the sequence functions. This was already in the GRANT reference page, but also add it to the function description for easier reference.
  • Avoid permission failure in pg_sequences.last_value. Before, reading pg_sequences.last_value would fail unless the user had appropriate sequence permissions, which would make the pg_sequences view cumbersome to use. Instead, return null instead of the real value when there are no permissions. From: Michael Paquier <> Reported-by: Shinoda, Noriyoshi <>
  • doc: Some improvements in CREATE SUBSCRIPTION ref page. Add link to description of libpq connection strings. Add link to explanation of replication access control. This currently points to the description of streaming replication access control, which is currently the same as for logical replication, but that might be refined later. Also remove plain-text passwords from the examples, to not encourage that dubious practice. based on suggestions from Simon Riggs
  • Fix relcache leaks in get_object_address_publication_rel().
  • Add CREATE SEQUENCE AS <data type> clause. This stores a data type, required to be an integer type, with the sequence. The sequences min and max values default to the range supported by the type, and they cannot be set to values exceeding that range. The internal implementation of the sequence is not affected. Change the serial types to create sequences of the appropriate type. This makes sure that the min and max values of the sequence for a serial column match the range of values supported by the table column. So the sequence can no longer overflow the table column. This also makes monitoring for sequence exhaustion/wraparound easier, which currently requires various contortions to cross-reference the sequences with the table columns they are used with. This commit also effectively reverts the pg_sequence column reordering in f3b421da5f4addc95812b9db05a24972b8fd9739, because the new seqtypid column allows us to fill the hole in the struct and create a more natural overall column ordering. Reviewed-by: Steve Singer <> Reviewed-by: Michael Paquier <>

Tom Lane pushed:

  • Update comment in relcache.c. Commit 665d1fad9 introduced rd_pkindex, and made RelationGetIndexList responsible for updating it, but didn't bother to fix RelationGetIndexList's header comment to say so.
  • Avoid returning stale attribute bitmaps in RelationGetIndexAttrBitmap(). The problem with the original coding here is that we might receive (and clear) a relcache invalidation signal for the target relation down inside one of the index_open calls we're doing. Since the target is open, we would not drop the relcache entry, just reset its rd_indexvalid and rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and would eventually cache and return potentially-obsolete attribute bitmaps. The case where this matters is where the inval signal was from a CREATE INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed column. (In all other cases, the lock we hold on the target rel should prevent any concurrent change in index state.) Even just returning the stale attribute bitmap is not such a problem, because it shouldn't matter during the transaction in which we receive the signal. What hurts is caching the stale data, because it can survive into later transactions, breaking CREATE INDEX CONCURRENTLY's expectation that later transactions will not create new broken HOT chains. The upshot is that there's a window for building corrupted indexes during CREATE INDEX CONCURRENTLY. This patch fixes the problem by rechecking that the set of index OIDs is still the same at the end of RelationGetIndexAttrBitmap() as it was at the start. If not, we loop back and try again. That's a little more than is strictly necessary to fix the bug --- in principle, we could return the stale data but not cache it --- but it seems like a bad idea on general principles for relcache to return data it knows is stale. There might be more hazards of the same ilk, or there might be a better way to fix this one, but this patch definitely improves matters and seems unlikely to make anything worse. So let's push it into today's releases even as we continue to study the problem. Pavan Deolasee and myself Discussion:
  • Release note updates. Add item for last-minute CREATE INDEX CONCURRENTLY fix. Repair a couple of misspellings of patch authors' names. Back-branch updates will follow shortly, but I thought I'd commit this separately just to make it more visible.
  • Correct thinko in last-minute release note item. The CREATE INDEX CONCURRENTLY bug can only be triggered by row updates, not inserts, since the problem would arise from an update incorrectly being made HOT. Noted by Alvaro.
  • Speed up "brin" regression test a little bit. In the large DO block, collect row TIDs into array variables instead of creating and dropping a pile of temporary tables. In a normal build, this reduces the brin test script's runtime from about 1.1 sec to 0.4 sec on my workstation. That's not all that exciting perhaps, but in a CLOBBER_CACHE_ALWAYS test build, the runtime drops from 20 min to 17 min, which is a little more useful. In combination with some other changes I plan to propose, this will help provide a noticeable reduction in cycle time for CLOBBER_CACHE_ALWAYS buildfarm critters.
  • Fix roundoff problems in float8_timestamptz() and make_interval(). When converting a float value to integer microseconds, we should be careful to round the value to the nearest integer, typically with rint(); simply assigning to an int64 variable will truncate, causing apparently off-by-one values in cases that should work. Most places in the datetime code got this right, but not these two. float8_timestamptz() is new as of commit e511d878f (9.6). Previous versions effectively depended on interval_mul() to do roundoff correctly, which it does, so this fixes an accuracy regression in 9.6. The problem in make_interval() dates to its introduction in 9.4. Aside from being careful to round not truncate, let's incorporate the hours and minutes inputs into the result with exact integer arithmetic, rather than risk introducing roundoff error where there need not have been any. float8_timestamptz() problem reported by Erik Nordström, though this is not his proposed patch. make_interval() problem found by me. Discussion:
  • Allow index AMs to cache data across aminsert calls within a SQL command. It's always been possible for index AMs to cache data across successive amgettuple calls within a single SQL command: the IndexScanDesc.opaque field is meant for precisely that. However, no comparable facility exists for amortizing setup work across successive aminsert calls. This patch adds such a feature and teaches GIN, GIST, and BRIN to use it to amortize catalog lookups they'd previously been doing on every call. (The other standard index AMs keep everything they need in the relcache, so there's little to improve there.) For GIN, the overall improvement in a statement that inserts many rows can be as much as 10%, though it seems a bit less for the other two. In addition, this makes a really significant difference in runtime for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated catalog lookups are vastly more expensive. The reason this has been hard up to now is that the aminsert function is not passed any useful place to cache per-statement data. What I chose to do is to add suitable fields to struct IndexInfo and pass that to aminsert. That's not widening the index AM API very much because IndexInfo is already within the ken of ambuild; in fact, by passing the same info to aminsert as to ambuild, this is really removing an inconsistency in the AM API. Discussion:
  • Blind try to fix portability issue in commit 8f93bd851 et al. The S/390 members of the buildfarm are showing failures indicating that they're having trouble with the rint() calls I added yesterday. There's no good reason for that, and I wonder if it is a compiler bug similar to the one we worked around in d9476b838. Try to fix it using the same method as before, namely to store the result of rint() back into a "double" variable rather than immediately converting to int64. (This isn't entirely waving a dead chicken, since on machines with wider-than-double float registers, the extra store forces a width conversion. I don't know if S/390 is like that, but it seems worth trying.) In passing, merge duplicate ereport() calls in float8_timestamptz(). Per buildfarm.

Robert Haas pushed:

  • Cache hash index's metapage in rel->rd_amcache. This avoids a very significant amount of buffer manager traffic and contention when scanning hash indexes, because it's no longer necessary to lock and pin the metapage for every scan. We do need some way of figuring out when the cache is too stale to use any more, so that when we lock the primary bucket page to which the cached metapage points us, we can tell whether a split has occurred since we cached the metapage data. To do that, we use the hash_prevblkno field in the primary bucket page, which would otherwise always be set to InvalidBuffer. This patch contains code so that it will continue working (although less efficiently) with hash indexes built before this change, but perhaps we should consider bumping the hash version and ripping out the compatibility code. That decision can be made later, though. Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me. Before committing, I made a number of cosmetic changes to the last posted version of the patch, adjusted _hash_getcachedmetap to be more careful about order of operation, and made some necessary updates to the pageinspect documentation and regression tests.
  • Fix compiler warning. Mithun Cy, per a report by Erik Rijkers
  • Allow the element allocator for a simplehash to be specified. This is infrastructure for a pending patch to allow parallel bitmap heap scans. Dilip Kumar, reviewed (in earlier versions) by Andres Freund and (more recently) by me. Some further renaming by me, also.
  • Avoid redefining simplehash_allocate/simplehash_free. There's no generic guard against multiple inclusion in this file, for good reason. But these typedefs need one, as per a report from Jeff Janes.
  • Revise the way the element allocator for a simplehash is specified. This method is more elegant and more efficient. Per a suggestion from Andres Freund, who also briefly reviewed the patch.
  • Add WAL consistency checking facility. When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Ãlvaro Herrera, Simon Riggs, and Peter Eisentraut.
  • pageinspect: Fix hash_bitmap_info not to read the underlying page. It did that to verify that the page was an overflow page rather than anything else, but that means that checking the status of all the overflow bits requires reading the entire index. So don't do that. The new code validates that the page is not a primary bucket page or bitmap page by looking at the metapage, so that using this on large numbers of pages can be reasonably efficient. Ashutosh Sharma, per a complaint from me, and with further modifications by me.
  • Fix race condition in ConditionVariablePrepareToSleep. Thomas Munro
  • simplehash: Additional tweaks to make specifying an allocator work. Even if we don't emit definitions for SH_ALLOCATE and SH_FREE, we still need prototypes. The user can't define them before including simplehash.h because SH_TYPE isn't available yet. For the allocator to be able to access private_data, it needs to become an argument to SH_CREATE. Previously we relied on callers to set that after returning from SH_CREATE, but SH_CREATE calls SH_ALLOCATE before returning. Dilip Kumar, reviewed by me.
  • Remove all references to "xlog" from SQL-callable functions in pg_proc. Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog directory to pg_wal. To make things consistent, and because "xlog" is terrible terminology for either "transaction log" or "write-ahead log" rename all SQL-callable functions that contain "xlog" in the name to instead contain "wal". (Note that this may pose an upgrade hazard for some users.) Similarly, rename the xlog_position argument of the functions that create slots to be called wal_position. Discussion:
  • Rename user-facing tools with "xlog" in the name to say "wal". This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.
  • Rename dtrace probes for ongoing xlog -> wal conversion. xlog-switch becomes wal-switch, and xlog-insert becomes wal-insert.
  • Rename command line options for ongoing xlog -> wal conversion. initdb and pg_basebackup now have a --waldir option rather --xlogdir, and pg_basebackup now has --wal-method rather than --xlog-method.

Andres Freund pushed:

Simon Riggs pushed:

Noah Misch pushed:

  • Ignore tablespace ACLs when ignoring schema ACLs. The ALTER TABLE ALTER TYPE implementation can issue DROP INDEX and CREATE INDEX to refit existing indexes for the new column type. Since this CREATE INDEX is an implementation detail of an index alteration, the ensuing DefineIndex() should skip ACL checks specific to index creation. It already skips the namespace ACL check. Make it skip the tablespace ACL check, too. Back-patch to 9.2 (all supported versions). Reviewed by Tom Lane.

Correctifs en attente

Amit Khandekar sent in two more revisions of a patch to implement Parallel Append.

Boris Muratshin sent in a patch to implement 3D Z-curve spatial indexes.

Corey Huinker sent in five more revisions of a patch to implement \if and friends in psql.

Nikita Glukhov sent in another revision of a patch to implement KNN for B-trees.

Nikita Glukhov sent in another revision of a patch to implement KNN for SP-GiST.

Beena Emerson sent in another revision of a patch to allow increasing the default WAL segment size.

Nikolay Shaplov sent in another revision of a patch to move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM.

Fujii Masao sent in a patch to fix a bug that made it impossible to shut down a subscriber after DROP SUBSCRIPTION.

Heikki Linnakangas sent in another revision of a patch to implement SCRAM authentication.

Christoph Berg sent in two more revisions of a patch to implement \gx, a one-shot expanded output for queries, in psql.

Mithun Cy sent in two more revisions of a patch to implement auto_prewarm.

Rafia Sabih sent in two more revisions of a patch to enable passing query string to workers.

Takeshi Ideriha sent in another revision of a patch to enable DECLARE STATEMENT to set up a connection in ECPG.

Peter Eisentraut sent in a patch to drop Python 2.3 support.

Pavel Raiskup sent in two revisions of a patch to create a configure-time knob to set default ssl ciphers.

Naoki Okano sent in a patch to implement CREATE OR REPLACE TRIGGER.

Piotr Stefaniak sent in a patch to pg_bsd_indent to implement -lps ("leave preprocessor space").

Dilip Kumar and Robert Haas traded patches to implement parallel bitmap heap scan.

Masahiko Sawada sent in two revisions of a patch to stop the apply worker after DROP SUBSCRIPTION is committed.

Peter Eisentraut sent in a patch to systematically trim the trailing newlines off PQerrorMessage() results in backend uses (dblink, postgres_fdw, libpqwalreceiver).

Peter Eisentraut sent in a patch to implement CREATE COLLATION IF NOT EXISTS.

Michaël Paquier sent in a patch to implement SASLprep(), or NFKC if you want for UTF-8 strings.

Kyle Gearhart sent in a patch to implement an alternate row processor for libpq which is faster for certain use cases than the default one.

Amit Langote sent in two revisions of a patch to implement a check partition strategy in ATExecDropNotNull.

Andres Freund sent in a patch to speed up expression processing, including several JIT PoCs.

Pavel Stěhule sent in two revisions of a patch to enable specifying a template database for pg_regress.

Petr Jelínek sent in another revision of a patch to enable existing data copy for logical replication.

Amit Kapila sent in two more revisions of a patch to implement parallel index scans.

Amit Kapila sent in another revision of a patch to implement WAL for hash indexes.

Amit Langote sent in a patch to optimize partitioned tables by noting that top-level tables are always empty and avoiding that anything that might write to them can't.

Amit Langote sent in a patch to add relkind checks to certain contrib modules.

Peter Geoghegan sent in a patch to add parallel B-tree index build sorting with some testing tools.

Ashutosh Bapat sent in three more revisions of a patch to speed up partition-wise joins on declaratively partitioned tables.

Amit Kapila sent in another revision of a patch to parallelize queries containing initplans.

Simon Riggs sent in a patch to make log_autovacuum_min_duration log the durations of vacuums whether or not they were launched by autovacuum workers.

Simon Riggs sent in a patch to enable reporting xmin from VACUUMs.

Peter Geoghegan sent in a patch to add amcheck extension to contrib.

Michael Banck sent in three revisions of a patch to better document pg_basebackup's behavior in certain corner cases.

Andreas Karlsson sent in another revision of a patch to implement REINDEX CONCURRENTLY.

Tom Lane sent in a patch to preprocess join OR clauses that might be better handled as UNIONs.

Magnus Hagander sent in a patch to enable having fallback servers RADIUS auth.