Revues de code

Correctifs appliqués

Robert Haas a poussé :

Peter Eisentraut a poussé :

Tom Lane a poussé :

  • Silence compiler warning about uninitialized variable.
  • Bend parse location rules for the convenience of pg_stat_statements. Generally, the parse location assigned to a multiple-token construct is the location of its leftmost token. This commit breaks that rule for the syntaxes TYPENAME 'LITERAL' and CAST(CONSTANT Alexander Shulgin TYPENAME) --- the resulting Const will have the location of the literal string, not the typename or CAST keyword. The cases where this matters are pretty thin on the ground (no error messages in the regression tests change, for example), and it's unlikely that any user would be confused anyway by an error cursor pointing at the literal. But still it's less than consistent. The reason for changing it is that contrib/pg_stat_statements wants to know the parse location of the original literal, and it was agreed that this is the least unpleasant way to preserve that information through parse analysis. Peter Geoghegan
  • Add some infrastructure for contrib/pg_stat_statements. Add a queryId field to Query and PlannedStmt. This is not used by the core backend, except for being copied around at appropriate times. It's meant to allow plug-ins to track a particular query forward from parse analysis to execution. The queryId is intentionally not dumped into stored rules (and hence this commit doesn't bump catversion). You could argue that choice either way, but it seems better that stored rule strings not have any dependency on plug-ins that might or might not be present. Also, add a post_parse_analyze_hook that gets invoked at the end of parse analysis (but only for top-level analysis of complete queries, not cases such as analyzing a domain's default-value expression). This is mainly meant to be used to compute and assign a queryId, but it could have other applications. Peter Geoghegan
  • Improve contrib/pg_stat_statements to lump "similar" queries together. pg_stat_statements now hashes selected fields of the analyzed parse tree to assign a "fingerprint" to each query, and groups all queries with the same fingerprint into a single entry in the pg_stat_statements view. In practice it is expected that queries with the same fingerprint will be equivalent except for values of literal constants. To make the display more useful, such constants are replaced by "?" in the displayed query strings. This mechanism currently supports only optimizable queries (SELECT, INSERT, UPDATE, DELETE). Utility commands are still matched on the basis of their literal query strings. There remain some open questions about how to deal with utility statements that contain optimizable queries (such as EXPLAIN and SELECT INTO) and how to deal with expiring speculative hashtable entries that are made to save the normalized form of a query string. However, fixing these issues should require only localized changes, and since there are other open patches involving contrib/pg_stat_statements, it seems best to go ahead and commit what we've got. Peter Geoghegan, reviewed by Daniel Farina
  • Improve handling of utility statements containing plannable statements. When tracking nested statements, contrib/pg_stat_statements formerly double-counted the execution costs of utility statements that directly contain an executable statement, such as EXPLAIN and DECLARE CURSOR. This was not obvious since the ProcessUtility and Executor hooks would each add their measured costs to the same stats table entry. However, with the new implementation that hashes utility and plannable statements differently, this showed up as seemingly-duplicate stats entries. Fix that by disabling the Executor hooks when the query has a queryId of zero, which was the case already for such statements but is now more clearly specified in the code. (The zero queryId was causing problems anyway because all such statements would add to a single bogus entry.) The PREPARE/EXECUTE case still results in counting the same execution in two different stats table entries, but it should be much less surprising to users that there are two entries in such cases. In passing, include a CommonTableExpr's ctename in the query hash. I had left it out originally on the grounds that we wanted to omit all inessential aliases, but since RTE_CTE RTEs are hashing their referenced names, we'd better hash the CTE names too to make sure we don't hash semantically different queries the same.
  • Improve contrib/pg_stat_statements' handling of PREPARE/EXECUTE statements. It's actually more useful for the module to ignore these. Ignoring EXECUTE (and not incrementing the nesting level) allows the executor hooks to charge the time to the underlying prepared query, which shows up as a stats entry with the original PREPARE as query string (possibly modified by suppression of constants, which might not be terribly useful here but it's not worth avoiding). This is much more useful than cluttering the stats table with a distinct entry for each textually distinct EXECUTE. Experimentation with this idea shows that it's also preferable to ignore PREPARE. If we don't, we get two stats table entries, one with the query string hash and one with the jumble-derived hash, but with the same visible query string (modulo those constants). This is confusing and not very helpful, since the first entry will only receive costs associated with initial planning of the query, which is not something counted at all normally by pg_stat_statements. (And if we do start tracking planning costs, we'd want them blamed on the other hash table entry anyway.)
  • Fix dblink's failure to report correct connection name in error messages. The DBLINK_GET_CONN and DBLINK_GET_NAMED_CONN macros did not set the surrounding function's conname variable, causing errors to be incorrectly reported as having occurred on the "unnamed" connection in some cases. This bug was actually visible in two cases in the regression tests, but apparently whoever added those cases wasn't paying attention. Noted by Kyotaro Horiguchi, though this is different from his proposed patch. Back-patch to 8.4; 8.3 does not have the same type of error reporting so the patch is not relevant.
  • Add PGDLLIMPORT to ScanKeywords and NumScanKeywords. Per buildfarm, this is now needed by contrib/pg_stat_statements.
  • Fix glitch recently introduced in psql tab completion. Over-optimization (by me, looks like :-() broke the case of recognizing a word boundary just before a quoted identifier. Reported and diagnosed by Dean Rasheed.
  • Rename frontend keyword arrays to avoid conflict with backend. ecpg and pg_dump each contain keyword arrays with structure similar to the backend's keyword array. Up to now, we actually named those arrays the same as the backend's and relied on parser/keywords.h to declare them. This seems a tad too cute, though, and it breaks now that we need to PGDLLIMPORT-decorate the backend symbols. Rename to avoid the problem. Per buildfarm. (It strikes me that maybe we should get rid of the separate keywords.c files altogether, and just define these arrays in the modules that use them, but that's a rather more invasive change.)
  • Fix O(N2) behavior in pg_dump for large numbers of owned sequences. The loop that matched owned sequences to their owning tables required time proportional to number of owned sequences times number of tables; although this work was only expended in selective-dump situations, which is probably why the issue wasn't recognized long since. Refactor slightly so that we can perform this work after the index array for findTableByOid has been set up, reducing the time to O(M log N). Per gripe from Mike Roest. Since this is a longstanding performance bug, backpatch to all supported versions.
  • Fix O(N2) behavior in pg_dump when many objects are in dependency loops. Combining the loop workspace with the record of already-processed objects might have been a cute trick, but it behaves horridly if there are many dependency loops to repair: the time spent in the first step of findLoop() grows as O(N2). Instead use a separate flag array indexed by dump ID, which we can check in constant time. The length of the workspace array is now never more than the actual length of a dependency chain, which should be reasonably short in all cases of practical interest. The code is noticeably easier to understand this way, too. Per gripe from Mike Roest. Since this is a longstanding performance bug, backpatch to all supported versions.

Andrew Dunstan a poussé :

Heikki Linnakangas a poussé :

  • Inherit max_safe_fds to child processes in EXEC_BACKEND mode. Postmaster sets max_safe_fds by testing how many open file descriptors it can open, and that is normally inherited by all child processes at fork(). Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we effectively ignored max_files_per_process on Windows, and always assumed a conservative default of 32 simultaneous open files. That could have an impact on performance, if you need to access a lot of different files in a query. After this patch, the value is passed to child processes by save/restore_backend_variables() among many other global variables. It has been like this forever, but given the lack of complaints about it, I'm not backpatching this.

Simon Riggs a poussé :

  • Correct epoch of txid_current() when executed on a Hot Standby server. Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina

Correctifs rejetés (à ce jour)

  • Pas de déception cette semaine :-)

Correctifs en attente

  • Kyotaro HORIGUCHI sent in two more revisions of the patch to create a new tuple storage format for libpq and use same in dblink.
  • Shigeru HANADA sent in two more revisions of the patch to add a PostgreSQL FDW along with infrastructure for same.
  • Peter Eisentraut and Alexander Shulgin traded patches to add a URI format for connection strings in libpq.
  • Fujii Masao sent in two revisions of a patch to make pg_basebackup exit on error.
  • Ants Aasma sent in a patch to use lazy hash aggregation to speed up cases where no actual aggregates are used.
  • Dimitri Fontaine sent in two more revisions of the patch to add finer dependencies for EXTENSIONs.
  • Marco Nenciarini sent in another revision of the patch to allow each element of an array to be an enforced foreign key reference.
  • Peter Eisentraut sent in a patch to fix some infelicities between pgxs, bison and flexx.
  • Andrew Dunstan and Joachim Wieland traded patches to implement parallel pg_dump.
  • Zoltan Boszormenyi sent in two more revisions of the ECPG FETCH readahead patch.
  • Daniel Farina sent in another revision of the patch to allow same-role pg_terminate_backend.
  • Pavel Stehule sent in another revision of the CHECK TRIGGER functionality for PL/pgsql.
  • Peter Eisentraut sent in a patch which reverts the default capitalization behavior in psql's tab completion to that prior to a previous patch while expanding the tunability of that capitalization with tab completion.
  • Robert Haas sent in two patches to measure lwlock-related latency spikes.
  • Heikki Linnakangas sent in a patch to set stack_base_ptr in autovacuum.