Git log: commit 9e99a74cc18fdadbfdf969b2a9707fa8df554b86
Author: Cristian R. Silva <rs.cristian@gmail.com>
Date: Mon Sep 25 15:55:40 2023 +0200
Fixing PING documentation (#476)
commit 10cd0a72633538e267ab48c3d0a3240e44eb8954
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Mon Sep 25 13:46:00 2023 +0200
Add tests coverage for Postgres 16. (#478)
* Add tests coverage for Postgres 16.
* Fix GitHub Actions workflow file.
commit af1c231ae757827d95c654debd50557e1df7e912
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Mon Sep 25 13:33:55 2023 +0200
Rewrite the ad-hoc parser for Postgres Archive TOC items. (#469)
* Rewrite the ad-hoc parser for Postgres Archive TOC items.
This time use a tokenizer and hand-roll a grammar/parser on-top of it,
expecting tokens in the right order. We have a pretty simple grammar at the
moment, because we don't extend the support to the ACL and COMMENT composite
types. We might want to do that later.
* Reformat parsing of ACL and COMMENTS too.
* Free allocated memory for parsing pg_restore archive catalogs.
* Fix rebase.
* Protect against NULL pointer dereferencing.
* Fix string size compute bug (include 2 bytes for prefix).
commit 379d242baca0d0fa9bf358d7566bbe74e466e87c
Author: lospejos <lospejos@gmail.com>
Date: Fri Sep 22 14:21:27 2023 +0300
fix drop cascade issue (#475)
Co-authored-by: lospejos <lospejos@github.com>
commit 4ff2963cdb9f9252f9f15f19309ff036b37ddaf6
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Mon Sep 18 21:13:17 2023 +0200
Add a way to debug parsing a list file. (#470)
The following command will now parse the pg_restore --list output file and
output to standard out the parsed contents, allowing to check the parser.
pgcopydb restore parse-list pre.list
commit c22c1026823b41b54d3b37aeceb36ea6d5cdbbb7
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Mon Sep 18 22:33:06 2023 +0530
Implement multi-value inserts (#465)
This commit optimizes multi-value inserts by transferring insert values
from a new statement to an existing one if they target the same table.
This optimization significantly reduces network latency when multiple consecutive
insert statements are executed within a transaction.
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit a15881860dc18ad268c7bd8cae11dbe0055538dd
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Sep 13 14:53:55 2023 +0200
Use an intermediate file on-disk for pg_restore --list output. (#467)
That way it's possible to review the pg_restore file separately from the
filtered file that we process in pgcopydb, and because both the files are
left around at the end of the command, it's also possible to diff them.
commit 3091b4567183b186ab3653c6f621771a87cf547d
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Tue Sep 12 12:57:40 2023 +0200
Fix SQL identifier quoting, generalize using format %I. (#464)
Using the Postgres function format with the %I formatter allows our code to
bypass any formatting internally when re-using the SQL object names, both in
our logs and also in the SQL queries we then emit.
Generalize that approach to attribute names, index names, index table names,
index constraint names, and sequences names.
commit 6bc29fc817fe7cd106aab1763c184f76b3223a3d
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Thu Sep 7 14:36:33 2023 +0200
Fix error handling of fork() calls. (#459)
When failing to start a sub-process (out-of-memory can be reported when
calling the fork() system call), properly handle the error rather than
continue trying the migration.
In order to cancel the other sub-processes that were succesfully started we
then send a TERM signal to all processes in our process group.
commit 0a37cdd19374cb7250887454dc4618af2e3f44fe
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Tue Sep 5 19:41:19 2023 +0200
Improve connection string TCP keepalive parameters handling. (#457)
* Improve connection string TCP keepalive parameters handling.
Allow overriding at the command-line level, also set the parameters in the
connection string so that they're effective even when connecting.
* Fix computing of the safe URI: always do it now.
commit de3b379ab27757d80cdd9cd6ac5c6ed9e0805e09
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Tue Sep 5 11:38:56 2023 +0200
Fix set_ps_title to avoid calling sformat() on possibly short buffers. (#437)
commit 059da5ce1ebbb63cfc2ca9be6210dfedffeb1570
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Tue Sep 5 11:02:32 2023 +0200
Review the TCP keepalives settings. (#436)
To work with shorter timeouts in network stacks (firewall, NAT settings,
etc), adjust to TCP keepalive settings to quite short values.
commit 93c0305c6169f69151650a2b57a20051e26f8ec9
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Mon Sep 4 16:39:40 2023 +0200
Set Process Titles as seen in ps/top/htop etc. (#435)
* Set Process Titles as seen in ps/top/htop etc.
This helps understand the process hierarchy better and could be useful to
debug or watch long running processes.
* Append current table/index information when we have it.
commit 5d0589f313dcc14976bfa03cf8aca201a8c7805e
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Mon Sep 4 16:06:11 2023 +0200
Fix escaping double-quotes in SQL identifiers. (#434)
* Fix escaping double-quotes in SQL identifiers.
When double-quoting SQL identifiers, any double-quote that is part of the
name should be doubled. Now, rather than implementing that ourselves yet
again in the C code, use format('%I', relname) as relname in the SQL
queries.
* Fix bugs.
* Update test files, the transform output has changed.
commit ca6130b950a4b649e69adf5cbdc47f7f49727644
Author: James Guthrie <JamesGuthrie@users.noreply.github.com>
Date: Mon Sep 4 12:01:44 2023 +0200
Fix memory leaks in transform code (#433)
* Clean up PQExpBuffers in all usages
PQExpBuffers must be destroyed with `destroyPQExpBuffer`, otherwise the
bytes that they contain end up being leaked. In putting together this
patch I audited every call site of `createPQExpBuffer` and the subsequent
code in order to locate missing calls to `destroyPQExpBuffer`. Some
modifications may be classified as pedantic, but overall this patch
removes a number of real-world memory leaks.
* Fix memory leaks in wal2json transformation
The `stream_transform_file` function didn't free memory which was
allocated to contain the file contents.
The `FreeLogicalMessageTuple` function didn't correctly free all items
in its hierarchy.
commit f348a8726e71d4fb96dba3db2970f8a653a0d80f
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 9 17:36:17 2023 +0200
Implement same-table concurrency using Postgres CTID column. (#410)
* Implement same-table concurrency using Postgres CTID column.
The Postgres system column "ctid" is the physical location of the row
version within its table. It is encoded using the tid datatype:
`(pagenumber,rownumber)`.
Every single Postgres table has a "ctid" column and it is always possible to
split a table's contents by using ctid based ranges. The question of this
approach being good at reducing pgcopydb timings remains open: this PR is
meant for allowing more experimentations.
* Restrict CTID COPY partitionning to tables using "heap" am.
TID scan might not be supported by other Table AM (such as Citus Columnar).
commit b10091642dc14fd4285e6d876104a967d5af8921
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 9 17:28:41 2023 +0200
When applying --follow changes, set synchronous_commit to off. (#425)
This allows a way faster replay because we don't have to wait for Postgres
disk sync operation before sending the next SQL command. We need to be
careful with what replay_lsn value is sent back to the replication protocol,
and for that we need to introduce a tracking between the source LSN and the
replay insert LSN.
We still restart applying from the latest LSN that we manage to commit
durably using the replication origin API. The advantage of that API is that
it's as durable as the transactions replayed on the target system.
commit fa7171586245481de028016b0aad98efab79bf2d
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 9 17:20:48 2023 +0200
Do not reset sequences on copy --follow failure. (#424)
commit d5fd588ba3cfbb59b69c097cd4c88bcc600565e2
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 9 17:16:06 2023 +0200
Assorted fixes from review, including stream_read_context retry loop. (#423)
commit e83e53b9532d43c284fd4591700d907223062b52
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 9 17:11:22 2023 +0200
Fix a SEGFAULT that was hidden by our waitpid() calls. (#422)
When a subprocess terminates with a successful return code, it might still
have been terminated by a signal, one signal would be SIGSEGV. Arrange our
code to report when that happens.
This happened in initialisation of the streaming module when trying to call
setvbuf on a un-assigned file descriptor. This is fixed in follow.c when
preparing the call.
commit 988dfeccbfc40f9a44368c0433677ddb4d38161d
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 9 16:53:57 2023 +0200
Fix exit() calls that should have been return false; (#421)
commit d19ae863f6672a03be8c8913ceaf38bfbda1e107
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Tue Aug 8 12:30:15 2023 +0200
Use PREPARE/EXECUTE statements for applying DML in follow mode. (#420)
* Introduce Bob Jenkins lookup3 hash function.
* Use PREPARE/EXECUTE statements for applying DML in follow mode.
When replaying INSERT/UPDATE/DELETE statements, switch to using PREPARE and
EXECUTE at the protocol level (using the libpq functions PQprepare and
PQexecPrepared). When a statement has already been prepared previously in
our session, we then only send the EXECUTE statement.
This shoud provide nice performance improvements.
* Fix ci/banned.h.sh.
Avoid using sscand(), use strtoull with a base 16 instead. In passing
clean-up some extra logging.
* Code review and refactoring.
* Suppress compilation warnings from Jenkins lookup3.c code.
commit 595b46f6f8c4acf5d3beee610e7f05ba129f4852
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Thu Aug 3 18:44:33 2023 +0200
Use --table-jobs processes when computing pgcopydb compare data. (#419)
commit 70741cdcdc049e940e807296bd3b0e3680695c5f
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Thu Aug 3 14:33:31 2023 +0200
Refactor the cli_compare code. (#418)
Introduce a new C module "compare.c" and implement the work there, leaving
the "cli_compare.c" module with command line and output handling.
In passing, add the checksum information to the schema.json file at the end
of the pgcopydb compare data command, that might be useful.
commit 452ce599d4683ce46d951e9e82e70cf0a6a93ab1
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 2 18:09:05 2023 +0200
Use Postgres async queries to compute checksums. (#416)
That way pgcopydb compare data waits for as long as the slowest query that
are running concurrently on source and target instances, which is better
than serial execution on two different servers.
This changes the JSON format output, because we also now skip fetching the
target database catalogs.
commit 2da784079d6f64d2bdc400e7fd368bfe25771001
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Wed Aug 2 15:44:22 2023 +0200
Fix filtering on schema names. (#415)
First, the parsing of the schema and table names from the INI file was done
wrong, in a way that shows with long schema and table names.
Then, I just learned we can't use pg_restore --schema in our context,
because it would then skip the CREATE SCHEMA statement and pgcopydb relies
on pg_restore to create the schema on the target database,
commit 913e78e9ad9f7f0c40b83797bfb8d794db045ea0
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Tue Aug 1 17:56:51 2023 +0200
Implement concurrent workers for Large Objects data copy. (#411)
* Implement concurrent workers for Large Objects data copy.
The default is 4 workers for Large Object data, and another process is
created to queue the Large Object metadata (oid) and allow workers to share
the workload. New option available: --large-objects-jobs.
* Fix blob summary.
Reinstall a blob summary file and change its format to JSON.
commit 82600afc1d9847dbf28af6f613f82cb7dd4553ac
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Fri Jul 28 12:53:03 2023 +0200
Improve pgcopydb compare data checksum computation. (#407)
Use an output format that is stable in number of digits and can deal with a
bigint overflow (in Postgres a sum(bigint) is numeric): use an MD5 sum and
represent it as an UUID.
Also, include the row count in the MD5 computation to better protect against
collisions.
Finally, add support for pgcopydb compare data --json.
commit 35a0087a3c158fa4a46f92f89b0649382aeb4998
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Thu Jul 27 16:53:33 2023 +0200
Fix a compile warning on Linux related to printf format strings. (#406)