Git log: commit bbbb4b9572e808f81b16cffc47ab5d6bf602c1f4
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Wed Jul 10 20:08:12 2024 +0530
Adding filtering flag to pgcopydb dump schema (#836)
Following error will be seen without this fix,
```
10:46:09.943 284145 INFO Running pgcopydb version 0.0.21.184.g8c85cce from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb"
10:46:09.981 284145 INFO Using work dir "/tmp/pgcopydb"
10:46:09.981 284145 INFO Restoring database from existing files at "/tmp/pgcopydb"
10:46:09.984 284145 INFO Current filtering setup is: {"type":"SOURCE_FILTER_TYPE_EXCL","exclude-table-data":[{"schema":"public","name":"metrics_New"}]}
10:46:09.984 284145 INFO Catalog filtering setup is: {"type":"SOURCE_FILTER_TYPE_NONE"}
10:46:09.984 284145 ERROR Catalogs at "/tmp/pgcopydb/schema/source.db" have been setup for a different filtering than the current command, see above for details
10:46:09.984 284145 ERROR Failed to initialize pgcopydb internal catalogs
```
The problem is, `pgcopydb dump schema` without filtering creates catalog
with filtering type as NONE, when we supply filter for restore, it fails
with filter type mismatch.
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit 3cbb38cca25733e906a6206d3bf381fb7a4053cd
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Wed Jul 10 12:50:55 2024 +0300
Fixes probable bug sources (#840)
commit 6df9baecfd49a78323b19df27c7e1fafa9526ead
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Tue Jul 9 20:51:06 2024 +0300
Fix issues with stylechecker tests (#838)
- Update stylechecker image to a new version
- Set a fixed version of checkout action
- Set safe directory for git repository
commit 17b395f7a41b80089f114189325b9e4edd9eb283
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Fri Jul 5 18:20:20 2024 +0530
Fix double sqlite source.db attach in restore parse-list (#835)
Before this commit, pgcopydb restore parse-list failed with following
error.
```
11:44:52.406 233882 ERROR Failed to attach '/tmp/pgcopydb/schema/source.db' as source
11:44:52.406 233882 ERROR database source is already in use
```
cli_restore_prepare_specs already prepases catalogs.
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit cce0894ead328b9d4e97243b6a730864855a32d9
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Mon Jul 1 19:45:49 2024 +0200
Fix out-of-memory issue in creating our pg_restore list file. (#831)
Instead of creating the whole contents of the file in-memory, write it to
disk one line at a time, keeping only that line in-memory during the whole
operation.
Also arrange to code to only read the pg_restore --list output one line at a
time, no context is needed to be able to parse this file format (pg_dump
archive Table Of Contents).
commit d563f5e10171e446f60b693aaf52d43174c1b4d4
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Fri Jun 28 18:39:17 2024 +0200
Fix/build and test (#830)
* Dockerfile tests optimization.
When running tests and only tests file have changed, skip building the
tested binary all over again.
* Fix Dockerfile caching, introduce maintainer-clean Make target.
* Run `make version` before checking docs in test suite.
* Refactor the CI docs testing.
Starting with the following Docker error:
JSON arguments recommended for ENTRYPOINT/CMD to prevent unintended
behavior related to OS signals
Refactor things in a way that makes it easy to switch to using JSON
arguments there, specifically using a new Makefile in ci/check-docs.mk.
commit 76828fc67677ba67deb6c9882ca3e368ab1ece0b
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Fri Jun 28 18:03:28 2024 +0300
Review when to create a vacuum job queue (#827)
Fixes #813: Error when copying indexes
commit 00caf45faa2a50f3f39b31e1ac34faaf51e7499c
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Tue Jun 25 16:09:04 2024 +0300
Introduce --skip-analyze (#825)
This option allows the user to skip running vacuumdb --analyze-only on the
source database before calculating table size estimates. This can be useful
when the user has already run the ANALYZE command manually before running
pgcopydb.
commit ffca16ae9926446c699eee13e4af00456448601c
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Tue Jun 25 12:53:55 2024 +0300
Refactor environment variable parsing logic (#807)
We had a repetitive pattern in the code where we checked if an environment
variable was set, and if it was, we parsed it and stored it in a target
variable.
This patch introduces a new function `get_env_using_parser` that takes
all the necesary information needed to parse the environment variable in
a struct, and does the parsing. This way, we can avoid repeating the same
code over and over again.
In passing, I also refactored our logic to set defaults before reading
environment variables, so that we can avoid repeating the same default
values in multiple places.
commit ce11b3a1ade1dc8e3125f28035d021f0702cd159
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Mon Jun 24 16:41:00 2024 +0530
Problem: Pipeline sync deadlock (#823)
**Deadlock call stack:**
```
%0 0x0000787ac168fe16 in select () from target:/lib/x86_64-linux-gnu/libc.so.6
%1 0x000057db30f718ad in pgsql_sync_pipeline (pgsql=pgsql@entry=0x7ffc8ce1f9c8) at pgsql.c:1979
%2 0x000057db30f5bc2c in stream_apply_file (context=context@entry=0x7ffc8ce1d880) at ld_apply.c:657
%3 0x000057db30f5c495 in stream_apply_catchup (specs=specs@entry=0x7ffc8d030050) at ld_apply.c:112
%4 0x000057db30f549d2 in follow_start_catchup (specs=0x7ffc8d030050) at follow.c:809
%5 0x000057db30f54baa in follow_start_subprocess (specs=specs@entry=0x7ffc8d030050,
% subprocess=subprocess@entry=0x7ffc8d138da8) at follow.c:860
%6 0x000057db30f55383 in follow_prepare_mode_switch (streamSpecs=streamSpecs@entry=0x7ffc8d030050,
% previousMode=previousMode@entry=STREAM_MODE_CATCHUP, currentMode=currentMode@entry=STREAM_MODE_REPLAY) at follow.c:488
%7 0x000057db30f55a0b in follow_main_loop (copySpecs=copySpecs@entry=0x7ffc8d13bd50,
% streamSpecs=streamSpecs@entry=0x7ffc8d030050) at follow.c:340
%8 0x000057db30f37cbb in cli_follow (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:464
%9 0x000057db30f8a17a in commandline_run (command=command@entry=0x7ffc8d24f0d0, argc=0, argc@entry=4, argv=0x7ffc8d24f248,
% argv@entry=0x7ffc8d24f228) at /usr/src/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71
%10 0x000057db30f36cf0 in main (argc=4, argv=0x7ffc8d24f228) at main.c:142
```
**Root cause:**
PQpipelineSync might clear select read condition because it might read
the data from server.
Code path:(PQpipelineSync->pqPipelineSyncInternal->pqFlush->pqSendSome->pqReadData)
**Solution:**
Get rid of select call and read results in a blocking mode. It also
reduces CPU utilization as the function never pools.
Along with fixing the deadlock, this commit also fixes the following,
1) Use after free while dealing with PGresult
2) Handle notifications while readings results
[1] https://github.com/postgres/postgres/blob/fd49e8f32325c675d9bb6e26fcdbe9754249932f/src/interfaces/libpq/fe-misc.c#L856-L926
Fixes https://github.com/timescale/team-data-onboarding/issues/149
Fixes #794
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit 5c50c3b1fc0612186acd24271521a0b5363bf067
Author: Ken Barber <kbarber@salesforce.com>
Date: Thu Jun 13 20:21:33 2024 +1000
Standalone sequences do not get reset when schema is filtered (#816)
Because standalone sequences have no table relationships, they were getting skipped.
commit 662289c102bd3925d1d85b63241198fe7b5850f8
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Tue Jun 11 18:07:57 2024 +0300
Fix initialisation of the start time, fixing timing reports on subsequent runs (#814)
Make sure that the startTime is set on each catalog start timing.
commit b7c7ac48c5a7b48fb7595b7a5ff1eadd6a1a06d9
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Tue Jun 11 18:15:20 2024 +0530
Problem: Vacuum worker exits early (just after table copy) (#815)
commit bbf94e7cbd17d07021302241578317029ccc0307
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Tue Jun 11 15:35:44 2024 +0300
Use a single dump file to give more dependency information to pg_restore. (#804)
commit f667d46c77d30560de533d0d3f8a827e0e9ba32b
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Fri Jun 7 19:19:46 2024 +0530
Fix Array out of bounds access during transform (#809)
The logic which removes redundant SET for columns who's value is same as
in WHERE clause accesses oldValue using newValue index. Both can have
varying size items causing out of bound access.
We use the correct index to access the old column name, but incorrect index
has been used to access the old column value.
Solution: Use old column index to access it's value
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit 022f2a37f2317fc19c5376e7c9b5fb110e2c5286
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Thu Jun 6 18:32:16 2024 +0300
Update versions for checkout github action (#806)
There is an issue with the current latest version of the checkout action
that causes the action to fail with the following error:
```
Error relocating /__e/node20_alpine/bin/node: secure_getenv: symbol not found
```
To remedy this, we downgrade our checkout action for stylechecker to the
last known working major version.
commit 39420dd422e522f8b6d4c31ba2dcbe75f980bc22
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Thu Jun 6 16:05:12 2024 +0300
Remove redundant ANALYZE queries (#805)
If the user enabled the --estimate-table-sizes option, we should not run
ANALYZE queries on the tables that will be split by ctid, as we already
ran vacuumdb --analyze-only before.
Sample output from pagila:
12:02:38.720 46 INFO Running vacuumdb --analyze-only on source database before calculating table size estimates
12:02:38.720 46 INFO /usr/bin/vacuumdb --analyze-only --jobs 4 --dbname 'postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
12:02:38.955 46 INFO vacuumdb: vacuuming database "pagila"
12:02:39.318 46 INFO Table public.film_actor is 240 kB large which is larger than --split-tables-larger-than 200 kB, and does not have a unique column of type integer: splitting by CTID
12:02:39.318 46 DEBUG Skipping running ANALYZE on table public.film_actor for CTID split
commit 047c842ea332da13c2f9bbbada05fb81d626f508
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Thu Jun 6 15:58:17 2024 +0300
Add support for limiting same table concurrency (#803)
This commit adds a new option --split-max-parts to limit the number of
jobs that can be run concurrently for the same table. This option is
useful when the number of tables is large and the system is running out
of resources.
Optionally the user can also set the environment variable
COPYDB_SPLIT_MAX_PARTS to the desired value.
commit 493025b5b5a54eeb762be3527d1f157abcc3122f
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Tue Jun 4 17:03:37 2024 +0300
Fix issues in the docs. (#801)
* Fix some minor issues in the docs
* Update the docs and help messages about the sentinel table
* Update sentinel subcommand doc
commit f6f7dacb8a888dddfa37348a2318b8012e3bb1ba
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Tue Jun 4 15:06:13 2024 +0300
Log libpq notice messages with proper log levels. (#802)
commit b9590f0f48941c694b83ce5929679ed7f6a12f9c
Author: Cem Eliguzel <cemeliguzel@microsoft.com>
Date: Mon Jun 3 19:33:37 2024 +0300
Get the block size from Postgres source database. (#795)
The Postgres block size defaults to 8192 bytes, and can be edited at Postgres compile-time. The value selected is exposed as a GUC, and we can query for the value using a SQL query.
commit 120601c7796bbb519d2079c9819ea85dfb756a21
Author: Gokhan Gulbiz <ggulbiz@gmail.com>
Date: Thu May 30 17:14:17 2024 +0300
Refactor pgcopydb for standby server support (#655)
To be able to use a standby server, pgcopydb needs to switch from ISOLATION_SERIALIZABLE to ISOLATION_READ_REPEATED, and we also need to avoid calling into ANALYZE, which is not available there.
commit 06ae6560dbf44f765833e3073a8343c99fb514ff
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Thu May 30 15:59:26 2024 +0530
Implement support for filtering materialized view and data (refresh) (#737)
Currently, there is no way to filter materialized view or materialized view data(refresh).
pg_dump supports it with following option,
* pg_dump --exclude-table filters materialized view altogether.
* pg_dump --exclude-table-data filters REFRESH MATERIALIZED VIEW.
Solution: Incorporate materialized view filtering into existing filtering framework. [exclude-table], [include-table] can be used filter the materialized view and [exclude-table-data] can be used to filter materialized view data(refresh).
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit b0ce438cd0b30b2d4db0db9be9d9b7bd9515fa75
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Wed May 29 20:37:17 2024 +0300
Introduce new options for estimating table sizes (#793)
This commit introduces a new option --estimate-table-sizes. When set,
pgcopydb will use the relpages value in system catalog to estimate the
size of each table. This is done by multiplying the number of pages by
the page size. PostgreSQL uses a default of 8KB for page sizes, but can
be changed in build time.
Optionally, the user can set the environment variable
PGCOPYDB_ESTIMATE_TABLE_SIZES to a boolean value to enable the option.
If this option is used, we run vacuumdb --analyze-only on the source to
update the relpages values before calculating the estimates.
In passing, I fixed several things including the following:
- Fix comment references to pg_autoctl
- Remove some unused functions
- Remove reference to non-existing vacuumdb command
- Remove --cache and --drop-cache references that are no longer used
- Fix typos in vacuum logs and comments
- Remove references to pgcopydb_table_size table
commit aece0790d9222c6f0e8e75b378c9f0d2fff72e83
Author: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
Date: Wed May 29 18:54:09 2024 +0300
Improve formatting for pretty printed integers (#792)
During development I created a table with 1000 tuples and in the logs I
saw that the format is not so easy to read and understand.
... with an estimated total of 1 0 tuples ...
For numbers with 4, 5, or 6 decimal digits, we should pad the number
with zeros.
commit 19795fca058a34292a9f15a8cd5e360d41bf46c0
Author: VaibhaveS <56480355+VaibhaveS@users.noreply.github.com>
Date: Mon May 27 16:52:37 2024 +0530
Improve foreign key dependency tracking SQL query for extension tables (#790)
commit a988fec7d1bd67daef91904051b5b32afe902da2
Author: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Date: Sat May 25 14:22:08 2024 +0530
Allow exclude-index filtering with other filters (#789)
[exclude-index] works only when there is no other filter presents.
Solution: Fix index listing query to consider exclude-index filters on
all possible scenarios.
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
commit 2ca40d5ddb39264f5308b01ce17fcf3bc32ca878
Author: VaibhaveS <56480355+VaibhaveS@users.noreply.github.com>
Date: Thu May 23 17:57:19 2024 +0530
Skip disable by ctid (#785)
commit a8d2749959da3900e32f9a618600bf3575cc49f3
Author: Dimitri Fontaine <dim@tapoueh.org>
Date: Thu May 23 12:06:47 2024 +0200
Post release adjustments (#788)
Signed-off-by: Dimitri Fontaine <dim@tapoueh.org>