How to create an Oracle materialized view that performs well on fast refresh ?

Recently, working for a client, we faced an interesting challenge using Oracle database. It's been quite painful to gather scattered piece of information to get to the goal, so I thought I could summarize and share it here.

Let's imagine you have 20 million addresses, and you want to enrich them with the country they belong to. You have a country look up table. And you want to keep track of how the country has been enriched, so you need a dedicated table to store addresses' country (or you would violate normalization rules).

The core problem have been this: how to allow displaying AND modifying data stored in quite big tables (milions of records) with acceptable performance for the end user.

Let's go into it and create our model first!

We create and populate the address table

```sql
CREATE TABLE address (
    address_id   NUMBER,
    address      VARCHAR2(25 CHAR)
);

ALTER TABLE address
ADD PRIMARY KEY (address_id);

BEGIN
    FOR i IN 1..20000000 LOOP
        INSERT INTO address (
            address_id,
            address
        ) VALUES (
            i,
            'address_' || i
        );
    END LOOP;
END;
/
```

Then we create and populate the country look up table:

```sql
CREATE TABLE country (
    country_iso    VARCHAR2(3 CHAR),
    country_name   VARCHAR2(50 CHAR)
);

ALTER TABLE country
ADD PRIMARY KEY (country_iso);

INSERT INTO country (
    country_iso,
    country_name
) VALUES (
    'FRA',
    'France'
);

INSERT INTO country (
    country_iso,
    country_name
) VALUES (
    'USA',
    'USA'
);
```

Finally we create and populate the enrichment table:

```sql
CREATE TABLE country_address (
    address_id   NUMBER,
    country_iso  VARCHAR2(3 CHAR)
);

ALTER TABLE country_address
ADD PRIMARY KEY (address_id);

BEGIN
    FOR i IN 1..20000000 LOOP
        INSERT INTO country_address (
            address_id,
            country_iso
        ) VALUES (
            i,
            'FRA'
        );
    END LOOP;
END;
/
```

Then, selecting data is a bit slow:

```sql
SELECT
    a.address_id,
    a.address,
    ca.country_iso,
    c.country_name
FROM
    address           a
    LEFT JOIN country_address   ca ON a.address_id = ca.address_id
    LEFT JOIN country           c ON ca.country_iso = c.country_iso;
 ```

Takes ~9s with my set up. 

Important remark: all addresses may not have been enriched, but you want to see them all, because the whole point of showing addresses in this context is to enrich the ones that have not been enriched yet. So you have to use outer joins in your select. Moreover, in real life, we have more columns to select and more tables to join so the performance is far from being acceptable.

So we can take advantage of materialized views:

```sql
CREATE MATERIALIZED VIEW enriched_addresses
AS
SELECT
    a.address_id,
    a.address,
    ca.country_iso,
    c.country_name
FROM
    address           a
    LEFT JOIN country_address   ca ON a.address_id = ca.address_id
    LEFT JOIN country c ON ca.country_iso = c.country_iso;
    ```

Which takes ~25s with my personnal setup.

But for selecting:

```sql
SELECT *
FROM enriched_addresses;
```

50 records are fetched within 0.026s.

When the enrichment process is automated and executes overnight, it's perfectly acceptable. But in our case, we need to have a web application that allows user to manually enrich addresses, which is the reason why we want to display and modify data with good performance. Then 25s to refresh the materialized view is even worse than 5s to show the 50 first records.

Luckily for us, Oracle implemented a "fast refresh" mode for materialized views. In order to activate fast refresh, we have to create materialized view logs on the underlying tables. Let's create them:

```sql
CREATE MATERIALIZED VIEW LOG ON address WITH ROWID INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG ON country_address WITH ROWID INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG ON country WITH ROWID INCLUDING NEW VALUES;```

Then we can enable fast refresh: 

```sql
ALTER MATERIALIZED VIEW enriched_addresses REFRESH FAST;
```

But then, when we try to refresh:

```sql
EXEC DBMS_MVIEW.REFRESH('enriched_addresses');
```

This error is thrown:

ORA-12004: "REFRESH FAST cannot be used for materialized view"
*Cause: The materialized view log does not exist or cannot be used.
PCT refresh is also not enabled on the materialized view

Since we created the logs, why can't they be used ?

The reason is "the rowids of all the detail tables must appear in the SELECT list of the materialized view query definition", so we have to add them:

```sql
DROP MATERIALIZED VIEW enriched_addresses;

CREATE MATERIALIZED VIEW enriched_addresses
REFRESH FAST
AS
SELECT
    a.rowid address_rowid,
    ca.rowid country_address_rowid,
    c.rowid country_rowid,
    a.address_id,
    a.address,
    ca.country_iso,
    c.country_name
FROM
    address           a
    LEFT JOIN country_address   ca ON a.address_id = ca.address_id
    LEFT JOIN country c ON ca.country_iso = c.country_iso;
```

Which leads to another error:

ORA-12015: cannot create a fast refresh materialized view from a complex query

This one has been a bit tricky to investigate ! This is due to the fact that fast-refresh does not support the Ansi join syntax (as shown on Ask Tom for instance). So we have to re-write our materialized view creation statement:

```sql
CREATE MATERIALIZED VIEW enriched_addresses
REFRESH FAST
AS
SELECT
    a.rowid address_rowid,
    ca.rowid country_address_rowid,
    c.rowid country_rowid,
    a.address_id,
    a.address,
    ca.country_iso,
    c.country_name
FROM
    address           a,
    country_address   ca,
    country c
WHERE  a.address_id = ca.address_id (+)
 AND ca.country_iso = c.country_iso (+);
```

Now, let's update one record and refresh the materialized view:

```sql
UPDATE country_address
SET country_iso = 'USA'
WHERE address_id = 1;

EXEC dbms_mview.refresh('enriched_addresses');
```

 The refresh actions takes ~11 seconds on my laptop. Better, but still not satisfying.

Let's have a look at what happened. When looking at the trace we can see that the statement is this one:

```sql
UPDATE /*+ BYPASS_UJVC */ (
    SELECT /*+ NO_MERGE ("JV$") */
        "SNA$"."COUNTRY_ADDRESS_ROWID"   "C0_0",
        "JV$"."RID$"                     "C1_0",
        "SNA$"."COUNTRY_ROWID"           "C0_1",
        "MAS$0".rowid                    "C1_1",
        "SNA$"."COUNTRY_ISO"             "C0_2",
        "JV$"."COUNTRY_ISO"              "C1_2",
        "SNA$"."COUNTRY_NAME"            "C0_3",
        "MAS$0"."COUNTRY_NAME"           "C1_3"
    FROM
        (
            SELECT
                "MAS$"."ROWID" "RID$",
                "MAS$".*
            FROM
                "MYUSER"."COUNTRY_ADDRESS" "MAS$"
            WHERE
                ROWID IN (
                    SELECT  /*+ HASH_SJ */
                        chartorowid("MAS$"."M_ROW$$") rid$
                    FROM
                        "MYUSER"."MLOG$_COUNTRY_ADDRESS" "MAS$"
                    WHERE
                        "MAS$".snaptime$$ > :b_st1
                )
        ) AS OF SNAPSHOT ( :b_scn ) "JV$",
        "COUNTRY" AS OF SNAPSHOT ( :b_scn ) "MAS$0",
        "ADDRESS" AS OF SNAPSHOT ( :b_scn ) "MAS$2",
        "MYUSER"."ENRICHED_ADDRESSES" "SNA$"
    WHERE
        "MAS$2"."ADDRESS_ID" = "JV$"."ADDRESS_ID"
        AND "JV$"."COUNTRY_ISO" = "MAS$0"."COUNTRY_ISO" (+)
        AND "SNA$"."ADDRESS_ROWID" = "MAS$2".rowid
) uv$
SET
    "C0_0" = "C1_0",
    "C0_1" = "C1_1",
    "C0_2" = "C1_2",
    "C0_3" = "C1_3"
```

And the explain plan says:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  ENRICHED_ADDRESSES (cr=239968 pr=218783 pw=0 time=7114170 us)
         1          1          1   HASH JOIN  (cr=239968 pr=218783 pw=0 time=7114138 us cost=96136 size=207 card=1)
         1          1          1    NESTED LOOPS  (cr=41243 pr=20262 pw=0 time=2321899 us cost=41890 size=61 card=1)
         1          1          1     NESTED LOOPS OUTER (cr=41239 pr=20261 pw=0 time=2321455 us cost=41889 size=43 card=1)
         1          1          1      VIEW  (cr=41237 pr=20261 pw=0 time=2321442 us cost=41888 size=33 card=1)
         1          1          1       HASH JOIN RIGHT SEMI (cr=41237 pr=20261 pw=0 time=2321441 us cost=41888 size=148 card=1)
         2          2          2        TABLE ACCESS FULL MLOG$_COUNTRY_ADDRESS (cr=8 pr=0 pw=0 time=29 us cost=6 size=276 card=2)
  20000000   20000000   20000000        TABLE ACCESS FULL COUNTRY_ADDRESS (cr=41229 pr=20261 pw=0 time=1945833 us cost=41824 size=200000000 card=20000000)
         1          1          1      TABLE ACCESS BY INDEX ROWID COUNTRY (cr=2 pr=0 pw=0 time=11 us cost=1 size=10 card=1)
         1          1          1       INDEX UNIQUE SCAN SYS_C0025420 (cr=1 pr=0 pw=0 time=7 us cost=0 size=0 card=1)(object id 110365)
         1          1          1     INDEX UNIQUE SCAN SYS_C0025419 (cr=4 pr=1 pw=0 time=443 us cost=1 size=18 card=1)(object id 110363)
  20000000   20000000   20000000    MAT_VIEW ACCESS FULL ENRICHED_ADDRESSES (cr=198725 pr=198521 pw=0 time=4149354 us cost=54191 size=2795070282 card=19144317)

We'll address the full table access on the table country_adrdress later, but first we need to add indexes on enriched_addresses:

```sql
CREATE INDEX enr_addr_address_rowid_idx ON enriched_addresses ( address_rowid );
CREATE INDEX enr_addr_countr_addr_rowid_idx ON enriched_addresses ( country_address_rowid );
CREATE INDEX enr_addr_country_rowid_idx ON enriched_addresses ( country_rowid );
```

And then we update a value again and refresh the materialized view:

```sql
UPDATE country_address
SET country_iso = 'FRA'
WHERE address_id = 1;

EXEC dbms_mview.refresh('enriched_addresses');
```

~5s. Better, but still too much.

The statement is the same, but the explain plan became:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  ENRICHED_ADDRESSES (cr=41246 pr=20316 pw=0 time=2382607 us)
         1          1          1   NESTED LOOPS  (cr=41246 pr=20314 pw=0 time=2382373 us cost=41893 size=207 card=1)
         1          1          1    NESTED LOOPS  (cr=41245 pr=20314 pw=0 time=2382369 us cost=41893 size=207 card=1)
         1          1          1     NESTED LOOPS  (cr=41242 pr=20312 pw=0 time=2382049 us cost=41890 size=61 card=1)
         1          1          1      NESTED LOOPS OUTER (cr=41239 pr=20312 pw=0 time=2382044 us cost=41889 size=43 card=1)
         1          1          1       VIEW  (cr=41237 pr=20312 pw=0 time=2382029 us cost=41888 size=33 card=1)
         1          1          1        HASH JOIN RIGHT SEMI (cr=41237 pr=20312 pw=0 time=2382027 us cost=41888 size=148 card=1)
         2          2          2         TABLE ACCESS FULL MLOG$_COUNTRY_ADDRESS (cr=8 pr=0 pw=0 time=30 us cost=6 size=276 card=2)
  20000000   20000000   20000000         TABLE ACCESS FULL COUNTRY_ADDRESS (cr=41229 pr=20312 pw=0 time=2000407 us cost=41824 size=200000000 card=20000000)
         1          1          1       TABLE ACCESS BY INDEX ROWID COUNTRY (cr=2 pr=0 pw=0 time=12 us cost=1 size=10 card=1)
         1          1          1        INDEX UNIQUE SCAN SYS_C0025420 (cr=1 pr=0 pw=0 time=8 us cost=0 size=0 card=1)(object id 110365)
         1          1          1      INDEX UNIQUE SCAN SYS_C0025419 (cr=3 pr=0 pw=0 time=5 us cost=1 size=18 card=1)(object id 110363)
         1          1          1     INDEX RANGE SCAN ENR_ADDR_ADDRESS_ROWID_IDX (cr=3 pr=2 pw=0 time=319 us cost=2 size=0 card=1)(object id 110382)
         1          1          1    MAT_VIEW ACCESS BY INDEX ROWID ENRICHED_ADDRESSES (cr=1 pr=0 pw=0 time=2 us cost=3 size=146 card=1)

This is the other weird behaviour which took me a while to understand. The Oracle knowledge database says "This problem is caused by unpublished Bug 4196039, "Inefficient plan causing poor incremental refresh for Materialized View", and for those not having access, the Ask Tom version, Alberto Dell’Era’s blog version or Dominic Brooks’ one.

So we gather materialized view stats while it’s empty and lock them:

```sql
EXEC sys.dbms_stats.gather_table_stats(user, 'MLOG$_ADDRESS');
EXEC sys.dbms_stats.lock_table_stats(user, 'MLOG$_ADDRESS');
EXEC sys.dbms_stats.gather_table_stats(user, 'MLOG$_COUNTRY');
EXEC sys.dbms_stats.lock_table_stats(user, 'MLOG$_COUNTRY');
EXEC sys.dbms_stats.gather_table_stats(user, 'MLOG$_COUNTRY_ADDRESS');
EXEC sys.dbms_stats.lock_table_stats(user, 'MLOG$_COUNTRY_ADDRESS');
```

And being connected as sys, we set the _mv_refresh_use_stats parameter to true:

```sql
ALTER SYSTEM SET "_mv_refresh_use_stats" = true;
```

Let's update data again:

```sql
UPDATE country_address
SET country_iso = 'USA'
WHERE address_id = 1;

EXEC dbms_mview.refresh('enriched_addresses');
```

Which finally takes 0.099s. Here we are ! 🚀

Becareful with the "_mv_refresh_use_stats”, apparently it can lead to problems in some cases, as mentionned on Dominique Brook’s blog.