Introduction to Row Level Security. Examining access rights differentiation systems implemented in Oracle и PostgreSQL

There are many ways to show the user only the data they need. Row level security (RLS) is one of the most universal, simple, and reliable mechanisms ensuring that the data are presented only to persons having the required access rights. In this article, I will show that there is nothing really difficult in RLS and will explain how to set up an access rights differentiation system using the database tools and without affecting the performance much.

Row level security allows to implement the differentiation of access rights to data using the database tools. Furthermore, this is done transparently for applications interacting with the database. Even if somebody gains direct access to the database (e.g. using the account of the data schema’s owner), RLS can prevent the attacker from viewing protected information. Unlike traditional rights management systems that grant access to the entire database objects (e.g. tables), RLS policies allow to remove entire rows from a selection or hide column values for rows the user does not have access to.

How it works? When a query to the database is executed, the parser checks whether any access policies apply to these tables. If yes, it composes an additional predicate on the basis of each policy and adds it to the query. Predicates can be of any complexity, for instance:

and (512 in (select id_user from v_admin_users where id_department = 255) or 512 in (select id_user from v_bypass_rls))

The main benefit is that predicates are applied to all queries, including those made by administration tools (SQL Developer, Toad, PgAdmin, etc.), and even to queries made by the backup software. This is a single access control mechanism for all applications operating at the DBMS engine level. So, the question is: why isn’t RLS used as commonly as it should? There are several reasons for this, including:

The number of specialists skilled in database management is much less in comparison with programmers skilled in conventional languages. Therefore, in many cases, it’s easier and cheaper to implement access control mechanisms in the application layer;
Transparency. If RLS is disabled, this may not be discovered immediately. The applications will continue running as usual, but they will provide more data than needed. This is not a big issue, but with bad processes and keeping in mind the previous paragraph, it may be risky; and
The execution of queries consumes extra resources. Normally, this isn’t a big deal. If RLS is really needed, it’s enabled, and a small slowdown of the system is accepted as an inevitable fact of life. But if RLS is misconfigured, it may consume times more resources than the effective work of the system.

Overall, row level security is a centralized tool for access rights management. It’s implemented in many modern database management systems, including Oracle, PostgreSQL, and MS SQL Server. In this article, I will show how it works in the first two products.

Oracle

I will use a practical approach to demonstrate how RLS is implemented in Oracle.

An example for Oracle

Let’s implement a simple policy based on a standard HR schema. Ordinary users may only see data available to them. The department head can see all the department-related data. I have to:

identify the employee on whose behalf the session is running;
create a function that composes a predicate for this employee; and
set up a policy that binds the function to the table.

I assume that the app connects to the database on behalf of an HR user account, and this app has an authentication mechanism that allows to understand which employee is working with it. The information on the connected employee will be saved in a context – a special key-value store of attributes that control applications. A standard CLIENT_IDENTIFIER could be used for this purpose, but I am going to create my own identifier. This, in turn, requires to create a package that will work with it.

First, I create a context on behalf of a privileged user and specify which package can change it:

CREATE CONTEXT SECURITY_CONTEXT USING HR.P_SEC_CONTEXT;

Then I create a package to work with this context:

create or replace package P_SEC_CONTEXT is
procedure set_employee(p_employee_id in number);
end P_SEC_CONTEXT;
/
create or replace package body P_SEC_CONTEXT is
procedure set_employee(p_employee_id in number)
is
begin
dbms_session.set_context(namespace => 'SECURITY_CONTEXT',
attribute => 'EMPLOYEE_ID',
value => p_employee_id);
end;
end P_SEC_CONTEXT;
/

Now the app will use p_sec_context.set_employee to specify the code of the employee working in this DB session (i.e. the employee’s identifier in the EMPLOYEES table).

If you use connection pooling, you have to set the context every time a connection is received. The default value can be specified in the logon trigger, but this is not mandatory. Below is a sample trigger:

create trigger tr_hr_logon after logon on HR.SCHEMA
begin
p_sec_context.set_employee(null);
end;
/

If you use CLIENT_IDENTIFIER instead of p_sec_context.set_employee, you have to specify dbms_session.set_identifier.

Next, I need a function to compose the access predicate. The text returned by this function will be added to the general condition block (WHERE) of any query that uses the table with the AND operator. This can be illustrated by the following transformation: WHERE (all my conditions are combined into one block by parentheses) AND access_predicate. The function must take two string parameters: the schema name and the table name. It must not write anything to the database (the purity level is WNDS: write no database state), but it’s not necessary to declare this using pragma RESTRICT_REFERENCES.

create or replace function sec_employees(p_schema_name in varchar2,
p_object_name in varchar2)
return varchar2 is
begin
return ' (employee_id = sys_context (''SECURITY_CONTEXT'', ''EMPLOYEE_ID'')'
|| ' or department_id in '
|| '(select department_id from departments where manager_id = sys_context (''SECURITY_CONTEXT'', ''EMPLOYEE_ID'')))';
end;
/

The maximum length of the created condition must not exceed 32 KB. Aside from this, you are free to use whatever you want: subqueries, table joins, and all the power of SQL are at your service.

info

In this example, the sys_context query is included in the predicate (instead of getting its value and combining it with the other conditions). This is done to reduce the number of hard parses.

Finally, I create a policy for situations when the table is queried. It has to be done by a privileged user:

begin
dbms_rls.add_policy(object_schema => 'HR',
object_name => 'EMPLOYEES',
policy_name => 'TEST_EMPLOYEES',
function_schema => 'HR',
policy_function => 'SEC_EMPLOYEES',
statement_types => 'SELECT',
enable => true,
policy_type => dbms_rls.SHARED_CONTEXT_SENSITIVE);
end;
/

After that, select * from EMPLOYEES returns zero records. I set the context and gain full access:

begin
p_sec_context.set_employee(p_employee_id => 100);
end;

The added predicate is clearly visible in the execution plan.

Performance

There are two reasons why RLS slows down the system.

Time required to compose an additional predicate. To reduce it, you have to use caching to the maximum. The caching type is determined by the policy type. In the above example (SHARED_CONTEXT_SENSITIVE), the predicate isn’t recomposed if the session context hasn’t changed, and it won’t be composed for other objects that use the same function; and
Time required to execute the additional predicate. It can be optimized similar to regular queries.

Problems and possible solutions

In the above example, I entrust the calling code to set the context. Therefore, nothing prevents the attacker from calling p_sec_context.set_employee with different identifiers and checking the results. However, it’s possible to complicate the malefactors’ lives by adding more data to the user code (e.g. a hash on behalf of the user and the current time) and validate the user code based on these data;
Some data can be accessed bypassing RLS. Oracle collects the tables’ statistics: numbers of rows, largest and smallest column values, numbers of unique values, distribution histograms, etc. The policies don’t filter the statistics in any way; accordingly, such data are available to all users. Some additional information can also be obtained by making changes in tables that have foreign keys. Finally, unique constraint validation ignores the policies, too;
In the above example, the code responsible for security is owned by the data schema. This enables the HR user to view and change this object. To complicate the attacker’s life, you can place objects in a separate schema. If necessary, the code can be obfuscated as well – but this won’t help much against a determined hacker; and
Predicates are composed dynamically in the course of their execution; therefore, it’s difficult to debug them. Just accept this.

PostgreSQL

PostgreSQL supports RLS starting from version 9.5. Unlike Oracle, PostgreSQL policies are compilable. This simplifies their development and eliminates cache-related issues, but slightly limits their capabilities.

An example for PostgreSQL

I am going to implement the same policy as in Oracle. Ordinary users may only see data available to them. The department head can see all the department-related data. If something goes wrong, the system must neither report an error nor provide access to the data. I don’t have an HR schema; so, first of all, I have to create a minimum set of tables with data.

create table hr.employees
(
employee_id serial,
first_name VARCHAR(20),
last_name VARCHAR(25),
email VARCHAR(25),
phone_number VARCHAR(20),
hire_date DATE,
job_id VARCHAR(10),
salary float,
commission_pct float,
manager_id integer,
department_id integer
);
insert into hr.employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id)
values (100, 'Steven', 'King', 'SKING', '515.123.4567', to_date('20-03-2058', 'dd-mm-yyyy'), 'AD_PRES', 24000.00, null, null, 90);
insert into hr.employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id)
values (101, 'Neena', 'Kochhar', 'NKOCHHAR', '515.123.4568', to_date('24-06-2060', 'dd-mm-yyyy'), 'AD_VP', 17000.00, null, 100, 90);
insert into hr.employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id)
values (102, 'Lex', 'De Haan', 'LDEHAAN', '515.123.4569', to_date('17-10-2055', 'dd-mm-yyyy'), 'AD_VP', 17000.00, null, 100, 90);
insert into hr.employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id)
values (103, 'Alexander', 'Hunold', 'AHUNOLD', '590.423.4567', to_date('06-10-2060', 'dd-mm-yyyy'), 'IT_PROG', 9000.00, null, 102, 60);
insert into hr.employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id)
values (104, 'Bruce', 'Ernst', 'BERNST', '590.423.4568', to_date('21-02-2062', 'dd-mm-yyyy'), 'IT_PROG', 6000.00, null, 103, 60);
insert into hr.employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id)
values (105, 'David', 'Austin', 'DAUSTIN', '590.423.4569', to_date('28-03-2060', 'dd-mm-yyyy'), 'IT_PROG', 4800.00, null, 103, 60);
create table hr.departments
(
department_id serial primary key,
department_name VARCHAR(30),
manager_id numeric,
location_id numeric
);
insert into hr.departments (department_id, department_name, manager_id, location_id)
values (60, 'IT', 103, 1400);
insert into hr.departments (department_id, department_name, manager_id, location_id)
values (90, 'Executive', 100, 1700);
alter table hr.employees
add constraint fk_emp_dep foreign key (department_id) references hr.departments (department_id)
on update cascade on delete cascade;
create index fki_emp_dep
on hr.employees(department_id);

Then I enable RLS on the table.

alter table hr.employees enable row level security;

Enabling RLS for the table owner. By default, the access policies are bypassed by this user.

alter table hr.employees force row level security;

Adding the access predicate:

create policy test_security on hr.employees
using ((employee_id = current_setting('SECURITY_CONTEXT.EMPLOYEE_ID', true)::integer
or department_id in
(select department_id from hr.departments where manager_id = current_setting('SECURITY_CONTEXT.EMPLOYEE_ID', true)::integer)));

Using contains the exact text to be added to the query when the policy is executed, and its syntax is checked immediately. The current_setting function returns the value of the session parameter that identifies the user. The second parameter suppresses errors if the context is not initialized. The app can set the context by executing the following query:

select set_config('SECURITY_CONTEXT.EMPLOYEE_ID', '100', false);

The setup is finished, and I review the execution plan.

Seq Scan on employees (cost=20.36..21.49 rows=3 width=322)

Filter: ((employee_id = (current_setting(‘SECURITY_CONTEXT.EMPLOYEE_ID’::text, true))::integer) OR (hashed SubPlan 1))

SubPlan 1
-> Seq Scan on departments (cost=0.00..20.35 rows=2 width=4)

Filter: (manager_id = ((current_setting(‘SECURITY_CONTEXT.EMPLOYEE_ID’::text, true))::integer)::numeric)

Alternatively, it’s possible to create a database user for each employee user and configure the policy regulating the use of the database accounts. To do so, I have to drop the existing policy first:

drop policy test_security on hr.employees;

If operations are regulated by several policies, they are combined together using the OR operator; therefore, if I hadn’t dropped test_security, queries to employees would look as follows: where ... ( ((employee_id = current_setting('SECURITY_CONTEXT.EMPLOYEE_ID', true)::integer or department_id in (select department_id from hr.departments where manager_id = current_setting('SECURITY_CONTEXT.EMPLOYEE_ID', true)::integer))) OR (email = current_user)).

Creating a new policy:

create policy test_security_role on hr.employees
for select
using (email = current_user);

This policy allows users to see only their own data. To access these data, users must connect to the database using accounts matching their email addresses (e.g. SKING), not their accounts in the HR table. No additional actions with the app are required. But for each person interacting with the system, a user must be created in the database with all respective rights.

Potential issues

Similar to Oracle, the attacker who has access to the statistics can partially bypass RLS. In addition, functions declared as leakproof can be executed prior to the application of the policies. If such a function, contrary to its specifications, exposes values of its arguments, it can be used to obtain information.

Conclusions

Row level security is not a panacea and has some drawbacks. Its use is justified if:

More than one app works with the data;
Access control has to be implemented without significantly reworking the program; and
Data access control has to implemented in a separate layer.

It’s preferable to use other methods if:

Only one application works with the data; and
The database is already heavily loaded and limits the system performance.

Summarizing the above: if RLS is suitable for your case, don’t hesitate to use it. Hopefully, this article was helpful in mastering the RLS technology. Good luck.