HIVE Audit Logs

Hive auditing is like YARN in that it does not have a dedicated audit logfile. Audit events occur inside the actual Hive metastore service log so it can be a bit of a challenge. However, the audit logger class names can be used to identify audit events. Other Hive components, such as Hive-Server 2, do not have explicit auditing, but Audit like information can still be gleaned from the service logs.

The audit events are tagged to org.apache.hadoop.hive.metastore.HiveMetaStore.audit. This makes it easier to search the logs specifically for audit events.

With Hive, only the username is shown instead of the full Kerberos UPN, the action performed by the user is identified by the cmd field.

Cloudera Impala Audit Logs

Impala Audit trails are logged into dedicated audit logs used by each Impala daemon (impalad). The audit log directory is specified using the flag audit_event_log_dir. A typical choice is the directory /var/log/impalad/audits. These logfiles are rolled after they reach a certain “size” dictated by a number of lines, as specified using the flag max_audit_event_log_file_size. A reasonable setting is 5,000 lines.

We will discuss HBase, Accumulo, Sentry and Log Aggregation in the next part.

Conclusion

As the industry is adopting Hadoop based Data Lakes, security is also maturing to enable CISO/CIO and business information security analysts to do forensics from an Audit perspective. Telecom, BFSI clients are demanding Audit log capabilities with a “boots on the ground” approach. To this effect, a better understanding of Hadoop and its ecosystems is paramount to achieving business objectives.

Like This Article? Read More From DZone

big data ,hadoop ,audit ,accounting
Big
Published at DZone with permission of Rupam Bora , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.