Skip to content

fbraza/scala-dfs-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DFS-Lib

logo

Simple Scala interface for HDFS filesystem operations.


Setup

Add to your build.sbt:

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.8.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.8.1"
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.9.4"

Usage

Import the library and initialize your filesystem:

import dfs._
import org.apache.hadoop.fs.FileSystem

implicit val fs: FileSystem = yourHadoopClusterInstance.getFileSystem()

File operations

Create files

Create a file with automatic parent directory creation:

val created = touch("path/to/file.txt")

Create with custom parameters:

val created = touch(
  path = "path/to/file.txt",
  overwrite = true,
  bufferSize = 8192,
  replicationFactor = 3,
  blockSize = 268435456
)

Create directories

Create a directory and all parent directories:

val created = mkdir("path/to/directory")

Move and rename

Basic move operation:

val moved = mv("source/path", "destination/path")

Move into a directory (creates parents if needed):

val moved = mv.into("source/path", "destination/directory")

Move with overwrite:

val moved = mv.over("source/path", "destination/path")

Copy operations

Copy a single file:

cp(fs, "source/file.txt", "destination/file.txt")

Copy directories recursively:

cp.recursive(fs, "source/directory", "destination/directory")

Remove operations

Remove a file:

val removed = rm("path/to/file.txt")

Remove directories recursively:

val removed = rm.r("path/to/directory")

File inspection

Check existence

val fileExists = exists("path/to/file.txt")
val isDir = isDirectory("path/to/directory")
val isFile = isFile("path/to/file.txt")

Get file information

Get file size:

val fileSize = size("path/to/file.txt")

Get comprehensive file metadata:

val metadata = stat("path/to/file.txt")
println(s"Size: ${metadata.size} bytes")
println(s"Owner: ${metadata.owner}")
println(s"Permissions: ${metadata.permissions}")

List directory contents

List files in directory:

val files = ls(fs, "path/to/directory")

List with detailed information:

val details = ls.details(fs, "path/to/directory")
println(details)

File content operations

Read file contents

Read entire file:

val content = cat(fs, "path/to/file.txt")

Read with line numbers:

val content = cat.numbered(fs, "path/to/file.txt")

Read first N lines:

val head = cat.head(fs, "path/to/file.txt", lines = 20)

Read last N lines:

val tail = cat.tail(fs, "path/to/file.txt", lines = 10)

Permission operations

Change ownership

Change file owner:

chown("path/to/file.txt", "newowner")

Change owner and group:

chown("path/to/file.txt", "newowner", "newgroup")

Recursive ownership change:

chown.r("path/to/directory", "newowner", "newgroup")

Change permissions

Set file permissions using Unix-style permissions:

val permissions = Perm("755")
chmod("path/to/file.txt", permissions)

Error handling

All operations return Boolean for success/failure or throw exceptions for critical errors. Failed operations are logged with detailed error messages.


For developers

Run tests:

sbt test

Acknowledgement

Special thanks to @lihaoyi for his Scala libraries, particularly OS-Lib, which heavily inspired this library's design patterns.

About

DFS-Lib is a scala flavoured api to the Hadoop java filesystem api

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages