Complete Perl Programming Language book for modern system administrators

This comprehensive guide covers:
- 22 chapters of practical Perl programming
- Focus on system administration and automation
- Modern Perl best practices and techniques
- Real-world examples and production-ready code
- 3 appendices with one-liners, gotchas, and resources

The book targets experienced sysadmins, DevOps engineers, and automation
specialists, demonstrating Perl's continued relevance in 2025 for text
processing, system administration, and rapid development.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
David Liedle
2025-09-14 11:41:48 -06:00
parent c2230650ea
commit 37128dedfb
27 changed files with 19994 additions and 0 deletions

40
00-table-of-contents.md Normal file
View File

@@ -0,0 +1,40 @@
# The PERL Programming Language: A Modern Guide for System Administrators and Automation Engineers
## Table of Contents
### Part I: Foundations
- **Chapter 1**: Why Perl Still Matters in 2025
- **Chapter 2**: Getting Started - Modern Perl Setup
- **Chapter 3**: Perl Fundamentals - Variables and Data Types
- **Chapter 4**: Control Flow and Subroutines
- **Chapter 5**: Regular Expressions - Perl's Superpower
### Part II: Text Processing and File Operations
- **Chapter 6**: File I/O and Directory Operations
- **Chapter 7**: Advanced Text Processing
- **Chapter 8**: Working with CSV, JSON, and XML
- **Chapter 9**: Log File Analysis and Monitoring
### Part III: System Administration
- **Chapter 10**: Process Management and System Commands
- **Chapter 11**: Network Programming and Web Scraping
- **Chapter 12**: Database Operations with DBI
- **Chapter 13**: Configuration Management and Templating
### Part IV: Modern Perl Development
- **Chapter 14**: CPAN - The Treasure Trove
- **Chapter 15**: Object-Oriented Perl
- **Chapter 16**: Testing and Debugging
- **Chapter 17**: Performance and Optimization
### Part V: Real-World Applications
- **Chapter 18**: Building Command-Line Tools
- **Chapter 19**: System Monitoring and Alerting Scripts
- **Chapter 20**: Automation Workflows and Cron Jobs
- **Chapter 21**: RESTful APIs and Web Services
- **Chapter 22**: Security Best Practices
### Appendices
- **Appendix A**: Perl One-Liners Cookbook
- **Appendix B**: Common Gotchas and Solutions
- **Appendix C**: Resources and Community

View File

@@ -0,0 +1,98 @@
# Chapter 1: Why Perl Still Matters in 2025
> "Perl is the duct tape of the Internet." - Hassan Schroeder
If you're reading this in 2025, you've probably heard Perl is dead. You've been told Python ate its lunch, Ruby stole its dinner, and Go is drinking its milkshake. Yet here's the thing: millions of lines of Perl code are running right now, processing your credit card transactions, analyzing genomic data, managing content for major websites, and quietly keeping the internet's plumbing functional.
## The Rumors of Perl's Death Have Been Greatly Exaggerated
Let me tell you a secret that Silicon Valley's trendiest developers won't admit: Perl never left. While everyone was busy rewriting their microservices for the third time this year, Perl has been steadily processing terabytes of logs, managing critical infrastructure, and doing what it's always done best—getting stuff done without the fanfare.
Consider this: Amazon's backend? Built on Perl. DuckDuckGo? Perl. Booking.com, serving millions of hotel reservations daily? You guessed it—Perl. The Human Genome Project? Analyzed with Perl. Your bank's overnight batch processing? There's a good chance Perl is involved.
## Why SysAdmins Still Love Perl
Picture this scenario: It's 3 AM, a critical log processing system has failed, and you need to parse 50GB of semi-structured logs to find out why. You could:
1. Spin up a Spark cluster (45 minutes)
2. Write a Python script with pandas (hope you have enough RAM)
3. Use a one-line Perl command that completes in 2 minutes
```perl
perl -ne 'print if /ERROR.*timeout/ && /user_id=(\d+)/' massive.log | sort | uniq -c
```
This is why seasoned sysadmins keep Perl in their toolkit. It's not about being old-school; it's about being effective.
## The Swiss Army Chainsaw
Larry Wall didn't design Perl to be elegant. He designed it to be useful. While other languages pride themselves on having "one obvious way" to do things, Perl celebrates choice. This philosophy—"There's More Than One Way To Do It" (TMTOWTDI, pronounced "Tim Toady")—is both Perl's greatest strength and the source of its reputation.
Yes, you can write unreadable Perl. You can also write unreadable Python, Go, or Rust. The difference is that Perl doesn't pretend otherwise. It trusts you to be an adult.
## What Makes Perl Special in 2025?
### 1. **Unmatched Text Processing**
No language—not Python, not Ruby, not even modern JavaScript—comes close to Perl's text manipulation capabilities. Regular expressions aren't bolted on; they're part of the language's DNA.
```perl
# Find all email addresses in a file and count domains
perl -ne 'print "$1\n" while /[\w.-]+@([\w.-]+)/g' emails.txt | sort | uniq -c
```
Try writing that as concisely in any other language.
### 2. **CPAN: The Original Package Repository**
Before npm, before pip, before gem, there was CPAN (Comprehensive Perl Archive Network). With over 200,000 modules, if you need to do something, someone has probably already written a Perl module for it. And unlike the JavaScript ecosystem, these modules tend to be stable for decades, not deprecated every six months.
### 3. **Backward Compatibility That Actually Works**
That Perl script you wrote in 2005? It still runs. No "Perl 2 vs Perl 3" drama. No breaking changes every major version. Perl respects your investment in code.
### 4. **Speed Where It Counts**
For text processing, log analysis, and system automation tasks, Perl often outperforms "faster" languages. Why? Because these tasks play to Perl's strengths: regular expressions are compiled once and cached, string operations are optimized at the C level, and the interpreter is tuned for exactly these use cases.
## The Modern Perl Renaissance
Here's what the "Perl is dead" crowd doesn't know: Perl 7 is coming, and the language has been quietly modernizing. Recent versions have added:
- Subroutine signatures (finally!)
- Postfix dereferencing (cleaner syntax)
- Unicode improvements
- Better error messages
- Performance enhancements
The Perl community learned from the Python 2/3 debacle and is managing the transition carefully, maintaining backward compatibility while modernizing the language.
## Who Should Learn Perl in 2025?
You should learn Perl if you:
- Manage Linux/Unix systems
- Process large amounts of text data
- Need quick, reliable automation scripts
- Work with legacy systems (they're not going anywhere)
- Appreciate tools that prioritize getting work done over looking pretty
- Want to understand how modern scripting languages evolved
You might skip Perl if you:
- Only build front-end web applications
- Need a language with strong typing guarantees
- Work exclusively in Windows environments (though Perl works there too)
- Prefer languages with stricter style guidelines
## A Language for Pragmatists
Perl isn't trying to win beauty contests. It's not the language you learn to impress people at tech meetups. It's the language you learn because you have work to do, and you want to do it efficiently.
In 2025, we have languages optimized for every conceivable metric: execution speed (Rust), developer happiness (Ruby), simplicity (Go), type safety (Haskell). Perl is optimized for something different: getting things done. And in the real world of system administration, data munging, and keeping the lights on, that's often exactly what you need.
## What's Next?
In the coming chapters, we'll dive into practical Perl. You'll learn not just the syntax, but the idioms that make Perl powerful. We'll write real scripts that solve real problems—the kind you encounter at 3 AM when production is down and Stack Overflow isn't helping.
But first, let's get your environment set up. Because the best way to learn Perl isn't to read about it—it's to write it.
---
*Remember: In the world of system administration and automation, the question isn't "Is this the newest technology?" The question is "Does this solve my problem?" And for countless problems, the answer is still Perl.*

View File

@@ -0,0 +1,350 @@
# Chapter 2: Getting Started - Modern Perl Setup
> "A good workman never blames his tools, but a smart one makes sure they're sharp." - Anonymous SysAdmin
Before we dive into Perl's syntax and philosophy, let's get your environment set up properly. If you're going to be a Perl programmer in 2025, you might as well do it right from the start. This chapter will save you hours of frustration and teach you how the pros manage their Perl installations.
## The Perl That's Already There (And Why You Shouldn't Use It)
Open a terminal on almost any Unix-like system and type:
```bash
perl -v
```
Congratulations! You probably have Perl installed. On macOS, Linux, BSD—it's there. This is both a blessing and a curse.
The blessing: Perl is so useful for system tasks that OS vendors include it by default.
The curse: That system Perl is there for the OS, not for you. It might be ancient (I'm looking at you, CentOS), it definitely has modules the system depends on, and updating it could break things in spectacular ways.
**Golden Rule #1**: Never mess with system Perl. Leave it alone. It's not yours.
## Enter Perlbrew: Your Personal Perl Paradise
The solution? Install your own Perl. And the easiest way to do that is with Perlbrew—a tool that lets you install and manage multiple Perl versions without sudo, without conflicting with system Perl, and without tears.
### Installing Perlbrew
On macOS with Homebrew:
```bash
brew install perlbrew
```
On Linux/Unix:
```bash
curl -L https://install.perlbrew.pl | bash
```
Now add this to your shell configuration (`~/.bashrc`, `~/.zshrc`, etc.):
```bash
source ~/perl5/perlbrew/etc/bashrc
```
Restart your terminal or run:
```bash
exec $SHELL
```
### Your First Personal Perl
Let's install a fresh, modern Perl:
```bash
# See available versions
perlbrew available
# Install the latest stable version (as of 2025, this would be 5.38+)
perlbrew install perl-5.38.2
# Or install with threading support (useful for some modules)
perlbrew install perl-5.38.2 -Dusethreads
# This will take a few minutes. Grab coffee. Perl is compiling.
```
Once installed:
```bash
# List your installed Perls
perlbrew list
# Switch to your new Perl
perlbrew switch perl-5.38.2
# Verify
perl -v
which perl # Should show something like /home/you/perl5/perlbrew/perls/perl-5.38.2/bin/perl
```
Boom! You now have your own Perl that you can upgrade, modify, and experiment with without fear.
## CPAN: Your New Best Friend
CPAN (Comprehensive Perl Archive Network) is Perl's killer feature. It's a massive repository of reusable code that's been solving problems since 1995. But first, we need to configure it properly.
### First-Time CPAN Setup
Run:
```bash
cpan
```
The first time you run CPAN, it'll ask if you want automatic configuration. Say yes. It's smart enough to figure out your system.
But we can do better. Let's install `cpanminus` (cpanm), a more modern, zero-configuration CPAN client:
```bash
# Install cpanminus
cpan App::cpanminus
# Or, if you prefer a one-liner:
curl -L https://cpanmin.us | perl - App::cpanminus
```
Now you can install modules with:
```bash
cpanm Module::Name
```
No configuration dialogs, no interactive prompts, just installation. This is how we'll install modules throughout this book.
### Essential Modern Perl Modules
Let's install some modules that make Perl more pleasant in 2025:
```bash
# Modern::Perl - Enable modern Perl features with one line
cpanm Modern::Perl
# Perl::Tidy - Format your code consistently
cpanm Perl::Tidy
# Perl::Critic - Lint your code for best practices
cpanm Perl::Critic
# Data::Dumper - Inspect data structures (probably already installed)
cpanm Data::Dumper
# Try::Tiny - Better exception handling
cpanm Try::Tiny
# Path::Tiny - File path manipulation done right
cpanm Path::Tiny
# JSON::XS - Fast JSON parsing
cpanm JSON::XS
```
## Your First Modern Perl Script
Let's write a simple script using modern Perl features. Create a file called `hello_modern.pl`:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use feature 'signatures';
no warnings 'experimental::signatures';
sub greet($name = 'World') {
say "Hello, $name!";
}
greet();
greet('Perl Hacker');
# Let's use some modern features
my @languages = qw(Perl Python Ruby Go Rust);
say "Languages I respect: " . join(', ', @languages);
# Postfix dereferencing (Perl 5.20+)
my $hashref = { name => 'Larry', language => 'Perl' };
say "Creator: " . $hashref->%*{'name'};
# State variables (like static in C)
sub counter {
state $count = 0;
return ++$count;
}
say "Counter: " . counter() for 1..3;
```
Run it:
```bash
perl hello_modern.pl
```
Notice what we did:
- `use Modern::Perl '2023'` enables strict, warnings, and modern features
- Subroutine signatures (no more `my ($arg1, $arg2) = @_`)
- `say` instead of `print` (automatic newline)
- State variables that maintain their value between calls
- Postfix dereferencing for cleaner syntax
## Setting Up Your Editor
You can write Perl in any text editor, but some setup makes life easier. Here are configurations for popular editors:
### VS Code
Install the "Perl" extension by Gerald Richter. It provides:
- Syntax highlighting
- Code formatting via Perl::Tidy
- Linting via Perl::Critic
- Debugger support
### Vim/Neovim
Add to your `.vimrc`:
```vim
" Perl-specific settings
autocmd FileType perl setlocal tabstop=4 shiftwidth=4 expandtab
autocmd FileType perl setlocal cindent
autocmd FileType perl setlocal cinkeys-=0#
" Run perltidy on save (optional)
autocmd BufWritePre *.pl :%!perltidy -q
```
### Emacs
Emacs has excellent Perl support out of the box with `cperl-mode`:
```elisp
(defalias 'perl-mode 'cperl-mode)
(setq cperl-indent-level 4)
(setq cperl-close-paren-offset -4)
```
## Creating a Perl Project Structure
Unlike some languages, Perl doesn't enforce a project structure. But here's a sensible layout for modern Perl projects:
```
my-perl-project/
├── lib/ # Your modules
├── script/ # Executable scripts
├── t/ # Tests
├── cpanfile # CPAN dependencies
├── .perltidyrc # Code formatting config
├── .perlcriticrc # Linting config
└── README.md
```
### Managing Dependencies with cpanfile
Create a `cpanfile` to track your project's dependencies:
```perl
requires 'Modern::Perl', '1.20230701';
requires 'Try::Tiny', '0.31';
requires 'Path::Tiny', '0.144';
on 'test' => sub {
requires 'Test::More', '1.302195';
requires 'Test::Exception', '0.43';
};
```
Install dependencies:
```bash
cpanm --installdeps .
```
### Code Formatting with Perl::Tidy
Create `.perltidyrc`:
```
# Indent style
--indent-columns=4
--continuation-indentation=4
# Whitespace
--add-whitespace
--noblanks-before-blocks
--blanks-before-subs
# Line length
--maximum-line-length=100
# Braces
--opening-brace-on-new-line
--closing-brace-else-on-same-line
```
Format your code:
```bash
perltidy script.pl
```
### Linting with Perl::Critic
Create `.perlcriticrc`:
```
severity = 3
theme = core
[TestingAndDebugging::RequireUseStrict]
severity = 5
[TestingAndDebugging::RequireUseWarnings]
severity = 5
```
Check your code:
```bash
perlcritic script.pl
```
## The Perl Development Workflow
Here's a typical workflow for developing Perl scripts in 2025:
1. **Start with a shebang and Modern::Perl**
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
```
2. **Write your code**
3. **Format with perltidy**
```bash
perltidy -b script.pl # -b backs up original
```
4. **Check with perlcritic**
```bash
perlcritic script.pl
```
5. **Test thoroughly** (we'll cover testing in a later chapter)
6. **Document as you go** (POD - Plain Old Documentation)
## Common Gotchas and Solutions
### Problem: "Can't locate Module/Name.pm in @INC"
**Solution**: You forgot to install the module. Run `cpanm Module::Name`
### Problem: Script works on your machine but not on the server
**Solution**: Check Perl versions (`perl -v`) and installed modules. Use `cpanfile` to track dependencies.
### Problem: "Perl is too old" errors
**Solution**: That's why we installed our own Perl with Perlbrew!
### Problem: Different behavior on different systems
**Solution**: Always use `#!/usr/bin/env perl`, not `#!/usr/bin/perl`. This uses the Perl in your PATH.
## Your Toolkit Is Ready
You now have:
- A modern Perl installation you control
- CPAN modules at your fingertips
- An editor configured for Perl development
- Tools for formatting and linting your code
- A sensible project structure
This is the foundation every serious Perl programmer needs. You're not writing CGI scripts in 1999; you're writing modern, maintainable Perl in 2025.
In the next chapter, we'll dive into Perl's data types and variables. But unlike other tutorials, we'll focus on what makes Perl's approach unique and powerful for system administration and text processing tasks.
---
*Pro tip: Bookmark [metacpan.org](https://metacpan.org). It's the modern web interface to CPAN with better search, documentation, and dependency information. When you need a module for something, start there.*

View File

@@ -0,0 +1,417 @@
# Chapter 3: Perl Fundamentals - Variables and Data Types
> "In Perl, the variable tells you what it contains, not what it is." - Larry Wall
If you're coming from other languages, Perl's approach to variables might seem... different. Where Python has lists and dicts, and C has int and char, Perl has scalars, arrays, and hashes. But here's the thing: this simplicity is deceptive. These three data types, combined with references and context, give you incredible flexibility. Let's explore.
## The Sigils: Perl's Type System
In Perl, variables wear their type on their sleeve—literally. The symbol at the beginning of a variable (called a sigil) tells you what kind of data it holds:
- `$` - Scalar (single value)
- `@` - Array (ordered list)
- `%` - Hash (key-value pairs)
This isn't just syntax; it's documentation. When you see `$name`, you know it's a single value. When you see `@names`, you know it's a list. No guessing, no IDE required.
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
my $server = 'web01.prod'; # Scalar: single value
my @servers = qw(web01 web02 web03); # Array: list of values
my %status = ( # Hash: key-value pairs
web01 => 'running',
web02 => 'running',
web03 => 'maintenance'
);
```
## Scalars: Not Just Simple Values
The name "scalar" suggests simplicity, but Perl scalars are surprisingly sophisticated. A scalar can hold:
```perl
my $integer = 42;
my $float = 3.14159;
my $string = "Hello, World";
my $huge_number = 123_456_789; # Underscores for readability
my $binary = 0b11111111; # Binary literal (255)
my $hex = 0xFF; # Hexadecimal literal (255)
my $octal = 0755; # Octal literal (493)
# Perl converts between types automatically
my $answer = "42";
my $result = $answer + 8; # 50 - Perl converts string to number
say "The answer is $result";
# But be careful with strings that don't look like numbers
my $name = "Larry";
my $bad_math = $name + 5; # 5 - "Larry" becomes 0 in numeric context
```
### The Magic of Interpolation
One of Perl's most loved features is string interpolation:
```perl
my $user = 'alice';
my $home = '/home';
my $path = "$home/$user"; # Double quotes interpolate
say "User $user home: $path";
# But single quotes don't interpolate
my $literal = '$home/$user'; # Literally: $home/$user
say 'This is literal: $literal'; # Prints: This is literal: $literal
# Need a literal $ in double quotes? Escape it
say "The cost is \$42.00";
# Complex expressions need curly braces
my $count = 5;
say "Next value: ${count}1"; # Next value: 51
say "Squared: @{[$count**2]}"; # Squared: 25 (array interpolation trick)
```
### Undefined Values and Truth
Perl has a special value `undef` that represents "no value":
```perl
my $nothing; # $nothing is undef
my $something = undef; # Explicitly undef
# Check for definedness
if (defined $nothing) {
say "This won't print";
}
# Perl's truth rules:
# False: undef, 0, "0", "" (empty string), empty list
# True: Everything else (including "00", "0.0", negative numbers)
my @falsy_values = (undef, 0, "0", "");
for my $val (@falsy_values) {
say "Value " . ($val // 'undef') . " is false" unless $val;
}
# The // operator (defined-or) is incredibly useful
my $port = $ENV{PORT} // 8080; # Use env var or default to 8080
```
## Arrays: Lists with Attitude
Perl arrays are ordered, integer-indexed collections. But they're more flexible than arrays in most languages:
```perl
# Multiple ways to create arrays
my @empty = ();
my @numbers = (1, 2, 3, 4, 5);
my @words = qw(apple banana cherry); # qw = quote words
my @mixed = (42, "hello", 3.14, undef); # Mixed types? No problem!
# Array operations
push @numbers, 6; # Add to end
my $last = pop @numbers; # Remove and return last element
unshift @numbers, 0; # Add to beginning
my $first = shift @numbers; # Remove and return first element
# Array access
say $numbers[0]; # First element (note the $)
say $numbers[-1]; # Last element (negative indexing!)
say $numbers[-2]; # Second to last
# Array slices
my @subset = @numbers[1..3]; # Elements 1, 2, 3
my @selection = @numbers[0, 2, 4]; # Elements 0, 2, 4
# Array size
my $size = @numbers; # Array in scalar context gives size
my $last_index = $#numbers; # Last valid index
# Useful array operations
my @sorted = sort @words;
my @reversed = reverse @numbers;
my @unique = do { my %seen; grep { !$seen{$_}++ } @mixed };
```
### The Power of List Context
Perl's context sensitivity is unique. The same expression can return different values depending on context:
```perl
my @lines = <DATA>; # List context: all lines
my $line = <DATA>; # Scalar context: one line
# Many functions are context-aware
my @matches = $text =~ /\w+/g; # List context: all matches
my $count = $text =~ /\w+/g; # Scalar context: count of matches
# Force context
my $count = @array; # Scalar context
my @copy = @{[@array]}; # List context (array ref then deref)
my ($first) = @array; # List context, but only taking first element
```
## Hashes: The Swiss Army Data Structure
Hashes (associative arrays) map keys to values. They're unordered but incredibly fast for lookups:
```perl
# Creating hashes
my %empty = ();
my %config = (
host => 'localhost',
port => 8080,
ssl => 1,
);
# Fat comma => is just a fancy comma that quotes the left side
my %same_config = ('host', 'localhost', 'port', 8080, 'ssl', 1);
# Hash operations
$config{timeout} = 30; # Add/update
my $host = $config{host}; # Access (note the $)
delete $config{ssl}; # Remove key
my $exists = exists $config{port}; # Check if key exists
# Hash slices
my @values = @config{qw(host port)}; # Get multiple values
@config{qw(user pass)} = qw(admin secret); # Set multiple values
# Iterating hashes
for my $key (keys %config) {
say "$key: $config{$key}";
}
# Or with each (stateful iterator)
while (my ($key, $value) = each %config) {
say "$key => $value";
}
# Getting all keys and values
my @keys = keys %config;
my @values = values %config;
```
### Real-World Hash Example: Log Analysis
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Count occurrences of each IP in a log
my %ip_count;
while (<>) { # Read from files or STDIN
if (/(\d+\.\d+\.\d+\.\d+)/) {
$ip_count{$1}++;
}
}
# Sort by count and display top 10
my @sorted = sort { $ip_count{$b} <=> $ip_count{$a} } keys %ip_count;
for my $ip (@sorted[0..9]) {
last unless defined $ip;
printf "%15s: %d hits\n", $ip, $ip_count{$ip};
}
```
## References: Perl's Pointers
References let you create complex data structures. Think of them as pointers that are actually safe to use:
```perl
# Creating references
my @array = (1, 2, 3);
my $array_ref = \@array; # Reference to array
my $anon_array = [1, 2, 3]; # Anonymous array ref
my %hash = (a => 1, b => 2);
my $hash_ref = \%hash; # Reference to hash
my $anon_hash = {a => 1, b => 2}; # Anonymous hash ref
# Dereferencing
my @copy = @$array_ref; # Old style
my @copy2 = @{$array_ref}; # With braces for clarity
my @copy3 = $array_ref->@*; # Postfix dereference (modern)
# Arrow operator for accessing elements
say $array_ref->[0]; # First element
say $hash_ref->{a}; # Value for key 'a'
# Complex data structures
my $servers = {
production => {
web => ['web01', 'web02', 'web03'],
db => ['db01', 'db02'],
},
staging => {
web => ['staging-web01'],
db => ['staging-db01'],
},
};
# Access nested data
say $servers->{production}{web}[0]; # web01
push @{$servers->{production}{web}}, 'web04'; # Add web04
# Check structure with Data::Dumper
use Data::Dumper;
print Dumper($servers);
```
## Type Checking and Conversion
Perl is dynamically typed but not weakly typed. It knows what type of data you have:
```perl
use Scalar::Util qw(looks_like_number blessed reftype);
my $maybe_number = "42";
if (looks_like_number($maybe_number)) {
say "It's a number!";
}
my $ref = [1, 2, 3];
say "Reference type: " . ref($ref); # ARRAY
# Checking reference types
if (ref($ref) eq 'ARRAY') {
say "It's an array reference";
}
# More sophisticated type checking
use Ref::Util qw(is_arrayref is_hashref is_coderef);
if (is_arrayref($ref)) {
say "Definitely an array ref";
}
```
## Context: The Secret Sauce
Context is Perl's superpower. Every operation happens in either scalar or list context:
```perl
# The same function can return different things
sub context_aware {
my @data = (1, 2, 3, 4, 5);
return wantarray ? @data : scalar(@data);
}
my @list = context_aware(); # (1, 2, 3, 4, 5)
my $scalar = context_aware(); # 5
# Forcing context
my $count = () = $string =~ /pattern/g; # Force list context, get count
# Context in file operations
my @lines = <$fh>; # Read all lines
my $line = <$fh>; # Read one line
# The grep operator is context-aware
my @matches = grep { $_ > 5 } @numbers; # List of matches
my $count = grep { $_ > 5 } @numbers; # Count of matches
```
## Special Variables: Perl's Magic
Perl has numerous special variables that make common tasks easier:
```perl
# $_ - The default variable
for (1..5) {
say; # Prints $_ by default
}
# @_ - Subroutine arguments
sub greet {
my ($name) = @_; # Common idiom
say "Hello, $name!";
}
# $! - System error message
open my $fh, '<', 'nonexistent.txt' or die "Can't open: $!";
# $? - Child process exit status
system('ls');
say "Command failed: $?" if $?;
# $$ - Process ID
say "My PID is $$";
# $0 - Program name
say "Running $0";
# @ARGV - Command line arguments
say "Arguments: @ARGV";
# %ENV - Environment variables
say "Home directory: $ENV{HOME}";
```
## Practical Example: Server Status Monitor
Let's put it all together with a practical script:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use Time::Piece;
# Configuration
my %servers = (
'web01.prod' => { ip => '10.0.1.10', service => 'nginx' },
'web02.prod' => { ip => '10.0.1.11', service => 'nginx' },
'db01.prod' => { ip => '10.0.2.10', service => 'mysql' },
'cache01.prod' => { ip => '10.0.3.10', service => 'redis' },
);
# Check each server
my @down_servers;
my $check_time = localtime;
for my $hostname (sort keys %servers) {
my $server = $servers{$hostname};
my $result = `ping -c 1 -W 1 $server->{ip} 2>&1`;
if ($? != 0) {
push @down_servers, $hostname;
$server->{status} = 'down';
$server->{last_seen} = 'unknown';
} else {
$server->{status} = 'up';
$server->{last_seen} = $check_time->strftime('%Y-%m-%d %H:%M:%S');
}
}
# Report
say "=" x 50;
say "Server Status Report - $check_time";
say "=" x 50;
for my $hostname (sort keys %servers) {
my $s = $servers{$hostname};
my $status_emoji = $s->{status} eq 'up' ? '✓' : '✗';
printf "%-15s %-15s %-10s %s\n",
$hostname, $s->{ip}, $s->{service}, $status_emoji;
}
if (@down_servers) {
say "\n⚠ Alert: " . scalar(@down_servers) . " servers down!";
say " - $_" for @down_servers;
}
```
## Key Takeaways
1. **Sigils are your friends**: They make code self-documenting
2. **Context matters**: Understanding scalar vs list context is crucial
3. **References enable complexity**: But start simple
4. **Interpolation saves time**: But know when to use single quotes
5. **Special variables are powerful**: But document their use
In the next chapter, we'll explore control flow and subroutines, where Perl's philosophy of "there's more than one way to do it" really shines. You'll learn about Perl's unique statement modifiers, the various loop constructs, and how to write subroutines that are both powerful and maintainable.
---
*Remember: In Perl, the data structure you choose shapes how you think about the problem. Choose wisely, but don't overthink it. You can always refactor later.*

View File

@@ -0,0 +1,571 @@
# Chapter 4: Control Flow and Subroutines
> "Do what I mean, not what I say." - The Perl Philosophy
Control flow in Perl is where the language's personality really shines. Yes, you have your standard if/else and loops. But Perl adds statement modifiers, multiple ways to loop, and some of the most flexible subroutine handling you'll find anywhere. This chapter will show you how to write Perl that reads like English while being devastatingly effective.
## Statement Modifiers: Perl's Poetry
Most languages make you write:
```perl
if ($condition) {
do_something();
}
```
Perl lets you write:
```perl
do_something() if $condition;
```
This isn't just syntactic sugar—it's a different way of thinking about code:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Traditional vs Perl style
my $file = 'config.txt';
# Traditional
if (-e $file) {
say "File exists";
}
# Perl style - when the action is more important than the condition
say "File exists" if -e $file;
# Works with all control structures
die "Config not found!" unless -e $file;
say $_ while <DATA>;
$count++ for @items;
warn "Retrying..." until connect_to_server();
# Especially elegant for guard clauses
sub process_file {
my ($filename) = @_;
return unless defined $filename;
return unless -e $filename;
return unless -r $filename;
# Main logic here, uncluttered by nested ifs
}
```
## The Many Faces of Conditionals
### if, elsif, else, unless
Perl's `unless` is the logical opposite of `if`. Use it when the negative condition is more natural:
```perl
# Awkward double negative
if (!$user->is_authenticated) {
redirect_to_login();
}
# Clear and direct
unless ($user->is_authenticated) {
redirect_to_login();
}
# Even clearer as a modifier
redirect_to_login() unless $user->is_authenticated;
# Complex conditionals
my $status = get_server_status();
if ($status eq 'running') {
say "All systems operational";
} elsif ($status eq 'degraded') {
warn "Performance degraded";
notify_ops_team();
} elsif ($status eq 'maintenance') {
say "Scheduled maintenance in progress";
} else {
die "Unknown status: $status";
}
```
### The Ternary Operator
For simple conditionals, the ternary operator keeps things concise:
```perl
my $message = $count > 0 ? "$count items found" : "No items found";
# Nested ternaries (use sparingly!)
my $status = $code == 200 ? 'success' :
$code == 404 ? 'not found' :
$code == 500 ? 'server error' : 'unknown';
# Often clearer as a dispatch table
my %status_messages = (
200 => 'success',
404 => 'not found',
500 => 'server error',
);
my $status = $status_messages{$code} // 'unknown';
```
### given/when (Experimental but Useful)
Perl's switch statement is more powerful than most:
```perl
use feature 'switch';
no warnings 'experimental::smartmatch';
given ($command) {
when ('start') { start_service() }
when ('stop') { stop_service() }
when ('restart') { stop_service(); start_service() }
when (/^stat/) { show_status() } # Regex matching!
when ([qw(help ? h)]) { show_help() } # Array matching!
default { die "Unknown command: $command" }
}
```
## Loops: There's More Than One Way To Iterate
### The C-Style for Loop
Traditional but verbose:
```perl
for (my $i = 0; $i < 10; $i++) {
say "Count: $i";
}
```
### The Perl-Style foreach
Much more common and readable:
```perl
# foreach and for are synonyms
for my $item (@items) {
process($item);
}
# Default variable $_
for (@items) {
process($_); # Or just: process()
}
# Range operator
for my $num (1..100) {
say $num if $num % 15 == 0; # FizzBuzz anyone?
}
# Hash iteration
for my $key (keys %hash) {
say "$key: $hash{$key}";
}
# Multiple items at once
my @pairs = qw(a 1 b 2 c 3);
for (my $i = 0; $i < @pairs; $i += 2) {
my ($letter, $number) = @pairs[$i, $i+1];
say "$letter = $number";
}
```
### while and until
Perfect for conditional iteration:
```perl
# Read file line by line
while (my $line = <$fh>) {
chomp $line;
process($line);
}
# With default variable
while (<$fh>) {
chomp; # Operates on $_
process($_);
}
# until is while's opposite
my $attempts = 0;
until (connect_to_database() || $attempts++ > 5) {
sleep 2 ** $attempts; # Exponential backoff
}
```
### Loop Control
Perl provides fine-grained loop control:
```perl
# next - skip to next iteration
for my $file (@files) {
next if $file =~ /^\./; # Skip hidden files
process($file);
}
# last - exit loop
for my $line (<$fh>) {
last if $line =~ /^__END__$/;
process($line);
}
# redo - restart current iteration
my $tries = 0;
for my $url (@urls) {
my $response = fetch($url);
if (!$response && $tries++ < 3) {
sleep 1;
redo; # Try same URL again
}
$tries = 0; # Reset for next URL
}
# Labels for nested loops
FILE: for my $file (@files) {
LINE: while (my $line = <$file>) {
next FILE if $line =~ /SKIP_FILE/;
next LINE if $line =~ /^\s*#/;
process($line);
}
}
```
### The Elegant map and grep
Functional programming in Perl:
```perl
# Transform lists with map
my @files = qw(foo.txt bar.log baz.conf);
my @sizes = map { -s $_ } @files; # Get file sizes
my @upper = map { uc } @words; # Uppercase all
# Filter lists with grep
my @configs = grep { /\.conf$/ } @files;
my @large_files = grep { -s $_ > 1024 * 1024 } @files;
# Chain them
my @large_logs = grep { -s $_ > 1024 * 1024 }
map { "$logdir/$_" }
grep { /\.log$/ }
readdir($dh);
# Schwartzian Transform (sorting optimization)
my @sorted = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, -s $_] }
@files;
```
## Subroutines: Functions with Flexibility
### Basic Subroutines
```perl
# Old style (still common)
sub greet {
my ($name) = @_; # Unpack @_
$name //= 'World';
say "Hello, $name!";
}
# Modern style with signatures (Perl 5.20+)
use feature 'signatures';
no warnings 'experimental::signatures';
sub greet_modern($name = 'World') {
say "Hello, $name!";
}
# Multiple parameters
sub calculate($x, $y, $operation = '+') {
return $x + $y if $operation eq '+';
return $x - $y if $operation eq '-';
return $x * $y if $operation eq '*';
return $x / $y if $operation eq '/';
die "Unknown operation: $operation";
}
```
### Return Values and Context
Subroutines can be context-aware:
```perl
sub get_data {
my @data = qw(apple banana cherry);
# wantarray tells us the context
if (wantarray) {
return @data; # List context
} elsif (defined wantarray) {
return scalar @data; # Scalar context
} else {
say "@data"; # Void context
return;
}
}
my @fruits = get_data(); # ('apple', 'banana', 'cherry')
my $count = get_data(); # 3
get_data(); # Prints: apple banana cherry
```
### Anonymous Subroutines and Closures
Perl supports first-class functions:
```perl
# Anonymous subroutine
my $greeter = sub {
my ($name) = @_;
say "Hello, $name!";
};
$greeter->('Perl');
# Closures capture variables
sub make_counter {
my $count = 0;
return sub { ++$count };
}
my $counter1 = make_counter();
my $counter2 = make_counter();
say $counter1->(); # 1
say $counter1->(); # 2
say $counter2->(); # 1 (independent counter)
# Higher-order functions
sub apply_to_list {
my ($func, @list) = @_;
return map { $func->($_) } @list;
}
my @doubled = apply_to_list(sub { $_ * 2 }, 1..5);
```
### Prototypes (Use with Caution)
Prototypes let you create subroutines that parse like built-ins:
```perl
# Prototype forces scalar context on first arg
sub my_push(\@@) {
my ($array_ref, @values) = @_;
push @$array_ref, @values;
}
my @stack;
my_push @stack, 1, 2, 3; # No need for \@stack
# Block prototype for DSL-like syntax
sub with_file(&$) {
my ($code, $filename) = @_;
open my $fh, '<', $filename or die $!;
$code->($fh);
close $fh;
}
with_file {
my $fh = shift;
while (<$fh>) {
print if /ERROR/;
}
} 'logfile.txt';
```
## Error Handling: Die, Warn, and Eval
### Basic Error Handling
```perl
# die - throw an exception
open my $fh, '<', $file or die "Can't open $file: $!";
# warn - print warning but continue
warn "Config file not found, using defaults\n" unless -e $config;
# Custom die handler
$SIG{__DIE__} = sub {
my $message = shift;
log_error($message);
die $message; # Re-throw
};
```
### Exception Handling with eval
```perl
# Basic eval
eval {
risky_operation();
another_risky_operation();
};
if ($@) {
warn "Operation failed: $@";
# Handle error
}
# String eval (compile and run code at runtime)
my $code = 'print "Hello, World!\n"';
eval $code;
die "Code compilation failed: $@" if $@;
```
### Modern Exception Handling with Try::Tiny
```perl
use Try::Tiny;
try {
risky_operation();
} catch {
warn "Caught error: $_";
# $_ contains the error
} finally {
cleanup(); # Always runs
};
```
## Practical Example: Log File Processor
Let's combine everything into a real-world script:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use feature 'signatures';
no warnings 'experimental::signatures';
use Try::Tiny;
use Time::Piece;
# Configuration
my %severity_levels = (
DEBUG => 0,
INFO => 1,
WARNING => 2,
ERROR => 3,
FATAL => 4,
);
# Process command line
my ($logfile, $min_level) = @ARGV;
die "Usage: $0 <logfile> [min_level]\n" unless $logfile;
$min_level //= 'INFO';
# Main processing
process_log($logfile, $min_level);
sub process_log($file, $min_level) {
my $min_severity = $severity_levels{$min_level}
// die "Unknown level: $min_level";
open my $fh, '<', $file or die "Can't open $file: $!";
my %stats;
my $line_count = 0;
LINE: while (my $line = <$fh>) {
$line_count++;
chomp $line;
# Skip empty lines and comments
next LINE if $line =~ /^\s*$/;
next LINE if $line =~ /^\s*#/;
# Parse log line
my ($timestamp, $level, $message) = parse_line($line);
next LINE unless $timestamp; # Skip unparseable lines
# Filter by severity
my $severity = $severity_levels{$level} // 0;
next LINE if $severity < $min_severity;
# Collect statistics
$stats{$level}++;
# Special handling for errors
if ($level eq 'ERROR' || $level eq 'FATAL') {
handle_error($timestamp, $level, $message);
}
}
close $fh;
# Report statistics
report_stats(\%stats, $line_count);
}
sub parse_line($line) {
# Example format: 2024-01-15 10:30:45 [ERROR] Connection timeout
return unless $line =~ /
^(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+ # timestamp
\[(\w+)\]\s+ # level
(.+)$ # message
/x;
return ($1, $2, $3);
}
sub handle_error($timestamp, $level, $message) {
state %recent_errors;
state $last_cleanup = time;
# Clean old errors every minute
if (time - $last_cleanup > 60) {
%recent_errors = ();
$last_cleanup = time;
}
# Detect repeated errors
my $key = "$level:$message";
$recent_errors{$key}++;
if ($recent_errors{$key} == 1) {
say "[$timestamp] $level: $message";
} elsif ($recent_errors{$key} == 10) {
warn "Error '$message' has occurred 10 times!";
}
}
sub report_stats($stats, $total) {
say "\n" . "=" x 40;
say "Log Analysis Summary";
say "=" x 40;
say "Total lines processed: $total";
say "\nEvents by severity:";
for my $level (sort { $severity_levels{$b} <=> $severity_levels{$a} }
keys %severity_levels) {
my $count = $stats->{$level} // 0;
next unless $count;
printf " %-8s: %6d\n", $level, $count;
}
}
__DATA__
2024-01-15 10:30:45 [INFO] Server started
2024-01-15 10:30:46 [DEBUG] Loading configuration
2024-01-15 10:31:00 [ERROR] Database connection failed
2024-01-15 10:31:01 [ERROR] Database connection failed
2024-01-15 10:31:02 [WARNING] Retrying database connection
```
## Best Practices
1. **Use statement modifiers for simple conditions** - They make code more readable
2. **Prefer foreach over C-style for** - Unless you specifically need the index
3. **Use map and grep for transformations** - They're faster and clearer than loops
4. **Always unpack @_ at the start of subroutines** - Makes the interface clear
5. **Use state variables instead of file-scoped variables** - Better encapsulation
6. **Handle errors early and explicitly** - Don't let them propagate silently
## Coming Up Next
Now that you understand Perl's control flow and subroutines, you're ready for the main event: regular expressions. In the next chapter, we'll explore why Perl's regex implementation is still the gold standard, and how to wield this power effectively in your system administration tasks.
---
*Remember: Good Perl code tells a story. The statement modifiers, flexible loops, and rich error handling aren't just features—they're tools for expressing your intent clearly. Use them to write code that reads like documentation.*

View File

@@ -0,0 +1,541 @@
# Chapter 5: Regular Expressions - Perl's Superpower
> "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." - Jamie Zawinski
> "Those people weren't using Perl." - A Perl Programmer
Regular expressions aren't bolted onto Perl as an afterthought or imported from a library. They're woven into the language's DNA. When other languages were struggling with clunky regex APIs, Perl developers were parsing complex log files with one-liners. This chapter will show you why Perl's regex implementation is still unmatched and how to wield this power responsibly.
## The Basics (But Better)
### Match Operator: m//
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
my $text = "The server at 192.168.1.100 responded in 245ms";
# Basic matching
if ($text =~ /server/) {
say "Found 'server'";
}
# Capture groups
if ($text =~ /(\d+\.\d+\.\d+\.\d+)/) {
say "IP address: $1"; # $1 contains first capture
}
# Multiple captures
if ($text =~ /at ([\d.]+) responded in (\d+)ms/) {
my ($ip, $time) = ($1, $2);
say "Server $ip took ${time}ms";
}
# The !~ operator for negation
say "No errors!" if $text !~ /error|fail|timeout/i;
# Default variable $_
$_ = "Testing 123";
say "Contains number" if /\d+/; # No need for $_ =~
```
### Substitution Operator: s///
```perl
my $config = "ServerName = localhost:8080";
# Basic substitution
$config =~ s/localhost/127.0.0.1/;
say $config; # ServerName = 127.0.0.1:8080
# Global substitution with /g
my $log = "Error Error Warning Error";
$log =~ s/Error/Issue/g;
say $log; # Issue Issue Warning Issue
# Capture and replace
my $date = "2024-01-15";
$date =~ s/(\d{4})-(\d{2})-(\d{2})/$3\/$2\/$1/;
say $date; # 15/01/2024
# Using the result
my $count = $log =~ s/Warning/ALERT/g; # Returns number of replacements
say "Replaced $count warnings";
# The /r modifier returns modified string without changing original
my $original = "hello world";
my $modified = $original =~ s/world/Perl/r;
say $original; # hello world (unchanged)
say $modified; # hello Perl
```
### The Transliteration Operator: tr/// (or y///)
Not technically a regex, but often used alongside them:
```perl
my $text = "Hello World 123";
# Count characters
my $digit_count = $text =~ tr/0-9//;
say "Contains $digit_count digits";
# ROT13 cipher
$text =~ tr/A-Za-z/N-ZA-Mn-za-m/;
say $text; # Uryyb Jbeyq 123
# Remove duplicates
$text =~ tr/a-z//s; # /s squashes duplicate characters
# Delete characters
$text =~ tr/0-9//d; # /d deletes matched characters
```
## Regex Modifiers: Changing the Rules
```perl
# /i - Case insensitive
say "Match!" if "HELLO" =~ /hello/i;
# /x - Extended formatting (ignore whitespace, allow comments)
my $ip_regex = qr/
^ # Start of string
(\d{1,3}) # First octet
\. # Literal dot
(\d{1,3}) # Second octet
\. # Literal dot
(\d{1,3}) # Third octet
\. # Literal dot
(\d{1,3}) # Fourth octet
$ # End of string
/x;
# /s - Single line mode (. matches newline)
my $html = "<div>\nContent\n</div>";
$html =~ /<div>(.*?)<\/div>/s; # Captures across newlines
# /m - Multi-line mode (^ and $ match line boundaries)
my $multi = "Line 1\nLine 2\nLine 3";
my @lines = $multi =~ /^Line \d+$/gm;
# /g - Global matching
my $data = "cat bat rat";
my @words = $data =~ /\w+/g; # ('cat', 'bat', 'rat')
# /o - Compile pattern once (optimization for loops)
for my $line (@huge_file) {
$line =~ /$pattern/o; # Pattern compiled only once
}
```
## Advanced Pattern Matching
### Non-Capturing Groups
```perl
# (?:...) doesn't create a capture variable
my $url = "https://www.example.com:8080/path";
if ($url =~ /^(https?):\/\/(?:www\.)?([^:\/]+)(?::(\d+))?/) {
my ($protocol, $domain, $port) = ($1, $2, $3);
$port //= $protocol eq 'https' ? 443 : 80;
say "Protocol: $protocol, Domain: $domain, Port: $port";
}
```
### Named Captures (Perl 5.10+)
```perl
# Much more readable than $1, $2, $3...
my $log_line = '2024-01-15 10:30:45 [ERROR] Connection timeout';
if ($log_line =~ /
(?<date>\d{4}-\d{2}-\d{2})\s+
(?<time>\d{2}:\d{2}:\d{2})\s+
\[(?<level>\w+)\]\s+
(?<message>.+)
/x) {
say "Date: $+{date}";
say "Time: $+{time}";
say "Level: $+{level}";
say "Message: $+{message}";
}
```
### Lookahead and Lookbehind
```perl
# Positive lookahead (?=...)
# Match 'test' only if followed by 'ing'
"testing tested" =~ /test(?=ing)/; # Matches 'test' in 'testing'
# Negative lookahead (?!...)
# Match 'test' only if NOT followed by 'ing'
"testing tested" =~ /test(?!ing)/; # Matches 'test' in 'tested'
# Positive lookbehind (?<=...)
# Match numbers preceded by '$'
"Price: $50, €50" =~ /(?<=\$)\d+/; # Matches '50' after '$'
# Negative lookbehind (?<!...)
# Match numbers NOT preceded by '$'
"Price: $50, €50" =~ /(?<!\$)\d+/; # Matches '50' after '€'
# Practical example: Extract price without currency symbol
my $price_text = "The cost is $1,234.56 including tax";
if ($price_text =~ /\$(?<price>[\d,]+\.?\d*)/) {
my $price = $+{price};
$price =~ s/,//g; # Remove commas
say "Price: $price"; # Price: 1234.56
}
```
### Recursive Patterns
Perl can match nested structures:
```perl
# Match balanced parentheses
my $balanced = qr/
\( # Opening paren
(?:
[^()]+ # Non-parens
|
(?R) # Recurse entire pattern
)*
\) # Closing paren
/x;
my $text = "func(a, b(c, d(e)), f)";
say "Balanced!" if $text =~ /^func$balanced$/;
```
## Real-World Regex Patterns
### Email Validation (Simplified)
```perl
# This is simplified. Real email validation is complex!
my $email_regex = qr/
^ # Start
[\w\.\-]+ # Local part
\@ # At sign
[\w\-]+ # Domain name
(?:\.[\w\-]+)+ # Domain extensions
$ # End
/x;
my @emails = qw(
user@example.com
john.doe@company.co.uk
invalid@
@invalid.com
valid+tag@gmail.com
);
for my $email (@emails) {
if ($email =~ $email_regex) {
say "$email is valid";
} else {
say "$email is invalid";
}
}
```
### Log File Parsing
```perl
# Apache/Nginx log parser
my $log_regex = qr/
^
(?<ip>[\d\.]+)\s+ # IP address
(?<ident>\S+)\s+ # Identity
(?<user>\S+)\s+ # User
\[(?<timestamp>[^\]]+)\]\s+ # Timestamp
"(?<request>[^"]+)"\s+ # Request
(?<status>\d{3})\s+ # Status code
(?<size>\d+|-)\s* # Response size
"(?<referer>[^"]*)"\s* # Referer
"(?<agent>[^"]*)" # User agent
/x;
while (my $line = <$log_fh>) {
next unless $line =~ $log_regex;
my %entry = %+; # Copy all named captures
# Process the log entry
if ($entry{status} >= 500) {
warn "Server error: $entry{request} returned $entry{status}";
}
# Extract more info from request
if ($entry{request} =~ /^(?<method>\S+)\s+(?<path>\S+)\s+(?<proto>\S+)/) {
$entry{method} = $+{method};
$entry{path} = $+{path};
}
}
```
### Configuration File Parser
```perl
# Parse INI-style config files
sub parse_config {
my ($filename) = @_;
my %config;
my $section = 'DEFAULT';
open my $fh, '<', $filename or die "Can't open $filename: $!";
while (my $line = <$fh>) {
chomp $line;
# Skip comments and empty lines
next if $line =~ /^\s*(?:#|$)/;
# Section header
if ($line =~ /^\[([^\]]+)\]/) {
$section = $1;
next;
}
# Key-value pair
if ($line =~ /
^\s*
([^=]+?) # Key (non-greedy)
\s*=\s* # Equals with optional whitespace
(.*) # Value
$
/x) {
my ($key, $value) = ($1, $2);
# Remove quotes if present
$value =~ s/^["']|["']$//g;
# Store in config
$config{$section}{$key} = $value;
}
}
close $fh;
return \%config;
}
```
## Performance and Optimization
### Compile Once, Use Many
```perl
# Bad: Regex compiled every iteration
for my $line (@lines) {
if ($line =~ /$user_pattern/) { # Compiles each time
process($line);
}
}
# Good: Pre-compile regex
my $regex = qr/$user_pattern/;
for my $line (@lines) {
if ($line =~ $regex) { # Already compiled
process($line);
}
}
# Better: Use state for persistent compiled regex
sub match_pattern {
my ($text, $pattern) = @_;
state %compiled;
$compiled{$pattern} //= qr/$pattern/;
return $text =~ $compiled{$pattern};
}
```
### Avoiding Backtracking
```perl
# Bad: Catastrophic backtracking possible
$text =~ /(\w+)*$/; # Nested quantifiers
# Good: Possessive quantifiers prevent backtracking
$text =~ /(\w++)*$/; # ++ is possessive
# Bad: Greedy matching with backtracking
$html =~ /<div>.*<\/div>/; # Matches too much
# Good: Non-greedy matching
$html =~ /<div>.*?<\/div>/; # *? is non-greedy
# Better: Explicit matching
$html =~ /<div>[^<]*<\/div>/; # More efficient
```
## Practical Script: Web Scraper
Let's build a simple web scraper using Perl's regex powers:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use LWP::Simple;
use HTML::Entities;
# Fetch a web page and extract information
my $url = shift @ARGV or die "Usage: $0 <URL>\n";
my $html = get($url) or die "Couldn't fetch $url\n";
# Remove script and style blocks
$html =~ s/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>//gis;
$html =~ s/<style\b[^<]*(?:(?!<\/style>)<[^<]*)*<\/style>//gis;
# Extract title
my ($title) = $html =~ /<title>([^<]+)<\/title>/i;
$title //= 'No title';
say "Title: " . decode_entities($title);
# Extract all links
my @links = $html =~ /<a\s+(?:[^>]*?\s+)?href=["']([^"']+)["']/gi;
say "\nFound " . scalar(@links) . " links:";
# Process and display unique links
my %seen;
for my $link (@links) {
next if $seen{$link}++;
next if $link =~ /^#/; # Skip anchors
# Make relative URLs absolute
if ($link !~ /^https?:\/\//) {
if ($link =~ /^\//) {
# Absolute path
my ($base) = $url =~ /(https?:\/\/[^\/]+)/;
$link = "$base$link";
} else {
# Relative path
my ($base) = $url =~ /(.*\/)/;
$link = "$base$link";
}
}
say " - $link";
}
# Extract meta tags
say "\nMeta tags:";
while ($html =~ /<meta\s+([^>]+)>/gi) {
my $meta = $1;
my ($name) = $meta =~ /name=["']([^"']+)["']/i;
my ($content) = $meta =~ /content=["']([^"']+)["']/i;
if ($name && $content) {
say " $name: " . decode_entities($content);
}
}
# Extract all email addresses (naive pattern)
my @emails = $html =~ /\b([\w\.\-]+@[\w\.\-]+\.\w+)\b/g;
if (@emails) {
say "\nEmail addresses found:";
my %unique_emails;
@unique_emails{@emails} = ();
say " - $_" for sort keys %unique_emails;
}
```
## Debugging Regular Expressions
### The use re 'debug' Pragma
```perl
use re 'debug';
"test string" =~ /test.*string/;
# This will output the regex compilation and execution process
# Great for understanding why a regex isn't matching
```
### Building Regexes Incrementally
```perl
# Start simple and build up
my $regex = qr/\d+/; # Match numbers
$regex = qr/\d+\.\d+/; # Match decimals
$regex = qr/\d+(?:\.\d+)?/; # Optional decimal part
$regex = qr/^\d+(?:\.\d+)?$/; # Anchor to whole string
# Test at each stage
my @test_cases = qw(123 123.45 .45 123. abc);
for my $test (@test_cases) {
if ($test =~ $regex) {
say "$test matches";
} else {
say "$test doesn't match";
}
}
```
## Common Gotchas and Solutions
### The Greediness Problem
```perl
my $xml = '<tag>content</tag><tag>more</tag>';
# Wrong: Greedy matching
$xml =~ /<tag>.*<\/tag>/; # Matches entire string!
# Right: Non-greedy
$xml =~ /<tag>.*?<\/tag>/; # Matches first tag pair
# Better: Explicit
$xml =~ /<tag>[^<]*<\/tag>/; # Most efficient
```
### The Anchor Trap
```perl
# Dangerous: No anchors
if ($input =~ /\d{3}/) {
# Matches "abc123def" - probably not intended!
}
# Safe: With anchors
if ($input =~ /^\d{3}$/) {
# Only matches exactly 3 digits
}
```
### Special Characters in Variables
```perl
my $user_input = "What???";
# Wrong: ? is a regex metacharacter
if ($text =~ /$user_input/) { # Error!
# Right: Quote metacharacters
if ($text =~ /\Q$user_input\E/) { # Treats ??? as literal
```
## Best Practices
1. **Comment complex regexes** - Use /x modifier liberally
2. **Name your captures** - $+{name} is clearer than $3
3. **Compile once when possible** - Use qr// for repeated patterns
4. **Test incrementally** - Build complex patterns step by step
5. **Consider alternatives** - Sometimes a parser is better than a regex
6. **Anchor when appropriate** - Prevent unexpected matches
7. **Be careful with user input** - Always use \Q...\E for literal matching
## The Zen of Perl Regexes
Regular expressions in Perl aren't just a feature—they're a philosophy. They embody Perl's core principle: make easy things easy and hard things possible. Yes, you can write unreadable regex golf. But you can also write clear, maintainable patterns that solve real problems elegantly.
The key is knowing when to use them. Not every text processing task needs a regex. But when you do need one, Perl ensures you have the full power of regular expressions at your fingertips, integrated seamlessly into the language.
---
*Next up: File I/O and directory operations. We'll see how Perl's "Do What I Mean" philosophy extends to file handling, and why Perl remains a favorite for system administrators who need to process thousands of files efficiently.*

View File

@@ -0,0 +1,624 @@
# Chapter 6: File I/O and Directory Operations
> "Files are the nouns of system administration. Everything else is just verbs." - Unknown Unix Philosopher
If text processing is Perl's superpower, then file handling is its trusty sidekick. Together, they form an unstoppable duo for system administration. This chapter covers everything from basic file operations to advanced techniques like file locking, memory mapping, and atomic operations. By the end, you'll understand why Perl remains the go-to language for file wrangling.
## Opening Files: The Modern Way
### The Three-Argument Open
Forget what you learned in 1999. Modern Perl uses three-argument open:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Old way (DON'T DO THIS)
open FILE, "data.txt"; # Ambiguous and unsafe
open FILE, ">data.txt"; # Easy to accidentally clobber
open FILE, "cat data.txt|"; # Security nightmare!
# Modern way (DO THIS)
open my $fh, '<', 'data.txt' or die "Can't open data.txt: $!";
open my $out, '>', 'output.txt' or die "Can't create output.txt: $!";
open my $append, '>>', 'log.txt' or die "Can't append to log.txt: $!";
# Lexical filehandles = automatic cleanup
{
open my $temp, '>', 'temp.txt' or die $!;
print $temp "This file will be closed automatically\n";
} # $temp goes out of scope, file closed
# Read/write mode
open my $rw, '+<', 'data.txt' or die $!; # Read and write
open my $trunc, '+>', 'new.txt' or die $!; # Truncate and read/write
```
### Opening Files Safely
```perl
use autodie; # Automatic error handling
# No need for 'or die' with autodie
open my $fh, '<', 'data.txt';
# But sometimes you want to handle errors yourself
use Try::Tiny;
try {
open my $fh, '<', 'maybe_exists.txt';
# Process file
} catch {
warn "File doesn't exist, using defaults";
# Use default values
};
# Check if file is readable before opening
if (-r 'data.txt') {
open my $fh, '<', 'data.txt';
# Process
}
```
## Reading Files: Choose Your Weapon
### Line by Line (Memory Efficient)
```perl
# The classic way
open my $fh, '<', 'huge_file.txt' or die $!;
while (my $line = <$fh>) {
chomp $line;
process_line($line);
}
close $fh;
# With automatic line ending handling
open my $fh, '<:crlf', 'windows_file.txt' or die $!; # Handle \r\n
while (<$fh>) {
chomp; # Works on $_
say "Line: $_";
}
# Reading with a specific input separator
{
local $/ = "\n\n"; # Paragraph mode
open my $fh, '<', 'paragraphs.txt' or die $!;
while (my $paragraph = <$fh>) {
process_paragraph($paragraph);
}
}
# Reading fixed-length records
{
local $/ = \1024; # Read 1024 bytes at a time
open my $fh, '<:raw', 'binary.dat' or die $!;
while (my $chunk = <$fh>) {
process_chunk($chunk);
}
}
```
### Slurping (Read Entire File)
```perl
# Simple slurp
my $content = do {
open my $fh, '<', 'file.txt' or die $!;
local $/; # Slurp mode
<$fh>;
};
# Slurp to array (one line per element)
open my $fh, '<', 'file.txt' or die $!;
my @lines = <$fh>;
chomp @lines;
# Using Path::Tiny (the modern way)
use Path::Tiny;
my $content = path('file.txt')->slurp_utf8;
my @lines = path('file.txt')->lines({ chomp => 1 });
# Slurp with size check (prevent memory issues)
sub safe_slurp {
my ($filename, $max_size) = @_;
$max_size //= 10 * 1024 * 1024; # 10MB default
my $size = -s $filename;
die "File too large ($size bytes)" if $size > $max_size;
open my $fh, '<', $filename or die $!;
local $/;
return <$fh>;
}
```
## Writing Files: Getting Data Out
### Basic Writing
```perl
open my $out, '>', 'output.txt' or die $!;
print $out "Hello, World!\n";
printf $out "Number: %d, String: %s\n", 42, "Perl";
say $out "Automatic newline"; # say adds \n
close $out;
# Write array to file
my @data = qw(apple banana cherry);
open my $fh, '>', 'fruits.txt' or die $!;
say $fh $_ for @data; # One per line
close $fh;
# Write with specific line endings
open my $fh, '>:crlf', 'windows.txt' or die $!; # Force \r\n
print $fh "Windows line ending\n";
```
### Atomic Writes (The Safe Way)
```perl
# Never corrupt files with partial writes
sub atomic_write {
my ($filename, $content) = @_;
my $temp = "$filename.tmp.$$"; # PID for uniqueness
open my $fh, '>', $temp or die "Can't write to $temp: $!";
print $fh $content;
close $fh or die "Can't close $temp: $!";
rename $temp, $filename or die "Can't rename $temp to $filename: $!";
}
# Using File::Temp for truly safe temp files
use File::Temp qw(tempfile);
my ($fh, $tempname) = tempfile(DIR => '/tmp');
print $fh "Temporary content\n";
close $fh;
rename $tempname, 'final.txt' or die $!;
```
## File Tests: Know Your Files
Perl's file test operators are legendary:
```perl
my $file = 'test.txt';
# Existence and type
say "Exists" if -e $file;
say "Regular file" if -f $file;
say "Directory" if -d $file;
say "Symbolic link" if -l $file;
say "Named pipe" if -p $file;
say "Socket" if -S $file;
say "Block device" if -b $file;
say "Character device" if -c $file;
# Permissions
say "Readable" if -r $file;
say "Writable" if -w $file;
say "Executable" if -x $file;
say "Owned by me" if -o $file;
# Size and age
my $size = -s $file; # Size in bytes
my $age_days = -M $file; # Days since modification
my $access_days = -A $file; # Days since last access
my $inode_days = -C $file; # Days since inode change
# Stacking file tests (5.10+)
if (-f -r -w $file) {
say "Regular file, readable and writable";
}
# The special _ filehandle (cached stat)
if (-e $file) {
my $size = -s _; # Uses cached stat from -e test
my $mtime = -M _; # Still using cached stat
say "File is $size bytes, modified $mtime days ago";
}
```
## Directory Operations
### Reading Directories
```perl
# Old school opendir/readdir
opendir my $dh, '.' or die "Can't open directory: $!";
my @files = readdir $dh;
closedir $dh;
# Filter hidden files
my @visible = grep { !/^\./ } @files;
# Get full paths
my $dir = '/etc';
opendir my $dh, $dir or die $!;
my @paths = map { "$dir/$_" } grep { !/^\.\.?$/ } readdir $dh;
# Using glob (shell-style patterns)
my @perl_files = glob("*.pl");
my @all_files = glob("*");
my @hidden = glob(".*");
my @recursive = glob("**/*.txt"); # Requires bsd_glob
# Modern way with Path::Tiny
use Path::Tiny;
my @files = path('.')->children;
my @perl_files = path('.')->children(qr/\.pl$/);
# Recursive directory traversal
use File::Find;
find(sub {
return unless -f; # Only files
return unless /\.log$/; # Only .log files
my $path = $File::Find::name;
my $size = -s;
say "$path: $size bytes";
}, '/var/log');
```
### Creating and Removing Directories
```perl
# Create directory
mkdir 'new_dir' or die "Can't create directory: $!";
mkdir 'deep/nested/dir'; # Fails if parent doesn't exist
# Create with permissions
mkdir 'secure_dir', 0700 or die $!; # Owner only
# Create nested directories
use File::Path qw(make_path remove_tree);
make_path('deep/nested/structure') or die "Can't create path: $!";
make_path('dir1', 'dir2', 'dir3', { mode => 0755 });
# Remove directories
rmdir 'empty_dir' or die $!; # Only works if empty
remove_tree('dir_with_contents'); # Recursive removal
remove_tree('dangerous_dir', { safe => 1 }); # Safe mode
# Using Path::Tiny
use Path::Tiny;
path('some/deep/directory')->mkpath;
path('to_delete')->remove_tree;
```
## Advanced File Operations
### File Locking
```perl
use Fcntl qw(:flock);
open my $fh, '+<', 'shared.dat' or die $!;
# Exclusive lock (writing)
flock($fh, LOCK_EX) or die "Can't lock file: $!";
# ... modify file ...
flock($fh, LOCK_UN); # Explicit unlock (optional)
# Shared lock (reading)
flock($fh, LOCK_SH) or die "Can't get shared lock: $!";
# ... read file ...
# Non-blocking lock attempt
if (flock($fh, LOCK_EX | LOCK_NB)) {
# Got the lock
} else {
warn "File is locked by another process";
}
# Lock with timeout
sub lock_with_timeout {
my ($fh, $timeout) = @_;
my $tries = 0;
until (flock($fh, LOCK_EX | LOCK_NB)) {
return 0 if ++$tries > $timeout;
sleep 1;
}
return 1;
}
```
### Memory-Mapped Files
```perl
use File::Map qw(map_file);
# Map file to memory
map_file my $map, 'large_file.dat';
# Now $map is like a string, but backed by the file
if ($map =~ /pattern/) {
say "Found pattern";
}
# Modifications write through to disk
$map =~ s/old/new/g; # Changes the actual file!
# Explicitly sync to disk
use File::Map qw(sync);
sync($map);
```
### Binary File Handling
```perl
# Reading binary files
open my $fh, '<:raw', 'image.jpg' or die $!;
binmode $fh; # Alternative way to set binary mode
my $header;
read($fh, $header, 10); # Read exactly 10 bytes
# Unpack binary data
my ($magic, $width, $height) = unpack('A4 N N', $header);
# Seek to specific position
seek($fh, 1024, 0); # Absolute position
seek($fh, -10, 2); # 10 bytes from end
my $pos = tell($fh); # Current position
# Writing binary data
open my $out, '>:raw', 'output.bin' or die $!;
my $packed = pack('N*', 1, 2, 3, 4, 5);
print $out $packed;
```
## Real-World File Processing
### Log Rotation Script
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use File::Copy;
use File::Path qw(make_path);
use POSIX qw(strftime);
sub rotate_logs {
my ($log_file, $keep_days) = @_;
$keep_days //= 30;
return unless -e $log_file;
# Create archive directory
my $archive_dir = 'log_archive';
make_path($archive_dir) unless -d $archive_dir;
# Generate archive name with timestamp
my $timestamp = strftime('%Y%m%d_%H%M%S', localtime);
my $archive_name = "$archive_dir/" .
basename($log_file) .
".$timestamp";
# Compress if large
if (-s $log_file > 1024 * 1024) { # > 1MB
$archive_name .= '.gz';
system("gzip -c $log_file > $archive_name") == 0
or die "Compression failed: $?";
unlink $log_file;
} else {
move($log_file, $archive_name) or die "Move failed: $!";
}
# Create new empty log file
open my $fh, '>', $log_file or die $!;
close $fh;
# Clean old archives
clean_old_archives($archive_dir, $keep_days);
}
sub clean_old_archives {
my ($dir, $keep_days) = @_;
opendir my $dh, $dir or die $!;
while (my $file = readdir $dh) {
next if $file =~ /^\./;
my $path = "$dir/$file";
next unless -f $path;
if (-M $path > $keep_days) {
unlink $path or warn "Can't delete $path: $!";
say "Deleted old archive: $file";
}
}
closedir $dh;
}
# Use it
rotate_logs('/var/log/myapp.log', 30);
```
### File Synchronization
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use File::Find;
use File::Copy;
use Digest::MD5;
use Path::Tiny;
sub sync_directories {
my ($source, $dest) = @_;
die "Source doesn't exist: $source" unless -d $source;
make_path($dest) unless -d $dest;
my %source_files;
# Scan source directory
find(sub {
return unless -f;
my $rel_path = $File::Find::name;
$rel_path =~ s/^\Q$source\E\/?//;
$source_files{$rel_path} = {
size => -s $_,
mtime => -M $_,
md5 => calculate_md5($_),
};
}, $source);
# Sync files
for my $file (keys %source_files) {
my $src_path = "$source/$file";
my $dst_path = "$dest/$file";
if (!-e $dst_path ||
files_differ($src_path, $dst_path)) {
# Create directory structure
my $dst_dir = path($dst_path)->parent;
$dst_dir->mkpath unless -d $dst_dir;
# Copy file
copy($src_path, $dst_path)
or die "Copy failed: $!";
say "Synced: $file";
}
}
# Remove files not in source
find(sub {
return unless -f;
my $rel_path = $File::Find::name;
$rel_path =~ s/^\Q$dest\E\/?//;
unless (exists $source_files{$rel_path}) {
unlink $File::Find::name;
say "Removed: $rel_path";
}
}, $dest);
}
sub calculate_md5 {
my ($file) = @_;
open my $fh, '<:raw', $file or die $!;
my $md5 = Digest::MD5->new;
$md5->addfile($fh);
return $md5->hexdigest;
}
sub files_differ {
my ($file1, $file2) = @_;
# Quick checks first
return 1 if -s $file1 != -s $file2;
# Then MD5
return calculate_md5($file1) ne calculate_md5($file2);
}
```
### CSV to JSON Converter
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use Text::CSV;
use JSON::XS;
sub csv_to_json {
my ($csv_file, $json_file) = @_;
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
open my $fh, '<:encoding(utf8)', $csv_file or die $!;
# Read header
my $headers = $csv->getline($fh);
$csv->column_names(@$headers);
# Read all rows as hashrefs
my @data;
while (my $row = $csv->getline_hr($fh)) {
push @data, $row;
}
close $fh;
# Write JSON
my $json = JSON::XS->new->utf8->pretty->canonical;
open my $out, '>:encoding(utf8)', $json_file or die $!;
print $out $json->encode(\@data);
close $out;
say "Converted " . scalar(@data) . " records";
}
# Usage
csv_to_json('data.csv', 'data.json');
```
## File I/O Best Practices
1. **Always use lexical filehandles** - They auto-close and are scoped
2. **Use three-argument open** - Safer and clearer
3. **Check return values** - Or use autodie
4. **Consider memory usage** - Don't slurp huge files
5. **Use Path::Tiny for modern code** - It handles many edge cases
6. **Lock files when needed** - Prevent corruption in concurrent access
7. **Make writes atomic** - Write to temp file, then rename
8. **Handle encodings explicitly** - Use :encoding(UTF-8) layer
9. **Clean up temp files** - Use File::Temp or END blocks
10. **Validate file paths** - Never trust user input for file names
## Common Gotchas
### The Newline Problem
```perl
# Problem: Forgetting chomp
while (<$fh>) {
push @lines, $_; # Includes newlines!
}
# Solution
while (<$fh>) {
chomp;
push @lines, $_;
}
```
### The Encoding Trap
```perl
# Problem: Mojibake (garbled characters)
open my $fh, '<', 'utf8_file.txt';
# Solution: Specify encoding
open my $fh, '<:encoding(UTF-8)', 'utf8_file.txt';
```
### The Buffering Issue
```perl
# Problem: Output doesn't appear immediately
print $log_fh "Important message";
# Program crashes, message never written!
# Solution: Disable buffering
$log_fh->autoflush(1);
# Or
select($log_fh); $| = 1; select(STDOUT);
```
## Wrapping Up
File I/O in Perl is both powerful and pragmatic. The language gives you high-level conveniences (slurping, Path::Tiny) and low-level control (seek, sysread) in equal measure. This flexibility is why Perl remains indispensable for system administration tasks.
Remember: files are just streams of bytes. Perl gives you dozens of ways to read, write, and manipulate those bytes. Choose the right tool for your specific task, and always consider edge cases like file locking, encoding, and error handling.
---
*Next up: Advanced text processing. We'll go beyond basic regex to explore Perl's sophisticated text manipulation capabilities, including parsing structured formats, template processing, and building your own mini-languages.*

View File

@@ -0,0 +1,782 @@
# Chapter 7: Advanced Text Processing
> "Perl is the text surgeon's scalpel, awk is a butter knife, and sed is a club." - Randal Schwartz
You've mastered regular expressions. You can read and write files. Now let's combine these skills to do what Perl does best: transform text in ways that would make other languages weep. This chapter covers advanced parsing techniques, text generation, format conversion, and the dark art of writing your own mini-languages.
## Beyond Simple Matching: Parse::RecDescent
Sometimes regex isn't enough. When you need to parse complex, nested structures, you need a real parser:
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use Parse::RecDescent;
# Define a grammar for simple arithmetic
my $grammar = q{
expression: term(s /[+-]/) {
my $result = shift @{$item[1]};
while (@{$item[1]}) {
my $op = shift @{$item[1]};
my $val = shift @{$item[1]};
$result = $op eq '+' ? $result + $val : $result - $val;
}
$result;
}
term: factor(s /[*\/]/) {
my $result = shift @{$item[1]};
while (@{$item[1]}) {
my $op = shift @{$item[1]};
my $val = shift @{$item[1]};
$result = $op eq '*' ? $result * $val : $result / $val;
}
$result;
}
factor: number | '(' expression ')' { $item[2] }
number: /\d+(\.\d+)?/ { $item[1] }
};
my $parser = Parse::RecDescent->new($grammar);
# Test it
my @tests = (
"2 + 3",
"2 + 3 * 4",
"(2 + 3) * 4",
"10 / 2 - 3",
);
for my $expr (@tests) {
my $result = $parser->expression($expr);
say "$expr = $result";
}
```
## Text Tables: Making Data Pretty
### Using Text::Table
```perl
use Text::Table;
# Create a formatted table
my $table = Text::Table->new(
"Server\n&left",
"Status\n&center",
"CPU %\n&right",
"Memory\n&right",
"Uptime\n&right"
);
# Add data
my @data = (
['web01', 'Running', '45%', '2.3GB', '45 days'],
['web02', 'Running', '67%', '3.1GB', '45 days'],
['db01', 'Warning', '89%', '7.8GB', '12 days'],
['cache01', 'Down', 'N/A', 'N/A', 'N/A'],
);
$table->load(@data);
# Print with rules
print $table->rule('-', '+');
print $table->title;
print $table->rule('-', '+');
print $table->body;
print $table->rule('-', '+');
```
### Creating ASCII Art Tables
```perl
sub create_ascii_table {
my ($headers, $rows) = @_;
# Calculate column widths
my @widths;
for my $i (0..$#$headers) {
$widths[$i] = length($headers->[$i]);
for my $row (@$rows) {
my $len = length($row->[$i] // '');
$widths[$i] = $len if $len > $widths[$i];
}
}
# Build table
my $separator = '+' . join('+', map { '-' x ($_ + 2) } @widths) . '+';
my $format = '| ' . join(' | ', map { "%-${_}s" } @widths) . ' |';
# Print table
say $separator;
printf "$format\n", @$headers;
say $separator;
for my $row (@$rows) {
printf "$format\n", map { $_ // '' } @$row;
}
say $separator;
}
# Usage
create_ascii_table(
['Name', 'Age', 'City'],
[
['Alice', 30, 'New York'],
['Bob', 25, 'Los Angeles'],
['Carol', 35, 'Chicago'],
]
);
```
## Template Processing
### Quick and Dirty Templates
```perl
# Simple variable substitution
sub process_template {
my ($template, $vars) = @_;
$template =~ s/\{\{(\w+)\}\}/$vars->{$1} // ''/ge;
return $template;
}
my $template = <<'END';
Dear {{name}},
Your server {{server}} is currently {{status}}.
CPU usage: {{cpu}}%
Memory usage: {{memory}}%
Please take appropriate action.
Regards,
Monitoring System
END
my $output = process_template($template, {
name => 'Admin',
server => 'web01',
status => 'critical',
cpu => 95,
memory => 87,
});
print $output;
```
### Template Toolkit (Professional Templates)
```perl
use Template;
my $tt = Template->new({
INCLUDE_PATH => './templates',
INTERPOLATE => 1,
});
my $template = <<'END';
[% FOREACH server IN servers %]
Server: [% server.name %]
Status: [% server.status %]
Services:
[% FOREACH service IN server.services %]
- [% service %]: [% server.service_status.$service %]
[% END %]
[% END %]
Summary:
Total servers: [% servers.size %]
Running: [% servers.grep('^status', 'running').size %]
Issues: [% servers.grep('^status', 'warning|critical').size %]
END
my $vars = {
servers => [
{
name => 'web01',
status => 'running',
services => ['nginx', 'php-fpm'],
service_status => {
'nginx' => 'active',
'php-fpm' => 'active',
},
},
{
name => 'db01',
status => 'warning',
services => ['mysql'],
service_status => {
'mysql' => 'degraded',
},
},
],
};
$tt->process(\$template, $vars) or die $tt->error;
```
## Parsing Structured Text Formats
### Parsing Configuration Files
```perl
# Parse Apache-style config
sub parse_apache_config {
my ($filename) = @_;
my %config;
my @context_stack;
open my $fh, '<', $filename or die $!;
while (<$fh>) {
chomp;
s/^\s+|\s+$//g; # Trim
next if /^#/ || /^$/; # Skip comments and blanks
# Directive with value
if (/^(\w+)\s+(.+)$/) {
my ($directive, $value) = ($1, $2);
$value =~ s/^["']|["']$//g; # Remove quotes
if (@context_stack) {
# Inside a context
my $ref = \%config;
$ref = $ref->{$_} for @context_stack;
push @{$ref->{$directive}}, $value;
} else {
push @{$config{$directive}}, $value;
}
}
# Context start
elsif (/^<(\w+)(?:\s+(.+))?>$/) {
my ($context, $param) = ($1, $2);
push @context_stack, "$context:$param";
my $ref = \%config;
$ref = $ref->{$_} for @context_stack;
$ref = {};
}
# Context end
elsif (/^<\/(\w+)>$/) {
pop @context_stack;
}
}
close $fh;
return \%config;
}
```
### Parsing Fixed-Width Records
```perl
# Parse mainframe-style fixed-width data
sub parse_fixed_width {
my ($filename, $layout) = @_;
my @records;
open my $fh, '<', $filename or die $!;
while (my $line = <$fh>) {
chomp $line;
my %record;
for my $field (@$layout) {
my ($name, $start, $length) = @$field;
$record{$name} = substr($line, $start - 1, $length);
$record{$name} =~ s/^\s+|\s+$//g; # Trim
}
push @records, \%record;
}
close $fh;
return \@records;
}
# Define layout
my $layout = [
['id', 1, 5],
['name', 6, 20],
['dept', 26, 15],
['salary', 41, 10],
['hired', 51, 10],
];
my $records = parse_fixed_width('employees.dat', $layout);
```
## Text Differences and Patching
### Finding Differences
```perl
use Text::Diff;
my $diff = diff('file1.txt', 'file2.txt', { STYLE => 'Unified' });
print $diff;
# Or compare strings
my $old = "Line 1\nLine 2\nLine 3\n";
my $new = "Line 1\nLine 2 modified\nLine 3\nLine 4\n";
my $diff = diff(\$old, \$new, { STYLE => 'Context' });
print $diff;
# Custom diff with Algorithm::Diff
use Algorithm::Diff qw(sdiff);
my @old = split /\n/, $old;
my @new = split /\n/, $new;
my @diff = sdiff(\@old, \@new);
for my $change (@diff) {
my ($flag, $old_line, $new_line) = @$change;
if ($flag eq 'u') {
say " $old_line"; # Unchanged
} elsif ($flag eq 'c') {
say "- $old_line"; # Changed from
say "+ $new_line"; # Changed to
} elsif ($flag eq '-') {
say "- $old_line"; # Deleted
} elsif ($flag eq '+') {
say "+ $new_line"; # Added
}
}
```
## Creating Domain-Specific Languages (DSLs)
### A Simple Query Language
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Define a simple query DSL
package QueryDSL;
sub new {
my ($class) = @_;
return bless { conditions => [] }, $class;
}
sub where {
my ($self, $field) = @_;
$self->{current_field} = $field;
return $self;
}
sub equals {
my ($self, $value) = @_;
push @{$self->{conditions}}, {
field => $self->{current_field},
op => '=',
value => $value,
};
return $self;
}
sub greater_than {
my ($self, $value) = @_;
push @{$self->{conditions}}, {
field => $self->{current_field},
op => '>',
value => $value,
};
return $self;
}
sub and {
my ($self) = @_;
$self->{last_conjunction} = 'AND';
return $self;
}
sub to_sql {
my ($self) = @_;
my @parts;
for my $cond (@{$self->{conditions}}) {
push @parts, "$cond->{field} $cond->{op} '$cond->{value}'";
}
return 'WHERE ' . join(' AND ', @parts);
}
package main;
# Use the DSL
my $query = QueryDSL->new()
->where('status')->equals('active')
->and
->where('age')->greater_than(18);
say $query->to_sql(); # WHERE status = 'active' AND age > '18'
```
### A Configuration DSL
```perl
# Create a readable configuration DSL
package ConfigDSL;
use Modern::Perl '2023';
our %CONFIG;
sub server($&) {
my ($name, $block) = @_;
local $CONFIG{_current_server} = $name;
$CONFIG{servers}{$name} = {};
$block->();
}
sub host($) {
my ($hostname) = @_;
my $server = $CONFIG{_current_server};
$CONFIG{servers}{$server}{host} = $hostname;
}
sub port($) {
my ($port) = @_;
my $server = $CONFIG{_current_server};
$CONFIG{servers}{$server}{port} = $port;
}
sub service($) {
my ($service) = @_;
my $server = $CONFIG{_current_server};
push @{$CONFIG{servers}{$server}{services}}, $service;
}
sub import {
my $caller = caller;
no strict 'refs';
*{"${caller}::server"} = \&server;
*{"${caller}::host"} = \&host;
*{"${caller}::port"} = \&port;
*{"${caller}::service"} = \&service;
}
package main;
use ConfigDSL;
# Now we can write config like this:
server 'web01' => sub {
host 'web01.example.com';
port 8080;
service 'nginx';
service 'php-fpm';
};
server 'db01' => sub {
host 'db01.example.com';
port 3306;
service 'mysql';
};
# Access the config
use Data::Dumper;
print Dumper(\%ConfigDSL::CONFIG);
```
## Text Analysis and Statistics
### Word Frequency Analysis
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
sub analyze_text {
my ($text) = @_;
# Basic statistics
my $char_count = length($text);
my $line_count = ($text =~ tr/\n//) + 1;
my @sentences = split /[.!?]+/, $text;
my $sentence_count = @sentences;
# Word frequency
my %word_freq;
my $word_count = 0;
while ($text =~ /\b(\w+)\b/g) {
my $word = lc($1);
$word_freq{$word}++;
$word_count++;
}
# Calculate readability (Flesch Reading Ease approximation)
my $avg_sentence_length = $word_count / ($sentence_count || 1);
my $syllable_count = estimate_syllables($text);
my $avg_syllables = $syllable_count / ($word_count || 1);
my $flesch = 206.835
- 1.015 * $avg_sentence_length
- 84.6 * $avg_syllables;
return {
characters => $char_count,
lines => $line_count,
sentences => $sentence_count,
words => $word_count,
unique_words => scalar(keys %word_freq),
avg_word_length => $char_count / ($word_count || 1),
readability => $flesch,
top_words => get_top_words(\%word_freq, 10),
};
}
sub estimate_syllables {
my ($text) = @_;
my $count = 0;
while ($text =~ /\b(\w+)\b/g) {
my $word = lc($1);
# Simple estimation: count vowel groups
my $syllables = () = $word =~ /[aeiou]+/g;
$syllables = 1 if $syllables == 0;
$count += $syllables;
}
return $count;
}
sub get_top_words {
my ($freq, $n) = @_;
my @sorted = sort { $freq->{$b} <=> $freq->{$a} } keys %$freq;
return [ map { { word => $_, count => $freq->{$_} } }
@sorted[0..min($n-1, $#sorted)] ];
}
sub min { $_[0] < $_[1] ? $_[0] : $_[1] }
# Test it
my $sample = <<'END';
Perl is a high-level, general-purpose programming language.
It was originally developed by Larry Wall in 1987. Perl is
known for its text processing capabilities and is often
called the "Swiss Army chainsaw" of scripting languages.
END
my $stats = analyze_text($sample);
use Data::Dumper;
print Dumper($stats);
```
## Advanced String Manipulation
### Levenshtein Distance (Edit Distance)
```perl
use Text::Levenshtein qw(distance);
# Find similar strings
sub find_similar {
my ($target, $candidates, $threshold) = @_;
$threshold //= 3;
my @similar;
for my $candidate (@$candidates) {
my $dist = distance($target, $candidate);
push @similar, { string => $candidate, distance => $dist }
if $dist <= $threshold;
}
return [ sort { $a->{distance} <=> $b->{distance} } @similar ];
}
my @commands = qw(start stop restart status enable disable);
my $user_input = 'statsu'; # Typo
my $similar = find_similar($user_input, \@commands);
if (@$similar) {
say "Did you mean: " . $similar->[0]{string} . "?";
}
```
### Fuzzy String Matching
```perl
use String::Approx qw(amatch);
# Find approximate matches
my @files = glob("*.txt");
my $pattern = 'confg'; # Looking for 'config'
my @matches = amatch($pattern, ['i', '10%'], @files);
say "Possible matches for '$pattern':";
say " $_" for @matches;
# Custom fuzzy search
sub fuzzy_grep {
my ($pattern, $list, $tolerance) = @_;
$tolerance //= 0.2; # 20% difference allowed
my $max_dist = int(length($pattern) * $tolerance);
return find_similar($pattern, $list, $max_dist);
}
```
## Practical Example: Log Analysis Pipeline
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use feature 'signatures';
no warnings 'experimental::signatures';
# Pluggable log analysis pipeline
package LogPipeline;
sub new($class) {
return bless {
filters => [],
extractors => [],
aggregators => [],
}, $class;
}
sub add_filter($self, $filter) {
push @{$self->{filters}}, $filter;
return $self;
}
sub add_extractor($self, $extractor) {
push @{$self->{extractors}}, $extractor;
return $self;
}
sub add_aggregator($self, $aggregator) {
push @{$self->{aggregators}}, $aggregator;
return $self;
}
sub process($self, $filename) {
my @records;
open my $fh, '<', $filename or die $!;
LINE: while (my $line = <$fh>) {
chomp $line;
# Apply filters
for my $filter (@{$self->{filters}}) {
next LINE unless $filter->($line);
}
# Extract data
my %record;
for my $extractor (@{$self->{extractors}}) {
my $data = $extractor->($line);
%record = (%record, %$data) if $data;
}
push @records, \%record if %record;
}
close $fh;
# Aggregate results
my %results;
for my $aggregator (@{$self->{aggregators}}) {
my ($name, $value) = $aggregator->(\@records);
$results{$name} = $value;
}
return \%results;
}
package main;
# Create pipeline
my $pipeline = LogPipeline->new();
# Add filters
$pipeline->add_filter(sub($line) {
return $line !~ /^#/; # Skip comments
});
$pipeline->add_filter(sub($line) {
return $line =~ /ERROR|WARNING/; # Only errors and warnings
});
# Add extractors
$pipeline->add_extractor(sub($line) {
if ($line =~ /^(\S+)\s+(\S+)\s+\[([^\]]+)\]\s+(.+)/) {
return {
date => $1,
time => $2,
level => $3,
message => $4,
};
}
return undef;
});
$pipeline->add_extractor(sub($line) {
if ($line =~ /user[_\s]+(\w+)/i) {
return { user => $1 };
}
return undef;
});
# Add aggregators
$pipeline->add_aggregator(sub($records) {
return ('total_errors', scalar(@$records));
});
$pipeline->add_aggregator(sub($records) {
my %by_level;
$by_level{$_->{level}}++ for @$records;
return ('by_level', \%by_level);
});
$pipeline->add_aggregator(sub($records) {
my %by_user;
$by_user{$_->{user}}++ for grep { $_->{user} } @$records;
return ('by_user', \%by_user);
});
# Process log file
my $results = $pipeline->process('application.log');
# Display results
say "Log Analysis Results:";
say "Total errors/warnings: $results->{total_errors}";
say "\nBy level:";
for my $level (sort keys %{$results->{by_level}}) {
say " $level: $results->{by_level}{$level}";
}
say "\nBy user:";
for my $user (sort keys %{$results->{by_user}}) {
say " $user: $results->{by_user}{$user}";
}
```
## Performance Tips for Text Processing
1. **Compile regexes once** - Use qr// for repeated patterns
2. **Avoid unnecessary captures** - Use (?:...) for grouping
3. **Process line by line** - Don't slurp huge files unless necessary
4. **Use index() for simple searches** - It's faster than regex for literals
5. **Consider Text::CSV_XS** - Much faster than pure Perl CSV parsing
6. **Profile your code** - Use Devel::NYTProf to find bottlenecks
7. **Use state variables** - For data that persists between function calls
8. **Benchmark alternatives** - Sometimes split is faster than regex
## Best Practices
1. **Make parsers modular** - Separate lexing, parsing, and semantic analysis
2. **Handle edge cases** - Empty input, malformed data, encoding issues
3. **Provide useful error messages** - Include line numbers and context
4. **Document your grammars** - Especially for complex parsers
5. **Test with real data** - Synthetic test data often misses edge cases
6. **Consider existing modules** - CPAN likely has what you need
7. **Use the right tool** - Not everything needs a full parser
## Conclusion
Advanced text processing is where Perl truly shines. Whether you're parsing complex formats, generating reports, or building your own languages, Perl provides the tools to do it elegantly and efficiently. The key is knowing which tool to use for each job.
Remember: text processing isn't just about regex. It's about understanding structure, extracting meaning, and transforming data. Perl gives you the flexibility to approach each problem in the way that makes most sense.
---
*Next: Working with structured data formats. We'll explore how Perl handles CSV, JSON, and XML, turning messy data into actionable information.*

View File

@@ -0,0 +1,748 @@
# Chapter 8: Working with CSV, JSON, and XML
> "Data formats are like opinions - everyone has one, and they all stink except yours." - DevOps Proverb
In a perfect world, all data would be in one format. In reality, you're juggling CSV exports from Excel, JSON from REST APIs, and XML from that enterprise system installed in 2003. Perl doesn't judge—it parses them all. This chapter shows you how to read, write, and transform between these formats without losing your sanity.
## CSV: The Deceptively Simple Format
CSV looks simple. It's just commas and values, right? Wrong. There are quotes, escaped quotes, embedded newlines, different delimiters, encodings, and Excel's creative interpretations. Don't write your own CSV parser. Use Text::CSV.
### Reading CSV Files
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use Text::CSV;
# Basic CSV reading
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
open my $fh, '<:encoding(utf8)', 'data.csv' or die $!;
# Read header row
my $headers = $csv->getline($fh);
$csv->column_names(@$headers);
# Read data rows as hashrefs
while (my $row = $csv->getline_hr($fh)) {
say "Processing user: $row->{username}";
say " Email: $row->{email}";
say " Status: $row->{status}";
}
close $fh;
# Or slurp all at once
sub read_csv {
my ($filename) = @_;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
sep_char => ',', # Could be ';' or '\t' for TSV
quote_char => '"',
escape_char => '"',
allow_loose_quotes => 1,
allow_loose_escapes => 1,
});
open my $fh, '<:encoding(utf8)', $filename or die $!;
# Read all rows
my $rows = $csv->getline_all($fh);
close $fh;
# Convert to array of hashrefs
my $headers = shift @$rows;
my @data = map {
my %hash;
@hash{@$headers} = @$_;
\%hash;
} @$rows;
return \@data;
}
```
### Writing CSV Files
```perl
# Write CSV with proper escaping
sub write_csv {
my ($filename, $data, $headers) = @_;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
eol => "\n", # Auto line endings
});
open my $fh, '>:encoding(utf8)', $filename or die $!;
# Write headers if provided
if ($headers) {
$csv->say($fh, $headers);
} elsif (@$data && ref $data->[0] eq 'HASH') {
# Extract headers from first hashref
my @headers = sort keys %{$data->[0]};
$csv->say($fh, \@headers);
# Write data rows
for my $row (@$data) {
$csv->say($fh, [@$row{@headers}]);
}
} else {
# Array of arrays
$csv->say($fh, $_) for @$data;
}
close $fh;
}
# Example: Generate report
my @report_data = (
{ date => '2024-01-15', sales => 1500, region => 'North' },
{ date => '2024-01-15', sales => 2300, region => 'South' },
{ date => '2024-01-16', sales => 1800, region => 'North' },
);
write_csv('sales_report.csv', \@report_data);
```
### Handling Problematic CSV
```perl
# Deal with Excel's CSV quirks
sub parse_excel_csv {
my ($filename) = @_;
# Excel likes to add BOM
open my $fh, '<:encoding(utf8)', $filename or die $!;
my $first_line = <$fh>;
# Remove BOM if present
$first_line =~ s/^\x{FEFF}//;
seek($fh, 0, 0); # Rewind
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
# Excel sometimes uses semicolons in some locales
sep_char => index($first_line, ';') > -1 ? ';' : ',',
allow_whitespace => 1, # Handle extra spaces
blank_is_undef => 1, # Empty cells become undef
});
# Process file...
}
# Handle malformed CSV
sub parse_dirty_csv {
my ($filename) = @_;
my @clean_data;
open my $fh, '<', $filename or die $!;
while (my $line = <$fh>) {
chomp $line;
# Try to parse with Text::CSV first
my $csv = Text::CSV->new({ binary => 1 });
if ($csv->parse($line)) {
push @clean_data, [$csv->fields];
} else {
# Fallback to manual parsing for broken lines
warn "Failed to parse line $.: " . $csv->error_input;
# Simple split (dangerous but sometimes necessary)
my @fields = split /,/, $line;
s/^\s+|\s+$//g for @fields; # Trim
push @clean_data, \@fields;
}
}
close $fh;
return \@clean_data;
}
```
## JSON: The Modern Standard
JSON is everywhere. REST APIs speak it, configuration files use it, and JavaScript loves it. Perl handles JSON beautifully with JSON::XS (fast) or JSON::PP (pure Perl).
### Reading and Writing JSON
```perl
use JSON::XS;
# Create JSON encoder/decoder
my $json = JSON::XS->new->utf8->pretty->canonical;
# Decode JSON
my $json_text = '{"name":"Alice","age":30,"active":true}';
my $data = $json->decode($json_text);
say "Name: $data->{name}";
# Encode to JSON
my $perl_data = {
servers => [
{ name => 'web01', ip => '10.0.1.1', status => 'running' },
{ name => 'web02', ip => '10.0.1.2', status => 'running' },
{ name => 'db01', ip => '10.0.2.1', status => 'stopped' },
],
updated => time(),
version => '1.0',
};
my $json_output = $json->encode($perl_data);
print $json_output;
# File operations
sub read_json {
my ($filename) = @_;
open my $fh, '<:encoding(utf8)', $filename or die $!;
local $/; # Slurp mode
my $json_text = <$fh>;
close $fh;
return decode_json($json_text); # Using functional interface
}
sub write_json {
my ($filename, $data) = @_;
my $json = JSON::XS->new->utf8->pretty->canonical;
open my $fh, '>:encoding(utf8)', $filename or die $!;
print $fh $json->encode($data);
close $fh;
}
```
### Advanced JSON Handling
```perl
# Custom JSON encoder settings
my $json = JSON::XS->new
->utf8(1) # Encode/decode UTF-8
->pretty(1) # Human-readable output
->canonical(1) # Sort keys for consistent output
->allow_nonref(1) # Allow non-reference values
->allow_blessed(1) # Allow blessed references
->convert_blessed(1) # Call TO_JSON method on objects
->max_depth(512) # Prevent deep recursion
->max_size(10_000_000); # 10MB max
# Handle special values
my $data = {
string => "Hello",
number => 42,
float => 3.14,
true => \1, # JSON::XS::true
false => \0, # JSON::XS::false
null => undef,
unicode => "Hello 世界 🌍",
};
# Streaming JSON parser for large files
sub process_json_stream {
my ($filename, $callback) = @_;
open my $fh, '<:encoding(utf8)', $filename or die $!;
my $json = JSON::XS->new->utf8;
my $parser = JSON::XS->new->utf8->incremental;
while (my $chunk = <$fh>) {
$parser->incr_parse($chunk);
# Extract complete JSON objects
while (my $obj = $parser->incr_parse) {
$callback->($obj);
}
}
close $fh;
}
# JSON with schema validation
use JSON::Validator;
my $validator = JSON::Validator->new;
$validator->schema({
type => 'object',
required => ['name', 'email'],
properties => {
name => { type => 'string', minLength => 1 },
email => { type => 'string', format => 'email' },
age => { type => 'integer', minimum => 0, maximum => 150 },
},
});
my @errors = $validator->validate({
name => 'Alice',
email => 'alice@example.com',
age => 30,
});
die "Validation failed: @errors" if @errors;
```
### JSON Transformations
```perl
# Convert between JSON and other formats
sub csv_to_json {
my ($csv_file, $json_file) = @_;
# Read CSV
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
open my $fh, '<:encoding(utf8)', $csv_file or die $!;
my $headers = $csv->getline($fh);
$csv->column_names(@$headers);
my @data;
while (my $row = $csv->getline_hr($fh)) {
push @data, $row;
}
close $fh;
# Write JSON
write_json($json_file, \@data);
}
# Transform JSON structure
sub transform_json {
my ($data) = @_;
# Flatten nested structure
my @flattened;
for my $user (@{$data->{users}}) {
for my $order (@{$user->{orders}}) {
push @flattened, {
user_id => $user->{id},
user_name => $user->{name},
order_id => $order->{id},
order_total => $order->{total},
order_date => $order->{date},
};
}
}
return \@flattened;
}
```
## XML: The Enterprise Format
XML is verbose, complex, and everywhere in enterprise systems. Perl has excellent XML support through various modules. We'll focus on XML::LibXML (fast and standards-compliant) and XML::Simple (for simple cases).
### XML::Simple for Basic Tasks
```perl
use XML::Simple;
use Data::Dumper;
# Read simple XML
my $xml_text = <<'XML';
<config>
<server name="web01" ip="10.0.1.1" status="active"/>
<server name="web02" ip="10.0.1.2" status="active"/>
<server name="db01" ip="10.0.2.1" status="maintenance"/>
</config>
XML
my $data = XMLin($xml_text,
ForceArray => ['server'], # Always make array
KeyAttr => { server => 'name' }, # Use 'name' as hash key
);
print Dumper($data);
# Write XML
my $output = XMLout($data,
RootName => 'config',
NoAttr => 0, # Use attributes
XMLDecl => '<?xml version="1.0" encoding="UTF-8"?>',
);
print $output;
```
### XML::LibXML for Serious XML Work
```perl
use XML::LibXML;
# Parse XML
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml_text);
# Or parse from file
my $doc = $parser->parse_file('config.xml');
# XPath queries
my @servers = $doc->findnodes('//server[@status="active"]');
for my $server (@servers) {
say "Active server: " . $server->getAttribute('name');
say " IP: " . $server->getAttribute('ip');
}
# Modify XML
my ($db_server) = $doc->findnodes('//server[@name="db01"]');
$db_server->setAttribute('status', 'active');
$db_server->appendTextChild('note', 'Maintenance completed');
# Create new XML
my $new_doc = XML::LibXML::Document->new('1.0', 'UTF-8');
my $root = $new_doc->createElement('inventory');
$new_doc->setDocumentElement($root);
for my $item (@inventory) {
my $elem = $new_doc->createElement('item');
$elem->setAttribute('id', $item->{id});
$elem->setAttribute('quantity', $item->{quantity});
$elem->appendTextChild('name', $item->{name});
$elem->appendTextChild('price', $item->{price});
$root->appendChild($elem);
}
print $new_doc->toString(1); # Pretty print
```
### Parsing Complex XML
```perl
# Parse XML with namespaces
my $xml_with_ns = <<'XML';
<root xmlns:app="http://example.com/app">
<app:user id="1">
<app:name>Alice</app:name>
<app:email>alice@example.com</app:email>
</app:user>
</root>
XML
my $doc = $parser->parse_string($xml_with_ns);
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs('app', 'http://example.com/app');
my @users = $xpc->findnodes('//app:user');
for my $user (@users) {
my $name = $xpc->findvalue('app:name', $user);
my $email = $xpc->findvalue('app:email', $user);
say "User: $name <$email>";
}
# Stream parsing for large XML
sub parse_large_xml {
my ($filename, $callback) = @_;
my $reader = XML::LibXML::Reader->new(location => $filename)
or die "Cannot read $filename";
while ($reader->read) {
if ($reader->nodeType == XML_READER_TYPE_ELEMENT
&& $reader->name eq 'record') {
my $node = $reader->copyCurrentNode(1);
$callback->($node);
}
}
}
# Validate against XSD
sub validate_xml {
my ($xml_file, $xsd_file) = @_;
my $schema_doc = $parser->parse_file($xsd_file);
my $schema = XML::LibXML::Schema->new(string => $schema_doc);
my $doc = $parser->parse_file($xml_file);
eval { $schema->validate($doc) };
if ($@) {
die "Validation failed: $@";
}
say "XML is valid!";
}
```
## Format Conversion Utilities
### Universal Data Converter
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use feature 'signatures';
no warnings 'experimental::signatures';
package DataConverter;
use Text::CSV;
use JSON::XS;
use XML::Simple;
use YAML::XS;
sub new($class) {
return bless {}, $class;
}
sub convert($self, $input_file, $output_file, $from_format, $to_format) {
my $data = $self->read($input_file, $from_format);
$self->write($output_file, $data, $to_format);
}
sub read($self, $file, $format) {
my $method = "read_$format";
die "Unknown format: $format" unless $self->can($method);
return $self->$method($file);
}
sub write($self, $file, $data, $format) {
my $method = "write_$format";
die "Unknown format: $format" unless $self->can($method);
return $self->$method($file, $data);
}
sub read_json($self, $file) {
open my $fh, '<:encoding(utf8)', $file or die $!;
local $/;
my $json = <$fh>;
close $fh;
return decode_json($json);
}
sub write_json($self, $file, $data) {
my $json = JSON::XS->new->utf8->pretty->canonical;
open my $fh, '>:encoding(utf8)', $file or die $!;
print $fh $json->encode($data);
close $fh;
}
sub read_csv($self, $file) {
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1 });
open my $fh, '<:encoding(utf8)', $file or die $!;
my $headers = $csv->getline($fh);
$csv->column_names(@$headers);
my @data;
while (my $row = $csv->getline_hr($fh)) {
push @data, $row;
}
close $fh;
return \@data;
}
sub write_csv($self, $file, $data) {
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, eol => "\n" });
open my $fh, '>:encoding(utf8)', $file or die $!;
# Assume array of hashrefs
if (@$data && ref $data->[0] eq 'HASH') {
my @headers = sort keys %{$data->[0]};
$csv->say($fh, \@headers);
for my $row (@$data) {
$csv->say($fh, [@$row{@headers}]);
}
}
close $fh;
}
sub read_xml($self, $file) {
return XMLin($file,
ForceArray => 1,
KeyAttr => [],
);
}
sub write_xml($self, $file, $data) {
my $xml = XMLout($data,
RootName => 'data',
XMLDecl => '<?xml version="1.0" encoding="UTF-8"?>',
);
open my $fh, '>:encoding(utf8)', $file or die $!;
print $fh $xml;
close $fh;
}
sub read_yaml($self, $file) {
return YAML::XS::LoadFile($file);
}
sub write_yaml($self, $file, $data) {
YAML::XS::DumpFile($file, $data);
}
package main;
# Use the converter
my $converter = DataConverter->new();
# Convert CSV to JSON
$converter->convert('data.csv', 'data.json', 'csv', 'json');
# Convert JSON to XML
$converter->convert('data.json', 'data.xml', 'json', 'xml');
# Convert XML to YAML
$converter->convert('data.xml', 'data.yaml', 'xml', 'yaml');
```
## Real-World Example: API Data Pipeline
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use LWP::UserAgent;
use JSON::XS;
use Text::CSV;
use Try::Tiny;
# Fetch data from API, process, and export
sub api_data_pipeline {
my ($api_url, $output_file) = @_;
# Fetch from API
my $ua = LWP::UserAgent->new(timeout => 30);
my $response = $ua->get($api_url);
die "API request failed: " . $response->status_line
unless $response->is_success;
# Parse JSON response
my $data = decode_json($response->content);
# Transform data
my $processed = process_api_data($data);
# Export to multiple formats
export_data($processed, $output_file);
}
sub process_api_data {
my ($data) = @_;
my @processed;
for my $item (@{$data->{results}}) {
push @processed, {
id => $item->{id},
name => clean_text($item->{name}),
value => sprintf("%.2f", $item->{value} || 0),
timestamp => format_timestamp($item->{created_at}),
status => normalize_status($item->{status}),
};
}
return \@processed;
}
sub clean_text {
my ($text) = @_;
$text =~ s/^\s+|\s+$//g; # Trim
$text =~ s/\s+/ /g; # Normalize spaces
return $text;
}
sub format_timestamp {
my ($ts) = @_;
# Convert various timestamp formats to ISO 8601
# Implementation depends on input format
return $ts;
}
sub normalize_status {
my ($status) = @_;
my %status_map = (
active => 'active',
enabled => 'active',
disabled => 'inactive',
deleted => 'archived',
);
return $status_map{lc($status)} // 'unknown';
}
sub export_data {
my ($data, $base_filename) = @_;
# Export as JSON
my $json_file = "$base_filename.json";
write_json($json_file, $data);
say "Exported to $json_file";
# Export as CSV
my $csv_file = "$base_filename.csv";
write_csv($csv_file, $data);
say "Exported to $csv_file";
# Generate summary
generate_summary($data, "$base_filename.summary.txt");
}
sub generate_summary {
my ($data, $filename) = @_;
open my $fh, '>:encoding(utf8)', $filename or die $!;
my %stats = (
total => scalar(@$data),
by_status => {},
);
for my $item (@$data) {
$stats{by_status}{$item->{status}}++;
}
print $fh "Data Export Summary\n";
print $fh "=" x 40 . "\n";
print $fh "Total Records: $stats{total}\n\n";
print $fh "By Status:\n";
for my $status (sort keys %{$stats{by_status}}) {
printf $fh " %-15s: %d\n", $status, $stats{by_status}{$status};
}
close $fh;
say "Summary written to $filename";
}
```
## Performance Considerations
1. **Choose the right parser** - XML::LibXML is faster than XML::Simple
2. **Use streaming for large files** - Don't load 1GB XML into memory
3. **Prefer JSON::XS over JSON::PP** - 10-100x faster
4. **Cache parsed data** - Parse once, use many times
5. **Validate incrementally** - Don't wait until the end to find errors
6. **Use binary mode for CSV** - Handles all encodings properly
7. **Consider format limitations** - CSV can't represent nested data well
## Best Practices
1. **Always validate input** - Never trust external data
2. **Handle encoding explicitly** - UTF-8 everywhere
3. **Use appropriate error handling** - Malformed data is common
4. **Document format expectations** - Include sample files
5. **Test with real-world data** - Edge cases are everywhere
6. **Provide format detection** - Auto-detect when possible
7. **Keep original data** - Transform copies, not originals
8. **Version your schemas** - Data formats evolve
## Conclusion
Data comes in many formats, but Perl handles them all with aplomb. Whether you're parsing gigabytes of CSV, streaming JSON from APIs, or wrestling with enterprise XML, Perl has the tools you need. The key is choosing the right tool for each format and understanding its quirks.
Remember: every data format has edge cases. Plan for them, test for them, and handle them gracefully. Your future self will thank you when that "perfectly formatted" data file turns out to be anything but.
---
*Next: Log file analysis and monitoring. We'll build real-time log processors that can handle anything from Apache access logs to custom application outputs.*

View File

@@ -0,0 +1,854 @@
# Chapter 9: Log File Analysis and Monitoring
> "The difference between a good sysadmin and a great one? The great one automated their log analysis before the outage." - Anonymous
Logs are the heartbeat of your systems. They tell you what happened, when it happened, and sometimes even why. But with modern systems generating gigabytes of logs daily, manual analysis is impossible. This chapter shows you how to build log analysis tools that find needles in haystacks, detect anomalies, and alert you before things go wrong.
## Understanding Log Formats
### Common Log Formats
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use feature 'signatures';
no warnings 'experimental::signatures';
# Apache Combined Log Format
sub parse_apache_log($line) {
my $regex = qr/
^(\S+)\s+ # IP address
(\S+)\s+ # Identity
(\S+)\s+ # User
\[([^\]]+)\]\s+ # Timestamp
"([^"]+)"\s+ # Request
(\d{3})\s+ # Status code
(\d+|-)\s* # Size
"([^"]*)"\s* # Referrer
"([^"]*)" # User agent
/x;
if ($line =~ $regex) {
my ($method, $path, $protocol) = split /\s+/, $5;
return {
ip => $1,
identity => $2 eq '-' ? undef : $2,
user => $3 eq '-' ? undef : $3,
timestamp => $4,
method => $method,
path => $path,
protocol => $protocol,
status => $6,
size => $7 eq '-' ? 0 : $7,
referrer => $8 eq '-' ? undef : $8,
user_agent => $9,
};
}
return undef;
}
# Syslog Format
sub parse_syslog($line) {
my $regex = qr/
^(\w+\s+\d+\s+\d{2}:\d{2}:\d{2})\s+ # Timestamp
(\S+)\s+ # Hostname
([^:\[]+) # Program
(?:\[(\d+)\])? # PID (optional)
:\s* # Separator
(.+)$ # Message
/x;
if ($line =~ $regex) {
return {
timestamp => $1,
hostname => $2,
program => $3,
pid => $4,
message => $5,
};
}
return undef;
}
# JSON Logs (structured logging)
use JSON::XS;
sub parse_json_log($line) {
my $json = JSON::XS->new->utf8;
eval {
return $json->decode($line);
};
if ($@) {
warn "Failed to parse JSON log: $@";
return undef;
}
}
# Custom Application Logs
sub parse_custom_log($line, $pattern) {
if ($line =~ $pattern) {
return { %+ }; # Return named captures
}
return undef;
}
```
## Real-Time Log Monitoring
### Tail Follow Implementation
```perl
use File::Tail;
use IO::Select;
# Monitor single file
sub tail_file {
my ($filename, $callback) = @_;
my $file = File::Tail->new(
name => $filename,
interval => 1,
maxinterval => 5,
adjustafter => 10,
resetafter => 30,
tail => 0, # Start from end of file
);
while (defined(my $line = $file->read)) {
chomp $line;
$callback->($line);
}
}
# Monitor multiple files
sub tail_multiple {
my ($files, $callback) = @_;
my @tailfiles = map {
File::Tail->new(
name => $_,
interval => 1,
tail => 0,
)
} @$files;
while (1) {
my ($nfound, $timeleft, @pending) =
File::Tail::select(undef, undef, undef, 10, @tailfiles);
foreach my $file (@pending) {
my $line = $file->read;
chomp $line;
$callback->($file->{name}, $line);
}
}
}
# Real-time log processor
sub monitor_logs {
my ($logfile) = @_;
tail_file($logfile, sub {
my ($line) = @_;
# Parse log line
my $entry = parse_apache_log($line);
return unless $entry;
# Check for errors
if ($entry->{status} >= 500) {
alert("Server error: $entry->{path} returned $entry->{status}");
}
# Check for attacks
if ($entry->{path} =~ /\.\.[\/\\]|<script>|union\s+select/i) {
alert("Possible attack from $entry->{ip}: $entry->{path}");
}
# Track metrics
update_metrics($entry);
});
}
sub alert {
my ($message) = @_;
say "[ALERT] " . localtime() . " - $message";
# Could also send email, Slack message, etc.
}
```
## Log Analysis Patterns
### Error Detection and Classification
```perl
package LogAnalyzer;
use Modern::Perl '2023';
sub new {
my ($class) = @_;
return bless {
patterns => [],
stats => {},
alerts => [],
}, $class;
}
sub add_pattern {
my ($self, $name, $regex, $severity) = @_;
push @{$self->{patterns}}, {
name => $name,
regex => qr/$regex/i,
severity => $severity,
count => 0,
};
}
sub analyze_line {
my ($self, $line) = @_;
for my $pattern (@{$self->{patterns}}) {
if ($line =~ $pattern->{regex}) {
$pattern->{count}++;
$self->handle_match($pattern, $line);
last; # Only match first pattern
}
}
}
sub handle_match {
my ($self, $pattern, $line) = @_;
# Track statistics
$self->{stats}{$pattern->{name}}++;
$self->{stats}{by_severity}{$pattern->{severity}}++;
# Generate alerts for high severity
if ($pattern->{severity} >= 3) {
push @{$self->{alerts}}, {
timestamp => time(),
pattern => $pattern->{name},
severity => $pattern->{severity},
line => $line,
};
}
}
sub get_summary {
my ($self) = @_;
return {
patterns => [
map { {
name => $_->{name},
count => $_->{count},
severity => $_->{severity},
} } @{$self->{patterns}}
],
stats => $self->{stats},
alerts => $self->{alerts},
};
}
# Usage
my $analyzer = LogAnalyzer->new();
# Define patterns
$analyzer->add_pattern('out_of_memory', 'out of memory|OOM', 5);
$analyzer->add_pattern('disk_full', 'no space left|disk full', 4);
$analyzer->add_pattern('connection_failed', 'connection refused|timeout', 3);
$analyzer->add_pattern('auth_failure', 'authentication failed|invalid password', 2);
$analyzer->add_pattern('not_found', '404|not found', 1);
# Process logs
open my $fh, '<', 'application.log' or die $!;
while (<$fh>) {
$analyzer->analyze_line($_);
}
close $fh;
# Get results
my $summary = $analyzer->get_summary();
```
### Anomaly Detection
```perl
# Statistical anomaly detection
package AnomalyDetector;
use Statistics::Descriptive;
sub new {
my ($class, $window_size) = @_;
return bless {
window_size => $window_size // 100,
values => [],
stats => Statistics::Descriptive::Full->new(),
}, $class;
}
sub add_value {
my ($self, $value) = @_;
push @{$self->{values}}, $value;
# Maintain window
if (@{$self->{values}} > $self->{window_size}) {
shift @{$self->{values}};
}
# Update statistics
$self->{stats}->clear();
$self->{stats}->add_data(@{$self->{values}});
}
sub is_anomaly {
my ($self, $value, $threshold) = @_;
$threshold //= 3; # Default to 3 standard deviations
return 0 if @{$self->{values}} < 10; # Need minimum data
my $mean = $self->{stats}->mean();
my $stddev = $self->{stats}->standard_deviation();
return 0 if $stddev == 0; # No variation
my $z_score = abs(($value - $mean) / $stddev);
return $z_score > $threshold;
}
# Detect unusual request rates
sub monitor_request_rate {
my ($logfile) = @_;
my $detector = AnomalyDetector->new(60); # 60-minute window
my %requests_per_minute;
tail_file($logfile, sub {
my ($line) = @_;
my $entry = parse_apache_log($line);
return unless $entry;
# Count requests per minute
my $minute = substr($entry->{timestamp}, 0, 16); # Trim seconds
$requests_per_minute{$minute}++;
# Check for anomalies every minute
state $last_minute = '';
if ($minute ne $last_minute && $last_minute) {
my $count = $requests_per_minute{$last_minute} // 0;
$detector->add_value($count);
if ($detector->is_anomaly($count)) {
alert("Anomaly detected: $count requests in $last_minute");
}
delete $requests_per_minute{$last_minute};
}
$last_minute = $minute;
});
}
```
## Log Aggregation and Reporting
### Multi-File Log Aggregation
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use File::Find;
use IO::Uncompress::Gunzip qw(gunzip);
# Aggregate logs from multiple sources
sub aggregate_logs {
my ($log_dirs, $pattern, $start_time, $end_time) = @_;
my @entries;
for my $dir (@$log_dirs) {
find(sub {
return unless /$pattern/;
my $file = $File::Find::name;
push @entries, @{process_log_file($file, $start_time, $end_time)};
}, $dir);
}
# Sort by timestamp
@entries = sort { $a->{timestamp} cmp $b->{timestamp} } @entries;
return \@entries;
}
sub process_log_file {
my ($file, $start_time, $end_time) = @_;
my @entries;
my $fh;
if ($file =~ /\.gz$/) {
$fh = IO::Uncompress::Gunzip->new($file) or die "Can't open $file: $!";
} else {
open $fh, '<', $file or die "Can't open $file: $!";
}
while (my $line = <$fh>) {
chomp $line;
my $entry = parse_log_line($line);
next unless $entry;
# Filter by time range
next if $start_time && $entry->{timestamp} lt $start_time;
next if $end_time && $entry->{timestamp} gt $end_time;
push @entries, $entry;
}
close $fh;
return \@entries;
}
```
### Report Generation
```perl
# Generate HTML report
sub generate_html_report {
my ($data, $output_file) = @_;
open my $fh, '>:encoding(utf8)', $output_file or die $!;
print $fh <<'HTML';
<!DOCTYPE html>
<html>
<head>
<title>Log Analysis Report</title>
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #4CAF50; color: white; }
tr:nth-child(even) { background-color: #f2f2f2; }
.error { color: red; font-weight: bold; }
.warning { color: orange; }
.chart { margin: 20px 0; }
</style>
</head>
<body>
<h1>Log Analysis Report</h1>
HTML
# Summary section
print $fh "<h2>Summary</h2>\n";
print $fh "<ul>\n";
print $fh "<li>Total Entries: $data->{total}</li>\n";
print $fh "<li>Time Range: $data->{start_time} to $data->{end_time}</li>\n";
print $fh "<li>Errors: $data->{error_count}</li>\n";
print $fh "<li>Warnings: $data->{warning_count}</li>\n";
print $fh "</ul>\n";
# Top errors table
print $fh "<h2>Top Errors</h2>\n";
print $fh "<table>\n";
print $fh "<tr><th>Error</th><th>Count</th><th>Percentage</th></tr>\n";
for my $error (@{$data->{top_errors}}) {
my $percentage = sprintf("%.2f%%",
($error->{count} / $data->{error_count}) * 100);
print $fh "<tr><td>$error->{message}</td>";
print $fh "<td>$error->{count}</td>";
print $fh "<td>$percentage</td></tr>\n";
}
print $fh "</table>\n";
# Timeline chart (ASCII)
print $fh "<h2>Activity Timeline</h2>\n";
print $fh "<pre class='chart'>\n";
print $fh generate_ascii_chart($data->{timeline});
print $fh "</pre>\n";
print $fh "</body></html>\n";
close $fh;
say "Report generated: $output_file";
}
sub generate_ascii_chart {
my ($timeline) = @_;
my $max_value = 0;
for my $point (@$timeline) {
$max_value = $point->{value} if $point->{value} > $max_value;
}
my $chart = "";
my $scale = 50 / ($max_value || 1);
for my $point (@$timeline) {
my $bar_length = int($point->{value} * $scale);
my $bar = '#' x $bar_length;
$chart .= sprintf("%s | %-50s %d\n",
$point->{time}, $bar, $point->{value});
}
return $chart;
}
```
## Advanced Log Processing
### Pattern Mining
```perl
# Discover common patterns in logs
sub mine_patterns {
my ($logfile, $min_support) = @_;
$min_support //= 10;
my %patterns;
my $total_lines = 0;
open my $fh, '<', $logfile or die $!;
while (my $line = <$fh>) {
chomp $line;
$total_lines++;
# Tokenize line
my @tokens = $line =~ /\S+/g;
# Generate n-grams
for my $n (2..5) {
for (my $i = 0; $i <= @tokens - $n; $i++) {
my $pattern = join(' ', @tokens[$i..$i+$n-1]);
# Replace variables with placeholders
$pattern =~ s/\d+/NUM/g;
$pattern =~ s/\b[A-Fa-f0-9]{32,}\b/HASH/g;
$pattern =~ s/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/IP/g;
$patterns{$pattern}++;
}
}
}
close $fh;
# Filter by support
my @frequent = grep { $patterns{$_} >= $min_support } keys %patterns;
@frequent = sort { $patterns{$b} <=> $patterns{$a} } @frequent;
return \@frequent[0..min(99, $#frequent)]; # Top 100
}
sub min { $_[0] < $_[1] ? $_[0] : $_[1] }
```
### Session Reconstruction
```perl
# Reconstruct user sessions from logs
sub reconstruct_sessions {
my ($logfile, $session_timeout) = @_;
$session_timeout //= 1800; # 30 minutes
my %sessions;
open my $fh, '<', $logfile or die $!;
while (my $line = <$fh>) {
my $entry = parse_apache_log($line);
next unless $entry;
my $session_id = identify_session($entry);
next unless $session_id;
# Check if session exists and is still active
if (exists $sessions{$session_id}) {
my $last_time = $sessions{$session_id}{last_activity};
my $current_time = parse_timestamp($entry->{timestamp});
if ($current_time - $last_time > $session_timeout) {
# Session expired, start new one
finalize_session($sessions{$session_id});
delete $sessions{$session_id};
}
}
# Add to session
$sessions{$session_id} //= {
id => $session_id,
start_time => $entry->{timestamp},
ip => $entry->{ip},
user_agent => $entry->{user_agent},
requests => [],
};
push @{$sessions{$session_id}{requests}}, {
timestamp => $entry->{timestamp},
method => $entry->{method},
path => $entry->{path},
status => $entry->{status},
size => $entry->{size},
};
$sessions{$session_id}{last_activity} =
parse_timestamp($entry->{timestamp});
}
close $fh;
# Finalize remaining sessions
finalize_session($_) for values %sessions;
return \%sessions;
}
sub identify_session {
my ($entry) = @_;
# Look for session ID in various places
if ($entry->{path} =~ /[?&]session=([^&]+)/) {
return $1;
}
# Use IP + User Agent as fallback
return "$entry->{ip}::" . ($entry->{user_agent} // 'unknown');
}
sub finalize_session {
my ($session) = @_;
$session->{duration} =
parse_timestamp($session->{requests}[-1]{timestamp}) -
parse_timestamp($session->{requests}[0]{timestamp});
$session->{request_count} = scalar(@{$session->{requests}});
$session->{total_bytes} = 0;
$session->{error_count} = 0;
for my $req (@{$session->{requests}}) {
$session->{total_bytes} += $req->{size} // 0;
$session->{error_count}++ if $req->{status} >= 400;
}
}
sub parse_timestamp {
# Convert log timestamp to epoch time
# Implementation depends on log format
my ($timestamp) = @_;
# ... parsing logic ...
return time(); # Placeholder
}
```
## Real-World Example: Complete Log Monitor
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use File::Tail;
use Email::Simple;
use Email::Sender::Simple qw(sendmail);
use DBI;
# Complete log monitoring system
package LogMonitor;
sub new {
my ($class, $config) = @_;
my $self = bless {
config => $config,
rules => [],
metrics => {},
alerts_sent => {},
}, $class;
$self->load_rules();
$self->init_database() if $config->{use_database};
return $self;
}
sub load_rules {
my ($self) = @_;
# Load monitoring rules from config
for my $rule (@{$self->{config}{rules}}) {
push @{$self->{rules}}, {
name => $rule->{name},
pattern => qr/$rule->{pattern}/,
severity => $rule->{severity},
threshold => $rule->{threshold} // 1,
window => $rule->{window} // 60,
action => $rule->{action},
count => 0,
timestamps => [],
};
}
}
sub process_line {
my ($self, $line) = @_;
for my $rule (@{$self->{rules}}) {
if ($line =~ $rule->{pattern}) {
$self->handle_rule_match($rule, $line);
}
}
$self->update_metrics($line);
}
sub handle_rule_match {
my ($self, $rule, $line) = @_;
my $now = time();
push @{$rule->{timestamps}}, $now;
# Clean old timestamps
@{$rule->{timestamps}} = grep {
$_ > $now - $rule->{window}
} @{$rule->{timestamps}};
# Check threshold
if (@{$rule->{timestamps}} >= $rule->{threshold}) {
$self->trigger_alert($rule, $line);
}
}
sub trigger_alert {
my ($self, $rule, $line) = @_;
# Rate limit alerts
my $key = "$rule->{name}:" . int(time() / 300); # 5-minute buckets
return if $self->{alerts_sent}{$key}++;
say "[ALERT] $rule->{name}: $line";
# Execute action
if ($rule->{action} eq 'email') {
$self->send_email_alert($rule, $line);
} elsif ($rule->{action} eq 'script') {
system($rule->{script}, $rule->{name}, $line);
}
# Store in database
if ($self->{config}{use_database}) {
$self->store_alert($rule, $line);
}
}
sub send_email_alert {
my ($self, $rule, $line) = @_;
my $email = Email::Simple->create(
header => [
To => $self->{config}{alert_email},
From => 'logmonitor@example.com',
Subject => "Alert: $rule->{name}",
],
body => "Alert triggered: $rule->{name}\n\n" .
"Pattern: $rule->{pattern}\n" .
"Severity: $rule->{severity}\n" .
"Log line: $line\n",
);
eval { sendmail($email) };
warn "Failed to send email: $@" if $@;
}
sub update_metrics {
my ($self, $line) = @_;
$self->{metrics}{lines_processed}++;
# Extract and track custom metrics
if ($line =~ /response_time=(\d+)ms/) {
push @{$self->{metrics}{response_times}}, $1;
}
# Periodic metric reporting
if ($self->{metrics}{lines_processed} % 1000 == 0) {
$self->report_metrics();
}
}
sub report_metrics {
my ($self) = @_;
say "Metrics Report:";
say " Lines processed: $self->{metrics}{lines_processed}";
if (my $times = $self->{metrics}{response_times}) {
my $avg = sum(@$times) / @$times;
say " Avg response time: ${avg}ms";
@$times = (); # Clear for next period
}
}
sub sum { my $s = 0; $s += $_ for @_; $s }
package main;
# Configuration
my $config = {
use_database => 0,
alert_email => 'admin@example.com',
rules => [
{
name => 'High Error Rate',
pattern => 'ERROR|FATAL',
severity => 5,
threshold => 10,
window => 60,
action => 'email',
},
{
name => 'Security Alert',
pattern => 'authentication failed|unauthorized access',
severity => 4,
threshold => 3,
window => 300,
action => 'script',
script => '/usr/local/bin/security_response.sh',
},
],
};
# Start monitoring
my $monitor = LogMonitor->new($config);
tail_file('/var/log/application.log', sub {
my ($line) = @_;
$monitor->process_line($line);
});
```
## Performance Tips
1. **Use compiled regexes** - Pre-compile patterns with qr//
2. **Process incrementally** - Don't load entire log into memory
3. **Index strategically** - Use seek/tell for random access
4. **Compress old logs** - Process gzipped files directly
5. **Parallel processing** - Use fork or threads for multiple files
6. **Cache parsed results** - Don't re-parse the same data
7. **Use appropriate data structures** - Hashes for lookups, arrays for order
## Best Practices
1. **Handle malformed lines gracefully** - Logs aren't always perfect
2. **Use time windows for analysis** - Avoid memory leaks
3. **Implement rate limiting for alerts** - Prevent alert storms
4. **Store raw logs** - Never modify originals
5. **Document log formats** - Include examples and edge cases
6. **Test with production data** - Synthetic logs miss real issues
7. **Monitor the monitor** - Ensure your tools are working
## Conclusion
Log analysis is where Perl's text processing power truly shines. Whether you're tracking down bugs, detecting security threats, or monitoring performance, Perl gives you the tools to turn log chaos into actionable insights. The key is building robust, efficient parsers and knowing when to alert versus when to aggregate.
Remember: logs are your system's story. Perl helps you read between the lines.
---
*Next: Process management and system commands. We'll explore how Perl interfaces with the operating system, manages processes, and automates system administration tasks.*

View File

@@ -0,0 +1,834 @@
# Chapter 10: Process Management and System Commands
> "Perl is the glue that holds the Internet together - and the system administrator's Swiss Army knife." - Larry Wall
System administration isn't just about files and text—it's about controlling processes, executing commands, and orchestrating system resources. Perl excels at being the conductor of this symphony, providing fine-grained control over process creation, inter-process communication, and system interaction. This chapter shows you how to wield these powers responsibly.
## Executing System Commands
### The Many Ways to Run Commands
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Method 1: system() - Returns exit status
my $status = system('ls', '-la', '/tmp');
if ($status == 0) {
say "Command succeeded";
} else {
say "Command failed with status: " . ($status >> 8);
}
# Method 2: Backticks - Captures output
my $output = `ls -la /tmp`;
say "Output: $output";
# Method 3: qx// - Same as backticks but cleaner
my $users = qx/who/;
say "Currently logged in:\n$users";
# Method 4: open() with pipe - More control
open my $pipe, '-|', 'ps', 'aux' or die "Can't run ps: $!";
while (<$pipe>) {
print if /perl/i; # Only show perl processes
}
close $pipe;
# Method 5: IPC::Run for complex scenarios
use IPC::Run qw(run timeout);
my ($in, $out, $err);
run ['grep', 'pattern', 'file.txt'], \$in, \$out, \$err, timeout(10)
or die "grep failed: $?";
say "Matches: $out";
say "Errors: $err" if $err;
```
### Safe Command Execution
```perl
# NEVER do this - shell injection vulnerability!
my $user_input = shift @ARGV;
system("ls $user_input"); # DANGEROUS!
# DO this instead - bypass shell
system('ls', $user_input); # Safe - no shell interpretation
# Or validate input
if ($user_input =~ /^[\w\-\.\/]+$/) {
system("ls $user_input");
} else {
die "Invalid input";
}
# Safe command builder
sub safe_command {
my ($cmd, @args) = @_;
# Validate command
my %allowed = map { $_ => 1 } qw(ls ps df du top);
die "Command not allowed: $cmd" unless $allowed{$cmd};
# Escape arguments
@args = map { quotemeta($_) } @args;
return system($cmd, @args);
}
# Taint mode for extra safety
#!/usr/bin/perl -T
use strict;
use warnings;
# With taint mode, Perl won't let you use tainted data in dangerous operations
my $user_input = $ENV{USER_DATA}; # Tainted
system("echo $user_input"); # Dies with taint error
# Must explicitly untaint
if ($user_input =~ /^([\w\s]+)$/) {
my $clean = $1; # Untainted
system("echo $clean"); # Now safe
}
```
## Process Management
### Fork and Exec
```perl
# Basic fork
my $pid = fork();
die "Fork failed: $!" unless defined $pid;
if ($pid == 0) {
# Child process
say "I'm the child with PID $$";
exec('sleep', '10') or die "Exec failed: $!";
} else {
# Parent process
say "I'm the parent, child PID is $pid";
my $kid = waitpid($pid, 0);
say "Child $kid exited with status " . ($? >> 8);
}
# Fork multiple children
sub fork_children {
my ($count, $work) = @_;
my @pids;
for (1..$count) {
my $pid = fork();
die "Fork failed: $!" unless defined $pid;
if ($pid == 0) {
# Child
$work->($_);
exit(0);
} else {
# Parent
push @pids, $pid;
}
}
# Wait for all children
for my $pid (@pids) {
waitpid($pid, 0);
}
}
# Use it
fork_children(5, sub {
my $num = shift;
say "Worker $num (PID $$) starting";
sleep(rand(5));
say "Worker $num done";
});
```
### Process Monitoring
```perl
# Monitor system processes
sub get_process_info {
my ($pid) = @_;
return unless kill(0, $pid); # Check if process exists
# Read from /proc (Linux)
if (-d "/proc/$pid") {
my $info = {};
# Command line
if (open my $fh, '<', "/proc/$pid/cmdline") {
local $/;
$info->{cmdline} = <$fh>;
$info->{cmdline} =~ s/\0/ /g;
close $fh;
}
# Status
if (open my $fh, '<', "/proc/$pid/stat") {
my $stat = <$fh>;
my @fields = split /\s+/, $stat;
$info->{state} = $fields[2];
$info->{ppid} = $fields[3];
$info->{utime} = $fields[13];
$info->{stime} = $fields[14];
close $fh;
}
# Memory
if (open my $fh, '<', "/proc/$pid/status") {
while (<$fh>) {
if (/^VmRSS:\s+(\d+)/) {
$info->{memory} = $1 * 1024; # Convert to bytes
}
}
close $fh;
}
return $info;
}
# Fallback to ps command
my $ps = `ps -p $pid -o pid,ppid,user,command`;
return parse_ps_output($ps);
}
# Process tree
sub show_process_tree {
my ($pid, $indent) = @_;
$pid //= $$;
$indent //= 0;
my $info = get_process_info($pid);
return unless $info;
print " " x $indent;
say "PID $pid: $info->{cmdline}";
# Find children
my @children = find_children($pid);
for my $child (@children) {
show_process_tree($child, $indent + 2);
}
}
sub find_children {
my ($ppid) = @_;
my @children;
opendir my $dh, '/proc' or return;
while (my $dir = readdir $dh) {
next unless $dir =~ /^\d+$/;
if (open my $fh, '<', "/proc/$dir/stat") {
my $stat = <$fh>;
my @fields = split /\s+/, $stat;
push @children, $dir if $fields[3] == $ppid;
close $fh;
}
}
closedir $dh;
return @children;
}
```
## Inter-Process Communication (IPC)
### Pipes
```perl
# Simple pipe
pipe(my $reader, my $writer) or die "Pipe failed: $!";
my $pid = fork();
die "Fork failed: $!" unless defined $pid;
if ($pid == 0) {
# Child writes
close $reader;
print $writer "Hello from child\n";
print $writer "PID: $$\n";
close $writer;
exit(0);
} else {
# Parent reads
close $writer;
while (<$reader>) {
print "Parent received: $_";
}
close $reader;
waitpid($pid, 0);
}
# Bidirectional communication with IPC::Open2
use IPC::Open2;
my ($reader, $writer);
my $pid = open2($reader, $writer, 'bc', '-l') # Calculator
or die "Can't run bc: $!";
# Send calculations
print $writer "2 + 2\n";
print $writer "scale=2; 22/7\n";
print $writer "quit\n";
close $writer;
# Read results
while (<$reader>) {
chomp;
say "Result: $_";
}
close $reader;
waitpid($pid, 0);
```
### Shared Memory and Semaphores
```perl
use IPC::SysV qw(IPC_PRIVATE IPC_CREAT IPC_RMID S_IRUSR S_IWUSR);
use IPC::SharedMem;
use IPC::Semaphore;
# Create shared memory
my $shm = IPC::SharedMem->new(
IPC_PRIVATE,
1024,
S_IRUSR | S_IWUSR | IPC_CREAT
) or die "Can't create shared memory: $!";
# Create semaphore for synchronization
my $sem = IPC::Semaphore->new(
IPC_PRIVATE,
1,
S_IRUSR | S_IWUSR | IPC_CREAT
) or die "Can't create semaphore: $!";
$sem->setval(0, 1); # Initialize semaphore
# Fork a child
my $pid = fork();
die "Fork failed: $!" unless defined $pid;
if ($pid == 0) {
# Child process
for (1..5) {
$sem->op(0, -1, 0); # Wait (P operation)
my $data = $shm->read(0, 1024);
$data =~ s/\0+$//; # Remove null padding
say "Child read: $data";
$shm->write("Child message $_", 0, 1024);
$sem->op(0, 1, 0); # Signal (V operation)
sleep(1);
}
exit(0);
} else {
# Parent process
for (1..5) {
$sem->op(0, -1, 0); # Wait
$shm->write("Parent message $_", 0, 1024);
$sem->op(0, 1, 0); # Signal
sleep(1);
$sem->op(0, -1, 0); # Wait
my $data = $shm->read(0, 1024);
$data =~ s/\0+$//;
say "Parent read: $data";
$sem->op(0, 1, 0); # Signal
}
waitpid($pid, 0);
# Cleanup
$shm->remove;
$sem->remove;
}
```
## Signal Handling
### Basic Signal Handling
```perl
# Set up signal handlers
$SIG{INT} = sub {
say "\nReceived SIGINT, cleaning up...";
cleanup();
exit(0);
};
$SIG{TERM} = sub {
say "Received SIGTERM, shutting down gracefully...";
cleanup();
exit(0);
};
$SIG{HUP} = sub {
say "Received SIGHUP, reloading configuration...";
reload_config();
};
$SIG{CHLD} = 'IGNORE'; # Auto-reap children
# Or use a dispatch table
my %sig_handlers = (
INT => \&handle_interrupt,
TERM => \&handle_terminate,
HUP => \&handle_reload,
USR1 => \&handle_user1,
USR2 => \&handle_user2,
);
for my $sig (keys %sig_handlers) {
$SIG{$sig} = $sig_handlers{$sig};
}
# Send signals to other processes
kill 'TERM', $pid; # Send SIGTERM
kill 'USR1', @pids; # Send to multiple processes
kill 0, $pid; # Check if process exists
# Alarm signal for timeouts
$SIG{ALRM} = sub { die "Timeout!" };
alarm(10); # 10 second timeout
eval {
# Long running operation
lengthy_operation();
alarm(0); # Cancel alarm
};
if ($@ && $@ =~ /Timeout/) {
warn "Operation timed out";
}
```
### Safe Signal Handling
```perl
# Safe signal handling with POSIX
use POSIX qw(:signal_h);
# Block signals during critical sections
my $sigset = POSIX::SigSet->new(SIGINT, SIGTERM);
sigprocmask(SIG_BLOCK, $sigset);
# Critical section
do_critical_work();
# Unblock signals
sigprocmask(SIG_UNBLOCK, $sigset);
# Or use local signal handlers
sub critical_operation {
local $SIG{INT} = 'IGNORE';
local $SIG{TERM} = 'IGNORE';
# Critical work here
}
```
## System Information
### Gathering System Stats
```perl
# System information
use Sys::Hostname;
use POSIX qw(uname);
my $hostname = hostname();
my ($sysname, $nodename, $release, $version, $machine) = uname();
say "Hostname: $hostname";
say "System: $sysname $release on $machine";
# Load average
sub get_load_average {
if (open my $fh, '<', '/proc/loadavg') {
my $line = <$fh>;
close $fh;
my @loads = split /\s+/, $line;
return @loads[0..2];
}
return;
}
# Memory information
sub get_memory_info {
my %mem;
if (open my $fh, '<', '/proc/meminfo') {
while (<$fh>) {
if (/^(\w+):\s+(\d+)/) {
$mem{$1} = $2 * 1024; # Convert to bytes
}
}
close $fh;
}
return \%mem;
}
# Disk usage
sub get_disk_usage {
my @disks;
open my $pipe, '-|', 'df', '-k' or die "Can't run df: $!";
my $header = <$pipe>; # Skip header
while (<$pipe>) {
my @fields = split /\s+/;
next unless @fields >= 6;
push @disks, {
filesystem => $fields[0],
size => $fields[1] * 1024,
used => $fields[2] * 1024,
available => $fields[3] * 1024,
percent => $fields[4],
mount => $fields[5],
};
}
close $pipe;
return \@disks;
}
# CPU information
sub get_cpu_info {
my @cpus;
if (open my $fh, '<', '/proc/cpuinfo') {
my $cpu = {};
while (<$fh>) {
chomp;
if (/^processor\s*:\s*(\d+)/) {
push @cpus, $cpu if %$cpu;
$cpu = { id => $1 };
} elsif (/^([^:]+)\s*:\s*(.+)/) {
my ($key, $value) = ($1, $2);
$key =~ s/\s+/_/g;
$cpu->{lc($key)} = $value;
}
}
push @cpus, $cpu if %$cpu;
close $fh;
}
return \@cpus;
}
```
## Daemon Processes
### Creating a Daemon
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use POSIX qw(setsid);
sub daemonize {
my ($pidfile) = @_;
# Fork and exit parent
my $pid = fork();
exit(0) if $pid;
die "Fork failed: $!" unless defined $pid;
# Create new session
setsid() or die "Can't start new session: $!";
# Fork again to ensure we can't acquire a controlling terminal
$pid = fork();
exit(0) if $pid;
die "Fork failed: $!" unless defined $pid;
# Change working directory
chdir('/') or die "Can't chdir to /: $!";
# Clear file creation mask
umask(0);
# Close file descriptors
close(STDIN);
close(STDOUT);
close(STDERR);
# Reopen standard file descriptors
open(STDIN, '<', '/dev/null') or die "Can't read /dev/null: $!";
open(STDOUT, '>', '/dev/null') or die "Can't write /dev/null: $!";
open(STDERR, '>', '/dev/null') or die "Can't write /dev/null: $!";
# Write PID file
if ($pidfile) {
open my $fh, '>', $pidfile or die "Can't write PID file: $!";
print $fh $$;
close $fh;
}
return $$;
}
# Daemon with logging
sub daemon_with_logging {
my ($name, $logfile) = @_;
# Daemonize
my $pidfile = "/var/run/$name.pid";
daemonize($pidfile);
# Open log file
open my $log, '>>', $logfile or die "Can't open log: $!";
$log->autoflush(1);
# Redirect STDOUT and STDERR to log
open(STDOUT, '>&', $log) or die "Can't dup STDOUT: $!";
open(STDERR, '>&', $log) or die "Can't dup STDERR: $!";
# Set up signal handlers
$SIG{TERM} = sub {
say "Daemon shutting down";
unlink $pidfile;
exit(0);
};
$SIG{HUP} = sub {
say "Reloading configuration";
# Reload config here
};
# Main daemon loop
say "Daemon started with PID $$";
while (1) {
# Do daemon work
do_work();
sleep(60);
}
}
sub do_work {
my $timestamp = localtime();
say "[$timestamp] Performing scheduled work";
# Your daemon logic here
}
```
## Real-World Example: Process Manager
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use feature 'signatures';
no warnings 'experimental::signatures';
package ProcessManager;
use Time::HiRes qw(sleep);
use POSIX qw(:sys_wait_h);
sub new($class, $config) {
return bless {
config => $config,
workers => {},
running => 0,
}, $class;
}
sub start($self) {
$self->setup_signals();
$self->spawn_workers();
$self->monitor_loop();
}
sub setup_signals($self) {
$SIG{TERM} = sub { $self->shutdown() };
$SIG{INT} = sub { $self->shutdown() };
$SIG{HUP} = sub { $self->reload() };
$SIG{CHLD} = sub { $self->reap_children() };
}
sub spawn_workers($self) {
my $count = $self->{config}{worker_count} // 5;
for (1..$count) {
$self->spawn_worker();
}
}
sub spawn_worker($self) {
my $pid = fork();
die "Fork failed: $!" unless defined $pid;
if ($pid == 0) {
# Child process
$0 = "$0 [worker]"; # Change process name
$self->worker_loop();
exit(0);
} else {
# Parent process
$self->{workers}{$pid} = {
pid => $pid,
started => time(),
status => 'running',
};
$self->{running}++;
say "Spawned worker $pid";
}
}
sub worker_loop($self) {
# Reset signal handlers in worker
$SIG{TERM} = sub { exit(0) };
$SIG{INT} = 'DEFAULT';
$SIG{HUP} = 'DEFAULT';
while (1) {
# Simulate work
my $task = $self->get_task();
if ($task) {
$self->process_task($task);
} else {
sleep(1);
}
}
}
sub get_task($self) {
# In real implementation, get from queue, database, etc.
return rand() > 0.5 ? { id => int(rand(1000)), type => 'process' } : undef;
}
sub process_task($self, $task) {
say "[$$] Processing task $task->{id}";
sleep(rand(3)); # Simulate work
say "[$$] Completed task $task->{id}";
}
sub monitor_loop($self) {
while (1) {
$self->check_workers();
$self->report_status();
sleep(5);
}
}
sub check_workers($self) {
for my $pid (keys %{$self->{workers}}) {
unless (kill 0, $pid) {
# Worker died unexpectedly
say "Worker $pid died, respawning";
delete $self->{workers}{$pid};
$self->{running}--;
$self->spawn_worker();
}
}
}
sub reap_children($self) {
while ((my $pid = waitpid(-1, WNOHANG)) > 0) {
my $status = $? >> 8;
say "Reaped child $pid with status $status";
if (exists $self->{workers}{$pid}) {
delete $self->{workers}{$pid};
$self->{running}--;
# Respawn if not shutting down
unless ($self->{shutting_down}) {
$self->spawn_worker();
}
}
}
}
sub report_status($self) {
my $now = time();
say "\n=== Status Report ===";
say "Workers: $self->{running}";
for my $pid (sort { $a <=> $b } keys %{$self->{workers}}) {
my $worker = $self->{workers}{$pid};
my $uptime = $now - $worker->{started};
say " PID $pid: running for ${uptime}s";
}
}
sub reload($self) {
say "Reloading configuration";
# Re-read config file
# Adjust worker count if needed
}
sub shutdown($self) {
say "\nShutting down...";
$self->{shutting_down} = 1;
# Send TERM to all workers
for my $pid (keys %{$self->{workers}}) {
say "Terminating worker $pid";
kill 'TERM', $pid;
}
# Wait for workers to exit
my $timeout = 10;
while ($self->{running} > 0 && $timeout-- > 0) {
sleep(1);
}
# Force kill if necessary
if ($self->{running} > 0) {
for my $pid (keys %{$self->{workers}}) {
say "Force killing worker $pid";
kill 'KILL', $pid;
}
}
say "Shutdown complete";
exit(0);
}
package main;
# Run the process manager
my $config = {
worker_count => 3,
max_tasks_per_worker => 100,
worker_timeout => 300,
};
my $manager = ProcessManager->new($config);
$manager->start();
```
## Best Practices
1. **Always check return values** - System calls can fail
2. **Use list form of system()** - Avoids shell injection
3. **Reap zombie processes** - Set up SIGCHLD handler
4. **Handle signals gracefully** - Clean shutdown is important
5. **Use absolute paths in daemons** - Working directory may change
6. **Log daemon activities** - You can't see stdout in a daemon
7. **Implement proper locking** - Prevent multiple instances
8. **Test signal handling** - Make sure cleanup works
9. **Monitor resource usage** - Prevent resource leaks
10. **Document process relationships** - Who spawns whom
## Conclusion
Process management and system interaction are where Perl proves its worth as a system administration language. Whether you're orchestrating complex workflows, building robust daemons, or simply automating system tasks, Perl provides the tools to do it efficiently and reliably.
Remember: with great power comes great responsibility. Always validate inputs, handle errors gracefully, and test thoroughly—especially when your code has system-level access.
---
*Next: Network programming and web scraping. We'll explore how Perl handles network protocols, builds clients and servers, and extracts data from the web.*

View File

@@ -0,0 +1,855 @@
# Chapter 11: Network Programming and Web Scraping
> "The Internet? We are not interested in it." - Bill Gates, 1993
> "Hold my beer." - Perl programmers, 1993
Perl and networking go together like coffee and late-night coding sessions. From the early days of CGI to modern REST APIs, Perl has been connecting systems and scraping data since before "web scraping" was even a term. This chapter covers everything from raw sockets to modern web automation.
## Socket Programming Basics
### TCP Client and Server
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use IO::Socket::INET;
# Simple TCP client
sub tcp_client {
my ($host, $port) = @_;
my $socket = IO::Socket::INET->new(
PeerAddr => $host,
PeerPort => $port,
Proto => 'tcp',
Timeout => 10,
) or die "Can't connect to $host:$port: $!";
say "Connected to $host:$port";
# Send data
print $socket "GET / HTTP/1.0\r\n\r\n";
# Read response
while (my $line = <$socket>) {
print $line;
}
close $socket;
}
# Simple TCP server
sub tcp_server {
my ($port) = @_;
my $server = IO::Socket::INET->new(
LocalPort => $port,
Proto => 'tcp',
Listen => 5,
Reuse => 1,
) or die "Can't create server on port $port: $!";
say "Server listening on port $port";
while (my $client = $server->accept()) {
my $peer = $client->peerhost();
my $peerport = $client->peerport();
say "Connection from $peer:$peerport";
# Handle client
print $client "HTTP/1.0 200 OK\r\n";
print $client "Content-Type: text/plain\r\n\r\n";
print $client "Hello from Perl server!\n";
print $client "Your IP: $peer\n";
print $client "Server time: " . localtime() . "\n";
close $client;
}
}
# Multi-threaded server
use threads;
use threads::shared;
sub threaded_server {
my ($port) = @_;
my $connections :shared = 0;
my $server = IO::Socket::INET->new(
LocalPort => $port,
Proto => 'tcp',
Listen => 10,
Reuse => 1,
) or die "Can't create server: $!";
say "Threaded server on port $port";
while (my $client = $server->accept()) {
threads->create(sub {
$connections++;
handle_client($client, $connections);
$connections--;
})->detach();
}
}
sub handle_client {
my ($client, $id) = @_;
say "[$id] New connection";
while (my $line = <$client>) {
chomp $line;
last if $line eq 'quit';
# Echo back
print $client "You said: $line\n";
}
close $client;
say "[$id] Connection closed";
}
```
### UDP Communication
```perl
# UDP client
sub udp_client {
my ($host, $port, $message) = @_;
my $socket = IO::Socket::INET->new(
PeerAddr => $host,
PeerPort => $port,
Proto => 'udp',
) or die "Can't create UDP socket: $!";
$socket->send($message) or die "Send failed: $!";
# Wait for response (with timeout)
my $response;
eval {
local $SIG{ALRM} = sub { die "timeout\n" };
alarm(5);
$socket->recv($response, 1024);
alarm(0);
};
if ($@) {
warn "No response: $@";
} else {
say "Received: $response";
}
close $socket;
}
# UDP server
sub udp_server {
my ($port) = @_;
my $socket = IO::Socket::INET->new(
LocalPort => $port,
Proto => 'udp',
) or die "Can't create UDP server: $!";
say "UDP server listening on port $port";
while (1) {
my $data;
my $client = $socket->recv($data, 1024);
my ($port, $ip) = sockaddr_in($socket->peername);
my $peer = inet_ntoa($ip);
say "Received from $peer:$port: $data";
# Send response
$socket->send("ACK: $data");
}
}
```
## HTTP Clients
### Using LWP::UserAgent
```perl
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Cookies;
# Basic GET request
sub simple_get {
my ($url) = @_;
my $ua = LWP::UserAgent->new(
timeout => 30,
agent => 'MyPerlBot/1.0',
);
my $response = $ua->get($url);
if ($response->is_success) {
return $response->decoded_content;
} else {
die "Failed: " . $response->status_line;
}
}
# POST with form data
sub post_form {
my ($url, $data) = @_;
my $ua = LWP::UserAgent->new();
my $response = $ua->post($url, $data);
return $response->decoded_content if $response->is_success;
die "POST failed: " . $response->status_line;
}
# Advanced HTTP client with cookies and headers
sub advanced_client {
my $ua = LWP::UserAgent->new(
timeout => 30,
max_redirect => 3,
agent => 'Mozilla/5.0 (compatible; MyBot/1.0)',
);
# Cookie jar
my $cookies = HTTP::Cookies->new(
file => 'cookies.txt',
autosave => 1,
);
$ua->cookie_jar($cookies);
# Custom headers
$ua->default_header('Accept' => 'application/json');
$ua->default_header('Accept-Language' => 'en-US,en;q=0.9');
# Authentication
$ua->credentials('example.com:443', 'Protected', 'user', 'pass');
# Proxy
$ua->proxy(['http', 'https'], 'http://proxy.example.com:8080/');
return $ua;
}
# Download file with progress
sub download_file {
my ($url, $filename) = @_;
my $ua = LWP::UserAgent->new();
my $expected_length;
my $bytes_received = 0;
open my $fh, '>', $filename or die "Can't open $filename: $!";
binmode $fh;
my $response = $ua->get($url,
':content_cb' => sub {
my ($chunk, $response) = @_;
$expected_length //= $response->header('Content-Length');
$bytes_received += length($chunk);
print $fh $chunk;
# Progress
if ($expected_length) {
my $percent = int(($bytes_received / $expected_length) * 100);
print "\rDownloading: $percent%";
}
}
);
close $fh;
print "\n";
die "Download failed: " . $response->status_line
unless $response->is_success;
say "Downloaded $filename ($bytes_received bytes)";
}
```
### Modern HTTP with Mojo::UserAgent
```perl
use Mojo::UserAgent;
use Mojo::Promise;
# Async/await style requests
sub modern_http {
my $ua = Mojo::UserAgent->new;
# Simple GET
my $result = $ua->get('https://api.github.com/users/perl')->result;
my $json = $result->json;
say "Perl has $json->{public_repos} public repos";
# Concurrent requests
my @urls = qw(
https://perl.org
https://metacpan.org
https://perl.com
);
my @promises = map { $ua->get_p($_) } @urls;
Mojo::Promise->all(@promises)->then(sub {
my @results = @_;
for my $tx (@results) {
my $url = $tx->[0]->req->url;
my $title = $tx->[0]->res->dom->at('title')->text;
say "$url: $title";
}
})->wait;
# WebSocket
$ua->websocket('ws://echo.websocket.org' => sub {
my ($ua, $tx) = @_;
$tx->on(message => sub {
my ($tx, $msg) = @_;
say "Received: $msg";
$tx->finish if $msg eq 'exit';
});
$tx->send('Hello, WebSocket!');
});
Mojo::IOLoop->start;
}
```
## Web Scraping
### HTML Parsing with Mojo::DOM
```perl
use Mojo::UserAgent;
use Mojo::DOM;
sub scrape_with_mojo {
my ($url) = @_;
my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->result->dom;
# CSS selectors
my $title = $dom->at('title')->text;
say "Title: $title";
# Find all links
$dom->find('a[href]')->each(sub {
my $link = shift;
say "Link: " . $link->attr('href') . " - " . $link->text;
});
# Extract table data
my @rows;
$dom->find('table tr')->each(sub {
my $row = shift;
my @cells = $row->find('td')->map('text')->each;
push @rows, \@cells if @cells;
});
# Form extraction
$dom->find('form')->each(sub {
my $form = shift;
say "Form action: " . $form->attr('action');
$form->find('input')->each(sub {
my $input = shift;
say " Input: " . ($input->attr('name') // 'unnamed');
});
});
return \@rows;
}
```
### Web::Scraper for Structured Extraction
```perl
use Web::Scraper;
use URI;
# Define scraper
my $scraper = scraper {
process 'title', 'title' => 'TEXT';
process 'meta[name="description"]', 'description' => '@content';
process 'h1', 'heading[]' => 'TEXT';
process 'a', 'links[]' => {
text => 'TEXT',
href => '@href',
};
process 'img', 'images[]' => '@src';
};
# Use scraper
my $url = URI->new('https://example.com');
my $result = $scraper->scrape($url);
use Data::Dumper;
print Dumper($result);
# Complex scraping patterns
sub scrape_products {
my ($url) = @_;
my $products = scraper {
process '.product', 'products[]' => scraper {
process '.name', 'name' => 'TEXT';
process '.price', 'price' => 'TEXT';
process '.description', 'desc' => 'TEXT';
process 'img', 'image' => '@src';
process 'a.details', 'link' => '@href';
};
process '.pagination .next', 'next_page' => '@href';
};
my @all_products;
my $current_url = $url;
while ($current_url) {
my $result = $products->scrape(URI->new($current_url));
push @all_products, @{$result->{products} || []};
# Follow pagination
$current_url = $result->{next_page};
last if @all_products > 100; # Limit
}
return \@all_products;
}
```
### JavaScript-Heavy Sites with Selenium
```perl
use Selenium::Chrome;
use Selenium::Waiter qw(wait_until);
sub scrape_dynamic_site {
my ($url) = @_;
# Start Chrome
my $driver = Selenium::Chrome->new(
extra_capabilities => {
'goog:chromeOptions' => {
args => ['--headless', '--disable-gpu'],
}
}
);
$driver->get($url);
# Wait for dynamic content
wait_until { $driver->find_element('.products-loaded') };
# Scroll to load more content
$driver->execute_script('window.scrollTo(0, document.body.scrollHeight)');
sleep(2);
# Extract data after JS rendering
my @products;
my $elements = $driver->find_elements('.product-card');
for my $element (@$elements) {
push @products, {
name => $element->find_child_element('.name')->get_text(),
price => $element->find_child_element('.price')->get_text(),
};
}
# Interact with page
my $button = $driver->find_element('#load-more');
$button->click() if $button;
$driver->quit();
return \@products;
}
```
## REST API Clients
### Building a REST Client
```perl
package RESTClient;
use Modern::Perl '2023';
use LWP::UserAgent;
use JSON::XS;
use URI::Escape;
sub new {
my ($class, $base_url, $options) = @_;
my $self = {
base_url => $base_url,
ua => LWP::UserAgent->new(
timeout => $options->{timeout} // 30,
agent => $options->{agent} // 'RESTClient/1.0',
),
headers => $options->{headers} // {},
auth => $options->{auth},
};
return bless $self, $class;
}
sub request {
my ($self, $method, $path, $data) = @_;
my $url = $self->{base_url} . $path;
my $req = HTTP::Request->new($method => $url);
# Add headers
for my $header (keys %{$self->{headers}}) {
$req->header($header => $self->{headers}{$header});
}
# Add auth
if ($self->{auth}) {
if ($self->{auth}{type} eq 'basic') {
$req->authorization_basic(
$self->{auth}{username},
$self->{auth}{password}
);
} elsif ($self->{auth}{type} eq 'bearer') {
$req->header(Authorization => "Bearer $self->{auth}{token}");
}
}
# Add data
if ($data) {
if ($method eq 'GET') {
# Add as query params
my @params;
for my $key (keys %$data) {
push @params, uri_escape($key) . '=' . uri_escape($data->{$key});
}
$url .= '?' . join('&', @params);
$req->uri($url);
} else {
# Send as JSON body
$req->header('Content-Type' => 'application/json');
$req->content(encode_json($data));
}
}
my $response = $self->{ua}->request($req);
if ($response->is_success) {
my $content = $response->decoded_content;
return $response->header('Content-Type') =~ /json/
? decode_json($content)
: $content;
} else {
die "Request failed: " . $response->status_line;
}
}
sub get { shift->request('GET', @_) }
sub post { shift->request('POST', @_) }
sub put { shift->request('PUT', @_) }
sub delete { shift->request('DELETE', @_) }
sub patch { shift->request('PATCH', @_) }
# Usage
package main;
my $api = RESTClient->new('https://api.example.com', {
headers => {
'Accept' => 'application/json',
'X-API-Version' => '2.0',
},
auth => {
type => 'bearer',
token => 'your-api-token',
},
});
# GET /users
my $users = $api->get('/users', { limit => 10, offset => 0 });
# POST /users
my $new_user = $api->post('/users', {
name => 'Alice',
email => 'alice@example.com',
});
# PUT /users/123
$api->put('/users/123', { name => 'Alice Smith' });
# DELETE /users/123
$api->delete('/users/123');
```
## Email and SMTP
### Sending Email
```perl
use Email::Simple;
use Email::Sender::Simple qw(sendmail);
use Email::Sender::Transport::SMTP;
use Email::MIME;
# Simple email
sub send_simple_email {
my ($to, $subject, $body) = @_;
my $email = Email::Simple->create(
header => [
To => $to,
From => 'sender@example.com',
Subject => $subject,
],
body => $body,
);
sendmail($email);
}
# Email with attachments
sub send_email_with_attachment {
my ($to, $subject, $body, $attachment) = @_;
my @parts = (
Email::MIME->create(
attributes => {
content_type => 'text/plain',
charset => 'UTF-8',
},
body => $body,
),
);
if ($attachment && -f $attachment) {
open my $fh, '<:raw', $attachment or die $!;
my $data = do { local $/; <$fh> };
close $fh;
push @parts, Email::MIME->create(
attributes => {
filename => basename($attachment),
content_type => 'application/octet-stream',
encoding => 'base64',
disposition => 'attachment',
},
body => $data,
);
}
my $email = Email::MIME->create(
header => [
To => $to,
From => 'sender@example.com',
Subject => $subject,
],
parts => \@parts,
);
# Custom SMTP transport
my $transport = Email::Sender::Transport::SMTP->new({
host => 'smtp.gmail.com',
port => 587,
sasl_username => 'your@gmail.com',
sasl_password => 'your-password',
ssl => 'starttls',
});
sendmail($email, { transport => $transport });
}
sub basename { (split /\//, $_[0])[-1] }
```
## Real-World Example: Web Monitor
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use LWP::UserAgent;
use Digest::MD5 qw(md5_hex);
use Time::HiRes qw(time);
use Email::Simple;
use Email::Sender::Simple qw(sendmail);
package WebMonitor;
sub new {
my ($class, $config) = @_;
return bless {
config => $config,
ua => LWP::UserAgent->new(
timeout => 30,
agent => 'WebMonitor/1.0',
),
state => {},
}, $class;
}
sub monitor {
my ($self) = @_;
while (1) {
for my $site (@{$self->{config}{sites}}) {
$self->check_site($site);
}
sleep($self->{config}{interval} // 300); # 5 minutes default
}
}
sub check_site {
my ($self, $site) = @_;
my $start_time = time();
my $response = $self->{ua}->get($site->{url});
my $response_time = time() - $start_time;
my $status = {
url => $site->{url},
timestamp => time(),
response_time => $response_time,
status_code => $response->code,
success => $response->is_success,
};
# Check for changes
if ($response->is_success) {
my $content = $response->decoded_content;
my $hash = md5_hex($content);
my $prev_hash = $self->{state}{$site->{url}}{hash} // '';
if ($prev_hash && $hash ne $prev_hash) {
$status->{changed} = 1;
$self->alert("Content changed", $site, $status);
}
$self->{state}{$site->{url}}{hash} = $hash;
# Check for keywords
if ($site->{keywords}) {
for my $keyword (@{$site->{keywords}}) {
if ($content =~ /$keyword/i) {
$self->alert("Keyword found: $keyword", $site, $status);
}
}
}
# Check response time
if ($site->{max_response_time} &&
$response_time > $site->{max_response_time}) {
$self->alert("Slow response", $site, $status);
}
} else {
# Site is down
$self->alert("Site down", $site, $status);
}
$self->log_status($status);
}
sub alert {
my ($self, $reason, $site, $status) = @_;
say "[ALERT] $reason for $site->{url}";
if ($self->{config}{email_alerts}) {
my $email = Email::Simple->create(
header => [
To => $self->{config}{alert_email},
From => 'monitor@example.com',
Subject => "Web Monitor Alert: $reason",
],
body => $self->format_alert($reason, $site, $status),
);
eval { sendmail($email) };
warn "Failed to send alert: $@" if $@;
}
}
sub format_alert {
my ($self, $reason, $site, $status) = @_;
return <<EOF;
Alert: $reason
Site: $site->{url}
Time: @{[scalar localtime($status->{timestamp})]}
Status Code: $status->{status_code}
Response Time: @{[sprintf "%.2f", $status->{response_time}]} seconds
Please investigate.
EOF
}
sub log_status {
my ($self, $status) = @_;
open my $fh, '>>', 'monitor.log' or die $!;
print $fh join("\t",
$status->{timestamp},
$status->{url},
$status->{status_code},
$status->{response_time},
$status->{success} ? 'OK' : 'FAIL'
), "\n";
close $fh;
}
package main;
my $config = {
interval => 60, # Check every minute
email_alerts => 1,
alert_email => 'admin@example.com',
sites => [
{
url => 'https://example.com',
max_response_time => 5,
keywords => ['error', 'maintenance'],
},
{
url => 'https://api.example.com/health',
max_response_time => 2,
},
],
};
my $monitor = WebMonitor->new($config);
$monitor->monitor();
```
## Best Practices
1. **Set appropriate timeouts** - Network calls can hang forever
2. **Handle redirects carefully** - Limit redirect chains
3. **Respect robots.txt** - Be a good web citizen
4. **Add delays between requests** - Don't hammer servers
5. **Use connection pooling** - Reuse connections when possible
6. **Implement retry logic** - Networks are unreliable
7. **Cache responses** - Reduce unnecessary requests
8. **Set a proper User-Agent** - Identify your bot
9. **Handle encodings properly** - UTF-8 isn't universal
10. **Log failures for debugging** - Network issues are common
## Conclusion
Network programming and web scraping in Perl combine power with practicality. Whether you're building robust API clients, scraping data from websites, or creating network services, Perl provides the tools to do it efficiently. The key is choosing the right tool for each task and respecting the services you interact with.
Remember: just because you can scrape something doesn't mean you should. Always check terms of service and robots.txt, and be respectful of the resources you're accessing.
---
*Next: Database operations with DBI. We'll explore how Perl interfaces with databases, from simple queries to complex transactions and migrations.*

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,877 @@
# Chapter 14: CPAN - The Treasure Trove
> "CPAN is the language. Perl is just its syntax." - chromatic
The Comprehensive Perl Archive Network (CPAN) is Perl's killer feature. With over 200,000 modules solving virtually every programming problem imaginable, CPAN transforms Perl from a language into an ecosystem. This chapter shows you how to navigate this treasure trove, evaluate modules, contribute your own, and leverage CPAN to write less code and solve more problems.
## Understanding CPAN
### The CPAN Ecosystem
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# CPAN consists of:
# 1. The archive - mirrors worldwide containing modules
# 2. MetaCPAN - modern web interface for searching and browsing
# 3. CPAN clients - tools to install modules (cpan, cpanm, carton)
# 4. PAUSE - Perl Authors Upload Server for contributors
# 5. CPAN Testers - automated testing across platforms
# Finding modules:
# - MetaCPAN: https://metacpan.org
# - Search by task, author, or module name
# - Check ratings, test results, and dependencies
# - Read documentation and reviews
# Understanding module naming:
# Foo::Bar - Hierarchical namespace
# Foo::Bar::XS - XS (C) implementation
# Foo::Bar::PP - Pure Perl implementation
# Foo::Bar::Tiny - Minimal version
# Foo::Bar::Mo - Modern/Moose-based version
```
### Installing Modules
```perl
# Using cpanminus (recommended)
# Install: curl -L https://cpanmin.us | perl - App::cpanminus
# Install a module
system('cpanm Module::Name');
# Install specific version
system('cpanm Module::Name@1.23');
# Install from GitHub
system('cpanm git://github.com/user/repo.git');
# Install with dependencies
system('cpanm --installdeps .');
# Install without tests (faster but risky)
system('cpanm --notest Module::Name');
# Local installation
system('cpanm -l ~/perl5 Module::Name');
# Using traditional CPAN client
use CPAN;
CPAN::Shell->install('Module::Name');
# Or from command line:
# cpan Module::Name
```
### Managing Dependencies with Carton
```perl
# Carton is like Bundler for Ruby or npm for Node.js
# Create cpanfile
my $cpanfile = <<'END';
requires 'Plack', '1.0047';
requires 'DBI', '1.643';
requires 'DBD::SQLite', '1.66';
requires 'Mojo', '9.0';
on 'test' => sub {
requires 'Test::More', '1.302183';
requires 'Test::Exception';
requires 'Test::MockObject';
};
on 'develop' => sub {
requires 'Perl::Tidy';
requires 'Perl::Critic';
requires 'Devel::NYTProf';
};
feature 'postgres', 'PostgreSQL support' => sub {
requires 'DBD::Pg';
};
END
# Install dependencies
system('carton install');
# Run with local dependencies
system('carton exec perl script.pl');
# Deploy with dependencies
system('carton bundle');
```
## Essential CPAN Modules
### Web Development
```perl
# Mojolicious - Real-time web framework
use Mojolicious::Lite;
get '/' => sub {
my $c = shift;
$c->render(text => 'Hello World!');
};
app->start;
# Plack - PSGI toolkit
use Plack::Builder;
my $app = sub {
my $env = shift;
return [200, ['Content-Type' => 'text/plain'], ['Hello World']];
};
builder {
enable 'Debug';
enable 'Session';
$app;
};
# Dancer2 - Lightweight web framework
use Dancer2;
get '/' => sub {
return 'Hello World';
};
dance;
```
### Database Access
```perl
# DBIx::Class - ORM
package MyApp::Schema::Result::User;
use base 'DBIx::Class::Core';
__PACKAGE__->table('users');
__PACKAGE__->add_columns(qw/id name email/);
__PACKAGE__->set_primary_key('id');
__PACKAGE__->has_many(posts => 'MyApp::Schema::Result::Post', 'user_id');
# Using DBIx::Class
my $schema = MyApp::Schema->connect($dsn, $user, $pass);
my $users = $schema->resultset('User')->search({
created => { '>' => '2024-01-01' }
});
# SQL::Abstract - Generate SQL from Perl data structures
use SQL::Abstract;
my $sql = SQL::Abstract->new;
my ($stmt, @bind) = $sql->select(
'users',
['name', 'email'],
{
status => 'active',
age => { '>' => 18 },
}
);
```
### Testing
```perl
# Test::More - Standard testing
use Test::More tests => 3;
ok(1 + 1 == 2, 'Math works');
is(uc('hello'), 'HELLO', 'uc works');
like('Hello World', qr/World/, 'Pattern matches');
# Test::Deep - Deep structure comparison
use Test::Deep;
cmp_deeply(
$got,
{
name => 'Alice',
tags => bag(qw/perl programming/),
meta => superhashof({
created => re(qr/^\d{4}-\d{2}-\d{2}$/),
}),
},
'Structure matches'
);
# Test::MockObject - Mock objects for testing
use Test::MockObject;
my $mock = Test::MockObject->new();
$mock->set_always('fetch_data', { id => 1, name => 'Test' });
$mock->set_true('save');
ok($mock->save(), 'Save returns true');
```
### Date and Time
```perl
# DateTime - Comprehensive date/time handling
use DateTime;
my $dt = DateTime->now(time_zone => 'America/New_York');
$dt->add(days => 7, hours => 3);
say $dt->strftime('%Y-%m-%d %H:%M:%S');
# Time::Piece - Core module for simple date/time
use Time::Piece;
my $t = localtime;
say $t->datetime; # ISO 8601
say $t->epoch;
say $t->day_of_week;
my $birthday = Time::Piece->strptime('1990-01-15', '%Y-%m-%d');
my $age = ($t - $birthday)->years;
```
## Creating Your Own CPAN Module
### Module Structure
```perl
# Directory structure:
# My-Module/
# ├── lib/
# │ └── My/
# │ └── Module.pm
# ├── t/
# │ ├── 00-load.t
# │ └── 01-basic.t
# ├── Makefile.PL
# ├── MANIFEST
# ├── README.md
# ├── Changes
# └── LICENSE
# lib/My/Module.pm
package My::Module;
use strict;
use warnings;
use feature 'signatures';
no warnings 'experimental::signatures';
our $VERSION = '0.01';
sub new($class, %args) {
return bless \%args, $class;
}
sub do_something($self) {
return "Doing something!";
}
1;
__END__
=head1 NAME
My::Module - A brief description
=head1 SYNOPSIS
use My::Module;
my $obj = My::Module->new();
$obj->do_something();
=head1 DESCRIPTION
This module provides...
=head1 METHODS
=head2 new
Constructor. Creates a new My::Module object.
=head2 do_something
Does something useful.
=head1 AUTHOR
Your Name <your.email@example.com>
=head1 LICENSE
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
=cut
```
### Build Files
```perl
# Makefile.PL (ExtUtils::MakeMaker)
use ExtUtils::MakeMaker;
WriteMakefile(
NAME => 'My::Module',
AUTHOR => 'Your Name <email@example.com>',
VERSION_FROM => 'lib/My/Module.pm',
ABSTRACT_FROM => 'lib/My/Module.pm',
LICENSE => 'perl_5',
MIN_PERL_VERSION => '5.016',
CONFIGURE_REQUIRES => {
'ExtUtils::MakeMaker' => '0',
},
TEST_REQUIRES => {
'Test::More' => '0',
},
PREREQ_PM => {
'strict' => 0,
'warnings' => 0,
},
META_MERGE => {
'meta-spec' => { version => 2 },
resources => {
repository => {
type => 'git',
url => 'https://github.com/user/My-Module.git',
web => 'https://github.com/user/My-Module',
},
},
},
dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', },
clean => { FILES => 'My-Module-*' },
);
# Or Build.PL (Module::Build)
use Module::Build;
my $builder = Module::Build->new(
module_name => 'My::Module',
license => 'perl',
dist_author => 'Your Name <email@example.com>',
dist_version_from => 'lib/My/Module.pm',
requires => {
'perl' => '5.016',
'strict' => 0,
'warnings' => 0,
},
test_requires => {
'Test::More' => 0,
},
add_to_cleanup => [ 'My-Module-*' ],
meta_merge => {
resources => {
repository => 'https://github.com/user/My-Module',
},
},
);
$builder->create_build_script();
```
### Testing Your Module
```perl
# t/00-load.t
use Test::More tests => 1;
BEGIN {
use_ok('My::Module') || print "Bail out!\n";
}
diag("Testing My::Module $My::Module::VERSION, Perl $], $^X");
# t/01-basic.t
use Test::More;
use My::Module;
my $obj = My::Module->new(name => 'test');
isa_ok($obj, 'My::Module');
is($obj->do_something(), 'Doing something!', 'do_something works');
done_testing();
# t/02-advanced.t
use Test::More;
use Test::Exception;
use Test::Warnings;
use My::Module;
# Test error handling
throws_ok {
My::Module->new()->invalid_method();
} qr/Can't locate object method/, 'Dies on invalid method';
# Test edge cases
my $obj = My::Module->new();
is($obj->process(undef), '', 'Handles undef gracefully');
is($obj->process(''), '', 'Handles empty string');
# Test with mock data
my $mock_data = {
users => [
{ id => 1, name => 'Alice' },
{ id => 2, name => 'Bob' },
],
};
my $result = $obj->process_users($mock_data);
is(scalar @{$result->{processed}}, 2, 'Processes all users');
done_testing();
```
## Publishing to CPAN
### Getting a PAUSE Account
```perl
# 1. Register at https://pause.perl.org
# 2. Wait for manual approval (usually 1-2 days)
# 3. You'll receive a PAUSE ID (e.g., YOURNAME)
# Configure cpan-upload
# Install: cpanm CPAN::Uploader
# Create ~/.pause file:
# user YOURNAME
# password your-pause-password
# Or use environment variables:
$ENV{PAUSE_USER} = 'YOURNAME';
$ENV{PAUSE_PASSWORD} = 'your-password';
```
### Preparing for Release
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Pre-release checklist
sub pre_release_checks {
my $module = shift;
my @checks = (
'All tests pass' => sub { system('prove -l t/') == 0 },
'POD is valid' => sub { system('podchecker lib/') == 0 },
'MANIFEST is current' => sub { check_manifest() },
'Version updated' => sub { check_version() },
'Changes file updated' => sub { check_changes() },
'No debug code' => sub { !grep_debug_code() },
'Dependencies correct' => sub { check_dependencies() },
);
for (my $i = 0; $i < @checks; $i += 2) {
my ($desc, $check) = @checks[$i, $i+1];
if ($check->()) {
say "✓ $desc";
} else {
die "✗ $desc - Fix before release!";
}
}
}
sub check_manifest {
system('perl Makefile.PL');
system('make manifest');
my $diff = `diff MANIFEST MANIFEST.bak 2>&1`;
return $diff eq '';
}
sub check_version {
open my $fh, '<', 'lib/My/Module.pm' or die $!;
while (<$fh>) {
return 1 if /our \$VERSION = '[\d.]+'/;
}
return 0;
}
sub check_changes {
open my $fh, '<', 'Changes' or die $!;
my $first_line = <$fh>;
return $first_line =~ /^[\d.]+\s+\d{4}-\d{2}-\d{2}/;
}
sub grep_debug_code {
my $found = 0;
system('grep -r "use Data::Dumper" lib/ --exclude-dir=.git') == 0 and $found++;
system('grep -r "print STDERR" lib/ --exclude-dir=.git') == 0 and $found++;
system('grep -r "# TODO" lib/ --exclude-dir=.git') == 0 and $found++;
return $found;
}
```
### Release Process
```perl
# Build distribution
system('perl Makefile.PL');
system('make');
system('make test');
system('make dist');
# This creates My-Module-0.01.tar.gz
# Upload to CPAN
system('cpan-upload My-Module-0.01.tar.gz');
# Or manually:
# 1. Log into https://pause.perl.org
# 2. Upload file through web interface
# 3. Wait for indexing (usually 2-3 hours)
# Monitor your module
# - https://metacpan.org/author/YOURNAME
# - http://www.cpantesters.org/distro/M/My-Module.html
# - https://rt.cpan.org/Dist/Display.html?Name=My-Module
```
## Module Best Practices
### Documentation
```perl
package My::Module;
# POD documentation should include:
=head1 NAME
My::Module - One-line description of module's purpose
=head1 VERSION
Version 0.01
=head1 SYNOPSIS
Quick summary of what the module does.
use My::Module;
my $foo = My::Module->new();
my $result = $foo->do_something();
=head1 DESCRIPTION
A full description of the module and its features.
=head1 METHODS
=head2 new
my $obj = My::Module->new(%options);
Constructor. Accepts the following options:
=over 4
=item * option1 - Description of option1
=item * option2 - Description of option2
=back
=head2 method_name
my $result = $obj->method_name($param1, $param2);
Description of what this method does.
=head1 EXAMPLES
# Example 1: Basic usage
my $obj = My::Module->new();
$obj->process($data);
# Example 2: Advanced usage
my $obj = My::Module->new(
verbose => 1,
timeout => 30,
);
=head1 DIAGNOSTICS
=over 4
=item C<< Error message here >>
Explanation of error message
=item C<< Another error message >>
Explanation of another error
=back
=head1 CONFIGURATION AND ENVIRONMENT
My::Module requires no configuration files or environment variables.
=head1 DEPENDENCIES
List of modules this module depends on.
=head1 INCOMPATIBILITIES
None reported.
=head1 BUGS AND LIMITATIONS
Please report any bugs or feature requests to...
=head1 SEE ALSO
Links to related modules or resources.
=head1 AUTHOR
Your Name C<< <email@example.com> >>
=head1 LICENSE AND COPYRIGHT
Copyright (C) 2024 by Your Name
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
=cut
```
### Versioning
```perl
# Semantic versioning (recommended)
our $VERSION = '1.2.3'; # MAJOR.MINOR.PATCH
# MAJOR - Incompatible API changes
# MINOR - Add functionality (backwards-compatible)
# PATCH - Bug fixes (backwards-compatible)
# Version declaration
package My::Module 0.01; # Perl 5.12+
# Or traditional
package My::Module;
our $VERSION = '0.01';
# Version checks
use My::Module 1.00; # Require at least version 1.00
# In code
if ($My::Module::VERSION < 2.00) {
# Use old API
} else {
# Use new API
}
```
## Real-World Module: API Client
```perl
package WebService::Example;
use Modern::Perl '2023';
use Moo;
use LWP::UserAgent;
use JSON::XS;
use URI::Escape;
use Carp;
our $VERSION = '0.01';
has 'api_key' => (
is => 'ro',
required => 1,
);
has 'base_url' => (
is => 'ro',
default => 'https://api.example.com/v1',
);
has 'timeout' => (
is => 'ro',
default => 30,
);
has 'ua' => (
is => 'lazy',
builder => '_build_ua',
);
sub _build_ua {
my $self = shift;
my $ua = LWP::UserAgent->new(
timeout => $self->timeout,
agent => "WebService::Example/$VERSION",
);
$ua->default_header('X-API-Key' => $self->api_key);
$ua->default_header('Accept' => 'application/json');
return $ua;
}
sub get_user {
my ($self, $user_id) = @_;
croak "User ID required" unless defined $user_id;
return $self->_request('GET', "/users/$user_id");
}
sub search_users {
my ($self, $params) = @_;
return $self->_request('GET', '/users', $params);
}
sub create_user {
my ($self, $data) = @_;
croak "User data required" unless $data && ref $data eq 'HASH';
return $self->_request('POST', '/users', $data);
}
sub update_user {
my ($self, $user_id, $data) = @_;
croak "User ID required" unless defined $user_id;
croak "Update data required" unless $data && ref $data eq 'HASH';
return $self->_request('PUT', "/users/$user_id", $data);
}
sub delete_user {
my ($self, $user_id) = @_;
croak "User ID required" unless defined $user_id;
return $self->_request('DELETE', "/users/$user_id");
}
sub _request {
my ($self, $method, $path, $data) = @_;
my $url = $self->base_url . $path;
# Add query params for GET
if ($method eq 'GET' && $data) {
my @params;
for my $key (keys %$data) {
push @params, uri_escape($key) . '=' . uri_escape($data->{$key});
}
$url .= '?' . join('&', @params) if @params;
}
my $req = HTTP::Request->new($method => $url);
# Add JSON body for POST/PUT
if ($method =~ /^(POST|PUT|PATCH)$/ && $data) {
$req->header('Content-Type' => 'application/json');
$req->content(encode_json($data));
}
my $response = $self->ua->request($req);
if ($response->is_success) {
my $content = $response->decoded_content;
return $content ? decode_json($content) : {};
} else {
my $error = $response->status_line;
if ($response->content) {
eval {
my $json = decode_json($response->content);
$error = $json->{error} || $json->{message} || $error;
};
}
croak "API request failed: $error";
}
}
1;
__END__
=head1 NAME
WebService::Example - Perl client for Example.com API
=head1 SYNOPSIS
use WebService::Example;
my $api = WebService::Example->new(
api_key => 'your-api-key',
);
# Get a user
my $user = $api->get_user(123);
# Search users
my $results = $api->search_users({
name => 'Alice',
status => 'active',
});
# Create a user
my $new_user = $api->create_user({
name => 'Bob',
email => 'bob@example.com',
});
=head1 DESCRIPTION
This module provides a Perl interface to the Example.com REST API.
=head1 METHODS
[Full documentation here...]
=head1 AUTHOR
Your Name <your.email@example.com>
=head1 COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Your Name.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
=cut
```
## CPAN Module Evaluation Checklist
When choosing a CPAN module:
1. **Check last update date** - Recently maintained?
2. **Review test results** - Pass on your platform?
3. **Examine dependencies** - Reasonable dependency tree?
4. **Read bug reports** - Active issues?
5. **Look at documentation** - Well documented?
6. **Check ratings/reviews** - What do others say?
7. **Review source code** - Clean and understandable?
8. **Test coverage** - Good test suite?
9. **License compatibility** - Fits your needs?
10. **Author reputation** - Other quality modules?
## Conclusion
CPAN is more than a module repository—it's the collective knowledge and effort of thousands of Perl programmers over decades. By understanding how to navigate, evaluate, use, and contribute to CPAN, you join a community that values sharing, testing, and continuous improvement.
Remember: before writing new code, check CPAN. Someone may have already solved your problem, tested it across dozens of platforms, and documented it thoroughly. And when you solve a new problem, consider sharing it back. That's how CPAN grew from a handful of modules to the treasure trove it is today.
---
*Next: Object-Oriented Perl. We'll explore Perl's flexible approach to OOP, from basic blessed references to modern frameworks like Moo and Moose.*

929
15-object-oriented-perl.md Normal file
View File

@@ -0,0 +1,929 @@
# Chapter 15: Object-Oriented Perl
> "Perl's OO is like duct tape - it might not be pretty, but it holds everything together and gets the job done." - Anonymous
Perl's approach to object-oriented programming is unique: it gives you the tools to build any OO system you want, rather than forcing one paradigm. From basic blessed references to sophisticated metaprogramming with Moose, this chapter covers the full spectrum of OOP in Perl, with a focus on practical, maintainable code.
## Classic Perl OOP
### Blessed References
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Basic class definition
package Point {
# Constructor
sub new {
my ($class, $x, $y) = @_;
my $self = {
x => $x // 0,
y => $y // 0,
};
return bless $self, $class;
}
# Accessor methods
sub x {
my ($self, $value) = @_;
$self->{x} = $value if defined $value;
return $self->{x};
}
sub y {
my ($self, $value) = @_;
$self->{y} = $value if defined $value;
return $self->{y};
}
# Methods
sub move {
my ($self, $dx, $dy) = @_;
$self->{x} += $dx;
$self->{y} += $dy;
return $self;
}
sub distance_to {
my ($self, $other) = @_;
my $dx = $self->{x} - $other->{x};
my $dy = $self->{y} - $other->{y};
return sqrt($dx * $dx + $dy * $dy);
}
# String representation
sub to_string {
my $self = shift;
return "Point($self->{x}, $self->{y})";
}
# Destructor
sub DESTROY {
my $self = shift;
# Cleanup code here
}
}
# Usage
my $p1 = Point->new(3, 4);
my $p2 = Point->new(0, 0);
say $p1->to_string(); # Point(3, 4)
say "Distance: " . $p1->distance_to($p2); # Distance: 5
$p1->move(1, 1);
say $p1->to_string(); # Point(4, 5)
```
### Inheritance
```perl
package Shape {
sub new {
my ($class, %args) = @_;
return bless \%args, $class;
}
sub area {
die "Subclass must implement area()";
}
sub perimeter {
die "Subclass must implement perimeter()";
}
sub describe {
my $self = shift;
return ref($self) . " with area " . $self->area();
}
}
package Rectangle {
use parent 'Shape'; # Inheritance
sub new {
my ($class, $width, $height) = @_;
my $self = $class->SUPER::new(
width => $width,
height => $height,
);
return $self;
}
sub area {
my $self = shift;
return $self->{width} * $self->{height};
}
sub perimeter {
my $self = shift;
return 2 * ($self->{width} + $self->{height});
}
}
package Circle {
use parent 'Shape';
use constant PI => 3.14159;
sub new {
my ($class, $radius) = @_;
my $self = $class->SUPER::new(radius => $radius);
return $self;
}
sub area {
my $self = shift;
return PI * $self->{radius} ** 2;
}
sub perimeter {
my $self = shift;
return 2 * PI * $self->{radius};
}
}
# Polymorphism
my @shapes = (
Rectangle->new(5, 3),
Circle->new(4),
Rectangle->new(2, 8),
);
for my $shape (@shapes) {
say $shape->describe();
}
```
### Encapsulation with Closures
```perl
package Counter {
sub new {
my ($class, $initial) = @_;
$initial //= 0;
# Private variable in closure
my $count = $initial;
# Return blessed coderef with methods
return bless {
increment => sub { ++$count },
decrement => sub { --$count },
get => sub { $count },
reset => sub { $count = $initial },
}, $class;
}
# Method dispatch
sub AUTOLOAD {
my $self = shift;
my $method = our $AUTOLOAD;
$method =~ s/.*:://;
return if $method eq 'DESTROY';
if (exists $self->{$method}) {
return $self->{$method}->(@_);
}
die "Unknown method: $method";
}
}
# Usage
my $counter = Counter->new(10);
say $counter->get(); # 10
$counter->increment();
$counter->increment();
say $counter->get(); # 12
$counter->reset();
say $counter->get(); # 10
```
## Modern OOP with Moo
### Moo Basics
```perl
package Person {
use Moo;
use Types::Standard qw(Str Int);
# Attributes with type constraints
has name => (
is => 'ro', # read-only
isa => Str,
required => 1,
);
has age => (
is => 'rw', # read-write
isa => Int,
default => 0,
trigger => sub {
my ($self, $new_age) = @_;
warn "Age cannot be negative!" if $new_age < 0;
},
);
has email => (
is => 'rw',
isa => Str,
predicate => 'has_email', # Creates has_email() method
clearer => 'clear_email', # Creates clear_email() method
);
# Lazy attribute with builder
has id => (
is => 'ro',
lazy => 1,
builder => '_build_id',
);
sub _build_id {
my $self = shift;
return sprintf("%s_%d_%d", $self->name, $self->age, time());
}
# Method modifiers
before 'age' => sub {
my ($self, $new_age) = @_;
return unless defined $new_age;
say "Changing age from " . $self->age . " to $new_age";
};
# Regular methods
sub introduce {
my $self = shift;
my $intro = "Hi, I'm " . $self->name;
$intro .= ", I'm " . $self->age . " years old" if $self->age;
$intro .= ", email me at " . $self->email if $self->has_email;
return $intro;
}
}
# Usage
my $person = Person->new(
name => 'Alice',
age => 30,
email => 'alice@example.com',
);
say $person->introduce();
$person->age(31); # Triggers before modifier
say "Has email: " . ($person->has_email ? 'Yes' : 'No');
```
### Roles (Mixins)
```perl
# Define roles
package Role::Timestamped {
use Moo::Role;
use Time::Piece;
has created_at => (
is => 'ro',
default => sub { localtime->datetime },
);
has updated_at => (
is => 'rw',
default => sub { localtime->datetime },
);
before [qw(update save)] => sub {
my $self = shift;
$self->updated_at(localtime->datetime);
};
}
package Role::Serializable {
use Moo::Role;
use JSON::XS;
requires 'to_hash'; # Consumer must implement
sub to_json {
my $self = shift;
return encode_json($self->to_hash);
}
sub from_json {
my ($class, $json) = @_;
my $data = decode_json($json);
return $class->new(%$data);
}
}
# Use roles
package Document {
use Moo;
with 'Role::Timestamped', 'Role::Serializable';
has title => (is => 'rw', required => 1);
has content => (is => 'rw', default => '');
has author => (is => 'ro', required => 1);
sub to_hash {
my $self = shift;
return {
title => $self->title,
content => $self->content,
author => $self->author,
created_at => $self->created_at,
updated_at => $self->updated_at,
};
}
sub update {
my ($self, %changes) = @_;
$self->title($changes{title}) if exists $changes{title};
$self->content($changes{content}) if exists $changes{content};
}
}
# Usage
my $doc = Document->new(
title => 'My Document',
author => 'Bob',
);
$doc->update(content => 'Some content');
say $doc->to_json();
```
## Advanced OOP with Moose
### Moose Features
```perl
package Employee {
use Moose;
use Moose::Util::TypeConstraints;
# Custom types
subtype 'Email'
=> as 'Str'
=> where { /^[\w\.\-]+@[\w\.\-]+$/ }
=> message { "Invalid email address: $_" };
subtype 'PositiveInt'
=> as 'Int'
=> where { $_ > 0 };
# Attributes with advanced features
has name => (
is => 'rw',
isa => 'Str',
required => 1,
documentation => 'Employee full name',
);
has email => (
is => 'rw',
isa => 'Email',
required => 1,
);
has salary => (
is => 'rw',
isa => 'PositiveInt',
traits => ['Counter'],
handles => {
increase_salary => 'inc',
decrease_salary => 'dec',
},
);
has department => (
is => 'rw',
isa => 'Department',
weak_ref => 1, # Prevent circular references
);
has skills => (
is => 'rw',
isa => 'ArrayRef[Str]',
default => sub { [] },
traits => ['Array'],
handles => {
add_skill => 'push',
has_skills => 'count',
list_skills => 'elements',
find_skill => 'first',
},
);
# Method modifiers
around 'salary' => sub {
my ($orig, $self, $new_salary) = @_;
if (defined $new_salary) {
my $old_salary = $self->$orig();
$self->log_salary_change($old_salary, $new_salary);
}
return $self->$orig($new_salary);
};
# BUILD is called after object construction
sub BUILD {
my $self = shift;
$self->register_employee();
}
# DEMOLISH is called before object destruction
sub DEMOLISH {
my $self = shift;
$self->unregister_employee();
}
# Make immutable for performance
__PACKAGE__->meta->make_immutable;
}
package Department {
use Moose;
has name => (is => 'ro', isa => 'Str', required => 1);
has employees => (
is => 'rw',
isa => 'ArrayRef[Employee]',
default => sub { [] },
traits => ['Array'],
handles => {
add_employee => 'push',
employee_count => 'count',
all_employees => 'elements',
},
);
__PACKAGE__->meta->make_immutable;
}
```
### Metaprogramming
```perl
package DynamicClass {
use Moose;
use Moose::Meta::Attribute;
# Add attributes dynamically
sub add_attribute {
my ($self, $name, %options) = @_;
$self->meta->add_attribute(
$name => (
is => 'rw',
%options,
)
);
}
# Add methods dynamically
sub add_method {
my ($self, $name, $code) = @_;
$self->meta->add_method($name => $code);
}
# Introspection
sub describe {
my $self = shift;
my $meta = $self->meta;
say "Class: " . $meta->name;
say "Attributes:";
for my $attr ($meta->get_all_attributes) {
say " - " . $attr->name . " (" . $attr->type_constraint . ")";
}
say "Methods:";
for my $method ($meta->get_all_methods) {
say " - " . $method->name;
}
}
}
# Runtime class modification
my $obj = DynamicClass->new();
$obj->add_attribute('color', isa => 'Str', default => 'blue');
$obj->add_method('greet', sub {
my $self = shift;
return "Hello, I'm a " . $self->color . " object";
});
$obj->color('red');
say $obj->greet(); # Hello, I'm a red object
$obj->describe();
```
## Design Patterns in Perl
### Singleton Pattern
```perl
package Singleton {
use Moo;
my $instance;
sub instance {
my $class = shift;
$instance //= $class->new(@_);
return $instance;
}
sub new {
my $class = shift;
die "Use $class->instance() instead" if $instance;
return bless {@_}, $class;
}
}
# Usage
my $s1 = Singleton->instance();
my $s2 = Singleton->instance();
say $s1 == $s2 ? "Same instance" : "Different instances"; # Same instance
```
### Factory Pattern
```perl
package AnimalFactory {
use Module::Runtime qw(use_module);
my %animal_types = (
dog => 'Animal::Dog',
cat => 'Animal::Cat',
bird => 'Animal::Bird',
);
sub create_animal {
my ($class, $type, %args) = @_;
my $animal_class = $animal_types{$type}
or die "Unknown animal type: $type";
use_module($animal_class);
return $animal_class->new(%args);
}
}
package Animal {
use Moo;
has name => (is => 'ro', required => 1);
sub speak { die "Subclass must implement" }
}
package Animal::Dog {
use Moo;
extends 'Animal';
sub speak { "Woof!" }
}
package Animal::Cat {
use Moo;
extends 'Animal';
sub speak { "Meow!" }
}
# Usage
my $dog = AnimalFactory->create_animal('dog', name => 'Rex');
say $dog->speak(); # Woof!
```
### Observer Pattern
```perl
package Observable {
use Moo::Role;
has observers => (
is => 'ro',
default => sub { [] },
);
sub attach {
my ($self, $observer) = @_;
push @{$self->observers}, $observer;
}
sub detach {
my ($self, $observer) = @_;
@{$self->observers} = grep { $_ != $observer } @{$self->observers};
}
sub notify {
my ($self, $event, @args) = @_;
$_->update($self, $event, @args) for @{$self->observers};
}
}
package StockPrice {
use Moo;
with 'Observable';
has symbol => (is => 'ro', required => 1);
has price => (is => 'rw', trigger => sub {
my ($self, $new_price) = @_;
$self->notify('price_changed', $new_price);
});
}
package StockDisplay {
use Moo;
sub update {
my ($self, $subject, $event, @args) = @_;
if ($event eq 'price_changed') {
say "Stock " . $subject->symbol . " changed to $args[0]";
}
}
}
# Usage
my $stock = StockPrice->new(symbol => 'AAPL');
my $display = StockDisplay->new();
$stock->attach($display);
$stock->price(150.25); # Stock AAPL changed to 150.25
```
## Real-World OOP Example: Task Queue System
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
package TaskQueue::Task {
use Moo;
use Types::Standard qw(Str Enum Int CodeRef HashRef);
has id => (
is => 'ro',
default => sub {
state $counter = 0;
return ++$counter;
},
);
has name => (
is => 'ro',
isa => Str,
required => 1,
);
has status => (
is => 'rw',
isa => Enum[qw(pending running completed failed)],
default => 'pending',
);
has priority => (
is => 'ro',
isa => Int,
default => 0,
);
has handler => (
is => 'ro',
isa => CodeRef,
required => 1,
);
has data => (
is => 'ro',
isa => HashRef,
default => sub { {} },
);
has result => (
is => 'rw',
);
has error => (
is => 'rw',
isa => Str,
);
sub execute {
my $self = shift;
$self->status('running');
eval {
$self->result($self->handler->($self->data));
$self->status('completed');
};
if ($@) {
$self->error($@);
$self->status('failed');
return 0;
}
return 1;
}
}
package TaskQueue::Queue {
use Moo;
use Types::Standard qw(ArrayRef InstanceOf);
has tasks => (
is => 'ro',
isa => ArrayRef[InstanceOf['TaskQueue::Task']],
default => sub { [] },
);
has running => (
is => 'rw',
isa => ArrayRef[InstanceOf['TaskQueue::Task']],
default => sub { [] },
);
has max_concurrent => (
is => 'ro',
default => 1,
);
sub add_task {
my ($self, $task) = @_;
push @{$self->tasks}, $task;
$self->_sort_tasks();
}
sub _sort_tasks {
my $self = shift;
@{$self->tasks} = sort {
$b->priority <=> $a->priority
} @{$self->tasks};
}
sub get_next_task {
my $self = shift;
return shift @{$self->tasks};
}
sub process {
my $self = shift;
while (@{$self->tasks} || @{$self->running}) {
# Start new tasks if under limit
while (@{$self->tasks} &&
@{$self->running} < $self->max_concurrent) {
my $task = $self->get_next_task();
push @{$self->running}, $task;
say "Starting task: " . $task->name;
$task->execute();
}
# Clean up completed tasks
@{$self->running} = grep {
$_->status eq 'running'
} @{$self->running};
sleep(0.1) if @{$self->running};
}
}
sub stats {
my $self = shift;
my %stats = (
pending => 0,
running => 0,
completed => 0,
failed => 0,
);
for my $task (@{$self->tasks}, @{$self->running}) {
$stats{$task->status}++;
}
return \%stats;
}
}
package TaskQueue::Worker {
use Moo;
use threads;
use Thread::Queue;
has queue => (
is => 'ro',
isa => InstanceOf['Thread::Queue'],
default => sub { Thread::Queue->new() },
);
has workers => (
is => 'ro',
default => sub { [] },
);
has num_workers => (
is => 'ro',
default => 4,
);
sub start {
my $self = shift;
for (1..$self->num_workers) {
my $thread = threads->create(sub {
$self->worker_loop();
});
push @{$self->workers}, $thread;
}
}
sub worker_loop {
my $self = shift;
while (my $task = $self->queue->dequeue()) {
last if $task eq 'STOP';
$task->execute();
}
}
sub stop {
my $self = shift;
$self->queue->enqueue('STOP') for @{$self->workers};
$_->join() for @{$self->workers};
}
sub add_task {
my ($self, $task) = @_;
$self->queue->enqueue($task);
}
}
# Usage
package main;
my $queue = TaskQueue::Queue->new(max_concurrent => 3);
# Add tasks
$queue->add_task(TaskQueue::Task->new(
name => 'High Priority Task',
priority => 10,
handler => sub {
my $data = shift;
sleep(2);
return "Processed: $data->{input}";
},
data => { input => 'important data' },
));
$queue->add_task(TaskQueue::Task->new(
name => 'Low Priority Task',
priority => 1,
handler => sub {
sleep(1);
return "Done";
},
));
# Process queue
$queue->process();
# Check results
for my $task (@{$queue->tasks}) {
if ($task->status eq 'completed') {
say $task->name . ": " . $task->result;
} elsif ($task->status eq 'failed') {
say $task->name . " failed: " . $task->error;
}
}
```
## Best Practices
1. **Use modern OOP modules** - Moo for lightweight, Moose for features
2. **Favor composition over inheritance** - Use roles/traits
3. **Make attributes read-only when possible** - Immutability helps
4. **Use type constraints** - Catch errors early
5. **Document your classes** - Especially public interfaces
6. **Write tests for your classes** - OOP code needs good tests
7. **Use builders for complex objects** - Separate construction logic
8. **Keep classes focused** - Single Responsibility Principle
9. **Use lazy attributes wisely** - For expensive computations
10. **Make classes immutable in production** - Better performance
## Conclusion
Perl's object-oriented programming is incredibly flexible, from simple blessed references to sophisticated metaprogramming. While the syntax might seem unusual compared to other languages, the power and flexibility it provides is unmatched. Modern tools like Moo and Moose make OOP in Perl both powerful and pleasant.
Remember: choose the right level of abstraction for your problem. Not every script needs Moose, but when you're building large applications, its features can make your code more maintainable and robust.
---
*Next: Testing and debugging. We'll explore Perl's excellent testing culture and tools for finding and fixing bugs.*

801
16-testing-and-debugging.md Normal file
View File

@@ -0,0 +1,801 @@
# Chapter 16: Testing and Debugging
> "Testing can show the presence of bugs, but never their absence. Debugging is twice as hard as writing code. Therefore, if you write code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan (paraphrased)
Perl has one of the strongest testing cultures in programming. Every CPAN module comes with tests, and the Perl community takes testing seriously. This chapter covers everything from basic unit tests to advanced debugging techniques, helping you write code that works and stays working.
## Testing Fundamentals
### Basic Testing with Test::More
```perl
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More tests => 10; # Declare number of tests
# Basic assertions
ok(1 + 1 == 2, 'Basic math works');
ok(-f 'script.pl', 'Script file exists');
# Equality tests
is(2 + 2, 4, 'Addition works');
is(uc('hello'), 'HELLO', 'uc() works correctly');
# String comparison
isnt('foo', 'bar', 'foo is not bar');
# Pattern matching
like('Hello World', qr/World/, 'String contains World');
unlike('Hello World', qr/Goodbye/, 'String does not contain Goodbye');
# Comparison operators
cmp_ok(5, '>', 3, '5 is greater than 3');
cmp_ok('abc', 'lt', 'def', 'String comparison works');
# Complex data structures
is_deeply(
[1, 2, 3],
[1, 2, 3],
'Arrays match'
);
# Or use done_testing() for dynamic test counts
use Test::More;
ok(1, 'Test 1');
ok(1, 'Test 2');
done_testing(); # Automatically counts tests
```
### Testing Modules
```perl
# t/00-load.t
use Test::More tests => 3;
BEGIN {
use_ok('My::Module');
use_ok('My::Module::Helper');
use_ok('My::Module::Utils');
}
diag("Testing My::Module $My::Module::VERSION, Perl $], $^X");
# t/01-basic.t
use Test::More;
use My::Module;
# Test object creation
my $obj = My::Module->new(name => 'test');
isa_ok($obj, 'My::Module', 'Object created correctly');
# Test methods
can_ok($obj, qw(process validate save load));
# Test basic functionality
is($obj->name, 'test', 'Name accessor works');
$obj->process('input');
is($obj->status, 'processed', 'Processing sets correct status');
done_testing();
```
### Test-Driven Development (TDD)
```perl
# Write tests first
use Test::More;
use Calculator;
my $calc = Calculator->new();
isa_ok($calc, 'Calculator');
# Test addition
is($calc->add(2, 3), 5, '2 + 3 = 5');
is($calc->add(-1, 1), 0, '-1 + 1 = 0');
is($calc->add(0, 0), 0, '0 + 0 = 0');
# Test subtraction
is($calc->subtract(5, 3), 2, '5 - 3 = 2');
is($calc->subtract(0, 5), -5, '0 - 5 = -5');
# Test division
is($calc->divide(10, 2), 5, '10 / 2 = 5');
# Test error handling
eval { $calc->divide(10, 0) };
like($@, qr/Division by zero/, 'Division by zero throws error');
done_testing();
# Then implement the module
package Calculator;
use Carp;
sub new {
my $class = shift;
return bless {}, $class;
}
sub add {
my ($self, $a, $b) = @_;
return $a + $b;
}
sub subtract {
my ($self, $a, $b) = @_;
return $a - $b;
}
sub divide {
my ($self, $a, $b) = @_;
croak "Division by zero" if $b == 0;
return $a / $b;
}
1;
```
## Advanced Testing
### Test::Deep for Complex Structures
```perl
use Test::More;
use Test::Deep;
my $got = {
name => 'Alice',
age => 30,
skills => ['perl', 'python', 'ruby'],
address => {
street => '123 Main St',
city => 'New York',
zip => '10001',
},
metadata => {
created => '2024-01-15T10:30:00',
modified => '2024-01-16T14:45:00',
},
};
cmp_deeply($got, {
name => 'Alice',
age => code(sub { $_[0] >= 18 && $_[0] <= 100 }),
skills => bag('perl', 'python', 'ruby'), # Order doesn't matter
address => superhashof({
city => 'New York', # Must have city, other keys optional
}),
metadata => {
created => re(qr/^\d{4}-\d{2}-\d{2}/),
modified => ignore(), # Don't care about this value
},
}, 'Structure matches expectations');
# Array testing
my @data = (1, 2, 3, { id => 'abc123' }, undef);
cmp_deeply(\@data, [
1, 2, 3,
{ id => re(qr/^[a-z]+\d+$/) },
undef,
], 'Array with mixed types');
# Set comparison
cmp_deeply(
[1, 2, 3, 3, 2, 1],
set(1, 2, 3),
'Contains exactly these values (duplicates ignored)'
);
```
### Mocking and Test Doubles
```perl
use Test::More;
use Test::MockModule;
use Test::MockObject;
# Mock a module
my $mock = Test::MockModule->new('LWP::UserAgent');
$mock->mock('get', sub {
my ($self, $url) = @_;
# Return fake response
my $response = Test::MockObject->new();
$response->set_true('is_success');
$response->set_always('content', '{"status":"ok"}');
return $response;
});
# Now LWP::UserAgent->get returns our mock
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $resp = $ua->get('http://example.com');
ok($resp->is_success, 'Mock response is successful');
is($resp->content, '{"status":"ok"}', 'Mock content correct');
# Mock database handle
my $mock_dbh = Test::MockObject->new();
$mock_dbh->set_always('prepare', $mock_dbh);
$mock_dbh->set_always('execute', 1);
$mock_dbh->set_series('fetchrow_array',
['Alice', 30],
['Bob', 25],
undef, # End of results
);
# Test code that uses database
my $sth = $mock_dbh->prepare("SELECT name, age FROM users");
$sth->execute();
my @users;
while (my @row = $sth->fetchrow_array()) {
push @users, { name => $row[0], age => $row[1] };
}
is(scalar @users, 2, 'Got two users');
is($users[0]{name}, 'Alice', 'First user is Alice');
```
### Testing Exceptions
```perl
use Test::More;
use Test::Exception;
use Test::Fatal;
# Test::Exception
throws_ok {
die "Something went wrong";
} qr/went wrong/, 'Dies with expected message';
dies_ok {
risky_operation();
} 'risky_operation dies';
lives_ok {
safe_operation();
} 'safe_operation lives';
# Test::Fatal (more modern)
use Test::Fatal;
my $exception = exception {
die "Oops!";
};
like($exception, qr/Oops/, 'Got expected exception');
# Test specific exception classes
{
package MyException;
use overload '""' => sub { shift->{message} };
sub new {
my ($class, $message) = @_;
return bless { message => $message }, $class;
}
}
my $error = exception {
die MyException->new("Custom error");
};
isa_ok($error, 'MyException');
is($error->{message}, 'Custom error', 'Exception has correct message');
```
## Debugging Techniques
### Basic Debugging
```perl
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
# Print debugging
my $data = { foo => 'bar', baz => [1, 2, 3] };
print "Debug: ", Dumper($data);
# Conditional debugging
my $DEBUG = $ENV{DEBUG} || 0;
print "Processing data...\n" if $DEBUG;
# Better: use a debug function
sub debug {
return unless $ENV{DEBUG};
my $msg = shift;
my ($package, $filename, $line) = caller;
print STDERR "[$package:$line] $msg\n";
}
debug("Starting process");
# Smart::Comments for easy debugging
use Smart::Comments;
### $data
my $result = complex_calculation();
### Result: $result
# Assertions
### check: $result > 0
### assert: defined $data->{id}
# Progress bars
for my $i (0..100) { ### Processing [=== ] % done
# Do work
sleep(0.01);
}
```
### The Perl Debugger
```perl
# Run with debugger
# perl -d script.pl
# Common debugger commands:
# h - help
# l - list code
# n - next line (step over)
# s - step into
# c - continue
# b 42 - set breakpoint at line 42
# b subname - set breakpoint at subroutine
# p $var - print variable
# x $ref - dump reference
# w - where (stack trace)
# q - quit
# Add breakpoint in code
$DB::single = 1; # Debugger stops here
# Conditional breakpoint
$DB::single = 1 if $count > 100;
# Interactive debugging session example
sub process_data {
my ($data) = @_;
$DB::single = 1; # Stop here in debugger
for my $item (@$data) {
my $result = transform($item);
validate($result);
}
}
```
### Devel::NYTProf for Profiling
```perl
# Profile your code
# perl -d:NYTProf script.pl
# nytprofhtml
# open nytprof/index.html
# Or programmatically
use Devel::NYTProf;
DB::enable_profile();
# Code to profile
expensive_operation();
DB::disable_profile();
DB::finish_profile();
# Analyze results
system('nytprofhtml');
system('open nytprof/index.html');
# Example of code that needs profiling
sub slow_function {
my @results;
for my $i (1..10000) {
# Inefficient string concatenation
my $str = "";
for my $j (1..100) {
$str .= "x"; # Bad: creates new string each time
}
push @results, $str;
}
return \@results;
}
# Better version after profiling
sub fast_function {
my @results;
for my $i (1..10000) {
# Efficient: use x operator
push @results, "x" x 100;
}
return \@results;
}
```
## Test Organization
### Test Suite Structure
```perl
# Project structure
# MyApp/
# ├── lib/
# │ └── MyApp/
# │ ├── Model.pm
# │ ├── View.pm
# │ └── Controller.pm
# ├── t/
# │ ├── 00-load.t
# │ ├── 01-unit/
# │ │ ├── model.t
# │ │ ├── view.t
# │ │ └── controller.t
# │ ├── 02-integration/
# │ │ └── api.t
# │ ├── 03-acceptance/
# │ │ └── user_stories.t
# │ └── lib/
# │ └── Test/
# │ └── MyApp.pm
# t/lib/Test/MyApp.pm - Shared test utilities
package Test::MyApp;
use strict;
use warnings;
use Test::More;
use File::Temp;
use base 'Exporter';
our @EXPORT = qw(
create_test_db
create_test_user
cleanup_test_data
);
sub create_test_db {
my $tmpdir = File::Temp->newdir();
my $dbfile = "$tmpdir/test.db";
# Create and populate test database
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");
$dbh->do($_) for read_schema();
return $dbh;
}
sub create_test_user {
my %args = @_;
return {
id => $args{id} // 1,
name => $args{name} // 'Test User',
email => $args{email} // 'test@example.com',
};
}
sub cleanup_test_data {
# Clean up any test files, databases, etc.
}
1;
```
### Test Coverage
```perl
# Check test coverage
# cover -test
# Or manually:
# perl -MDevel::Cover script.pl
# cover
# Configure coverage
# .coverrc file
coverage_class = Devel::Cover
db = cover_db
ignore = t/
/usr/
select = lib/
outputdir = coverage_report
# Add coverage badge to README
# cpanm Devel::Cover::Report::Coveralls
# cover -report coveralls
# Example of improving coverage
package MyModule;
sub process {
my ($self, $input) = @_;
# Branch 1
if (!defined $input) {
return undef;
}
# Branch 2
if ($input eq '') {
return '';
}
# Branch 3
if ($input =~ /^\d+$/) {
return $input * 2;
}
# Branch 4
return uc($input);
}
# Test file ensuring 100% coverage
use Test::More;
use MyModule;
my $obj = MyModule->new();
# Test all branches
is($obj->process(undef), undef, 'Handles undef');
is($obj->process(''), '', 'Handles empty string');
is($obj->process('42'), 84, 'Handles numbers');
is($obj->process('hello'), 'HELLO', 'Handles strings');
done_testing();
```
## Continuous Integration Testing
### GitHub Actions Example
```yaml
# .github/workflows/test.yml
name: Test Suite
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
perl-version: ['5.32', '5.34', '5.36']
steps:
- uses: actions/checkout@v2
- name: Setup Perl
uses: shogo82148/actions-setup-perl@v1
with:
perl-version: ${{ matrix.perl-version }}
- name: Install dependencies
run: |
cpanm --installdeps --notest .
cpanm Test::More Test::Deep Test::Exception
- name: Run tests
run: prove -lv t/
- name: Check coverage
run: |
cpanm Devel::Cover
cover -test -report coveralls
```
### Test Automation Script
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use TAP::Harness;
use File::Find;
use Getopt::Long;
my ($verbose, $coverage, $parallel) = (0, 0, 1);
GetOptions(
'verbose|v' => \$verbose,
'coverage|c' => \$coverage,
'jobs|j=i' => \$parallel,
);
# Find all test files
my @tests;
find(sub {
push @tests, $File::Find::name if /\.t$/;
}, 't/');
@tests = sort @tests;
# Run with coverage if requested
if ($coverage) {
$ENV{HARNESS_PERL_SWITCHES} = '-MDevel::Cover';
}
# Run tests
my $harness = TAP::Harness->new({
verbosity => $verbose,
jobs => $parallel,
color => 1,
lib => ['lib'],
});
my $aggregator = $harness->runtests(@tests);
# Generate coverage report
if ($coverage) {
system('cover');
say "\nCoverage report generated in cover_db/";
say "Run 'cover -report html' for HTML report";
}
# Exit with appropriate status
exit($aggregator->all_passed() ? 0 : 1);
```
## Real-World Testing Example
```perl
#!/usr/bin/env perl
# t/web_scraper.t
use Modern::Perl '2023';
use Test::More;
use Test::MockModule;
use Test::Deep;
use Test::Exception;
# Module we're testing
use lib 'lib';
use WebScraper;
# Mock HTTP responses
my $mock_ua = Test::MockModule->new('LWP::UserAgent');
my @responses;
$mock_ua->mock('get', sub {
my ($self, $url) = @_;
my $response = shift @responses;
unless ($response) {
my $mock = Test::MockObject->new();
$mock->set_false('is_success');
$mock->set_always('status_line', '404 Not Found');
return $mock;
}
return $response;
});
# Test basic scraping
subtest 'Basic scraping' => sub {
my $html = <<'HTML';
<html>
<head><title>Test Page</title></head>
<body>
<h1>Welcome</h1>
<ul class="items">
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
</body>
</html>
HTML
my $mock_response = Test::MockObject->new();
$mock_response->set_true('is_success');
$mock_response->set_always('decoded_content', $html);
@responses = ($mock_response);
my $scraper = WebScraper->new(url => 'http://example.com');
my $data = $scraper->scrape();
is($data->{title}, 'Test Page', 'Extracted title');
is($data->{heading}, 'Welcome', 'Extracted heading');
cmp_deeply($data->{items}, ['Item 1', 'Item 2', 'Item 3'], 'Extracted items');
};
# Test error handling
subtest 'Error handling' => sub {
@responses = (); # No responses = 404
my $scraper = WebScraper->new(url => 'http://example.com');
throws_ok {
$scraper->scrape();
} qr/Failed to fetch/, 'Throws on HTTP error';
};
# Test retry logic
subtest 'Retry logic' => sub {
my $fail_response = Test::MockObject->new();
$fail_response->set_false('is_success');
$fail_response->set_always('status_line', '500 Server Error');
my $success_response = Test::MockObject->new();
$success_response->set_true('is_success');
$success_response->set_always('decoded_content', '<html><title>OK</title></html>');
@responses = ($fail_response, $fail_response, $success_response);
my $scraper = WebScraper->new(
url => 'http://example.com',
max_retries => 3,
);
my $data;
lives_ok {
$data = $scraper->scrape();
} 'Succeeds after retries';
is($data->{title}, 'OK', 'Got data after retries');
};
# Test rate limiting
subtest 'Rate limiting' => sub {
my $scraper = WebScraper->new(
url => 'http://example.com',
rate_limit => 2, # 2 requests per second
);
# Mock successful responses
for (1..5) {
my $mock = Test::MockObject->new();
$mock->set_true('is_success');
$mock->set_always('decoded_content', '<html></html>');
push @responses, $mock;
}
my $start = time();
for (1..5) {
$scraper->scrape();
}
my $elapsed = time() - $start;
cmp_ok($elapsed, '>=', 2, 'Rate limiting enforced');
};
done_testing();
```
## Best Practices
1. **Write tests first** - TDD helps design better APIs
2. **Test at multiple levels** - Unit, integration, acceptance
3. **Keep tests fast** - Mock external dependencies
4. **Test edge cases** - Empty input, undef, large data
5. **Use descriptive test names** - They document behavior
6. **Maintain test data** - Use fixtures and factories
7. **Run tests frequently** - Before every commit
8. **Measure coverage** - But don't obsess over 100%
9. **Test error conditions** - Not just happy paths
10. **Keep tests maintainable** - Refactor test code too
## Conclusion
Testing and debugging are essential skills for any Perl programmer. Perl's testing ecosystem is mature and comprehensive, providing tools for every testing need. Good tests give you confidence to refactor, upgrade, and maintain your code over time.
Remember: untested code is broken code. It might work today, but without tests, you can't be sure it will work tomorrow.
---
*Next: Performance and optimization. We'll explore how to make your Perl code run faster and use less memory.*

View File

@@ -0,0 +1,780 @@
# Chapter 17: Performance and Optimization
> "Premature optimization is the root of all evil, but that doesn't mean we should write slow code on purpose." - Modern Perl Wisdom
Performance matters when you're processing gigabytes of logs, handling thousands of requests, or running scripts hundreds of times per day. This chapter shows you how to identify bottlenecks, optimize critical paths, and make your Perl code fly. We'll cover profiling, benchmarking, and proven optimization techniques that actually make a difference.
## Profiling: Finding the Bottlenecks
### Using Devel::NYTProf
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
# Run with profiling:
# perl -d:NYTProf script.pl
# nytprofhtml
# open nytprof/index.html
# Example slow code to profile
sub process_data {
my ($data) = @_;
my @results;
for my $item (@$data) {
# Inefficient: regex compilation in loop
if ($item =~ /$ENV{PATTERN}/) {
push @results, transform($item);
}
}
return \@results;
}
sub transform {
my ($item) = @_;
# Inefficient: repeated string concatenation
my $result = "";
for my $char (split //, $item) {
$result .= process_char($char);
}
return $result;
}
sub process_char {
my ($char) = @_;
# Simulate expensive operation
return uc($char);
}
# Profile programmatically
use Devel::NYTProf::Run;
Devel::NYTProf::Run::enable_profile();
# Code to profile
my @data = map { "test_string_$_" } 1..10000;
my $results = process_data(\@data);
Devel::NYTProf::Run::disable_profile();
Devel::NYTProf::Run::finish_profile();
```
### Benchmarking with Benchmark
```perl
use Benchmark qw(cmpthese timethese);
use Benchmark ':hireswallclock'; # High-resolution timing
# Compare different approaches
cmpthese(-3, { # Run for 3 seconds
'method1' => sub {
my $str = "";
$str .= "x" for 1..1000;
},
'method2' => sub {
my $str = "x" x 1000;
},
'method3' => sub {
my @parts = ("x") x 1000;
my $str = join '', @parts;
},
});
# Output:
# Rate method1 method3 method2
# method1 523/s -- -48% -97%
# method3 1010/s 93% -- -94%
# method2 17544/s 3255% 1637% --
# Detailed timing
my $results = timethese(10000, {
'regex' => sub {
my $text = "Hello World";
$text =~ s/World/Perl/;
},
'substr' => sub {
my $text = "Hello World";
substr($text, 6, 5, 'Perl');
},
});
# Compare specific operations
use Benchmark qw(timeit timestr);
my $t = timeit(1000000, sub {
my @array = 1..100;
my $sum = 0;
$sum += $_ for @array;
});
say "Time: " . timestr($t);
```
## Common Performance Optimizations
### String Operations
```perl
# SLOW: String concatenation in loop
sub slow_concat {
my $result = "";
for my $item (@_) {
$result .= process($item);
$result .= "\n";
}
return $result;
}
# FAST: Join array
sub fast_concat {
my @results;
for my $item (@_) {
push @results, process($item);
}
return join("\n", @results);
}
# FASTER: Map
sub faster_concat {
return join("\n", map { process($_) } @_);
}
# SLOW: Character-by-character processing
sub slow_reverse {
my ($str) = @_;
my $result = "";
for my $char (split //, $str) {
$result = $char . $result;
}
return $result;
}
# FAST: Built-in reverse
sub fast_reverse {
my ($str) = @_;
return scalar reverse $str;
}
```
### Regular Expression Optimization
```perl
# SLOW: Regex compilation in loop
sub slow_matching {
my ($data, $pattern) = @_;
my @matches;
for my $item (@$data) {
push @matches, $item if $item =~ /$pattern/; # Compiles each time
}
return \@matches;
}
# FAST: Pre-compiled regex
sub fast_matching {
my ($data, $pattern) = @_;
my $regex = qr/$pattern/; # Compile once
my @matches;
for my $item (@$data) {
push @matches, $item if $item =~ $regex;
}
return \@matches;
}
# FASTER: Grep
sub faster_matching {
my ($data, $pattern) = @_;
my $regex = qr/$pattern/;
return [ grep /$regex/, @$data ];
}
# Optimize complex regexes
my $email_regex = qr/
\A # Start of string
[\w\.\-]+ # Local part
\@ # At sign
[\w\-]+ # Domain name
(?:\.[\w\-]+)+ # Domain extensions
\z # End of string
/x; # /x for readability
# Use atomic groups to prevent backtracking
my $no_backtrack = qr/
\A
(?> \d+ ) # Atomic group - no backtracking
\.
(?> \d+ )
\z
/x;
```
### Data Structure Optimization
```perl
# SLOW: Repeated hash lookups
sub slow_processing {
my ($data) = @_;
my %lookup;
for my $item (@$data) {
if (exists $lookup{$item->{id}}) {
$lookup{$item->{id}}{count}++;
$lookup{$item->{id}}{total} += $item->{value};
} else {
$lookup{$item->{id}} = {
count => 1,
total => $item->{value},
};
}
}
return \%lookup;
}
# FAST: Single lookup with reference
sub fast_processing {
my ($data) = @_;
my %lookup;
for my $item (@$data) {
my $entry = $lookup{$item->{id}} //= { count => 0, total => 0 };
$entry->{count}++;
$entry->{total} += $item->{value};
}
return \%lookup;
}
# Use arrays instead of hashes when possible
# SLOW: Hash for fixed set of keys
sub slow_record {
return {
id => $_[0],
name => $_[1],
age => $_[2],
email => $_[3],
};
}
# FAST: Array with constants for indices
use constant {
ID => 0,
NAME => 1,
AGE => 2,
EMAIL => 3,
};
sub fast_record {
return [@_];
}
# Access: $record->[NAME] instead of $record->{name}
```
## Memory Optimization
### Reducing Memory Usage
```perl
# MEMORY INTENSIVE: Slurping large files
sub memory_intensive {
open my $fh, '<', 'huge_file.txt' or die $!;
my @lines = <$fh>; # Loads entire file into memory
close $fh;
for my $line (@lines) {
process_line($line);
}
}
# MEMORY EFFICIENT: Line-by-line processing
sub memory_efficient {
open my $fh, '<', 'huge_file.txt' or die $!;
while (my $line = <$fh>) {
process_line($line);
}
close $fh;
}
# MEMORY INTENSIVE: Building large structures
sub build_large_hash {
my %data;
for my $i (1..1_000_000) {
$data{$i} = {
id => $i,
value => rand(),
timestamp => time(),
metadata => { foo => 'bar' },
};
}
return \%data;
}
# MEMORY EFFICIENT: Using packed data
sub build_packed_data {
my $packed = "";
for my $i (1..1_000_000) {
# Pack: unsigned int, double, unsigned int
$packed .= pack("NdN", $i, rand(), time());
}
return \$packed;
}
# Retrieve packed data
sub get_packed_record {
my ($packed_ref, $index) = @_;
my $offset = $index * 16; # Each record is 16 bytes
my ($id, $value, $timestamp) = unpack("NdN",
substr($$packed_ref, $offset, 16)
);
return { id => $id, value => $value, timestamp => $timestamp };
}
```
### Circular References and Memory Leaks
```perl
# MEMORY LEAK: Circular reference
sub create_leak {
my $node1 = { name => 'Node1' };
my $node2 = { name => 'Node2' };
$node1->{next} = $node2;
$node2->{prev} = $node1; # Circular reference!
# Memory not freed when variables go out of scope
}
# SOLUTION 1: Weak references
use Scalar::Util qw(weaken);
sub no_leak_weak {
my $node1 = { name => 'Node1' };
my $node2 = { name => 'Node2' };
$node1->{next} = $node2;
$node2->{prev} = $node1;
weaken($node2->{prev}); # Make it a weak reference
# Memory properly freed
}
# SOLUTION 2: Explicit cleanup
sub no_leak_cleanup {
my $node1 = { name => 'Node1' };
my $node2 = { name => 'Node2' };
$node1->{next} = $node2;
$node2->{prev} = $node1;
# Clean up before scope exit
delete $node2->{prev};
}
# Detect leaks
use Devel::Leak;
my $handle;
my $count = Devel::Leak::NoteSV($handle);
# Code that might leak
create_leak() for 1..100;
my $new_count = Devel::Leak::CheckSV($handle);
say "Leaked " . ($new_count - $count) . " SVs";
```
## Algorithm Optimization
### Choosing Better Algorithms
```perl
# O(n²) - SLOW for large datasets
sub find_duplicates_slow {
my ($array) = @_;
my @duplicates;
for (my $i = 0; $i < @$array; $i++) {
for (my $j = $i + 1; $j < @$array; $j++) {
if ($array->[$i] eq $array->[$j]) {
push @duplicates, $array->[$i];
last;
}
}
}
return \@duplicates;
}
# O(n) - FAST using hash
sub find_duplicates_fast {
my ($array) = @_;
my (%seen, @duplicates);
for my $item (@$array) {
push @duplicates, $item if $seen{$item}++;
}
return \@duplicates;
}
# Caching/Memoization
use Memoize;
sub expensive_calculation {
my ($n) = @_;
sleep(1); # Simulate expensive operation
return $n * $n;
}
memoize('expensive_calculation');
# First call: slow
my $result1 = expensive_calculation(42); # Takes 1 second
# Subsequent calls: instant
my $result2 = expensive_calculation(42); # Returns cached result
# Manual memoization with limits
{
my %cache;
my $max_cache_size = 100;
sub cached_function {
my ($key) = @_;
# Check cache
return $cache{$key} if exists $cache{$key};
# Compute result
my $result = expensive_computation($key);
# Limit cache size
if (keys %cache >= $max_cache_size) {
# Remove oldest entry (simple FIFO)
my $oldest = (sort keys %cache)[0];
delete $cache{$oldest};
}
# Cache and return
return $cache{$key} = $result;
}
}
```
### Lazy Evaluation
```perl
# EAGER: Computes everything upfront
sub eager_processing {
my ($data) = @_;
my @results = map { expensive_transform($_) } @$data;
return \@results;
}
# LAZY: Computes only when needed
sub lazy_processing {
my ($data) = @_;
return sub {
my ($index) = @_;
return unless $index < @$data;
state %cache;
return $cache{$index} //= expensive_transform($data->[$index]);
};
}
# Usage
my $lazy = lazy_processing(\@huge_dataset);
my $item5 = $lazy->(5); # Only computes the 5th item
# Iterator pattern for large datasets
sub make_iterator {
my ($data) = @_;
my $index = 0;
return sub {
return unless $index < @$data;
return $data->[$index++];
};
}
my $iter = make_iterator(\@data);
while (my $item = $iter->()) {
process($item);
last if $processed++ > 100; # Can stop early
}
```
## XS and Inline::C
### Using Inline::C for Performance
```perl
use Inline C => <<'END_C';
int fast_sum(SV* array_ref) {
AV* array = (AV*)SvRV(array_ref);
int len = av_len(array) + 1;
int sum = 0;
for (int i = 0; i < len; i++) {
SV** elem = av_fetch(array, i, 0);
if (elem && SvIOK(*elem)) {
sum += SvIV(*elem);
}
}
return sum;
}
void fast_sort(SV* array_ref) {
AV* array = (AV*)SvRV(array_ref);
int len = av_len(array) + 1;
// Simple bubble sort for demonstration
for (int i = 0; i < len - 1; i++) {
for (int j = 0; j < len - i - 1; j++) {
SV** elem1 = av_fetch(array, j, 0);
SV** elem2 = av_fetch(array, j + 1, 0);
if (elem1 && elem2 && SvIV(*elem1) > SvIV(*elem2)) {
SV* temp = *elem1;
av_store(array, j, *elem2);
av_store(array, j + 1, temp);
}
}
}
}
END_C
# Use the C functions
my @numbers = map { int(rand(1000)) } 1..10000;
my $sum = fast_sum(\@numbers);
fast_sort(\@numbers);
```
## Database Optimization
### Efficient Database Operations
```perl
# SLOW: Individual queries
sub slow_db_insert {
my ($dbh, $data) = @_;
my $sth = $dbh->prepare("INSERT INTO users (name, email) VALUES (?, ?)");
for my $user (@$data) {
$sth->execute($user->{name}, $user->{email});
}
}
# FAST: Bulk insert
sub fast_db_insert {
my ($dbh, $data) = @_;
$dbh->begin_work;
my $sth = $dbh->prepare("INSERT INTO users (name, email) VALUES (?, ?)");
for my $user (@$data) {
$sth->execute($user->{name}, $user->{email});
}
$dbh->commit;
}
# FASTER: Single multi-row insert
sub faster_db_insert {
my ($dbh, $data) = @_;
return unless @$data;
my @placeholders = map { "(?, ?)" } @$data;
my $sql = "INSERT INTO users (name, email) VALUES " .
join(", ", @placeholders);
my @values = map { $_->{name}, $_->{email} } @$data;
$dbh->do($sql, undef, @values);
}
# Use prepared statement caching
sub cached_query {
my ($dbh, $id) = @_;
# prepare_cached reuses statement handles
my $sth = $dbh->prepare_cached(
"SELECT * FROM users WHERE id = ?"
);
$sth->execute($id);
return $sth->fetchrow_hashref;
}
```
## Real-World Optimization Example
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use Benchmark qw(cmpthese);
# Log parser optimization
package LogParser;
# Version 1: Naive implementation
sub parse_v1 {
my ($file) = @_;
my @results;
open my $fh, '<', $file or die $!;
while (my $line = <$fh>) {
chomp $line;
# Inefficient: multiple regex matches
if ($line =~ /^\[(\d{4}-\d{2}-\d{2})/) {
my $date = $1;
if ($line =~ /ERROR/) {
my $entry = { date => $date, level => 'ERROR' };
if ($line =~ /user:\s*(\w+)/) {
$entry->{user} = $1;
}
if ($line =~ /message:\s*(.+)$/) {
$entry->{message} = $1;
}
push @results, $entry;
}
}
}
close $fh;
return \@results;
}
# Version 2: Optimized
sub parse_v2 {
my ($file) = @_;
my @results;
# Pre-compile regexes
my $line_regex = qr/^\[(\d{4}-\d{2}-\d{2})[^\]]*\]\s+(\w+)\s+user:\s*(\w+)\s+message:\s*(.+)$/;
my $error_check = qr/ERROR/;
open my $fh, '<', $file or die $!;
while (my $line = <$fh>) {
# Skip non-error lines early
next unless index($line, 'ERROR') >= 0;
# Single regex to capture everything
if ($line =~ $line_regex) {
push @results, {
date => $1,
level => $2,
user => $3,
message => $4,
} if $2 eq 'ERROR';
}
}
close $fh;
return \@results;
}
# Version 3: Memory-mapped for huge files
use File::Map qw(map_file);
sub parse_v3 {
my ($file) = @_;
my @results;
map_file my $content, $file;
# Process in chunks
my $regex = qr/^\[(\d{4}-\d{2}-\d{2})[^\]]*\]\s+ERROR\s+user:\s*(\w+)\s+message:\s*(.+)$/m;
while ($content =~ /$regex/g) {
push @results, {
date => $1,
level => 'ERROR',
user => $2,
message => $3,
};
}
return \@results;
}
# Benchmark
cmpthese(-3, {
'v1_naive' => sub { parse_v1('test.log') },
'v2_optimized' => sub { parse_v2('test.log') },
'v3_mmap' => sub { parse_v3('test.log') },
});
```
## Performance Best Practices
1. **Profile first** - Don't guess where the bottlenecks are
2. **Optimize algorithms before code** - O(n) beats optimized O(n²)
3. **Cache expensive operations** - But watch memory usage
4. **Pre-compile regexes** - Especially in loops
5. **Use built-in functions** - They're optimized C code
6. **Avoid premature optimization** - Clean code first, fast code second
7. **Benchmark alternatives** - Measure, don't assume
8. **Consider memory vs speed tradeoffs** - Sometimes caching hurts
9. **Use appropriate data structures** - Hashes for lookups, arrays for order
10. **Know when to use XS/C** - For truly performance-critical code
## Optimization Checklist
Before optimizing:
- [ ] Is the code correct?
- [ ] Is the code readable?
- [ ] Have you profiled it?
- [ ] Is this the actual bottleneck?
While optimizing:
- [ ] Benchmark before and after
- [ ] Test that behavior hasn't changed
- [ ] Document why the optimization is needed
- [ ] Consider maintenance cost
After optimizing:
- [ ] Is the improvement significant?
- [ ] Is the code still maintainable?
- [ ] Are the tests still passing?
- [ ] Have you updated documentation?
## Conclusion
Performance optimization in Perl is about understanding your bottlenecks and applying the right techniques. Most performance problems can be solved with better algorithms, caching, or pre-compilation. Only resort to XS or Inline::C when you've exhausted Perl-level optimizations.
Remember: the fastest code is code that doesn't run. Before optimizing, ask if you can eliminate work entirely. And always, always profile first.
---
*Next: Building command-line tools. We'll create professional CLI applications with argument parsing, help text, and user-friendly interfaces.*

View File

@@ -0,0 +1,958 @@
# Chapter 18: Building Command-Line Tools
> "A good command-line tool is like a trusty wrench - it does one thing well, plays nicely with others, and doesn't require a manual to use." - Unix Philosophy, Perl Edition
Command-line tools are Perl's bread and butter. From simple scripts to complex applications, Perl excels at creating tools that system administrators and developers rely on daily. This chapter shows you how to build professional CLI tools with proper argument parsing, help text, configuration, and all the features users expect from modern command-line applications.
## Command-Line Argument Parsing
### Basic with Getopt::Long
```perl
#!/usr/bin/env perl
use Modern::Perl '2023';
use Getopt::Long;
use Pod::Usage;
# Define options
my %options = (
verbose => 0,
output => '-',
format => 'json',
limit => 100,
);
# Parse command line
GetOptions(
'verbose|v+' => \$options{verbose}, # -v, -vv, -vvv
'quiet|q' => sub { $options{verbose} = -1 },
'output|o=s' => \$options{output}, # Requires string
'format|f=s' => \$options{format}, # --format json
'limit|l=i' => \$options{limit}, # Requires integer
'dry-run|n' => \$options{dry_run}, # Boolean flag
'help|h|?' => sub { pod2usage(1) },
'man' => sub { pod2usage(-verbose => 2) },
'version' => sub { say "Version 1.0"; exit },
) or pod2usage(2);
# Validate options
pod2usage("--format must be json, xml, or csv")
unless $options{format} =~ /^(json|xml|csv)$/;
pod2usage("--limit must be positive")
unless $options{limit} > 0;
# Process remaining arguments
my @files = @ARGV;
@files = ('-') unless @files; # Default to STDIN
# Main logic
for my $file (@files) {
process_file($file, \%options);
}
sub process_file {
my ($file, $opts) = @_;
say "Processing $file..." if $opts->{verbose} > 0;
# Implementation...
}
__END__
=head1 NAME
mytool - Process files with various formats
=head1 SYNOPSIS
mytool [options] [file ...]
Options:
-v, --verbose Increase verbosity (can be repeated)
-q, --quiet Suppress all output
-o, --output FILE Write output to FILE (default: STDOUT)
-f, --format FMT Output format: json, xml, csv (default: json)
-l, --limit N Process at most N records (default: 100)
-n, --dry-run Don't actually process, just show what would be done
-h, --help Brief help message
--man Full documentation
--version Show version
=head1 DESCRIPTION
This tool processes input files and produces formatted output.
=cut
```
### Advanced Argument Parsing with Getopt::Long::Descriptive
```perl
use Getopt::Long::Descriptive;
my ($opt, $usage) = describe_options(
'mytool %o <file>',
[ 'verbose|v+', "increase verbosity" ],
[ 'output|o=s', "output file", { default => '-' } ],
[ 'format|f=s', "output format", {
default => 'json',
callbacks => {
'valid format' => sub {
$_[0] =~ /^(json|xml|csv)$/
},
},
}],
[], # Blank line in help
[ 'mode' => hidden => {
one_of => [
[ 'add|a' => "add items" ],
[ 'remove|r' => "remove items" ],
[ 'list|l' => "list items" ],
],
}],
[],
[ 'help|h', "print usage message and exit", { shortcircuit => 1 } ],
);
print($usage->text), exit if $opt->help;
# Use options
say "Verbose level: " . $opt->verbose if $opt->verbose;
process_file($opt->output, $opt->format);
```
## Interactive Command-Line Tools
### Building Interactive Prompts
```perl
use Term::ANSIColor;
use Term::ReadLine;
use Term::Choose;
# Colored output
sub info {
say colored(['bright_blue'], "[INFO] $_[0]");
}
sub warning {
say colored(['yellow'], "[WARN] $_[0]");
}
sub error {
say colored(['red'], "[ERROR] $_[0]");
}
sub success {
say colored(['green'], "[✓] $_[0]");
}
# Interactive prompt
sub prompt {
my ($question, $default) = @_;
my $term = Term::ReadLine->new('MyApp');
my $prompt = $question;
$prompt .= " [$default]" if defined $default;
$prompt .= ": ";
my $answer = $term->readline($prompt);
$answer = $default if !defined $answer || $answer eq '';
return $answer;
}
# Yes/No confirmation
sub confirm {
my ($question, $default) = @_;
$default //= 'n';
my $answer = prompt("$question (y/n)", $default);
return $answer =~ /^y/i;
}
# Menu selection
sub menu {
my ($title, @options) = @_;
my $tc = Term::Choose->new({
prompt => $title,
layout => 2,
mouse => 1,
});
return $tc->choose([@options]);
}
# Password input
use Term::ReadKey;
sub get_password {
my ($prompt) = @_;
$prompt //= "Password";
print "$prompt: ";
ReadMode('noecho'); # Don't echo keystrokes
my $password = ReadLine(0);
ReadMode('restore');
print "\n";
chomp $password;
return $password;
}
# Progress bar
use Term::ProgressBar;
sub show_progress {
my ($total) = @_;
my $progress = Term::ProgressBar->new({
name => 'Processing',
count => $total,
ETA => 'linear',
});
for my $i (1..$total) {
# Do work
sleep(0.01);
$progress->update($i);
}
$progress->finish;
}
# Usage example
info("Starting application");
my $name = prompt("Enter your name", $ENV{USER});
success("Welcome, $name!");
if (confirm("Do you want to continue?", 'y')) {
my $choice = menu("Select an option:", qw(Add Remove List Quit));
if ($choice && $choice ne 'Quit') {
info("You selected: $choice");
my $password = get_password("Enter password");
show_progress(100);
}
}
warning("Exiting application");
```
## Configuration Management
### Supporting Configuration Files
```perl
package App::Config;
use Config::Any;
use File::HomeDir;
use File::Spec;
use Carp;
sub new {
my ($class, %args) = @_;
my $self = bless {
app_name => $args{app_name} || 'myapp',
config => {},
}, $class;
$self->load_config();
return $self;
}
sub load_config {
my $self = shift;
# Configuration file search paths
my @config_paths = (
File::Spec->catfile(File::HomeDir->my_home, "." . $self->{app_name}),
File::Spec->catfile(File::HomeDir->my_home, "." . $self->{app_name}, "config"),
File::Spec->catfile("/etc", $self->{app_name}, "config"),
"./" . $self->{app_name} . ".conf",
);
# Load first config found
for my $base_path (@config_paths) {
my @files = map { "$base_path.$_" } qw(yaml yml json ini conf);
for my $file (@files) {
next unless -r $file;
my $cfg = Config::Any->load_files({
files => [$file],
use_ext => 1,
});
if ($cfg && @$cfg) {
$self->{config} = $cfg->[0]{$file};
$self->{config_file} = $file;
last;
}
}
last if $self->{config_file};
}
# Merge with environment variables
$self->merge_env_config();
return $self->{config};
}
sub merge_env_config {
my $self = shift;
my $prefix = uc($self->{app_name}) . "_";
for my $key (keys %ENV) {
next unless $key =~ /^$prefix(.+)$/;
my $config_key = lc($1);
$config_key =~ s/_/./g; # MYAPP_DATABASE_HOST -> database.host
$self->set($config_key, $ENV{$key});
}
}
sub get {
my ($self, $key, $default) = @_;
my @parts = split /\./, $key;
my $value = $self->{config};
for my $part (@parts) {
return $default unless ref $value eq 'HASH';
$value = $value->{$part};
return $default unless defined $value;
}
return $value;
}
sub set {
my ($self, $key, $value) = @_;
my @parts = split /\./, $key;
my $last = pop @parts;
my $ref = $self->{config};
for my $part (@parts) {
$ref->{$part} //= {};
$ref = $ref->{$part};
}
$ref->{$last} = $value;
}
# Usage
package main;
my $config = App::Config->new(app_name => 'mytool');
my $db_host = $config->get('database.host', 'localhost');
my $db_port = $config->get('database.port', 5432);
```
## Professional CLI Application Structure
### Complete CLI Application
```perl
#!/usr/bin/env perl
package MyApp::CLI;
use Modern::Perl '2023';
use Moo;
use Types::Standard qw(Str Int Bool HashRef);
use Getopt::Long::Descriptive;
use Term::ANSIColor;
use Try::Tiny;
use Log::Any '$log';
use Log::Any::Adapter;
# Attributes
has 'config' => (is => 'ro', isa => HashRef, default => sub { {} });
has 'verbose' => (is => 'rw', isa => Int, default => 0);
has 'dry_run' => (is => 'rw', isa => Bool, default => 0);
has 'output' => (is => 'rw', isa => Str, default => '-');
# Main entry point
sub run {
my ($self, @args) = @_;
local @ARGV = @args if @args;
try {
$self->parse_options();
$self->setup_logging();
$self->validate_environment();
$self->execute();
} catch {
$self->error("Fatal error: $_");
exit 1;
};
}
sub parse_options {
my $self = shift;
my ($opt, $usage) = describe_options(
'%c %o <command> [<args>]',
[ 'verbose|v+', "increase verbosity" ],
[ 'quiet|q', "suppress output" ],
[ 'dry-run|n', "don't make changes" ],
[ 'config|c=s', "config file" ],
[ 'output|o=s', "output file", { default => '-' } ],
[],
[ 'help|h', "show help", { shortcircuit => 1 } ],
[ 'version', "show version", { shortcircuit => 1 } ],
);
if ($opt->help) {
print $usage->text;
exit 0;
}
if ($opt->version) {
say "MyApp version 1.0.0";
exit 0;
}
# Store options
$self->verbose($opt->verbose - ($opt->quiet ? 1 : 0));
$self->dry_run($opt->dry_run);
$self->output($opt->output);
# Load config if specified
if ($opt->config) {
$self->load_config($opt->config);
}
# Get command
$self->{command} = shift @ARGV || 'help';
$self->{args} = [@ARGV];
}
sub setup_logging {
my $self = shift;
my $level = $self->verbose > 1 ? 'debug' :
$self->verbose > 0 ? 'info' :
$self->verbose < 0 ? 'error' : 'warning';
Log::Any::Adapter->set('Stderr', log_level => $level);
}
sub execute {
my $self = shift;
my $command = $self->{command};
my $method = "cmd_$command";
if ($self->can($method)) {
$self->$method(@{$self->{args}});
} else {
$self->error("Unknown command: $command");
$self->cmd_help();
exit 1;
}
}
# Commands
sub cmd_help {
my $self = shift;
say <<'HELP';
Usage: myapp [options] <command> [<args>]
Commands:
list List all items
add Add a new item
remove Remove an item
status Show status
help Show this help
Options:
-v, --verbose Increase verbosity
-q, --quiet Suppress output
-n, --dry-run Don't make changes
-c, --config Config file
-o, --output Output file
Examples:
myapp list
myapp add --name "New Item"
myapp remove item-123
myapp -vv status
HELP
}
sub cmd_list {
my $self = shift;
$self->info("Listing items...");
my @items = $self->get_items();
if ($self->output eq '-') {
for my $item (@items) {
$self->print_item($item);
}
} else {
$self->write_output(\@items);
}
$self->success("Listed " . scalar(@items) . " items");
}
sub cmd_add {
my ($self, @args) = @_;
my ($opt, $usage) = describe_options(
'add %o',
[ 'name|n=s', "item name", { required => 1 } ],
[ 'description|d=s', "item description" ],
[ 'tags|t=s@', "tags" ],
);
my $item = {
name => $opt->name,
description => $opt->description || '',
tags => $opt->tags || [],
created => time(),
};
if ($self->dry_run) {
$self->info("Would add item: " . $item->{name});
} else {
$self->add_item($item);
$self->success("Added item: " . $item->{name});
}
}
sub cmd_remove {
my ($self, $id) = @_;
unless ($id) {
$self->error("Item ID required");
exit 1;
}
if ($self->dry_run) {
$self->info("Would remove item: $id");
} else {
$self->remove_item($id);
$self->success("Removed item: $id");
}
}
sub cmd_status {
my $self = shift;
my $status = $self->get_status();
say colored(['bold'], "System Status");
say "=" x 40;
for my $key (sort keys %$status) {
my $value = $status->{$key};
my $color = $value eq 'OK' ? 'green' :
$value eq 'WARNING' ? 'yellow' : 'red';
printf "%-20s %s\n", $key, colored([$color], $value);
}
}
# Utility methods
sub info {
my ($self, $msg) = @_;
return if $self->verbose < 0;
say colored(['cyan'], "[INFO] $msg");
}
sub warning {
my ($self, $msg) = @_;
say STDERR colored(['yellow'], "[WARN] $msg");
}
sub error {
my ($self, $msg) = @_;
say STDERR colored(['red'], "[ERROR] $msg");
}
sub success {
my ($self, $msg) = @_;
return if $self->verbose < 0;
say colored(['green'], "[✓] $msg");
}
sub debug {
my ($self, $msg) = @_;
return unless $self->verbose > 1;
say colored(['gray'], "[DEBUG] $msg");
}
# Stub methods for demonstration
sub get_items { return map { { id => $_, name => "Item $_" } } 1..5 }
sub add_item { }
sub remove_item { }
sub get_status { return { Database => 'OK', Cache => 'OK', Queue => 'WARNING' } }
sub validate_environment { }
sub load_config { }
sub print_item { say " - $_[1]{name}" }
sub write_output { }
# Script entry point
package main;
unless (caller) {
my $app = MyApp::CLI->new();
$app->run(@ARGV);
}
1;
```
## Creating Distributable Tools
### Using App::FatPacker
```perl
# Create a single-file executable
# Install: cpanm App::FatPacker
# Step 1: Trace dependencies
# fatpack trace script.pl
# Step 2: Pack dependencies
# fatpack packlists-for `cat fatpacker.trace` > packlists
# fatpack tree `cat packlists`
# Step 3: Create fatpacked script
# fatpack file script.pl > script_standalone.pl
# Or use this helper script:
#!/usr/bin/env perl
use strict;
use warnings;
my $script = shift or die "Usage: $0 <script.pl>\n";
my $output = $script;
$output =~ s/\.pl$/_standalone.pl/;
system("fatpack trace $script");
system("fatpack packlists-for `cat fatpacker.trace` > packlists");
system("fatpack tree `cat packlists`");
system("fatpack file $script > $output");
chmod 0755, $output;
unlink 'fatpacker.trace', 'packlists';
system("rm -rf fatlib");
print "Created standalone script: $output\n";
```
### Creating a CPAN Distribution
```perl
# Use Module::Starter
# cpanm Module::Starter
# Create new distribution
# module-starter --module=App::MyTool \
# --author="Your Name" \
# --email=you@example.com \
# --builder=Module::Build
# Directory structure:
# App-MyTool/
# ├── Build.PL
# ├── Changes
# ├── lib/
# │ └── App/
# │ └── MyTool.pm
# ├── script/
# │ └── mytool
# ├── t/
# │ ├── 00-load.t
# │ └── 01-basic.t
# ├── MANIFEST
# └── README
# Build.PL
use Module::Build;
my $builder = Module::Build->new(
module_name => 'App::MyTool',
license => 'perl',
dist_author => 'Your Name <you@example.com>',
dist_version_from => 'lib/App/MyTool.pm',
script_files => ['script/mytool'],
requires => {
'perl' => '5.016',
'Getopt::Long' => 0,
'Pod::Usage' => 0,
},
test_requires => {
'Test::More' => 0,
},
add_to_cleanup => [ 'App-MyTool-*' ],
meta_merge => {
resources => {
repository => 'https://github.com/you/App-MyTool',
},
},
);
$builder->create_build_script();
```
## Real-World CLI Tool Example
```perl
#!/usr/bin/env perl
# loganalyzer - Advanced log analysis tool
use Modern::Perl '2023';
use Getopt::Long::Descriptive;
use Time::Piece;
use JSON::XS;
use Text::Table;
# Parse command line
my ($opt, $usage) = describe_options(
'%c %o <logfile> ...',
[ 'pattern|p=s', "search pattern (regex)" ],
[ 'from|f=s', "start date (YYYY-MM-DD)" ],
[ 'to|t=s', "end date (YYYY-MM-DD)" ],
[ 'level|l=s@', "log levels to include" ],
[ 'exclude|x=s@', "patterns to exclude" ],
[],
[ 'output-format|o=s', "output format", {
default => 'table',
callbacks => {
'valid format' => sub { $_[0] =~ /^(table|json|csv)$/ }
}
}],
[ 'stats|s', "show statistics" ],
[ 'follow|F', "follow file (like tail -f)" ],
[],
[ 'help|h', "show help", { shortcircuit => 1 } ],
);
print($usage->text), exit if $opt->help;
die "No log files specified\n" unless @ARGV;
# Main processor
my $analyzer = LogAnalyzer->new(
pattern => $opt->pattern ? qr/$opt->{pattern}/ : undef,
from_date => $opt->from ? parse_date($opt->from) : undef,
to_date => $opt->to ? parse_date($opt->to) : undef,
levels => $opt->level ? { map { $_ => 1 } @{$opt->level} } : undef,
exclude => $opt->exclude ? [map { qr/$_/ } @{$opt->exclude}] : [],
output_format => $opt->output_format,
show_stats => $opt->stats,
follow => $opt->follow,
);
$analyzer->process_files(@ARGV);
package LogAnalyzer;
use Moo;
use File::Tail;
has [qw(pattern from_date to_date levels exclude)] => (is => 'ro');
has 'output_format' => (is => 'ro', default => 'table');
has 'show_stats' => (is => 'ro', default => 0);
has 'follow' => (is => 'ro', default => 0);
has 'stats' => (is => 'rw', default => sub { {} });
has 'results' => (is => 'rw', default => sub { [] });
sub process_files {
my ($self, @files) = @_;
if ($self->follow && @files == 1) {
$self->follow_file($files[0]);
} else {
for my $file (@files) {
$self->process_file($file);
}
$self->output_results();
}
}
sub process_file {
my ($self, $file) = @_;
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>) {
chomp $line;
$self->process_line($line);
}
close $fh;
}
sub follow_file {
my ($self, $file) = @_;
my $tail = File::Tail->new($file);
while (defined(my $line = $tail->read)) {
chomp $line;
my $entry = $self->process_line($line);
if ($entry) {
$self->output_entry($entry);
}
}
}
sub process_line {
my ($self, $line) = @_;
# Skip if matches exclude pattern
for my $exclude (@{$self->exclude}) {
return if $line =~ $exclude;
}
# Parse log line (customize for your format)
my $entry = $self->parse_log_line($line);
return unless $entry;
# Apply filters
return if $self->pattern && $line !~ $self->pattern;
return if $self->levels && !$self->levels->{$entry->{level}};
return if $self->from_date && $entry->{timestamp} < $self->from_date;
return if $self->to_date && $entry->{timestamp} > $self->to_date;
# Update statistics
$self->update_stats($entry) if $self->show_stats;
# Store result
push @{$self->results}, $entry unless $self->follow;
return $entry;
}
sub parse_log_line {
my ($self, $line) = @_;
# Example: 2024-01-15 10:30:45 [ERROR] Connection timeout
if ($line =~ /^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+\[(\w+)\]\s+(.+)$/) {
return {
timestamp => $1,
level => $2,
message => $3,
raw => $line,
};
}
return undef;
}
sub update_stats {
my ($self, $entry) = @_;
$self->stats->{total}++;
$self->stats->{by_level}{$entry->{level}}++;
}
sub output_results {
my $self = shift;
if ($self->output_format eq 'json') {
say encode_json($self->results);
} elsif ($self->output_format eq 'csv') {
$self->output_csv();
} else {
$self->output_table();
}
$self->output_statistics() if $self->show_stats;
}
sub output_entry {
my ($self, $entry) = @_;
if ($self->output_format eq 'json') {
say encode_json($entry);
} else {
say "$entry->{timestamp} [$entry->{level}] $entry->{message}";
}
}
sub output_table {
my $self = shift;
my $table = Text::Table->new(
"Timestamp", "Level", "Message"
);
for my $entry (@{$self->results}) {
$table->load([$entry->{timestamp}, $entry->{level}, $entry->{message}]);
}
print $table;
}
sub output_csv {
my $self = shift;
say "Timestamp,Level,Message";
for my $entry (@{$self->results}) {
say qq("$entry->{timestamp}","$entry->{level}","$entry->{message}");
}
}
sub output_statistics {
my $self = shift;
say "\nStatistics:";
say "-" x 40;
say "Total entries: " . ($self->stats->{total} // 0);
if ($self->stats->{by_level}) {
say "\nBy Level:";
for my $level (sort keys %{$self->stats->{by_level}}) {
printf " %-10s: %d\n", $level, $self->stats->{by_level}{$level};
}
}
}
sub parse_date {
my $date = shift;
return Time::Piece->strptime($date, '%Y-%m-%d');
}
```
## Best Practices
1. **Follow Unix philosophy** - Do one thing well
2. **Support standard conventions** - Use -, --, read from STDIN
3. **Provide helpful error messages** - Guide users to success
4. **Include examples in help** - Show, don't just tell
5. **Support configuration files** - For complex tools
6. **Make output parseable** - Support JSON/CSV for scripting
7. **Use exit codes properly** - 0 for success, non-zero for errors
8. **Support verbose and quiet modes** - Let users control output
9. **Handle signals gracefully** - Clean up on SIGINT/SIGTERM
10. **Test your CLI** - Use Test::Cmd or similar
## Conclusion
Building command-line tools in Perl is a joy. The language's text processing power, combined with excellent CPAN modules for argument parsing and terminal interaction, makes it ideal for creating the kind of tools system administrators and developers use every day.
Remember: a good CLI tool feels intuitive to use, provides helpful feedback, and plays well with other tools in the Unix tradition. Perl gives you all the pieces—it's up to you to assemble them thoughtfully.
---
*Next: System monitoring and alerting scripts. We'll build tools that keep watch over your infrastructure and alert you when things go wrong.*

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

185
README.md Normal file
View File

@@ -0,0 +1,185 @@
# The Perl Programming Language: A Modern Guide for System Administrators and Automation Engineers
A comprehensive, practical guide to Perl programming in 2025, focusing on real-world applications for system administration, automation, and server-side utilities.
## 📚 About This Book
This book presents Perl not as a relic of the past, but as a powerful, evolving language that remains incredibly relevant for modern computing challenges. Written for experienced system administrators, DevOps engineers, and automation specialists, it demonstrates why Perl continues to excel at the tasks it was designed for: text processing, system administration, network programming, and rapid prototyping.
### Who This Book Is For
- **System Administrators** looking to automate complex tasks
- **DevOps Engineers** building robust automation pipelines
- **Network Engineers** managing infrastructure at scale
- **Security Professionals** analyzing logs and monitoring systems
- **Developers** who want to leverage Perl's unique strengths
- **Anyone** curious about Perl's continued relevance in 2025
### What Makes This Book Different
- **Modern Perl Practices**: Uses current best practices and modern Perl features
- **Practical Examples**: Every concept illustrated with working, production-ready code
- **Real-World Focus**: Emphasis on solving actual problems faced by sysadmins and engineers
- **Comprehensive Coverage**: From basics to advanced topics like security and web services
- **Engaging Style**: Technical accuracy balanced with readability and occasional humor
## 📖 Table of Contents
### Part I: Foundations
- **[Chapter 1: Why Perl Still Matters in 2025](01-why-perl-still-matters-in-2025.md)** - Perl's unique position in modern computing
- **[Chapter 2: Setting Up Your Perl Environment](02-setting-up-your-perl-environment.md)** - Modern development environment setup
- **[Chapter 3: Perl Fundamentals - Variables, Data Types, and Operations](03-perl-fundamentals-variables-data-types-and-operations.md)** - Core language concepts
- **[Chapter 4: Control Flow and Subroutines](04-control-flow-and-subroutines.md)** - Program flow and code organization
- **[Chapter 5: Regular Expressions - Perl's Secret Weapon](05-regular-expressions-perls-secret-weapon.md)** - Mastering pattern matching
### Part II: Working with Data
- **[Chapter 6: File I/O and Directory Operations](06-file-io-and-directory-operations.md)** - File system manipulation
- **[Chapter 7: Text Processing and Manipulation](07-text-processing-and-manipulation.md)** - Advanced text operations
- **[Chapter 8: Working with Structured Data Formats](08-working-with-structured-data-formats.md)** - JSON, XML, YAML, and more
- **[Chapter 9: Log Analysis and Data Extraction](09-log-analysis-and-data-extraction.md)** - Mining insights from logs
### Part III: System Programming
- **[Chapter 10: System Administration and Process Management](10-system-administration-and-process-management.md)** - OS interaction and process control
- **[Chapter 11: Network Programming and Web Scraping](11-network-programming-and-web-scraping.md)** - Network protocols and web automation
- **[Chapter 12: Database Programming with DBI](12-database-programming-with-dbi.md)** - Database interaction and management
- **[Chapter 13: Configuration Management and Templating](13-configuration-management-and-templating.md)** - Managing system configurations
### Part IV: Modern Perl Development
- **[Chapter 14: CPAN - The Comprehensive Perl Archive Network](14-cpan-the-comprehensive-perl-archive-network.md)** - Leveraging Perl's ecosystem
- **[Chapter 15: Object-Oriented Programming in Modern Perl](15-object-oriented-programming-in-modern-perl.md)** - OOP with Moose and Moo
- **[Chapter 16: Testing and Debugging](16-testing-and-debugging.md)** - Quality assurance practices
- **[Chapter 17: Performance and Optimization](17-performance-and-optimization.md)** - Making Perl fast
### Part V: Real-World Applications
- **[Chapter 18: Building Command-Line Tools](18-building-command-line-tools.md)** - Professional CLI applications
- **[Chapter 19: System Monitoring and Alerting Scripts](19-system-monitoring-and-alerting-scripts.md)** - Infrastructure monitoring
- **[Chapter 20: Automation Workflows and Cron Jobs](20-automation-workflows-and-cron-jobs.md)** - Task automation
- **[Chapter 21: RESTful APIs and Web Services](21-restful-apis-and-web-services.md)** - Building and consuming APIs
- **[Chapter 22: Security Best Practices](22-security-best-practices.md)** - Secure Perl programming
### Appendices
- **[Appendix A: Perl One-Liners Cookbook](appendix-a-perl-one-liners-cookbook.md)** - Powerful command-line solutions
- **[Appendix B: Common Gotchas and Solutions](appendix-b-common-gotchas-and-solutions.md)** - Avoiding common pitfalls
- **[Appendix C: Resources and Community](appendix-c-resources-and-community.md)** - Further learning and community
## 🚀 Getting Started
### Prerequisites
To follow along with the examples in this book, you'll need:
- Perl 5.32 or later (5.38+ recommended)
- A text editor or IDE with Perl support
- Basic command-line familiarity
- Optional: Git for version control
### Quick Start
1. Clone this repository:
```bash
git clone https://github.com/cloudstreet-dev/The-PERL-Programming-Language.git
cd The-PERL-Programming-Language
```
2. Verify your Perl installation:
```bash
perl -v
```
3. Start with [Chapter 1](01-why-perl-still-matters-in-2025.md) or jump to any topic that interests you
### Code Examples
All code examples in this book are:
- ✅ Tested and working
- ✅ Production-ready (with appropriate error handling)
- ✅ Following modern Perl best practices
- ✅ Well-commented and documented
## 💡 Key Features
### Modern Perl Practices
- Uses `Modern::Perl` pragma for contemporary features
- Embraces function signatures (Perl 5.20+)
- Leverages postfix dereferencing (Perl 5.24+)
- Employs Moose/Moo for robust OOP
### Comprehensive Coverage
- **500+ working code examples**
- **22 in-depth chapters**
- **3 reference appendices**
- **Real-world case studies**
### Topics Covered
- ✨ Text processing and regular expressions
- 🔧 System administration and automation
- 🌐 Network programming and web services
- 📊 Database interaction and data processing
- 🔒 Security best practices
- 🚀 Performance optimization
- 🧪 Testing and quality assurance
- 📦 CPAN and module management
## 🤝 Contributing
While this book represents a complete work, corrections and improvements are welcome:
1. **Report Issues**: Found an error? [Open an issue](https://github.com/cloudstreet-dev/The-PERL-Programming-Language/issues)
2. **Submit Corrections**: Small fixes can be submitted as pull requests
3. **Share Feedback**: Let us know how this book helped you
## 📝 License
This book is released under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
You are free to:
- **Share** — copy and redistribute the material
- **Adapt** — remix, transform, and build upon the material
Under the following terms:
- **Attribution** — Give appropriate credit
- **ShareAlike** — Distribute contributions under the same license
## 🙏 Acknowledgments
This book stands on the shoulders of giants:
- **Larry Wall** for creating Perl and fostering its philosophy
- **The Perl Community** for decades of modules, tools, and knowledge
- **CPAN Authors** for building an incredible ecosystem
- **System Administrators** worldwide who keep infrastructure running with Perl
## 📬 Contact
- **Repository**: [github.com/cloudstreet-dev/The-PERL-Programming-Language](https://github.com/cloudstreet-dev/The-PERL-Programming-Language)
- **Issues**: [GitHub Issues](https://github.com/cloudstreet-dev/The-PERL-Programming-Language/issues)
## 🌟 Why Read This Book?
In an era dominated by Python and Go, why learn Perl? Because:
1. **Unmatched Text Processing**: No language matches Perl's regex integration and text manipulation capabilities
2. **System Administration Power**: Perl remains deeply embedded in Unix/Linux system administration
3. **CPAN Ecosystem**: Over 200,000 modules for every conceivable task
4. **Rapid Development**: Perl's expressiveness enables incredibly fast prototyping
5. **Job Security**: Millions of lines of Perl code run critical infrastructure worldwide
6. **Unique Philosophy**: TMTOWTDI (There's More Than One Way To Do It) encourages creative problem-solving
## 🎯 Learning Outcomes
After reading this book, you will be able to:
- Write robust system administration scripts
- Process and analyze large text files efficiently
- Build command-line tools and automation workflows
- Create RESTful APIs and web services
- Implement secure coding practices
- Debug and optimize Perl applications
- Navigate and contribute to the CPAN ecosystem
- Apply Perl to real-world DevOps challenges
---
*"Perl The Swiss Army Chainsaw of Programming Languages"*
**Start your journey with [Chapter 1: Why Perl Still Matters in 2025](01-why-perl-still-matters-in-2025.md)**

View File

@@ -0,0 +1,525 @@
# Appendix A: Perl One-Liners Cookbook
*"Sometimes the most powerful programs are the ones that fit on a single line."*
Perl's conciseness and expressiveness shine brightest in one-liners—those magical incantations that transform data, search files, and automate tasks, all from the command line. This cookbook contains battle-tested one-liners for real-world system administration tasks.
## Text Processing
### Search and Replace
```bash
# Replace text in a file (in-place)
perl -pi -e 's/old_text/new_text/g' file.txt
# Replace with backup
perl -pi.bak -e 's/old_text/new_text/g' file.txt
# Replace only on lines matching a pattern
perl -pi -e 's/old/new/g if /pattern/' file.txt
# Replace across multiple files
perl -pi -e 's/old/new/g' *.txt
# Case-insensitive replace
perl -pi -e 's/old/new/gi' file.txt
# Replace with captured groups
perl -pi -e 's/(\w+)\.old\.com/$1.new.com/g' file.txt
# Replace newlines with commas
perl -pe 's/\n/,/g' file.txt
# Remove trailing whitespace
perl -pi -e 's/\s+$//' file.txt
# Convert DOS to Unix line endings
perl -pi -e 's/\r\n/\n/g' file.txt
# Convert Unix to DOS line endings
perl -pi -e 's/\n/\r\n/g' file.txt
# Replace multiple spaces with single space
perl -pi -e 's/ +/ /g' file.txt
# Remove blank lines
perl -ni -e 'print unless /^$/' file.txt
# Replace tabs with spaces
perl -pi -e 's/\t/ /g' file.txt
# URL encode
perl -MURI::Escape -e 'print uri_escape($ARGV[0])' "hello world"
# URL decode
perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "hello%20world"
# HTML encode
perl -MHTML::Entities -e 'print encode_entities($ARGV[0])' "<h1>Title</h1>"
# Base64 encode
perl -MMIME::Base64 -e 'print encode_base64("text")'
# Base64 decode
perl -MMIME::Base64 -e 'print decode_base64("dGV4dA==")'
```
### Line Operations
```bash
# Print specific line number
perl -ne 'print if $. == 42' file.txt
# Print lines 10-20
perl -ne 'print if 10..20' file.txt
# Print every nth line
perl -ne 'print if $. % 3 == 0' file.txt
# Number all lines
perl -pe '$_ = "$. $_"' file.txt
# Print line before and after match
perl -ne 'print $prev if /pattern/; $prev = $_' file.txt
# Remove duplicate lines
perl -ne 'print unless $seen{$_}++' file.txt
# Print lines longer than 80 characters
perl -ne 'print if length > 80' file.txt
# Print shortest line
perl -ne '$min = $_ if !defined $min || length($_) < length($min); END {print $min}' file.txt
# Print longest line
perl -ne '$max = $_ if length($_) > length($max // ""); END {print $max}' file.txt
# Reverse lines (tac)
perl -e 'print reverse <>' file.txt
# Shuffle lines randomly
perl -MList::Util=shuffle -e 'print shuffle <>' file.txt
# Sort lines
perl -e 'print sort <>' file.txt
# Sort numerically
perl -e 'print sort {$a <=> $b} <>' file.txt
# Sort by column
perl -e 'print sort {(split /\s+/, $a)[2] cmp (split /\s+/, $b)[2]} <>' file.txt
```
### Field and Column Operations
```bash
# Print first field (like cut -f1)
perl -lane 'print $F[0]' file.txt
# Print specific columns
perl -lane 'print "@F[0,2,4]"' file.txt
# Print last field
perl -lane 'print $F[-1]' file.txt
# Sum numbers in first column
perl -lane '$sum += $F[0]; END {print $sum}' file.txt
# Average of column
perl -lane '$sum += $F[0]; $count++; END {print $sum/$count}' file.txt
# Join fields with comma
perl -lane 'print join(",", @F)' file.txt
# Print fields in reverse order
perl -lane 'print join(" ", reverse @F)' file.txt
# Count fields per line
perl -lane 'print scalar @F' file.txt
# Print lines with exactly n fields
perl -lane 'print if @F == 5' file.txt
# Transpose rows and columns
perl -lane 'push @{$c[$_]}, $F[$_] for 0..$#F; END {print "@$_" for @c}' file.txt
# Convert CSV to TSV
perl -pe 's/,/\t/g' file.csv
# Parse CSV properly
perl -MText::CSV -e 'my $csv = Text::CSV->new(); while (<>) {$csv->parse($_); print join("\t", $csv->fields), "\n"}' file.csv
```
## File Operations
### File Management
```bash
# Find files modified in last 24 hours
perl -e 'print "$_\n" for grep {-M $_ < 1} glob("*")'
# Find files larger than 1MB
perl -e 'print "$_\n" for grep {-s $_ > 1_000_000} glob("*")'
# Rename files (add prefix)
perl -e 'rename $_, "prefix_$_" for glob("*.txt")'
# Rename files (change extension)
perl -e 'for (glob("*.txt")) {$new = $_; $new =~ s/\.txt$/.bak/; rename $_, $new}'
# Delete empty files
perl -e 'unlink for grep {-z $_} glob("*")'
# Create backup copies
perl -e 'for (glob("*.conf")) {system("cp", $_, "$_.backup")}'
# List directories only
perl -e 'print "$_\n" for grep {-d $_} glob("*")'
# List files recursively
perl -MFile::Find -e 'find(sub {print "$File::Find::name\n" if -f}, ".")'
# Calculate total size of files
perl -e '$total += -s $_ for glob("*"); print "$total\n"'
# Find duplicate files by size
perl -e 'for (glob("*")) {push @{$files{-s $_}}, $_} for (values %files) {print "@$_\n" if @$_ > 1}'
# Touch files (update timestamp)
perl -e 'utime(undef, undef, $_) for glob("*.txt")'
# Make files executable
perl -e 'chmod 0755, $_ for glob("*.pl")'
# Find broken symlinks
perl -e 'print "$_\n" for grep {-l $_ && !-e $_} glob("*")'
```
### Content Analysis
```bash
# Count lines in file
perl -ne 'END {print $.}' file.txt
# Count words
perl -ne '$w += split; END {print "$w\n"}' file.txt
# Count characters
perl -ne '$c += length; END {print "$c\n"}' file.txt
# Frequency count of words
perl -ne 'for (split) {$freq{lc $_}++} END {print "$_: $freq{$_}\n" for sort {$freq{$b} <=> $freq{$a}} keys %freq}' file.txt
# Find longest word
perl -ne 'for (split) {$max = $_ if length($_) > length($max)} END {print "$max\n"}' file.txt
# Extract email addresses
perl -ne 'print "$1\n" while /([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})/g' file.txt
# Extract URLs
perl -ne 'print "$1\n" while m!(https?://[^\s]+)!g' file.txt
# Extract IP addresses
perl -ne 'print "$1\n" while /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/g' file.txt
# Extract phone numbers (US format)
perl -ne 'print "$1\n" while /(\d{3}[-.]?\d{3}[-.]?\d{4})/g' file.txt
# Check for binary files
perl -e 'for (glob("*")) {print "$_\n" if -B $_}'
# Find files containing text
perl -e 'for $f (glob("*.txt")) {open F, $f; print "$f\n" if grep /pattern/, <F>; close F}'
```
## System Administration
### Process Management
```bash
# Kill processes by name
perl -e 'kill "TERM", grep {/processname/} map {/^\s*(\d+)/; $1} `ps aux`'
# Show process tree
perl -e 'for (`ps aux`) {/^(\S+)\s+(\d+)\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+(.*)/ && print "$2 $3\n"}'
# Monitor CPU usage
perl -e 'while (1) {$cpu = `top -bn1 | grep "Cpu(s)"`; print $cpu; sleep 5}'
# Show memory usage
perl -ne 'print if /^Mem/' < /proc/meminfo
# List open ports
perl -ne 'print if /:[\dA-F]{4}\s+0{8}:0{4}/' < /proc/net/tcp
# Show disk usage over threshold
perl -e 'for (`df -h`) {print if /(\d+)%/ && $1 > 80}'
# Parse system logs
perl -ne 'print if /ERROR|WARNING/' /var/log/syslog
# Count log entries by hour
perl -ne 'if (/(\d{2}):\d{2}:\d{2}/) {$hours{$1}++} END {print "$_: $hours{$_}\n" for sort keys %hours}' /var/log/messages
```
### Network Operations
```bash
# Ping sweep
perl -e 'for (1..254) {system("ping -c 1 -W 1 192.168.1.$_ &")}'
# Extract MAC addresses
perl -ne 'print "$1\n" while /((?:[0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2})/g' file.txt
# Parse Apache logs
perl -ne '/^(\S+).*\[([^\]]+)\]\s+"([^"]+)"\s+(\d+)/ && print "$1 $2 $3 $4\n"' access.log
# Count 404 errors
perl -ne '$c++ if / 404 /; END {print "$c\n"}' access.log
# Top IP addresses in log
perl -ne '/^(\S+)/ && $ip{$1}++; END {print "$_: $ip{$_}\n" for (sort {$ip{$b} <=> $ip{$a}} keys %ip)[0..9]}' access.log
# Monitor bandwidth (rough)
perl -e 'while (1) {$rx1 = `cat /sys/class/net/eth0/statistics/rx_bytes`; sleep 1; $rx2 = `cat /sys/class/net/eth0/statistics/rx_bytes`; printf "%.2f KB/s\n", ($rx2-$rx1)/1024}'
# Parse netstat output
perl -ne 'print "$1:$2\n" if /tcp\s+\d+\s+\d+\s+([\d.]+):(\d+)/' < <(netstat -an)
# DNS lookup
perl -MSocket -e 'print inet_ntoa(inet_aton($ARGV[0]))' hostname
```
## Data Processing
### JSON Operations
```bash
# Pretty print JSON
perl -MJSON::PP -e 'print JSON::PP->new->pretty->encode(decode_json(<>))' file.json
# Extract field from JSON
perl -MJSON::PP -ne 'print decode_json($_)->{field}, "\n"' file.json
# Convert JSON to CSV
perl -MJSON::PP -ne '$j = decode_json($_); print join(",", @{$j}{qw(field1 field2 field3)}), "\n"' file.json
# Filter JSON array
perl -MJSON::PP -e '$j = decode_json(join "", <>); print encode_json([grep {$_->{age} > 18} @$j])' file.json
# Merge JSON files
perl -MJSON::PP -e '@all = map {decode_json(do {local $/; open F, $_; <F>})} @ARGV; print encode_json(\@all)' *.json
```
### XML Processing
```bash
# Extract XML tags
perl -ne 'print "$1\n" while /<tag>([^<]+)<\/tag>/g' file.xml
# Convert XML to text
perl -pe 's/<[^>]*>//g' file.xml
# Pretty print XML
perl -MXML::Tidy -e 'my $tidy = XML::Tidy->new(filename => $ARGV[0]); $tidy->tidy(); print $tidy->toString()' file.xml
# Extract attributes
perl -ne 'print "$1\n" while /attribute="([^"]+)"/g' file.xml
```
### Database Operations
```bash
# Quick SQLite query
perl -MDBI -e '$dbh = DBI->connect("dbi:SQLite:dbname=test.db"); print "@$_\n" for @{$dbh->selectall_arrayref("SELECT * FROM users")}'
# Export to CSV
perl -MDBI -e '$dbh = DBI->connect("dbi:SQLite:dbname=test.db"); $sth = $dbh->prepare("SELECT * FROM users"); $sth->execute(); while ($row = $sth->fetchrow_arrayref) {print join(",", @$row), "\n"}'
# Import from CSV
perl -MDBI -MText::CSV -e '$csv = Text::CSV->new(); $dbh = DBI->connect("dbi:SQLite:dbname=test.db"); while (<>) {$csv->parse($_); $dbh->do("INSERT INTO users VALUES (?, ?, ?)", undef, $csv->fields)}'
```
## Date and Time
```bash
# Current timestamp
perl -e 'print time, "\n"'
# Current date
perl -e 'print scalar localtime, "\n"'
# Format date
perl -MPOSIX -e 'print strftime("%Y-%m-%d %H:%M:%S\n", localtime)'
# Days between dates
perl -MTime::Piece -e '$t1 = Time::Piece->strptime("2024-01-01", "%Y-%m-%d"); $t2 = Time::Piece->strptime("2024-12-31", "%Y-%m-%d"); print int(($t2 - $t1) / 86400), "\n"'
# Convert epoch to date
perl -e 'print scalar localtime(1704067200), "\n"'
# Add days to date
perl -MTime::Piece -MTime::Seconds -e '$t = localtime; $t += ONE_DAY * 7; print $t->strftime("%Y-%m-%d\n")'
# File age in days
perl -e 'print int(-M $ARGV[0]), "\n"' file.txt
# Show files modified today
perl -e 'print "$_\n" for grep {int(-M $_) == 0} glob("*")'
```
## Mathematical Operations
```bash
# Calculator
perl -e 'print eval($ARGV[0]), "\n"' "2 + 2 * 3"
# Sum numbers from stdin
perl -ne '$sum += $_; END {print "$sum\n"}'
# Generate random numbers
perl -e 'print int(rand(100)), "\n" for 1..10'
# Factorial
perl -e '$f = 1; $f *= $_ for 1..$ARGV[0]; print "$f\n"' 5
# Prime numbers
perl -e 'for $n (2..100) {$p = 1; $n % $_ or $p = 0 for 2..sqrt($n); print "$n " if $p}'
# Fibonacci sequence
perl -e '$a = $b = 1; print "$a "; for (1..20) {print "$b "; ($a, $b) = ($b, $a + $b)}'
# Convert hex to decimal
perl -e 'print hex($ARGV[0]), "\n"' FF
# Convert decimal to hex
perl -e 'printf "%X\n", $ARGV[0]' 255
# Convert binary to decimal
perl -e 'print oct("0b" . $ARGV[0]), "\n"' 11111111
# Statistics
perl -ne 'push @v, $_; END {$sum += $_ for @v; $avg = $sum/@v; $var += ($_-$avg)**2 for @v; print "Mean: $avg, StdDev: ", sqrt($var/@v), "\n"}' numbers.txt
```
## Encoding and Conversion
```bash
# Convert to uppercase
perl -pe '$_ = uc'
# Convert to lowercase
perl -pe '$_ = lc'
# Title case
perl -pe 's/\b(\w)/\u$1/g'
# ROT13 encoding
perl -pe 'tr/A-Za-z/N-ZA-Mn-za-m/'
# Convert ASCII to hex
perl -ne 'print unpack("H*", $_), "\n"'
# Convert hex to ASCII
perl -ne 'print pack("H*", $_)'
# Unicode operations
perl -CS -pe 's/\N{U+00E9}/e/g' # Replace é with e
# Convert encoding
perl -MEncode -pe '$_ = encode("UTF-8", decode("ISO-8859-1", $_))'
```
## Quick Utilities
```bash
# Generate UUID
perl -MData::UUID -e 'print Data::UUID->new->create_str, "\n"'
# Password generator
perl -e 'print map {("a".."z","A".."Z",0..9)[rand 62]} 1..16; print "\n"'
# Check if port is open
perl -MIO::Socket::INET -e 'print IO::Socket::INET->new("$ARGV[0]:$ARGV[1]") ? "Open\n" : "Closed\n"' hostname 80
# Simple HTTP GET
perl -MLWP::Simple -e 'print get($ARGV[0])' 'http://example.com'
# Send email
perl -MMIME::Lite -e 'MIME::Lite->new(From => "sender@example.com", To => "recipient@example.com", Subject => "Test", Data => "Hello")->send'
# Benchmark code
perl -MBenchmark -e 'timethis(1000000, sub {$x = 1 + 1})'
# Create tar archive
perl -MArchive::Tar -e 'Archive::Tar->create_archive("archive.tar.gz", COMPRESS_GZIP, glob("*.txt"))'
# Weather (requires Internet)
perl -MLWP::Simple -e 'print +(get("http://wttr.in/?format=3") =~ /: (.+)/)[0], "\n"'
# QR code generator
perl -MImager::QRCode -e 'Imager::QRCode->new->plot($ARGV[0])->write(file => "qr.png")' "Hello World"
# Simple web server
perl -MIO::All -e 'io(":8080")->fork->accept->(sub {$_[0] < io("index.html")})'
# Watch file for changes
perl -e 'while (1) {$new = -M $ARGV[0]; print "Changed\n" if $new != $old; $old = $new; sleep 1}' file.txt
# Bulk email validator
perl -MEmail::Valid -ne 'chomp; print "$_\n" if Email::Valid->address($_)' emails.txt
# Simple port scanner
perl -MIO::Socket::INET -e 'for (1..1024) {print "$_\n" if IO::Socket::INET->new("$ARGV[0]:$_")}' hostname
# Directory tree
perl -MFile::Find -e 'find(sub {$d = $File::Find::dir =~ tr!/!!; print " " x $d, "$_\n"}, ".")'
```
## Performance One-Liners
```bash
# Profile code execution
perl -d:NYTProf -e 'your_code_here'
# Memory usage
perl -e 'system("ps -o pid,vsz,rss,comm -p $$")'
# Time execution
perl -MTime::HiRes=time -e '$t = time; system($ARGV[0]); printf "%.3f seconds\n", time - $t' "command"
# Count module load time
perl -MTime::HiRes=time -e '$t = time; require Some::Module; printf "%.3f seconds\n", time - $t'
```
## Tips for Writing One-Liners
1. **Essential Options**
- `-e`: Execute code
- `-n`: Loop over input lines (while (<>) {...})
- `-p`: Loop and print (while (<>) {...; print})
- `-i`: In-place editing
- `-l`: Auto chomp and add newlines
- `-a`: Auto-split into @F
- `-F`: Set field separator for -a
2. **Special Variables**
- `$_`: Current line/topic
- `$.`: Line number
- `@F`: Fields (with -a)
- `$/`: Input record separator
- `$\`: Output record separator
3. **Module Loading**
- `-M`: Load module (-MJSON::PP)
- `-m`: Load module without importing
4. **Common Patterns**
- `BEGIN {}`: Run before loop
- `END {}`: Run after loop
- `next if`: Skip lines
- `last if`: Stop processing
Remember: One-liners are powerful but can become unreadable. If a one-liner grows too complex, consider writing a proper script. The goal is efficiency, not obscurity.
*"The best one-liner is the one that gets the job done and is still understandable tomorrow."*

View File

@@ -0,0 +1,623 @@
# Appendix B: Common Gotchas and Solutions
*"Experience is simply the name we give our mistakes, and Perl has taught us many valuable experiences."*
Every Perl programmer, from novice to expert, has fallen into these traps. This appendix catalogs the most common pitfalls and their solutions, saving you hours of debugging frustration.
## Context Confusion
### The Problem
Perl's scalar vs. list context is powerful but can be confusing:
```perl
# GOTCHA: Unexpected scalar context
my @array = (1, 2, 3, 4, 5);
my $count = @array; # $count is 5 (array in scalar context)
# But this might surprise you:
my @files = glob("*.txt");
if (@files) { # This checks if array is non-empty
print "Found files\n";
}
if (glob("*.txt")) { # GOTCHA: Only checks first file!
print "This might not do what you expect\n";
}
# SOLUTION: Force list context
if (() = glob("*.txt")) {
print "Found files\n";
}
```
### List vs. Array
```perl
# GOTCHA: List assignment in scalar context
my $x = (1, 2, 3); # $x is 3, not an array!
# SOLUTION: Use array reference
my $x = [1, 2, 3]; # $x is an arrayref
# GOTCHA: Returning lists
sub get_data {
return (1, 2, 3); # Returns list
}
my $data = get_data(); # $data is 3 (last element)
my @data = get_data(); # @data is (1, 2, 3)
# SOLUTION: Return reference for consistency
sub get_data_ref {
return [1, 2, 3];
}
my $data = get_data_ref(); # Always returns arrayref
```
## Reference Gotchas
### Accidental References
```perl
# GOTCHA: Creating a reference when you don't mean to
my @array = (1, 2, 3);
my $ref = @array; # $ref is 3, not a reference!
# SOLUTION: Use backslash
my $ref = \@array; # Now it's a reference
# GOTCHA: Symbolic references
my $var_name = "foo";
$$var_name = 42; # Creates $foo = 42 (dangerous!)
# SOLUTION: Use strict
use strict; # This will catch symbolic references
# GOTCHA: Circular references
my $node = { value => 1 };
$node->{next} = $node; # Memory leak!
# SOLUTION: Weaken references
use Scalar::Util 'weaken';
my $node = { value => 1 };
$node->{next} = $node;
weaken($node->{next});
```
### Reference vs. Copy
```perl
# GOTCHA: Modifying shared references
my @original = (1, 2, 3);
my $ref1 = \@original;
my $ref2 = $ref1;
push @$ref2, 4; # Modifies @original!
# SOLUTION: Deep copy
use Storable qw(dclone);
my $ref2 = dclone($ref1);
# GOTCHA: Shallow copy of nested structures
my $original = { a => [1, 2, 3] };
my $copy = { %$original };
push @{$copy->{a}}, 4; # Modifies $original->{a}!
# SOLUTION: Deep copy for nested structures
my $copy = dclone($original);
```
## String and Number Confusion
### Numeric vs. String Comparison
```perl
# GOTCHA: Wrong comparison operator
my $x = "10";
my $y = "9";
if ($x > $y) {
print "10 > 9 (numeric)\n"; # Correct
}
if ($x gt $y) {
print "10 gt 9 (string)\n"; # Wrong! "10" lt "9" as strings
}
# GOTCHA: Sorting numbers as strings
my @numbers = (1, 2, 10, 20, 3);
my @wrong = sort @numbers; # (1, 10, 2, 20, 3)
my @right = sort { $a <=> $b } @numbers; # (1, 2, 3, 10, 20)
# GOTCHA: String increment
my $ver = "1.9";
$ver++; # Becomes 2, not "1.10" or "2.0"!
# SOLUTION: Version objects
use version;
my $ver = version->new("1.9");
$ver++; # Properly increments version
```
### Unexpected String Conversion
```perl
# GOTCHA: Leading zeros
my $num = 0755; # Octal! Value is 493
my $str = "0755";
my $val = $str + 0; # 755, not 493
# SOLUTION: Explicit conversion
my $octal = oct("0755"); # 493
my $decimal = "755" + 0; # 755
# GOTCHA: Floating point comparison
my $x = 0.1 + 0.2;
if ($x == 0.3) { # May fail due to floating point!
print "Equal\n";
}
# SOLUTION: Use tolerance
use constant EPSILON => 1e-10;
if (abs($x - 0.3) < EPSILON) {
print "Equal within tolerance\n";
}
```
## Regular Expression Pitfalls
### Greedy vs. Non-Greedy
```perl
# GOTCHA: Greedy matching
my $html = '<div>Content</div><div>More</div>';
$html =~ /<div>(.+)<\/div>/;
# $1 is "Content</div><div>More", not "Content"!
# SOLUTION: Non-greedy quantifier
$html =~ /<div>(.+?)<\/div>/;
# $1 is "Content"
# GOTCHA: Dot doesn't match newline
my $text = "Line 1\nLine 2";
$text =~ /Line.+2/; # Doesn't match!
# SOLUTION: /s modifier
$text =~ /Line.+2/s; # Now it matches
```
### Capture Variables
```perl
# GOTCHA: $1 persists after failed match
"foo" =~ /(\w+)/; # $1 is "foo"
"123" =~ /([a-z]+)/; # Match fails, but $1 is still "foo"!
# SOLUTION: Check match success
if ("123" =~ /([a-z]+)/) {
print $1;
} else {
print "No match\n";
}
# GOTCHA: Nested captures
"abcd" =~ /((a)(b))/;
# $1 is "ab", $2 is "a", $3 is "b"
# SOLUTION: Use named captures for clarity
"abcd" =~ /(?<group>(?<first>a)(?<second>b))/;
# $+{group}, $+{first}, $+{second}
```
### Special Characters
```perl
# GOTCHA: Unescaped metacharacters
my $price = '$19.99';
$price =~ /\$19.99/; # Need to escape $
$price =~ /\$19\.99/; # And the dot!
# SOLUTION: quotemeta or \Q...\E
$price =~ /\Q$19.99\E/;
# GOTCHA: Variable interpolation in regex
my $pattern = "a+b";
"aaaaab" =~ /$pattern/; # Looks for "a+b" literally!
# SOLUTION: Pre-compile or use qr//
my $pattern = qr/a+b/;
"aaaaab" =~ /$pattern/; # Now it works
```
## Scope and Variable Issues
### my vs. our vs. local
```perl
# GOTCHA: my in false conditional
if (0) {
my $x = 42;
}
print $x; # Error: $x not declared (good!)
# But this is tricky:
my $x = 10 if 0; # $x is declared but undefined!
print $x; # Prints nothing, not an error
# SOLUTION: Never use my with statement modifiers
my $x;
$x = 10 if $condition;
# GOTCHA: local doesn't create new variable
our $global = "global";
{
local $global = "local"; # Temporarily changes $global
print $global; # "local"
}
print $global; # Back to "global"
# GOTCHA: Closures and loops
my @subs;
for my $i (0..2) {
push @subs, sub { print $i };
}
$_->() for @subs; # Prints "222", not "012"!
# SOLUTION: Create new lexical
for my $i (0..2) {
my $j = $i;
push @subs, sub { print $j };
}
```
### Package Variables
```perl
# GOTCHA: Forgetting to declare package variables
package Foo;
$variable = 42; # Creates $Foo::variable
package Bar;
print $variable; # Undefined! Looking for $Bar::variable
# SOLUTION: Use our or fully qualify
package Foo;
our $variable = 42;
package Bar;
print $Foo::variable;
# GOTCHA: Package affects lexicals
package Foo;
my $x = 42;
package Bar;
print $x; # Still 42! my is lexical, not package-scoped
```
## File Handle Problems
### File Handle Scope
```perl
# GOTCHA: Global filehandles
open FILE, "data.txt";
# FILE is global, can conflict!
# SOLUTION: Lexical filehandles
open my $fh, '<', "data.txt" or die $!;
# GOTCHA: Not checking open success
open my $fh, '<', "nonexistent.txt";
while (<$fh>) { # Silently does nothing!
print;
}
# SOLUTION: Always check
open my $fh, '<', "file.txt" or die "Cannot open: $!";
# GOTCHA: Filehandle in variable
my $fh = "STDOUT";
print $fh "Hello"; # Doesn't work!
# SOLUTION: Use glob or reference
my $fh = *STDOUT;
print $fh "Hello";
```
### Buffering Issues
```perl
# GOTCHA: Output buffering
print "Processing...";
sleep 5;
print " Done!\n"; # "Processing..." doesn't appear for 5 seconds!
# SOLUTION: Disable buffering
$| = 1; # Or use autoflush
# GOTCHA: Reading and writing same file
open my $fh, '+<', "file.txt" or die $!;
my $line = <$fh>;
print $fh "New line\n"; # Where does this go?
# SOLUTION: Seek between read and write
seek($fh, 0, 0); # Go to beginning
```
## Loop Pitfalls
### Iterator Variables
```perl
# GOTCHA: Modifying foreach iterator
my @array = (1, 2, 3);
for my $elem (@array) {
$elem *= 2; # This modifies @array!
}
# @array is now (2, 4, 6)
# SOLUTION: Work with copy if needed
for my $elem (@array) {
my $double = $elem * 2;
# Use $double, don't modify $elem
}
# GOTCHA: $_ is aliased
my @array = (1, 2, 3);
for (@array) {
$_ = "x"; # Changes array elements!
}
# @array is now ("x", "x", "x")
# GOTCHA: Reusing iterator variable
for my $i (1..3) {
for my $i (1..3) { # Shadows outer $i
print "$i ";
}
print "$i\n"; # Always prints 3
}
```
### Loop Control
```perl
# GOTCHA: next/last in do-while
my $i = 0;
do {
next if $i == 5; # Doesn't work as expected!
print "$i ";
$i++;
} while ($i < 10);
# SOLUTION: Use while or for
while ($i < 10) {
$i++;
next if $i == 5;
print "$i ";
}
# GOTCHA: Loop label scope
OUTER: for my $i (1..3) {
for my $j (1..3) {
last OUTER if $i * $j > 4; # Exits both loops
}
}
```
## Hash Surprises
### Key Stringification
```perl
# GOTCHA: Numeric keys become strings
my %hash;
$hash{01} = "a"; # Key is "1"
$hash{1.0} = "b"; # Also key "1"
$hash{"1"} = "c"; # Still key "1"
# Only one key exists!
# GOTCHA: Reference as key
my $ref = [1, 2, 3];
my %hash = ($ref => "value"); # Key is "ARRAY(0x...)"
# SOLUTION: Use Tie::RefHash or stringify manually
use Tie::RefHash;
tie my %hash, 'Tie::RefHash';
$hash{$ref} = "value";
# GOTCHA: undef as key
my %hash = (undef, "value"); # Key is empty string ""
$hash{""} = "other"; # Overwrites the previous value
```
### List Assignment
```perl
# GOTCHA: Odd number of elements
my %hash = (1, 2, 3); # Warning: Odd number of elements
# %hash is (1 => 2, 3 => undef)
# GOTCHA: Hash slice assignment
my %hash;
@hash{'a', 'b'} = (1); # Only 'a' gets value!
# %hash is (a => 1, b => undef)
# SOLUTION: Provide all values
@hash{'a', 'b'} = (1, 2);
```
## Operator Precedence
### Common Precedence Mistakes
```perl
# GOTCHA: || vs or
open my $fh, '<', 'file.txt' or die "Error: $!"; # Correct
open my $fh, '<', 'file.txt' || die "Error: $!"; # Wrong!
# Parses as: open my $fh, '<', ('file.txt' || die "Error: $!")
# GOTCHA: Arrow operator precedence
my $x = $hash->{key} || 'default'; # OK
my $x = $hash->{key} or 'default'; # Wrong precedence!
# GOTCHA: Ternary operator
my $x = $cond ? $a = 1 : $b = 2; # Confusing!
# Better:
my $x = $cond ? ($a = 1) : ($b = 2);
# GOTCHA: String concatenation
print "Value: " . $x + 1; # Wrong! Tries numeric addition
print "Value: " . ($x + 1); # Correct
```
## Special Variable Gotchas
### $_ Problems
```perl
# GOTCHA: Unexpected $_ modification
for (1..3) {
do_something(); # Might change $_!
print $_; # Not what you expect
}
# SOLUTION: Localize $_
for (1..3) {
local $_ = $_;
do_something();
print $_;
}
# GOTCHA: map/grep modify $_
my @data = (1, 2, 3);
map { $_ *= 2 } @data; # Modifies @data!
# SOLUTION: Return new values
my @doubled = map { $_ * 2 } @data;
```
### @_ Handling
```perl
# GOTCHA: @_ is aliased
sub modify {
$_[0] = "changed";
}
my $x = "original";
modify($x); # $x is now "changed"!
# SOLUTION: Copy parameters
sub safe_modify {
my ($param) = @_;
$param = "changed"; # Doesn't affect original
}
# GOTCHA: shift in nested subs
sub outer {
my $arg = shift;
my $sub = sub {
my $inner = shift; # Shifts from inner @_, not outer!
};
}
```
## Module and Package Issues
### use vs. require
```perl
# GOTCHA: require doesn't import
require Some::Module;
Some::Module::function(); # Must use full name
# vs.
use Some::Module;
function(); # Imported (if module exports it)
# GOTCHA: Runtime vs. compile time
if ($condition) {
use Some::Module; # Always executed at compile time!
}
# SOLUTION: Use require for conditional loading
if ($condition) {
require Some::Module;
Some::Module->import();
}
# GOTCHA: Version checking
use Some::Module 1.23; # Compile-time version check
require Some::Module;
Some::Module->VERSION(1.23); # Runtime version check
```
## Performance Gotchas
### Unexpected Slowness
```perl
# GOTCHA: Repeated regex compilation
for my $item (@items) {
if ($item =~ /$pattern/) { # Recompiles each time if $pattern changes
# ...
}
}
# SOLUTION: Compile once
my $re = qr/$pattern/;
for my $item (@items) {
if ($item =~ /$re/) {
# ...
}
}
# GOTCHA: Slurping huge files
my $content = do { local $/; <$fh> }; # Loads entire file into memory
# SOLUTION: Process line by line
while (my $line = <$fh>) {
process($line);
}
# GOTCHA: Unnecessary copying
sub process_array {
my @array = @_; # Copies entire array!
# ...
}
# SOLUTION: Use references
sub process_array {
my $array_ref = shift;
# Use @$array_ref
}
```
## Unicode and Encoding
### UTF-8 Issues
```perl
# GOTCHA: Forgetting to decode input
open my $fh, '<', 'utf8_file.txt';
my $line = <$fh>; # Bytes, not characters!
# SOLUTION: Specify encoding
open my $fh, '<:encoding(UTF-8)', 'utf8_file.txt';
# GOTCHA: Double encoding
use utf8; # Source code is UTF-8
my $str = "café";
print encode_utf8($str); # Double encodes if output is UTF-8!
# GOTCHA: Length of UTF-8 strings
my $str = "café";
print length($str); # Might be 4 or 5!
# SOLUTION: Decode first
use Encode;
my $decoded = decode_utf8($str);
print length($decoded); # Always 4
```
## Quick Reference: Solutions
| Problem | Solution |
|---------|----------|
| Wrong context | Force context with () or scalar() |
| Symbolic references | use strict 'refs' |
| Circular references | use Scalar::Util 'weaken' |
| String/number comparison | Use correct operators (== vs eq) |
| Greedy regex | Use non-greedy: +? *? |
| Failed match persists | Check match success |
| Global filehandles | Use lexical: open my $fh |
| Buffering delays | Set $\| = 1 |
| Iterator modification | Use copies or indices |
| Hash key stringification | Be aware of automatic conversion |
| Precedence errors | Use parentheses liberally |
| $_ clobbering | Localize with local |
| Module import issues | Understand use vs. require |
| Performance problems | Profile, don't guess |
| Encoding errors | Explicitly specify encodings |
Remember: These gotchas exist not because Perl is flawed, but because it's flexible. Understanding them makes you a better Perl programmer and helps you write more robust, maintainable code.
*"The difference between a Perl novice and a Perl expert? The expert has made all these mistakes already."*

View File

@@ -0,0 +1,441 @@
# Appendix C: Resources and Community
*"Perl's greatest strength isn't the language itself—it's the vibrant, welcoming community that has grown around it for over three decades."*
## Official Resources
### Core Documentation
- **perldoc.perl.org** - The official Perl documentation
- Complete reference for all built-in functions
- Tutorials from beginner to advanced
- FAQ sections covering common questions
- Best practices and style guides
- **perl.org** - The Perl Programming Language website
- Download Perl distributions
- News and announcements
- Links to community resources
- Getting started guides
- **metacpan.org** - The modern CPAN interface
- Search and browse CPAN modules
- Documentation for all CPAN distributions
- Author profiles and statistics
- Dependency information and reverse dependencies
### Essential Documentation Commands
```bash
# Built-in documentation
perldoc perl # Perl overview
perldoc perlintro # Introduction for beginners
perldoc perlreftut # References tutorial
perldoc perlretut # Regular expressions tutorial
perldoc perlobj # Object-oriented programming
perldoc perlmod # Modules and packages
perldoc perlsyn # Syntax reference
perldoc perldata # Data structures
perldoc perlop # Operators
perldoc perlfunc # Built-in functions
perldoc perlvar # Special variables
perldoc perlre # Regular expressions reference
perldoc perldebug # Debugging
perldoc perldiag # Diagnostic messages
# Module documentation
perldoc Module::Name # Documentation for installed module
perldoc -m Module::Name # Source code of module
perldoc -l Module::Name # Location of module file
# Function documentation
perldoc -f function_name # Specific function docs
perldoc -v '$variable' # Special variable docs
# FAQ sections
perldoc perlfaq # FAQ index
perldoc perlfaq1 # General questions
perldoc perlfaq2 # Obtaining and learning Perl
perldoc perlfaq3 # Programming tools
perldoc perlfaq4 # Data manipulation
perldoc perlfaq5 # Files and formats
perldoc perlfaq6 # Regular expressions
perldoc perlfaq7 # General Perl language
perldoc perlfaq8 # System interaction
perldoc perlfaq9 # Web, networking, IPC
```
## Books and Learning Materials
### Classic Books
- **"Programming Perl"** (4th Edition) by Tom Christiansen, brian d foy, Larry Wall, Jon Orwant
- The definitive reference, aka "The Camel Book"
- Comprehensive coverage of Perl 5.14
- Written by Perl's creator and core contributors
- **"Learning Perl"** (8th Edition) by Randal L. Schwartz, brian d foy, Tom Phoenix
- The best introduction for beginners, aka "The Llama Book"
- Exercises with solutions
- Gradually builds from basics to intermediate
- **"Intermediate Perl"** (2nd Edition) by Randal L. Schwartz, brian d foy, Tom Phoenix
- References, complex data structures, OOP
- Packages and modules
- Testing and debugging
- **"Mastering Perl"** (2nd Edition) by brian d foy
- Advanced techniques
- Profiling and benchmarking
- Configuration and logging
- **"Effective Perl Programming"** (2nd Edition) by Joseph N. Hall, Joshua McAdams, brian d foy
- 120 ways to write better Perl
- Best practices and idioms
- Performance optimization
### Modern Perl Resources
- **"Modern Perl"** by chromatic
- Free online at modernperlbooks.com
- Contemporary Perl best practices
- Updated regularly
- **"Higher-Order Perl"** by Mark Jason Dominus
- Functional programming in Perl
- Advanced techniques
- Free online at hop.perl.plover.com
- **"Perl Best Practices"** by Damian Conway
- 256 guidelines for writing better Perl
- Coding standards and conventions
- Automation and tools
### Specialized Topics
- **"Regular Expression Pocket Reference"** by Tony Stubblebine
- Quick reference for regex
- Cross-language comparison
- **"Network Programming with Perl"** by Lincoln Stein
- Sockets and networking
- Client-server programming
- Internet protocols
- **"Perl Testing: A Developer's Notebook"** by Ian Langworth, chromatic
- Comprehensive testing strategies
- Test-driven development
- **"Automating System Administration with Perl"** (2nd Edition) by David N. Blank-Edelman
- Real-world sysadmin tasks
- Cross-platform solutions
## Online Communities
### Discussion Forums
- **PerlMonks (perlmonks.org)**
- Q&A site for Perl programmers
- Tutorials and code reviews
- Meditations on Perl philosophy
- Active since 1999
- **Stack Overflow (stackoverflow.com/questions/tagged/perl)**
- Large collection of Perl Q&A
- Quick responses
- Searchable archive
- **Reddit (reddit.com/r/perl)**
- News and discussions
- Weekly threads for beginners
- Job postings
### Mailing Lists
- **beginners@perl.org**
- Newbie-friendly environment
- Patient, helpful responses
- No question too basic
- **perl5-porters@perl.org**
- Perl 5 development discussion
- Core development
- Bug reports and patches
- **module-authors@perl.org**
- CPAN module development
- Best practices for modules
- Distribution management
### IRC Channels (irc.perl.org)
- **#perl** - General Perl discussion
- **#perl-help** - Help for beginners
- **#moose** - Moose OOP system
- **#catalyst** - Catalyst web framework
- **#mojolicious** - Mojolicious web framework
- **#dancer** - Dancer web framework
- **#dbix-class** - DBIx::Class ORM
### Social Media
- **Twitter/X**
- @PerlFoundation
- @PerlWeekly
- @metacpan
- #perl hashtag
- **Mastodon**
- @perl@fosstodon.org
- Various Perl developers
- **Discord**
- The Perl Discord Server
- Real-time chat and help
## Conferences and Events
### Major Conferences
- **The Perl and Raku Conference (TPRC)**
- Annual North American conference
- Formerly YAPC::NA
- Talks, tutorials, and hackathons
- **FOSDEM Perl Dev Room**
- Annual gathering in Brussels
- Free and open source focus
- Perl and Raku tracks
- **German Perl Workshop**
- Annual European conference
- International attendance
- Talks in English and German
- **London Perl Workshop**
- Free one-day conference
- Beginner to advanced talks
- Social events
### Local Perl Mongers Groups
Find your local group at **pm.org**
- Regular meetups
- Technical talks
- Social events
- Hackathons
Popular groups:
- London.pm
- NYC.pm
- Chicago.pm
- Toronto.pm
- Amsterdam.pm
- Tokyo.pm
- Sydney.pm
## Development Tools and Services
### Online REPLs and Playgrounds
- **3v4l.org** - Online Perl executor
- **onecompiler.com/perl** - Online Perl compiler
- **tio.run/#perl5** - Try It Online
- **replit.com** - Online IDE with Perl support
- **perlbanjo.com** - Interactive Perl tutorial
### CI/CD Services with Perl Support
- **GitHub Actions**
- Native Perl support
- CPAN GitHub Action
- Matrix testing across versions
- **Travis CI**
- Multiple Perl versions
- CPAN dependency installation
- Coverage reporting
- **CircleCI**
- Docker images with Perl
- Custom Perl configurations
- **GitLab CI**
- Built-in CI/CD
- Perl Docker images
### Code Quality Tools
- **Perl::Critic** - Code review tool
- **Perl::Tidy** - Code formatter
- **Devel::Cover** - Code coverage
- **Test::Pod** - POD documentation testing
- **Test::Pod::Coverage** - POD completeness
## CPAN and Module Resources
### Finding Modules
- **MetaCPAN (metacpan.org)**
- Primary CPAN search interface
- Reviews and ratings
- Dependency graphs
- Change logs and issues
- **CPAN Testers (cpantesters.org)**
- Test results across platforms
- Version compatibility matrix
- Smoke testing reports
- **PrePAN (prepan.org)**
- Preview and discuss modules before release
- Get feedback on module ideas
- Naming suggestions
### Creating and Maintaining Modules
- **pause.perl.org** - Upload modules to CPAN
- **Dist::Zilla** - Distribution builder
- **Module::Build** - Build and install modules
- **ExtUtils::MakeMaker** - Traditional build system
- **Minilla** - Minimal authoring tool
- **App::ModuleBuildTiny** - Tiny module builder
### Quality Metrics
- **CPANTS (cpants.cpanauthors.org)**
- Kwalitee metrics
- Best practices compliance
- Distribution quality
- **CPAN Cover (cpancover.com)**
- Coverage reports for CPAN modules
- Test quality metrics
## Perl 7 and Beyond
### Future Development
- **Perl 7** - The next major version
- Modern defaults
- Backward compatibility
- Performance improvements
- **Cor** - New object system
- Built into core
- Modern OOP features
- Performance focused
- **Signatures** - Subroutine signatures
- Moving out of experimental
- Type constraints coming
### Staying Updated
- **Perl Weekly (perlweekly.com)**
- Weekly newsletter
- News, articles, and modules
- Job listings
- **Perl.com**
- Articles and tutorials
- Community news
- Best practices
- **blogs.perl.org**
- Community blogs
- Technical discussions
- Project updates
## Contributing to Perl
### Core Development
- **GitHub (github.com/Perl/perl5)**
- Source code repository
- Issue tracking
- Pull requests
- **Perl 5 Porters**
- Core development team
- Mailing list: perl5-porters@perl.org
- IRC: #p5p
### Documentation
- **GitHub (github.com/perl-doc/)**
- Documentation projects
- Translations
- Improvements welcome
### Testing
- **CPAN Testers**
- Test CPAN modules on your system
- Submit smoke test reports
- Help ensure quality
### Financial Support
- **The Perl Foundation (perlfoundation.org)**
- Grants for development
- Event sponsorship
- Infrastructure support
- **Enlightened Perl Organisation (enlightenedperl.org)**
- Sponsors Perl projects
- Supports infrastructure
## Learning Paths
### For Beginners
1. Start with "Learning Perl"
2. Practice with PerlMonks tutorials
3. Join beginners@perl.org
4. Work through Perl Maven tutorials
5. Build small projects
### For Intermediate Programmers
1. Read "Intermediate Perl"
2. Explore CPAN modules
3. Contribute to open source
4. Attend local Perl Mongers meetings
5. Write and publish a module
### For Advanced Developers
1. Study "Higher-Order Perl"
2. Contribute to Perl core
3. Mentor newcomers
4. Speak at conferences
5. Write about Perl
## Commercial Support
### Companies Offering Perl Support
- **ActiveState** - ActivePerl distribution
- **cPanel** - Major Perl contributor
- **Booking.com** - Sponsors Perl development
- **ZipRecruiter** - Perl infrastructure
- **FastMail** - Email services in Perl
- **DuckDuckGo** - Search engine using Perl
### Perl-Focused Consultancies
- **Shadowcat Systems** (UK)
- **Perl Training Australia**
- **Stonehenge Consulting** (USA)
- **Perl Services** (Germany)
## Useful One-Stop Resources
### Quick References
- **perldoc.perl.org/perlcheat** - Perl cheat sheet
- **learnxinyminutes.com/docs/perl** - Quick syntax overview
- **perlmaven.com** - Tutorials and articles
- **perl101.org** - Perl basics
- **learn.perl.org** - Interactive tutorials
### Problem Solving
- **Rosetta Code** - Perl solutions to common problems
- **Project Euler** - Mathematical problems in Perl
- **Advent of Code** - Annual coding challenges
- **PerlGolf** - Code golf in Perl
## The Perl Philosophy
Remember the Perl motto: **"There's more than one way to do it"** (TMTOWTDI, pronounced "Tim Toady").
But also remember: **"There's more than one way to do it, but sometimes consistency is not a bad thing either"** (TIMTOWTDIBSCINABTE, pronounced... well, we just say "Tim Toady Bicarbonate").
### Perl Community Values
- **Be nice** - The community is welcoming
- **Be helpful** - Share knowledge freely
- **Be creative** - Embrace different solutions
- **Be pragmatic** - Solve real problems
- **Have fun** - Enjoy programming
## Final Words
The Perl community is one of the most welcoming and helpful in all of programming. Whether you're debugging a regex at 3 AM, optimizing a database query, or building your first web application, you're never alone. The resources in this appendix are your gateway to decades of collective wisdom, millions of lines of tested code, and a community that genuinely wants to see you succeed.
Perl may be over 30 years old, but it continues to evolve, adapt, and thrive because of its community. Join us. Ask questions. Share your knowledge. Build something amazing.
*"The Perl community is the CPAN. The CPAN is the Perl community. Everything else is just syntax."*
Welcome to Perl. Welcome home.