fetch

Java Online Training | Parsing XML using Java DOM Parser



Sharing buttons:

hi I'm Julie Johnson with fire box

training today I'm going to show you how

to parse an XML file using java now

there are several ways of parsing an XML

file we have several different types of

parsers by the way a parser is just some

kind of program that can break up an XML

file into more meaningful pieces like

the element name and the attribute name

and text nodes and such so the first

thing that we're going to need is an XML

file to work with what I have open right

now is my Eclipse environment and I'm

going to first of all just create a new

Java project and I'm going to call this

Java Dom parser the reason why I put Dom

in there it stands for document object

model it's a type of parser that simply

parses the document and then stores an

in-memory tree representation of our XML

document and then from there we can

extract meaningful pieces of information

or even manipulate the structure of of

that okay so you'll see here that I'm

using a Java standard edition 1.7 and

we'll go ahead and this is under my

sandbox 2 directory or wherever you you

know decide to put your workspace that's

fine okay so I'll hit next and then

finish and then next I'm going to right

click on here and say new Java class ok

and I'm going to make the name of my

class be my Dom parser and I'd like to

have a main method in there which is the

main entry point into our Java program

ok and then the next thing I'm going to

need inside of my project is an XML file

that I can work with and so what I did

was just went into my directory system

here and I'm just going to grab an XML

file that I already had here let me just

grab this and I can just drag it right

into my project like that I'm going to

copy to file and so now I have this file

called

people

XML now if I want to take a look at the

structure I can right click and open

this up the best way to look at it is

with a web browser and this is what it

looks like

so my root node is people or my root

element is people and that has two

children nodes each person the person

elements each have an attribute called

ID so we have ID equals one ID equals

two and then we have some more children

nodes children elements last name and

first name and then these guys right in

here are what we call text nodes okay so

now the next thing I want to do is go

into my Java program and I want to have

some code in here that is going to help

me tap into this XML okay so the class

that I want to first of all use is what

we call a document builder Factory it's

just a factory class that will then

allow us to create a document builder

object so I'll start with document

builder Factory and I'll just call this

variable Factory and you'll see here

that I can just click right on here and

perform an import so this belongs to the

Java X dot XML dot parsers package okay

so if I say document builder factory dot

you can see all the different methods

available a new instance is a static

method which returns a document builder

factory object so I can just double

click on here and it will work just fine

okay let me move this over just a little

bit here ok need a semicolon at this

point we're looking good okay now from

the factory let's see what methods are

available you'll see that there's a

method here called new document builder

and it returns a document builder object

right there that's the return type

document builder

okay so I'll double click on that and

here on the left hand side will go like

that now this also needs to be imported

okay I also need to click on here and

surround this with a try-catch to handle

those checked exceptions so this can

possibly throw a parser configuration

exception okay so once we have our

builder object you'll see that we have

the parse method and we can it's

overloaded what I'm interested in is

this string argh

so this one argument is the name of the

file that we want to parse you'll see

that it returns a document object ok so

right in here I need to put the name of

my file which is going to be people dot

XML so this is in relation here to my

project now I said that it returns a

document object so here I'll just say

document docx equals and you'll see that

we need to import now we have two

different things we can import here what

we're interested in is the org w3c Dom

so this is the w3c z-- implementation of

the parser and then I also need to add a

catch clause to the surrounding try so

this can possibly throw a sax exception

or an i/o exception at this point what

I'd like to do is run my program and if

I have any problems we'll see some error

messages down in the console since I

have no errors that we know we're good

so far if I did a typo let's say I you

know accidentally had the wrong name of

the file like I just did here when I run

it it will tell me what the problem is

ok so let me fix this again and run it

ok so now we have this variable doc well

what can we do with our document object

if I say doc dot

get element by tag name and so here we

can grab string tag name what I'm

interested in are my person tags and

look at the return data type it's a node

list that it returns so the name of my

tag that I'm looking for is person so

it's going to go through the whole thing

here and look for any tag that says

person and by the way this is case

sensetive on the left hand side I'm

going to say that it returns a node list

I'll call this person list now let's

perform the import okay so once you have

a handle on your node list you can loop

through your node list so if I say

person list dot notice that we can

reference an item passing in the the

position so the first one would be

subscript zero second one would be

subscript one and so on

the other thing that we have available

is get length which tells us how many

elements are in that node list so what

we can do is a traditional for loop I

can say for int I equals zero I is less

than person list dot get length I plus

plus so this will loop through for us

twice because we have two person tags

okay so now what do we do well if I say

person list dot item and I pass in the

index you'll see that it returns a node

I'm going to pass in I okay actually

what I'm going to do is make this a

little more generic and then just check

the datatype that's the better way of

doing this so here I'll just let me just

say P equals okay so what data type is P

it's a node

and make sure I import org w3c Dom okay

now I'm going to have an if statement

I'm going to say well if P dot get node

type if it equals node dot and look at

all these built-in types here one of

them is element node so there's

different types of nodes we have like

element nodes text nodes comment nodes

and so on so we just want to make sure

that we're dealing with a person tag

which is an element node then we're

going to downcast it so here I'm going

to say element person equals P so a

couple things we need to import element

once again org w3c dom and then the

other thing we need to do is add a cast

like this okay so once i have a handle

on that what can i do there well let's

take a look at here we can grab the ID

an ID is an attribute so if i want to

grab the value of the ID attribute

you'll see here we can get the attribute

it takes as its argument the attribute

name and it returns the corresponding

value okay so we're looking for the ID

attribute and on the left hand side is

the ID okay so i've captured the ID in

here

now let's capture the last name and

first name okay so once again I already

have a handle on that person node if I

say person dot get child nodes it's

going to grab all the children now they

may or may not be all elements so once

again I'm going to look at the node type

to make sure this returns a node list

so here I'm going to say node list like

this and I'm going to loop through it I

want to use a different counter than I

because I is already taken let's use j j

is less than nameless dot get length j

plus plus okay so in here I'm going to

take my name list item pass in J

remember that returns a node let's just

call this n let's check the data type if

n dot get node type if it's equal to

node dot element node then we're going

to handle it so here let's grab the

what's first of all cast it down so I'm

going to say element name equals n and

I'm going to down cast it now let's do

this I'm going to print out I already

have access to the ID so I'm going to

say person and then ID and then a

literal colon and what else do we want

to put in here how about name dot get

node whoops sorry

name dot

get tagname and then we'll get the text

content do a literal equal sign name dot

get text content let's run this one more

time okay so far so good okay of course

we could do some cleanup and change this

to where the person information is just

printed out once for each person but you

get the idea here you can extract pieces

of information from here and if you

wanted to you can also use the API to

manipulate that node but that's in a

different tutorial some other time I

hope you got a lot out of this video

tutorial please visit our website at

www.att.com/biz