Forum Moderators: open

Message Too Old, No Replies

Time involved to import data into a database

its taking forever

         

Tourz

8:59 pm on Feb 11, 2008 (gmt 0)

10+ Year Member



Hi, I have some guys in India that are taking forever to import data into a database and I'm starting to wonder if there is a problem. I'm not familiar with the process...

The data set is very large (2.3 million world place names) so the tab-delimited source text file is 140MB. Very slow to open on my laptop! This 'two week project' has stretched out to a month and a half -- mostly because I wanted to prepopulate the database with this locations info.

He said they needed to break the data into smaller chunks first. Then a week ago he said that it would be all done by Saturday. Now it's Monday and he says that they have only imported 40MB so far, have 100MB to go. Fine, there is not a huge rush on this project but I am wondering if these guys are totally incompetent.

This is all just on their test server. Am I going to have to wait weeks for them to set it up on my server when it's ready? Comments welcome, thanks.

Tourz

1:33 am on Feb 12, 2008 (gmt 0)

10+ Year Member



Cool new quick reply feature! Maybe it will encourage someone to step up and tell me how long it should take to import 140MB of data.

Too many variables?

Frank_Rizzo

10:54 am on Feb 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don't say which DB, O/S, hardware etc.

Something with around 2 million+ records / 140Mb could be done in minutes quite easily.

The problem could be if

a) the hardware they are using is not upto scratch (not enough ram, too much disk activity)
b) the data is badly presented and needs to be massaged first (different date formats half way through the list)

I get the impression they are not bothered about your request and are treating it as low priority or have lost the data.

If you still have the data do it yourself. Post up the specs of DB, O/S, hardware, and you'll get better assistance here.

ZydoSEO

2:04 pm on Feb 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with Frank. No reason that should take long to load.

Tourz

5:45 pm on Feb 12, 2008 (gmt 0)

10+ Year Member



Thanks for the info. Not sure about the specs -- they are developing a website for me, this is just part of it. Now that I know a little bit more, I'm going to start quizzing them. They are probably using trash computers.... you get what you pay for I guess.

Tourz

5:53 pm on Feb 13, 2008 (gmt 0)

10+ Year Member



ok, here is the response

We are using Windows 2000 server with IIS and around 1 GB of Ram for server.
Sir people whom you have consulted might be using some other language C++ etc which is faster and handles much bigger files.
Php has many limitation with the size of file.
Once files are splitted (which we are doing) in smaller size data import won't take much time.
its basically splitting files into smaller sized files taking time.
sir we are not delaying this intentionally.

Does this make sense?

I'm running Apache, will this be a problem considering they are doing this on Windows 2000 server?

ZydoSEO

5:46 am on Feb 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not knowledgable w/ the ins and outs of PhP or MySQL for that matter... I have about 20yrs experience w/ Oracle and SQL*Server combined. And there are many ways to 'skin a cat', some faster and/or less messy than others. That was an interesting response.

It sounds to me like you're paying someone to move a pile of dirt from point a to point b... And when you ask why it is taking so long they respond with something like:

We are using a wheelbarrow with 5 cubic feet capacity. Sir people whom you have consulted might be using a shovel to load/unload the dirt into/from the wheelbarrow, respectively. Our tablespoon that we are using to load/unload the dirt into/from the wheelbarrow has limitation with the amount of dirt we can move in one scoop. Once we get the wheelbarrow loaded it won't take much time. It's basically using a spoon instead of a shovel that is taking time. Sir we are not delaying this intentionally.

ROFL... If what they are saying is true about PhP, it sounds to me like, as they practically admitted, they simply picked the wrong tool to load your data for which you are likely getting charged by the hour... If what they are saying about the limitations of PhP are NOT true, then it sounds like they know not what they speak of.

[edited by: ZydoSEO at 5:55 am (utc) on Feb. 14, 2008]

Tourz

6:06 pm on Feb 14, 2008 (gmt 0)

10+ Year Member



using a spoon instead of a shovel

Yes, that sounds accurate. I'm glad that I'm not paying them by the hour for this 'bonus work'.

Frank_Rizzo

9:48 pm on Feb 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If it is a delimited file they don't even need to use php.

It just requires a simple

load data local infile "yourdata.txt" into table ....

mysql would chew over that quicker than they could type their response email to you.

[dev.mysql.com...]

Tourz

10:29 pm on Feb 14, 2008 (gmt 0)

10+ Year Member



mysql would chew over that quicker than they could type their response email to you.

good tip, thanks, I have passed it on.

jtara

11:50 pm on Feb 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Should take 5-10 minutes, if all it is doing is importing place names to a simple table.

Assuming the data is in a well-know text format (comma-separated values, tab-separated values, etc.) it should be possible to handle it with your database manager's "import" facility, without having to write any code.

I load an MySQL database with 800,000 records (a dump from SQLite) in one table and 600,000 in another in < 5 minutes. The tables have maybe 20 fields each. The database occupies maybe 80MB.

Once files are splitted (which we are doing) in smaller size data import won't take much time.
its basically splitting files into smaller sized files taking time.

This makes absolutely no sense at all. There's no sense in splitting the source file(s). Loading the database is a sequential process. You read one line of data, you stick it into the database.

Even if they DID have to split the file into smaller ones, again, that's a few minutes work with the right tools. On a linux system it could be done with command-line tools in minutes.

Sir, the technical term for what you are experiencing is "a snow job".

BTW, if you are running on Windows, you might want to get the utility "V". It's a very fast file viewer that will view any kind of file (including binary). It's ideal for viewing large text dump files like this, if you need to get in and explore the data. Search for "V file viewer".

Tourz

12:10 am on Feb 15, 2008 (gmt 0)

10+ Year Member



cool, thanks

ZydoSEO

5:39 am on Feb 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No one found my spoon/shovel analogy humorous? Tough crowd! hehe

Tourz

6:43 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



I told him it sounded like they were using a spoon instead of a shovel... in this context, it's not funny for me. Ha

Here is the latest response:

1.code will work in apache.
2.http://dev.mysql.com/doc/refman/5.0/en/load-data.html will not work in our case as its not just about importing data as it is.
3. we did search for softwares that could automate splitting files but could not find any which could split files by lines all we found could split files in sizes which surely won't work in this case.
4. feeding data in database was never part of this project we are doing this only to compensate the extra time this project took.
5. About "Snow Job" , sir we don't believe in giving such excuses. if we are not working on your project then we will surely say that we are not working on it and not give excuse that data import is taking time. Moreover how can you say that we are lying ? Are you here everyday with us checking everything that we are doing ?.
6. About the tools for managing text files on linux we do not have idea as we do not use linux OS. so can not comment on that.

we are towards completion.
We assure you that it will be complete this week itself.
Thanks for your cooperation.

OK, well I guess they just aren't that experienced (like me). I'll just have to wait and see. They've had a hard time with this from the get go, I've had to explain a few things about the data that should have been obvious.

Any pointers on the file splitting software?

[edited by: Tourz at 6:44 pm (utc) on Feb. 15, 2008]

jtara

7:32 pm on Feb 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think the real problem here may be that it seems you are offshoring both design and implementation.

You don't seem to know much if anything about the design yourself.

It sounds as if you've defined your widget in broad strokes, and expect them to come up with a widget.

If you are not a software developer yourself, you probably should hire one (separate from your offshore developers) to oversee the design. You need somebody with a higher-level viewpoint than a coder has.

It's really hard to tell if they are being unreasonable or not, since you've given us so little information.

Let's back up a bit. You've got a lit of 2.3 million world place names. What is it that you want to do with it?

We've all been assuming that you simply need the place names loaded into a table. From the response, there's more than that going on.

As far as splitting files, it is trivial. Using the Linux/Unix split command:

split -l 1000 myfile.txt

will split myfile.txt into files of 1000 lines called myfileaa, myfileab, myfileac, etc.

You can get implementations of split that will run on Windows, or there are numerous packages of Linux-like command-line utilities available for Windows. The most popular is Cygwin. While Cygwin is overkill for this, it's a handy tool to have if you are stuck in a Windows world - you get pretty much a complete set of Linux command-line tools, which are pretty handy if you are working with text files a lot.

And in any case, simply writing a command-line program to split a file into pieces is a one-hour job, tops, with bells and whistles.

But there's no need to re-invent this simple tool!

I still don't see the point of splitting the file. The names have to be fed to the database one at a time. No difference if you read 1000 lines each from 1000 files or 1,000,000 lines from one file. It will take the same amount of time.

Not being able to locate a utility to split a text file into chunks of size "n" shows an appalling lack of knowledge. Not about programming, mind you, but simple use of the Internet!

Given a search engine, anybody's 12 year old kid should be able to accomplish this in a few minutes.

Are you here everyday with us checking everything that we are doing ?.

And that, I'm afraid is the biggest problem with offshoring. You really need to start with a small project and assess capabilities and responsiveness before you dive into a big project. You need to establish a LOT of trust, and you can't really do so using some coders-for-hire website's rating system, as they are too easy to game. You need to do it individually, and incrementally.

Tourz

8:02 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



Thanks for the detailed input jtara.

Actually the project is well defined: we are cloning an apartment rental directory.

I made the project a bit more complicated because I wanted to have the list of possible locations prepopulated with place names so I did not have to input every city etc. I want people to be able to select their location from a series of drop down menus.

The breadcrumb navigation trail on each page will reflect the locations heirarchy (ie continent, subregion, country, state, city)

I consider this to be a small test project. They are doing it for $400, plus a bonus for this extra work.

Your tip about hiring an expert here to oversee the oversea project is a good one, thanks. What would you charge for something like this?

jtara

9:19 pm on Feb 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What would you charge for something like this?

A heck of a lot more than $400! :)

And, yes, that's just for top-level design.

You gets what you pays for. $400 MIGHT get you a day's effort from a U.S. based software engineer.

Tourz

10:40 pm on Feb 15, 2008 (gmt 0)

10+ Year Member



$400 MIGHT get you a day's effort from a U.S. based software engineer.

Ha, ya, globalization is a wonderful thing.

Any comments on the project now that you know the scope of it?

Tourz

5:43 pm on Feb 16, 2008 (gmt 0)

10+ Year Member



hey, what do ya know! After a little applied heat these guys finally produced, the website is not completely finished but it's getting there.

Thanks for the tips guys, a little knowledge goes a long ways!